Caitlin Cisar
Caitlin Cisar
Caitlin Cisar
Caitlin Cisar
Hello — I’m Caitlin

Language Data Scientist
& Digital Tinkererthey / them

I’m a Language Data Scientist working at the intersection of natural language processing (NLP), machine learning (ML), and large language models (LLMs).

I’m passionate about advancing human language technology through human-centered ML. I firmly believe we’ve entered an era where safe "AI" starts with a thorough understanding of how humans and machines use language systems, thoughtful data practices, and the mindset that the future is always scientifically worthwhile.

Before starting this journey, I was a professional photographer for over a decade. When I’m not toiling away at linguistics or ML, I’m likely taking film photos, hiking, writing, gaming, or spending time with my lovely family.

See more of my research for notable projects or visit my gallery to see my photography.

Skillset

NLP & NLU

Developing linguistic analysis pipelines and applying linguistic theory across production grade NLP systems.

LLM & VLM Evaluation

Building internal benchmarks for frontier LLMs and VLMs across long-context and multi-turn dialogue with turn-level failure localization.

Annotation & Data Quality

Improving annotation pipelines and UIs, guideline authoring, and demystifying inter-annotator agreement metrics.

Python & Tooling

Coding from-scratch JS tools and dashboards, Naive Bayes classifiers, phonaesthetics generators, and standalone reporting.

Statistical Analysis

Transforming model failure modes and annotation disagreement into insight that drives data-backed decisions.

Synthetic Data

Generating tens of thousands of synthetic examples across dozens of use cases for experimental and research studies.

Timeline

A little over eight years working with linguistics across academia and industry, plus a creative life on the internet before that.

  1. 2025 — Present
    Language Data Scientist
    Innodata

    Focused on data quality, LLM evaluation, and workflow design for large-scale ML systems.

  2. 2022 — 2024
    M.A. Linguistics + HLT Certificate
    University of Colorado Boulder

    Graduate work in computational linguistics and human language technology, with projects spanning LLM-generated text detection and corpora research on gendered vocatives.

  3. 2021 — 2025
    ML Data Linguist
    Amazon Web Services

    Four years working on data and quality frameworks for production ML, plus co-leading research on model steerability.

  4. 2018 — 2021
    B.A. Linguistics + Philosophy Minor
    Iowa State University

    The formal start of my linguistics and philosophy studies. Topics covered computational linguistics, language and gender, and analytical philosophy.

  5. 2012 — 2023
    Professional Photographer
    Central Iowa

    Spent a decade as a (digital) photographer, where I specialized in portraiture, weddings, and cats.

  6. 2000s
    Digital Tinkerer
    The Internet

    Picked up HTML/CSS by being chronically online, learned photo-editing programs like PaintShop Pro, and played a lot of Neopets.