Data development | Snorkel AI

Research-led data development

Datasets and environments that give frontier models domain expertise

Snorkel builds the human expert-authored datasets, evaluation environments, and benchmarks calibrated to push the limits of frontier model capability. Off-the-shelf or custom.

Request dataset samples

Where generic data runs out

Frontier model development stalls on data problems generic pipelines weren't built to solve, including distributional gaps in specialized domains, benchmark blind spots, and failure modes that only surface at scale. We build the data to solve them.

get started

Two ways to get the data you need

What the data frontier models need most is rarely the data that already exists. Snorkel delivers it two ways: off-the-shelf for well-defined task areas, or custom-built for the gaps only you can see.

Snorkel Data Series

Curriculum-structured datasets for the task areas frontier models are pushing hardest, with rubrics, reviewer guidance, difficulty tiers, and eval slices built in.

Request samples

Custom data development

Bespoke datasets, evaluation environments, and benchmark expansions to target the exact failure surface you're trying to close.

Talk to the team

SNORKEL DATA SERIES

Built for the task areas that matter now

Co-developed with leading frontier AI teams. Each series is curriculum-structured to build difficulty progressively across a task area, with the evaluation infrastructure to match. A look at a few areas we support:

Specialized computer use agents thumbnail

Agentic coding

Repo-grounded software engineering tasks inside real codebases, spanning multiple languages and difficulty tiers.

Terminal tasks

Real software engineering tasks grounded in production-style codebases, spanning multi-file, multi-language, with real test coverage.

Enterprise RL environments

Simulate real enterprise workflows with step-level reward signals. Built for agents that need to perform in production, not just on benchmarks.

Multimodal STEM

Multimodal scientific reasoning across figures, tables, and text. Calibrated so no single modality is enough to solve the task.

Specialized computer use agents

Long-horizon computer-use workflows across professional engineering desktop applications, with flows that require 50+ UI actions.

CUSTOM DATA DEVELOPMENT

Closing gaps existing datasets can’t reach

Custom data development engagements start with the failure surface: what the model can't do, where it's brittle, and what the correct evaluation criteria are. From there, Snorkel builds the datasets, environments, and benchmark expansions needed to close it.

Task specification and rubric design

Bespoke dataset construction

RL environment development

Benchmark and eval expansion

Provenance and adjudication

PUBLISHED RESEARCH