The Power of
Snorkel AI's technology is based on years of research represented in 40+ publications around programmatic labeling, weak supervision, and broader ML techniques.
Over 40+ Peer-Reviewed Publications
In collaborations with the US Department of Veterans Affairs and the US Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provided 132% average improvements to predictive performance over prior heuristic approaches and came within an average 3.6% of the predictive performance of large hand-curated training sets.
In collaborations with users at research labs, Stanford Hospital, and on open source datasets, Snorkel outperformed other automated approaches like semi-supervised learning by up to 14.4 F1 points.
The slice-based weak supervision approach in Snorkel improved over baselines in terms of computational complexity and slice-specific and overall performance by up to 19 and 4.6 F1 points respectively on applications spanning natural language understanding and computer vision benchmarks as well as production-scale industrial systems.
On three classification tasks at Google, a Snorkel installation was used to create classifiers of comparable quality to ones trained with tens of thousands of hand-labeled examples, convert non-servable organizational resources to servable models for an average 52% performance improvement, and execute over millions of data points in tens of minutes.
The cross-modal weak supervision approach in Snorkel yielded models that on average perform within 1.75 points and 10.3 ROC-AUC of those supervised with physician-years and -months of hand labeling respectively, while using only person-days of developer time and clinician work—a time saving of 96%.
Conventional Approaches —
The Problem with Legacy AI
Black box models or APIs
Black box models or APIs ignore the nuances of your data and objectives, and offer no way to customize, adapt, or audit their behavior.
Rules-based approaches often don’t generalize as well as ML models on complex, unstructured data or adapt easily to data drift or changing objectives.
Hand-labeled ML is notoriously expensive and slow, especially when subject matter experts are required, with limited ability to iterate, adapt, audit, or be privacy compliant.
New Approach to AI —
Snorkel introduces a radically new approach that enables users to programmatically label massive amounts of training data by writing “labeling functions”. While this has led to advancing the state of AI, like any new paradigm it has introduced new challenges, which Team Snorkel has spent over half a decade researching. The result of this work is the Snorkel Flow platform.
The Snorkel Framework —
Snorkel’s framework is based on weak supervision, a classical but newly-resurgent set of techniques proven in research as well as hundreds of production deployments. The key idea in weak supervision is to train machine learning models using more efficient but potentially less accurate or “noisier” labels instead of “ground truth” labels provided by groups of expert annotators. Such noisy or so-called weak labels are easier to acquire in massive quantities, often resulting in higher quality models overall.
Snorkel Flow extends and subsumes years of weak supervision research with the concept of a “labeling function”. In Snorkel Flow, users write labeling functions which capture heuristics from domain knowledge of the data, and can leverage existing resources such as input from models, expert systems, and knowledge bases.
Weak labels typically overlap and conflict, vary in accuracy and dataset coverage, and may even hide latent dependencies and correlations. Snorkel automatically models and combines their outputs using a generative model that looks for patterns of agreements and disagreements, then uses the resulting probabilistic labels to train a discriminative model. Team Snorkel has spent years on theoretically- and empirically-grounded research advances that go into the foundations of the Snorkel Flow, and continue to integrate the latest advances in state-of-the-art.
Snorkel: Recommended For Modern ML Practice
Practical Weak Supervision: Doing More with Less Data
Generating Labels for Model Training Using Weak Supervision review