Document Classification




Build AI-powered document classification applications in a fraction of the time without hand-labeling data using Snorkel Flow.

Request demo

Image





Technology developed and deployed with the world’s leading organizations
Image
Image
Image
Image
Image
Image
Image
Image



Overview —

One Size Fits You, Not All


Achieve greater performance gains by exploiting domain-specific text features of your own data.



Image
Faster, Lower-cost Development
Use programmatic labeling to develop high-quality AI applications in hours instead of spending weeks or months on expensive hand-labeling.
Image
Higher-accuracy Models
Iterate on your application, using a closed-loop approach with intermediate results and analysis at every step to zero in on errors.
Image
Flexible Integrations
Easily integrate labeling, training and analysis pipelines defined over diverse input types–text, PDF, HTML, and more–with downstream applications using APIs or a Python SDK.
Image
Easier SME Collaboration
Build complex classification apps intuitively while preserving natural information about data taxonomies with subject matter expert (SME) collaboration.






Industry Use Cases —

Explore Enterprise Solutions For Classification


Build industry-specific AI applications combining state-of-the-art machine learning approaches with industry-specific best practices and last-mile connectors, all on an enterprise-scale platform.



FINANCIAL SERVICES



Contract Intelligence

Banks can classify contracts by terms and conditions to smoothly ensure regulatory complience.
TELECOM & CYBER



Customer Segmentation

Telecom organizations can classify customer usage documents to target promotional offers.
HEALTHCARE



Clinical Trial Matching

Biotech organizations can classify patient records to identify actionable clinical trial candidates.
INSURANCE



Risk Classification

Insurance underwriters can classify piolicy documents by behavioral or occupational variables to assess risk.
SOFTWARE



Search Engine Optimization

Software companies can recognize named entities in customer search queries and to optimize website content.
RETAIL



Product Recommendation

E-commerce sites can recognize entities in product descriptions (price, key words, etc.) to improve recommender systems.






Case Study —

Image
Google used Snorkel to replace 100K+ hand-annotated labels in critical ML pipelines for text classification.



Problem




Content, product, and event classification problems change too fast to hand-label, even with significant annotation budget.

Solution




Google deployed early versions of Snorkel's core technology with three high-impact teams, repurposing many resources as labeling functions.

Results




Hours of labeling function development replaced 10-100K+ hand labels, significantly impacting the bottom line and accelerating of ML adoption.

6 MONTHS
of hand-labeling data replaced in 30 mins
<0hrs
To develop the first custom ML model
52%
performance improvement
+0%
Accuracy for contract classification
100K+
hand labels replaced with programmatic approach
0K
Contracts processed in minutes

Read more






An End-to-end ML Platform —

Designed for Collaboration




Image

Data Scientist Friendly


  • Integrated Jupyter notebooks
  • Guided error analysis
  • Ready-to-use models
Image

Domain Expert Friendly


  • Intuitive, no-code UI
  • Rich dashboards and visualizations
  • Full-featured, push-button error analysis
Image

Developer Friendly


  • Platform access via Python SDK
  • Online or batch API deployment
  • Containerized software for cloud or on-premises deployments






Resources —

Explore More About Snorkel


Learn more about groundbreaking techniques for programmatic labeling and weak supervision developed by Team Snorkel and the broader data science community.



NATURE COMMS

Weakly Supervised Classification of Aortic Valve Malformations Using …

J. Fries, et al, 2019
IEEE IVS

Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving…

Z. Wheng, et al, 2019
MEDIUM

Understanding Snorkel

Anna Zubova
Research Paper

Trove: Ontology-driven Weak Supervision for
Medical Entity Classification

J. Fries, et al. 2020
AAAI

Training Complex Models with Multi-Task Weak Supervision

A. Ratner, et al, 2019
ACL

Training Classifiers with Natural Language Explanations

B. Hancock, et al, 2018
Research Paper

Train and You’ll Miss It: Interactive Model Iteration with Weak Supervision…

M. Chen, et al, 2020
CIDR

The Role of Massively Multi-Task and Weak Supervision in Software 2.0

A. Ratner, et al, 2019
FAST FOWARD LABS

Taking Snorkel for a Spin

Fast Forward Labs at Cloudera
Research Paper

SwellShark: A Generative Model for Biomedical NER without Labeled Data

J. Fries, et al, 2017
AI4 CYBER SUMMIT

State of AI in Cyber

Ai4 Cyber Summit
Course

Stanford University: CS229 – Machine Learning

Chris Re