Information Extraction




Rapidly build AI-powered applications that extract information from unstructured text, PDF, tables, or forms from millions of documents without expensive hand-labeling using Snorkel Flow.

Request demo

Image





Technology developed and deployed with the world’s leading organizations
Image
Image
Image
Image
Image
Image
Image
Image



Overview —

Targeted Applications to Tackle Any Entity


Extract useful data from any tables, cells, and forms linked to all headers, units, or references.



Image
Faster, Lower-cost Development
Use programmatic labeling to develop high-quality AI applications in hours instead of spending weeks or months on expensive hand-labeling.
Image
Rapidly Adaptable
Monitor for changes in the data, and rapidly adapt using built-in error analysis tools. Zoom in on errors to fine-tune training data & models with guided iteration.
Image
High-accuracy Models
Leverage large amounts of labeled and unlabeled data, NLP primitives, and state-of-the-art model architectures to build high-accuracy models.
Image
Flexible Integrations
Easily integrate labeling, training, and analysis pipelines defined over diverse input types–text, PDF, HTML, and more–with downstream applications using APIs or a Python SDK.






Industry Use Cases —

Information Extraction Customized for Your Workflow


Build industry-specific AI applications combining state-of-the-art machine learning approaches with industry-specific best practices and last-mile connectors, all on an enterprise-scale platform.



FINANCIAL SERVICES



Contract Intelligence

Banks can classify contracts by terms and conditions to smoothly ensure regulatory complience.
TELECOM & CYBER



Customer Segmentation

Telecom organizations can classify customer usage documents to target promotional offers.
HEALTHCARE



Clinical Trial Matching

Biotech organizations can classify patient records to identify actionable clinical trial candidates.
INSURANCE



Risk Classification

Insurance underwriters can classify piolicy documents by behavioral or occupational variables to assess risk.
SOFTWARE



Search Engine Optimization

Software companies can recognize named entities in customer search queries and to optimize website content.
RETAIL



Product Recommendation

E-commerce sites can recognize entities in product descriptions (price, key words, etc.) to improve recommender systems.






Case Study —

Image
A top U.S. bank uses Snorkel Flow to quickly build AI applications that classify and extract information from contracts and other legal documents.



Problem




The bank estimated that, for a time-sensitive use case, labeling data by hand would take over a month.

Solution




With Snorkel Flow, the team produced a AI-powered contract intelligence application that was over 99% accurate in under 24 hours.

Results




The resulting AI application was quickly and easily adapted to new problems.

99.1%
Snorkel Flow Accuracy
<0hrs
To develop the first custom ML model
<24hrs
From problem start
+0%
Accuracy for contract classification
>250K
# Documents processed
0K
Contracts processed in minutes

Read more






An End-to-end ML Platform —

Designed for Collaboration




Image

Data Scientist Friendly


  • Integrated Jupyter notebooks
  • Guided error analysis
  • Ready-to-use models
Image

Domain Expert Friendly


  • Intuitive, no-code UI
  • Rich dashboards and visualizations
  • Full-featured, push-button error analysis
Image

Developer Friendly


  • Platform access via Python SDK
  • Online or batch API deployment
  • Containerized software for cloud or on-premises deployments






Resources —

Explore More About Snorkel


Learn more about groundbreaking techniques for programmatic labeling and weak supervision developed by Team Snorkel and the broader data science community.



NATURE COMMS

Weakly Supervised Classification of Aortic Valve Malformations Using …

J. Fries, et al, 2019
IEEE IVS

Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving…

Z. Wheng, et al, 2019
MEDIUM

Understanding Snorkel

Anna Zubova
Research Paper

Trove: Ontology-driven Weak Supervision for
Medical Entity Classification

J. Fries, et al. 2020
AAAI

Training Complex Models with Multi-Task Weak Supervision

A. Ratner, et al, 2019
ACL

Training Classifiers with Natural Language Explanations

B. Hancock, et al, 2018
Research Paper

Train and You’ll Miss It: Interactive Model Iteration with Weak Supervision…

M. Chen, et al, 2020
CIDR

The Role of Massively Multi-Task and Weak Supervision in Software 2.0

A. Ratner, et al, 2019
FAST FOWARD LABS

Taking Snorkel for a Spin

Fast Forward Labs at Cloudera
Research Paper

SwellShark: A Generative Model for Biomedical NER without Labeled Data

J. Fries, et al, 2017
AI4 CYBER SUMMIT

State of AI in Cyber

Ai4 Cyber Summit
Course

Stanford University: CS229 – Machine Learning

Chris Re