WattzOn surpasses human accuracy in text extraction from structured documents

by | Apr 17, 2018 | GLYNTBlog

MOUNTAIN VIEW, CA (April 17, 2018)

WattzOn announced today that its flagship machine learning product, Mr Bill, has surpassed human level performance (also known as superhuman accuracy) for automated text extraction from structured documents such as invoices, forms, lab reports and medical records. A recent WattzOn study on real world documents shows F1 score performance of 98%, reflecting not only a high precision rate of 99%, but also a high recall rate of 98%. Additionally, Mr Bill trains on a small number of example documents, approaching one shot learning levels. Hosted on a proprietary fast and elastic AI workbench, Mr Bill is a complete and scalable machine learning system for data extraction from structured documents.

For data in PDF form, Mr Bill achieves a measured F1 score of 98% for fully supported fields. The product achieves an estimated F1 score of 96% for the the same documents converted to image form. The F1 score is a balanced measure of precision and recall, and is a metric useful for performance comparison against current real world data extraction systems. Manual systems typically have error rates of 10% or more before a final human review and correction cycle.

Mr Bill is a complete advanced machine learning system for highly scalable automated text extraction, supported by WattzOn’s Elastic AI Workbench, which includes pipelining, orchestration, job control, automatic hyperparameter tuning, and provision for an elastic API. Mr Bill’s key features are not only its beyond human accuracy and recall rates, but also that it has uniquely low training costs. With only 20 examples per text field needed for training purposes, the product approaches the goal of one shot learning. The low training requirements enable economical support for the natural variation of documents.

The study performed by the WattzOn team first measured the F1 score across four different document forms and layouts in PDF format. For each set, 20 – 30 example documents were used in training and 80 – 100 documents were held out unseen for validation. Using a modest elastic computing cluster, early results for data extraction processing speed measured 7 seconds per field. This study demonstrated that Mr Bill’s classification capabilities beat the performance of current real world systems for all document formats.

As organizations go through digital transformations, Mr Bill provides important business benefits:

** Liberating data trapped in structured documents
** Providing high-value semi-structured data for use within other machine learning and NLP algorithms
** Removing the choke point of processing speed for high-quality text extraction, the use of human data entry teams

Mr Bill, with its Elastic AI Workbench, enables text data extraction at the recall, precision, speed and scale to meet modern enterprise needs, reducing operating costs and expanding product and market opportunities. With very low training costs, Mr Bill’s ROI is measured in days and weeks, not months and years.

Mr Bill is immediately available. Please contact WattzOn for a customized product demonstration.

ABOUT MR BILL

Mr Bill uses an innovative, complex application of machine learning algorithms to quickly and accurately extract data. Mr Bill is supported by an Elastic AI Workbench, an infrastructure that manages, trains, tunes, and extracts data in an elastic computing cloud, and provides a strong foundation for other AI initiatives. Mr Bill is available through a SaaS software license to enterprise customers in the healthcare, energy, defense, government and finance sectors.

ABOUT WATTZON

WattzOn provides text extraction data services through its complete machine learning system, MR BILL, and a vertical API for utility data, LINK, that covers 50 states and 94 million homes. WattzOn is a women-led company, with Martha Amram, CEO, and Sandra Carrico, VP of Engineering and Chief Data Scientist.

With performance that surpasses that of humans, and uncommonly low set up costs, MR BILL is a natural fit for markets with fragmented data sources and high data volumes, providing valuable semi-structured data for use in AI powered analytics and digital enterprise automation. WattzOn’s LINK product serves market leaders in the solar, smart home and commercial utility bill processing markets, with expansion into consumer credit. Both products are available via SaaS software license.

Sources for Performance Comparison:
— Human Data Entry, Two Passes: Customer interviews
— Software Systems: See https://ocrsolutions.com/typical-field-acceptance-rate-ocr-accuracy-level/ ((F1 score calculated with 93% precision and 90% recall). https://blogs.dropbox.com/tech/2017/04/creating-a-modern-ocr-pipeline-using-computer-vision-and-deep-learning (F1 score calculated with 87% precision and 95% recall)
— System of Software And Human Review: Customer interviews

Most Recent Posts