Learn more about GLYNT. Why we built it and how we knew what features to include. Read this behind the scenes interview. Interview conducted by Shaul Stone (SS).
SS: Hey, David. Thanks for sitting down with me. So, can you tell me a little bit about GLYNT? What is it, exactly?
DAVID: That is my favorite question these days. GLYNT is a machine learning system that extracts data trapped in documents such as invoices, ID cards, health lab reports. GLYNT delivers clean, labeled data ready for use. It’s a system we built in-house from the ground up to liberate data from documents.
SS: What’s the key innovation?
DAVID: GLYNT started with our energy customers at WattzOn, where we provide utility bill data to customers in the cleantech and smart home industries. There are a multitude of layouts on utility bills across the U.S. Those differences make it time consuming and expensive to extract data from the bills for customers. Software engineers are reduced to writing hand-coded solutions for each utility. So we set out to automate data extraction from documents. We needed to account for the layout variations and have an accuracy in the “golden-zone” of 98% or higher. That’s exactly what we built, but it was only the beginning…
SS: Only the beginning?
DAVID: Yes, we soon realized we had automated data extraction with high precisions, low error rates, and on miniscule training sizes. This is a really big deal. We could account for document variations and custom field requirements. What we built has value for other industries too, such as healthcare, accounting, supply chains. Industries which require high accuracy and rely on semi-structured data trapped in faxes, pdfs and scanned documents. So we launched a new website, GLYNT.AI and expanded our customer base.
SS: Can you explain these features in a bit more detail? How do they benefit your customers?
DAVID: Automated data extraction is faster, cheaper and more accurate than the alternatives. GLYNT has a very high accuracy rate, typically 98%. Our studies show that errors are costly to fix, 11X the cost of data extraction itself. Every machine learning system is challenged by edge cases, and when the GLYNT system has a low confidence in a result (eg lower than 50% confident) it leaves the data field empty. So bad data is not perpetuated or hidden. The GLYNT system points humans to these exceptions, so they can be easily fixed if needed. This is the cheapest way to get to above 99% accuracy, which is what the market needs for insertion of extracted data into structured databases. And every fix makes our system smarter and smarter.
GLYNT also flips the script on the size of the document set needed to train the machine learning models. GLYNT requires just seven documents to learn on. With such small training sets, you don’t need to be a Fortune 500 company to create high quality data extraction models. I like to say it cuts the big data players down to size.
SS: And the secret sauce is…..?
DAVID: GLYNT uses Mixed Formal Learning to quickly tune machine learning models and overcome document variations. Legacy solutions like Zonal OCR or scripts of regular expressions to extract data from PDFs are brittle and require constant maintenance. Our system is flexible and easily encompasses new fields, new customer needs, and even new industries. It’s Ibuprofen for data solution headaches. That’s a big deal.
SS: Getting machine learning system off the ground must have had its fair share of challenges. What were some of the obstacles you overcame in the design and launch phases?
DAVID: Our biggest stumbling blocks were usability and quality assurance. These are common problems with machine learning solutions and tough to overcome.
SS: So, how did you get over the hump?
DAVID: First, we had to build the technology infrastructure. Our Elastic AI Workbench allows GLYNT to easily add and orchestrate additional ML capabilities into our ecosystem. It’s self-contained and elastic, so we can serve small and high volume customers.
Second, we had to verify our results and build in the infrastructure to monitor our performance. This was a process of trial and error, with the goal of creating a highly transparent AI system.
SS: Where did the name “GLYNT” come from?
DAVID: I’m glad you asked. In the process of designing the product, we were trying to envision ways to visualize how it functions. The best description we came up with was clear light hitting a document and breaking into chunks of color. Those color chunks are the extracted data, glinting in the sunlight. For better or worse, “GLINT” was already taken, so we went with “GLYNT.” We think the Y makes it cooler anyway.
SS: What are your customers saying about GLYNT? And where do you go next?
DAVID: Customer consistently tell us they are glad GLYNT came along. Not to toot our own horn, but we often get feedback on how simple and scalable a solution GLYNT is. It relieves major chokepoints in company workflows, and helps our customers grow their revenues.
So, the future of GLYNT is clear: high performance data extraction from documents, with clean labeled data delivered by API. Our mission is to continue to make this is as easy as possible. We’re working hard on the user experience, so that our customers can do as much of the document management and data extraction workflow as they want.
We also pay attention to roles. We don’t want software engineers to do data detail, that is best left to the data analyst. We want to support good data security and governance, so we need an easy-to-integrate API that meets modern IT department requirements.
SS: David, thank you for taking the time to chat with me today, but as a bit of a coffee snob, I always like to ask: do you always take your coffee black?
DAVID: Haha. Yes, I do. I feel like it gets me going more efficiently. And, you know, I’ve really come to enjoy the way the light glints off the liquid black surface.