But today the situation is entirely different. The OCR engines available have improved enormously. Black font on a grey background – no problem. Document skew – typically no problem. Confusion around lines – almost never. Today GLYNT’s accuracy on these scans is in the high 90%, not much lower than the rate on clean PDFs.
As you can see, GLYNT’s advanced machine learning (ML) is dependent on the OCR engine. If the OCR can’t see a word, the ML can’t process it. And if the OCR sees the text with an error, it weakens the ML results. Our solution is to bring in all of the world-class OCR engines – AWS, MSFT, GOOG, and Abbyy. Typically we use AWS, but when needed the other OCR engines are available for testing too.
So the bad scan has largely disappeared! The upgraded OCR engines are not perfect, and some documents are still problematic. If you have that stack of scanned documents, send them over. They may be gems after all.