[Part 2] Evaluating Offline Handwritten Text Recognition: Which Machine Learning Model is the Winner?

A benchmarking comparison between models provided by Google, Azure, AWS as well as open source models (Tesseract, SimpleHTR, Kraken, OrigamiNet, tf2-crnn, and CTC Word Beam Search)

4 min readSep 1, 2020

--

As we discussed in our previous post, Handwritten Text Recognition (HTR) involves the conversion of handwritten text into machine-encoded text, which can be challenging due to the variation in handwriting styles between different people. After an initial model review and down-selection, we evaluated APIs from three of the major cloud providers (Google, AWS, and Microsoft) and two open source models (Tesseract and SimpleHTR) for performance in processing handwritten text. In this post, we share the results from our evaluation and discuss the best HTR models to use for different use cases.

HTR Evaluation Results

Created by Anno.Ai

The Google Cloud Vision API was clearly the top performer with an average CER of 9.0 and an average match rate of 90.44. We’ve also included examples below to visualize the segmentation and detection outputs of the different models when run on the same image (f0035_19_crop.png).

Google Cloud Vision API example

Created by Anno.Ai with data from the NIST Handwritten Forms and Characters Database

AWS Textract API example

AWS Rekognition API example

Microsoft Azure Read API example

Tesseract example

SimpleHTR example

Conclusions

We found that the Google Cloud Vision API has the best performance and flexibility to recognize handwritten text. Due to its broad language support, the Google API also most gracefully handles non-Latin scripts, languages other than English, and use cases where there are a mix of languages in the text. The AWS Textract and Rekognition APIs also performed relatively well on the NIST dataset, although we found that performance dropped off during separate testing on datasets with non-Latin-script characters.

For open source models and projects requiring on-premise deployments, Tesseract works well although it does require some image preprocessing for optimal performance. Tesseract can detect the script that the text is written in, but unlike the Google API, requires additional language identification (LID) integration or that the user specify which language model or models to use. Both Tesseract and SimpleHTR can be retrained on additional handwriting data (for Tesseract, see the tesstrain repo), which is useful for custom datasets where the out-of-the-box models may not perform as well. However, to develop robust, generalizable models, both require a large variety of handwritten samples along with ‘ground truth’ transcriptions.

What’s Next

We’re planning to post more model evaluations soon, so please keep an eye out for our next posts on Automatic License Plate Recognition (ALPR) and Named Entity Recognition (NER) services and models!

About the Authors

Joe Sherman is a Principal Data Scientist at Anno.Ai, where he leads several efforts focused on operational applications of advanced machine learning. Joe has a passion for developing usable and actionable machine learning models and techniques and bringing the cutting edge to operational users. Prior to his work at Anno, Joe led the research and development of a number of machine learning applications for the Department of Defense, focusing on deploying analytics at scale. Joe holds a BS in Chemistry from Virginia Tech.

Ashley Antonides is Anno.Ai’s Chief Artificial Intelligence Officer. Ashley has over 20 years of experience leading the research and development of machine learning and computer vision applications in the national security, public health, and commercial sectors. Ashley holds a B.S. in Symbolic Systems from Stanford University and a Ph.D. in Environmental Science, Policy, and Management from the University of California, Berkeley.

[Part 2] Evaluating Offline Handwritten Text Recognition: Which Machine Learning Model is the Winner?

A benchmarking comparison between models provided by Google, Azure, AWS as well as open source models (Tesseract, SimpleHTR, Kraken, OrigamiNet, tf2-crnn, and CTC Word Beam Search)

HTR Evaluation Results

Conclusions

What’s Next

Further Reading

About the Authors

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Onsights.io

No responses yet