Sunday, March 15, 2015

OCR Engine Comparison

Today, I tested four OCR engines currently available to software developers.


OCR Engines Tested


  • Tesseract  Tesseract is free and open source.
  • Microsoft OCR Library Sample App  The microsoft OCR library appears to be free.  The implementaion I used seems to only work for windows store apps.
  • Abbyy Fine Reader 12  Abbyy fine reader is a packaged software, but It appears that they also license their engine to developers.
  • LEADTOOLS OCR SDK  Lead tools is well know for their imaging related developer tools.

(links go to company websites for more information)


The Test Image

I created this image containing text in various sizes and styles and fed it to the OCR engines.


OCR test image
The OCR test image

Tesseract Result

Using Tesseract, it output the following text:

DneszmsDcKzmngruHywurk> H -
How about a bigger font?
1254-5678'? I13
Whai‘ abow{’Hr\4'4«fow{'?

Microsoft OCR library Result

Using the Microsoft OCR Library Sample App, this was the result:

microsoft OCR library test result

Abbyy Fine Reader Result

Here is a partial screenshot of the results produced by Abbyy Fine Reader 12 :

Abby Fine Reader Result

LEADTOOLS OCR SDK Result

The LEADTOOLS OCR SDK produced the following results:

LEADTOOLS OCR SDK Result


Conclusion

These OCR systems seem to perform well on typewritten text, but fail on the script font, or text that is handwritten.  The OCR engines are not as good as I would expect them to be.