Sunday, March 15, 2015

OCR Engine Comparison

Today, I tested four OCR engines currently available to software developers.


OCR Engines Tested


  • Tesseract  Tesseract is free and open source.
  • Microsoft OCR Library Sample App  The microsoft OCR library appears to be free.  The implementaion I used seems to only work for windows store apps.
  • Abbyy Fine Reader 12  Abbyy fine reader is a packaged software, but It appears that they also license their engine to developers.
  • LEADTOOLS OCR SDK  Lead tools is well know for their imaging related developer tools.

(links go to company websites for more information)


The Test Image

I created this image containing text in various sizes and styles and fed it to the OCR engines.


OCR test image
The OCR test image

Tesseract Result

Using Tesseract, it output the following text:

DneszmsDcKzmngruHywurk> H -
How about a bigger font?
1254-5678'? I13
Whai‘ abow{’Hr\4'4«fow{'?

Microsoft OCR library Result

Using the Microsoft OCR Library Sample App, this was the result:

microsoft OCR library test result

Abbyy Fine Reader Result

Here is a partial screenshot of the results produced by Abbyy Fine Reader 12 :

Abby Fine Reader Result

LEADTOOLS OCR SDK Result

The LEADTOOLS OCR SDK produced the following results:

LEADTOOLS OCR SDK Result


Conclusion

These OCR systems seem to perform well on typewritten text, but fail on the script font, or text that is handwritten.  The OCR engines are not as good as I would expect them to be.



1 comment:

  1. Several reason for that since Leadtools, ABBYY or either our Yunmai solution are use by various industry around the world:

    1] the image you try to ocr is way too small (low account of pixel per character)
    2] the image sample size is ridiculously small (1 picture?)

    To appreciate accuracy and performance of OCR you need to test in "real" working situation, 300dpi for A4 document, a large variety of samples.

    If you would like to change your mind about OCR and daily usage try to download a business card scanner application on your mobile phone and you'll see that, even if the OCR result are not 100%, you can save a lot of time on data entry using this pieces of software.

    ReplyDelete