When comparing Tesseract OCR vs GOCR, the Slant community recommends GOCR for most people. In the question“What are the best Linux OCR programs?” GOCR is ranked 1st while Tesseract OCR is ranked 2nd. The most important reason people chose GOCR is:
GOCR is very easy to use and it's callable from the command line. Just type `gocr -h` and you will have all the available commands with the needed information on how to use them.
Ranked in these QuestionsQuestion Ranking
Pro Works great with 300 DPI files
Pro Support for 40 languages
In the beginning Tesseract only had support English. Now newer versions can support up to 40 languages.
Pro Very easy to use (see the manual page, not built-in help)
See the manual page.
Pro Can create sandwhich PDF files
Pro Easy, straightforward use
GOCR is very easy to use and it's callable from the command line. Just type
gocr -h and you will have all the available commands with the needed information on how to use them.
Pro Can be used with different frontends
Since GOCR is compatible with multiple different frontends, it can be very easy to port it to different operating systems.
Con Rudimentary image processing
Tesseract's image processing is very rudimentary, in order to get the most out of it you need to use a preprocessor or use an image that's already been processed.
A for humans perfectly readable image 100 dpi results in a huge number of failed characters even if source is free from physical scan artifacts (i.e. print to file). If possible, it's better to use 300 dpi image files.
Con Only reads TIFF files
Tesseract can only read TIFF files. If you want to use a picture with a different format (JPEG, PNG, PDF) you need to convert it first.
Con Couldn't OCR a clean pdf saved to file (containing images only), converted to pnm (GOCR native format)
Con Fails with multiline layouts
GOCR does not work very well with multiline layouts.