Posts

Showing posts from April, 2014

Auto image rotation

Using Tesseract OCR to autorotate scanned text to right rotation. Problem From time to time, everyone has a bunch images of text that they don't want to rotate one by one to their right orientation. Solution I decided to write a small script that auto detects the orientation of pages based on OCR recognition and dictionary matching. First I used open source tesseract as OCR for parsing text from images. Second, I wrote down a list of most probably occurring words in a text (there are only 5-6 in example below, feel free to write your own). Finally the images rotate and parse every rotation through OCR and test with dictionary. As the OCR accuracy isn't 100% I used some small deviation on comparing words. See code below. Just copy these 3 files listed below into your ~/bin directory and run tesseract_rotate_all Dependencies You need to install perl re::engine::TRE - TRE regular expression engine download here recognize_good_rotation perl script to evaluate