Swift 3 - How to Improve Image Quality for Tesseract

Swift 3 - Which pixel format type do I use for best Tessecract text recognition?

kCVPixelFormatType_24RGB, kCVPixelFormatType_24BGR, kCVPixelFormatType_32ARGB, kCVPixelFormatType_32BGRA, kCVPixelFormatType_32ABGR, kCVPixelFormatType_32RGBA all of these would be the best options and is usually the most COMMON options (IE: 24-bit bitmap, 24-bit PNG, 32-bit bitmap, 32-bit PNG, etc).

Basically, 24-bit only contains R, G, B, pixel components and the alpha channel is completely missing. 32-bit contains an alpha channel so: R, G, B, A, components would be used. Usually 24-bit works really well on Tesseract and 32-bit works really well when the alpha channel is transparent (0x0 or 0xFF for all bytes). This is equivalent to using BMP or PNG format.

Note: The above is just formats. Ideally, your image needs to be pretty decent quality as well (the best is usually white text, black background or black text, white background or some great contrast between the text and the background). It will depend on the image as well (not just the format).

As for capture settings: AVCapturePhotoSettings, allocating one will give you default settings. You can create your own using:

https://developer.apple.com/reference/avfoundation/avcapturephotosettings/1648673-photosettingswithformat?changes=latest_minor&language=objc

It tells you what parameters to pass. It also lets you also determine whether or not it should be high res, live photo, etc.. You can see here for more: https://developer.apple.com/reference/avfoundation/avcapturephotosettings?changes=latest_minor&language=objc

availablePhotoCodecTypes returns JPEG, PNG, BMP, etc. Just different formats that support compression for capturing. When you capture RAW or BMP, it is uncompressed. BMP compression for example uses RLE (Run Length Encoding). PNG uses zlib to compress and so does JPEG.

For videos, it would return maybe MP4, MPEG-4, etc. See: https://www.thedroidsonroids.com/blog/ios/whats-new-avfoundation-ios-10/ for examples.

iOS Tesseract: bad results

There's nothing wrong in the way your taking the pictures from your iPad per se. But you just can't throw in such a complex image and expect Tesseract to magically determine which text to extract. Take a closer look to the image and you'll notice it has no uniform lightning, it's extremely noisy so it may not be the best sample to start playing with.

In such scenarios it is mandatory to pre process the image in order to provide the tesseract library with something simpler to recognise.

Below find a very naive pre processing example that uses OpenCV (http://www.opencv.org), a popular image processing framework. It should give you and idea to get you started.

#import <TesseractOCR/TesseractOCR.h>
#import <opencv2/opencv.hpp>
#import "UIImage+OpenCV.h"

using namespace cv;

...

// load source image
UIImage *img = [UIImage imageNamed:@"tesseract.jpg"];

Mat mat = [img CVMat];
Mat hsv;

// convert to HSV (better than RGB for this task)
cvtColor(mat, hsv, CV_RGB2HSV_FULL);

// blur is slightly to reduce noise impact
const int blurRadius = img.size.width / 250;
blur(hsv, hsv, cv::Size(blurRadius, blurRadius));

// in range = extract pixels within a specified range
// here we work only on the V channel extracting pixels with 0 < V < 120
Mat inranged;
inRange(hsv, cv::Scalar(0, 0, 0), cv::Scalar(255, 255, 120), inranged);

Sample Image

Mat inrangedforcontours;
inranged.copyTo(inrangedforcontours); // findContours alters src mat

// now find contours to find where characters are approximately located
vector<vector<cv::Point> > contours;
vector<Vec4i> hierarchy;

findContours(inrangedforcontours, contours, hierarchy, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE, cv::Point(0, 0));

int minX = INT_MAX;
int minY = INT_MAX;
int maxX = 0;
int maxY = 0;

// find all contours that match expected character size
for (size_t i = 0; i < contours.size(); i++)
{
cv::Rect brect = cv::boundingRect(contours[i]);
float ratio = (float)brect.height / brect.width;

if (brect.height > 250 && ratio > 1.2 && ratio < 2.0)
{
minX = MIN(minX, brect.x);
minY = MIN(minY, brect.y);
maxX = MAX(maxX, brect.x + brect.width);
maxY = MAX(maxY, brect.y + brect.height);
}
}

Sample Image

// Now we know where our characters are located
// extract relevant part of the image adding a margin that enlarges area
const int margin = img.size.width / 50;
Mat roi = inranged(cv::Rect(minX - margin, minY - margin, maxX - minX + 2 * margin, maxY - minY + 2 * margin));
cvtColor(roi, roi, CV_GRAY2BGRA);
img = [UIImage imageWithCVMat:roi];

Sample Image

Tesseract *t = [[Tesseract alloc] initWithLanguage:@"eng"];

[t setVariableValue:@"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" forKey:@"tessedit_char_whitelist"];
[t setImage:img];

[t recognize];

NSString *recognizedText = [[t recognizedText] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

if ([recognizedText isEqualToString:@"1234567890"])
NSLog(@"Yeah!");
else
NSLog(@"Epic fail...");

Notes

  • The UIImage+OpenCV category can be found here. If you're under ARC check this.
  • Take a look at this to get you started with OpenCV in Xcode. Note that OpenCV is a C++ framework which can't be imported in plain C (or Objective-C) source files. The easiest workaround is to rename your view controller from .m to .mm (Objective-C++) and reimport it in your project.

Tesseract OCR won't recognize division symbol ÷

Train the or engine wit different fonts.

Here is the tool for training the engine.
Have a look on this also

Or you can use JTessBoxEditor



Related Topics



Leave a reply



Submit