Swift 3 - Which pixel format type do I use for best Tessecract text recognition?
kCVPixelFormatType_24RGB, kCVPixelFormatType_24BGR, kCVPixelFormatType_32ARGB, kCVPixelFormatType_32BGRA, kCVPixelFormatType_32ABGR, kCVPixelFormatType_32RGBA
all of these would be the best options and is usually the most COMMON options (IE: 24-bit bitmap, 24-bit PNG, 32-bit bitmap, 32-bit PNG, etc).
Basically, 24-bit only contains R, G, B, pixel components and the alpha channel is completely missing. 32-bit contains an alpha channel so: R, G, B, A, components would be used. Usually 24-bit works really well on Tesseract and 32-bit works really well when the alpha channel is transparent (0x0 or 0xFF for all bytes). This is equivalent to using BMP or PNG format.
Note: The above is just formats. Ideally, your image needs to be pretty decent quality as well (the best is usually white text, black background or black text, white background or some great contrast between the text and the background). It will depend on the image as well (not just the format).
As for capture settings: AVCapturePhotoSettings
, allocating one will give you default settings. You can create your own using:
https://developer.apple.com/reference/avfoundation/avcapturephotosettings/1648673-photosettingswithformat?changes=latest_minor&language=objc
It tells you what parameters to pass. It also lets you also determine whether or not it should be high res, live photo, etc.. You can see here for more: https://developer.apple.com/reference/avfoundation/avcapturephotosettings?changes=latest_minor&language=objc
availablePhotoCodecTypes
returns JPEG, PNG, BMP, etc. Just different formats that support compression for capturing. When you capture RAW or BMP, it is uncompressed. BMP compression for example uses RLE (Run Length Encoding). PNG uses zlib to compress and so does JPEG.
For videos, it would return maybe MP4, MPEG-4, etc. See: https://www.thedroidsonroids.com/blog/ios/whats-new-avfoundation-ios-10/ for examples.
iOS Tesseract: bad results
There's nothing wrong in the way your taking the pictures from your iPad per se. But you just can't throw in such a complex image and expect Tesseract to magically determine which text to extract. Take a closer look to the image and you'll notice it has no uniform lightning, it's extremely noisy so it may not be the best sample to start playing with.
In such scenarios it is mandatory to pre process the image in order to provide the tesseract library with something simpler to recognise.
Below find a very naive pre processing example that uses OpenCV (http://www.opencv.org), a popular image processing framework. It should give you and idea to get you started.
#import <TesseractOCR/TesseractOCR.h>
#import <opencv2/opencv.hpp>
#import "UIImage+OpenCV.h"
using namespace cv;
...
// load source image
UIImage *img = [UIImage imageNamed:@"tesseract.jpg"];
Mat mat = [img CVMat];
Mat hsv;
// convert to HSV (better than RGB for this task)
cvtColor(mat, hsv, CV_RGB2HSV_FULL);
// blur is slightly to reduce noise impact
const int blurRadius = img.size.width / 250;
blur(hsv, hsv, cv::Size(blurRadius, blurRadius));
// in range = extract pixels within a specified range
// here we work only on the V channel extracting pixels with 0 < V < 120
Mat inranged;
inRange(hsv, cv::Scalar(0, 0, 0), cv::Scalar(255, 255, 120), inranged);
Mat inrangedforcontours;
inranged.copyTo(inrangedforcontours); // findContours alters src mat
// now find contours to find where characters are approximately located
vector<vector<cv::Point> > contours;
vector<Vec4i> hierarchy;
findContours(inrangedforcontours, contours, hierarchy, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE, cv::Point(0, 0));
int minX = INT_MAX;
int minY = INT_MAX;
int maxX = 0;
int maxY = 0;
// find all contours that match expected character size
for (size_t i = 0; i < contours.size(); i++)
{
cv::Rect brect = cv::boundingRect(contours[i]);
float ratio = (float)brect.height / brect.width;
if (brect.height > 250 && ratio > 1.2 && ratio < 2.0)
{
minX = MIN(minX, brect.x);
minY = MIN(minY, brect.y);
maxX = MAX(maxX, brect.x + brect.width);
maxY = MAX(maxY, brect.y + brect.height);
}
}
// Now we know where our characters are located
// extract relevant part of the image adding a margin that enlarges area
const int margin = img.size.width / 50;
Mat roi = inranged(cv::Rect(minX - margin, minY - margin, maxX - minX + 2 * margin, maxY - minY + 2 * margin));
cvtColor(roi, roi, CV_GRAY2BGRA);
img = [UIImage imageWithCVMat:roi];
Tesseract *t = [[Tesseract alloc] initWithLanguage:@"eng"];
[t setVariableValue:@"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" forKey:@"tessedit_char_whitelist"];
[t setImage:img];
[t recognize];
NSString *recognizedText = [[t recognizedText] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
if ([recognizedText isEqualToString:@"1234567890"])
NSLog(@"Yeah!");
else
NSLog(@"Epic fail...");
Notes
- The
UIImage+OpenCV
category can be found here. If you're under ARC check this. - Take a look at this to get you started with OpenCV in Xcode. Note that OpenCV is a C++ framework which can't be imported in plain C (or Objective-C) source files. The easiest workaround is to rename your view controller from .m to .mm (Objective-C++) and reimport it in your project.
Tesseract OCR won't recognize division symbol ÷
Train the or engine wit different fonts.
Here is the tool for training the engine.
Have a look on this also
Or you can use JTessBoxEditor
Related Topics
How to Add Custom Data to Marker (Google Maps API Swift)
Handle Multiple File (Image) Uploads to Aws S3 Swift
How to Set Clear Background in Table View Cell Swipe Action
Adding Uigesturerecognizer to Subview Programmatically in Swift
How Is It I Can Animate the Change in Bar Tint Color of a Uinavigationbar But Not a Uitabbar
It Is Possible to Know If a String Is Encoded in Base64
Memory Usage Keeps Rising on Older Devices Using Metal
Return in Function Without Return Value in Swift
Setcollectionviewlayout' Animation Broken When Also Changing Collection View Frame
Using Completionselector and Completiontarget with Uiimagewritetosavedphotosalbum
Incorrect Position of Cashapelayer
Swiftui - Navigationview Title and Back Button Clipped Under Status Bar After Orientation Change
Swift Check If 3D Touch Is Possible
Detect When Uitableviewcell Did Load