Simple Digit Recognition OCR in OpenCV-Python
Well, I decided to workout myself on my question to solve the above problem. What I wanted is to implement a simple OCR using KNearest or SVM features in OpenCV. And below is what I did and how. (it is just for learning how to use KNearest for simple OCR purposes).
1) My first question was about letter_recognition.data
file that comes with OpenCV samples. I wanted to know what is inside that file.
It contains a letter, along with 16 features of that letter.
And this SOF
helped me to find it. These 16 features are explained in the paper Letter Recognition Using Holland-Style Adaptive Classifiers
.
(Although I didn't understand some of the features at the end)
2) Since I knew, without understanding all those features, it is difficult to do that method. I tried some other papers, but all were a little difficult for a beginner.
So I just decided to take all the pixel values as my features. (I was not worried about accuracy or performance, I just wanted it to work, at least with the least accuracy)
I took the below image for my training data:
(I know the amount of training data is less. But, since all letters are of the same font and size, I decided to try on this).
To prepare the data for training, I made a small code in OpenCV. It does the following things:
- It loads the image.
- Selects the digits (obviously by contour finding and applying constraints on area and height of letters to avoid false detections).
- Draws the bounding rectangle around one letter and wait for
key press manually
. This time we press the digit key ourselves corresponding to the letter in the box. - Once the corresponding digit key is pressed, it resizes this box to 10x10 and saves all 100 pixel values in an array (here, samples) and corresponding manually entered digit in another array(here, responses).
- Then save both the arrays in separate
.txt
files.
At the end of the manual classification of digits, all the digits in the training data (train.png
) are labeled manually by ourselves, image will look like below:
Below is the code I used for the above purpose (of course, not so clean):
import sys
import numpy as np
import cv2
im = cv2.imread('pitrain.png')
im3 = im.copy()
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)
################# Now finding Contours ###################
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
samples = np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]
for cnt in contours:
if cv2.contourArea(cnt)>50:
[x,y,w,h] = cv2.boundingRect(cnt)
if h>28:
cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
roi = thresh[y:y+h,x:x+w]
roismall = cv2.resize(roi,(10,10))
cv2.imshow('norm',im)
key = cv2.waitKey(0)
if key == 27: # (escape to quit)
sys.exit()
elif key in keys:
responses.append(int(chr(key)))
sample = roismall.reshape((1,100))
samples = np.append(samples,sample,0)
responses = np.array(responses,np.float32)
responses = responses.reshape((responses.size,1))
print "training complete"
np.savetxt('generalsamples.data',samples)
np.savetxt('generalresponses.data',responses)
Now we enter in to training and testing part.
For the testing part, I used the below image, which has the same type of letters I used for the training phase.
For training we do as follows:
- Load the
.txt
files we already saved earlier - create an instance of the classifier we are using (it is KNearest in this case)
- Then we use KNearest.train function to train the data
For testing purposes, we do as follows:
- We load the image used for testing
- process the image as earlier and extract each digit using contour methods
- Draw a bounding box for it, then resize it to 10x10, and store its pixel values in an array as done earlier.
- Then we use KNearest.find_nearest() function to find the nearest item to the one we gave. ( If lucky, it recognizes the correct digit.)
I included last two steps (training and testing) in single code below:
import cv2
import numpy as np
####### training part ###############
samples = np.loadtxt('generalsamples.data',np.float32)
responses = np.loadtxt('generalresponses.data',np.float32)
responses = responses.reshape((responses.size,1))
model = cv2.KNearest()
model.train(samples,responses)
############################# testing part #########################
im = cv2.imread('pi.png')
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
for cnt in contours:
if cv2.contourArea(cnt)>50:
[x,y,w,h] = cv2.boundingRect(cnt)
if h>28:
cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
roi = thresh[y:y+h,x:x+w]
roismall = cv2.resize(roi,(10,10))
roismall = roismall.reshape((1,100))
roismall = np.float32(roismall)
retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
string = str(int((results[0][0])))
cv2.putText(out,string,(x,y+h),0,1,(0,255,0))
cv2.imshow('im',im)
cv2.imshow('out',out)
cv2.waitKey(0)
And it worked, below is the result I got:
Here it worked with 100% accuracy. I assume this is because all the digits are of the same kind and the same size.
But anyway, this is a good start to go for beginners (I hope so).
Digit Recognition OCR in OpenCV-Python
I found the right way, it needed just more customised code.
The same process before detecting countours :
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)
Not
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)
And
cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
Not
cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
I get 99% accuracy, good beggining percentage
Thanks for you anyway
Recognizing digits with OpenCV and Python (Simple digit OCR)
Instead of using Template Matching, a better approach is to use Pytesseract OCR to read the number with image_to_string()
. But before performing OCR, you need to preprocess the image. For optimal OCR performance, the preprocessed image should have the desired text/number/characters to OCR in black with the background in white. A simple preprocessing step is to convert the image to grayscale, Otsu's threshold to obtain a binary image, then invert the image. Here's a visualization of the preprocessing step:
Input image ->
Grayscale ->
Otsu's threshold ->
Inverted image ready for OCR
Result from Pytesseract OCR
2
Here's the results with the other images:
2
5
We use the --psm 6
configuration option to assume a single uniform block of text. See here for more configuration options.
Code
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, grayscale, Otsu's threshold, then invert
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
invert = 255 - thresh
# Perfrom OCR with Pytesseract
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('invert', invert)
cv2.waitKey()
Note: If you insist on using Template Matching, you need to use scale variant template matching. Take a look at how to isolate everything inside of a contour, scale it, and test the similarity to an image? and Python OpenCV line detection to detect X symbol in image for some examples. If you know for certain that your images are blue, then another approach would be to use color thresholding with cv2.inRange()
to obtain a binary mask image then apply OCR on the image.
Most simple approach for digit recognition in Python
I suggest that you should use both opencv
and scikitlearn
. After you turn your pdf into an image, you can use opencv
for image pre-processing (Gaussian Blur, thresholding, Erosion/Dilation Filters), so that the digits will become more easy to extract. Then you can use contour tracing (again opencv
) to detect the individual digits. After you have extracted your digits (and given that you have a training set), you can use scikitlearn
for the classification.
Digit recognizing, using opencv
You can find many papers and software about OCR, because it is widely used in many applications. I want to present quite simple solution for your problem, using numpy and opencv, that will do the job.
What we will do:
- Import numpy and opencv
- Load images you have provided
- Treshold them
- Make function, that will return array of digits in given image
- Compare digit from image 1 and image 2
- Make our "bank of digits" so we know how number 9 looks like
- Compare digits we found in image 3 with our "bank of digits"
Code:
import cv2
import numpy as np
treshold = 70
#Treshold every image, so "0" in image means no digit and "1" is digit
image1 = (cv2.imread("number_1.png",0) > treshold).astype(np.uint8)
image2 = (cv2.imread("number_2.png",0) > treshold).astype(np.uint8)
image3 = (cv2.imread("number_3.png",0) > treshold).astype(np.uint8)
image4 = (cv2.imread("number_4.png",0) > treshold).astype(np.uint8)
Function, that will return array of digits in given image:
def get_images_of_digits(image):
components = cv2.connectedComponentsWithStats(image, 8, cv2.CV_16U) #Separate digits
#Get position of every components
#For details how this works take a look at
#https://stackoverflow.com/questions/35854197/how-to-use-opencvs-connected-components-with-stats-in-python
position_of_digits = components[2]
number_of_digits = len(position_of_digits) - 1 #number of digits found in image
digits = [] #Array with every digit in image
for i in range(number_of_digits):
w = position_of_digits[i+1,0] #Left corner of digit
h = position_of_digits[i+1,1] #Top corner of digit
digit = image[h:h+height_of_digit,w:w+width_of_digit] #Cut this digit out of image
#Count how many white pixels there are
px_count = np.count_nonzero(digit)
#Divide every pixel by square root of count of pixels in digit.
#Why? If we make convolution with the same digit it will give us sweet "1", which means these digits are identical
digit = digit / np.sqrt(px_count)
digits.append(digit)
return digits #Return all digits
Get digits
d_1 = get_images_of_digits(image1)[0] #Digit "9" from first image
d_2 = get_images_of_digits(image2)[0] #Digit "9" from second image
d_3 = get_images_of_digits(image4)[0] #Digit "6" from last image
print(cv2.filter2D(d_1,-1,d_2).max()) #Digit "9" on image 1 and 2 match perfectly (result of convolution is 1).
#Filter2D does convolution (correlation to be precise, but they are the same for our purpose)
Put number "9" from first image and number "6" from last image into digit bank. Then go trough every number we find in image 3 and compare it with our digit bank. If score is below 0.9, it is not match.
bank_of_digits = {"9":d_1, "6":d_3}
for digit in get_images_of_digits(image3):
#print(digit)
best_restult = 0.9 #If score is above 0.9, we say it is match
#Maybe tweak this higher for separating chars "8" and "9" and "0"
matching_digit = "?" #Default char, when there is no match
for number in bank_of_digits:
score = cv2.filter2D(digit,-1,bank_of_digits[number]).max() #Returns 0-1 . 1 Means perfect match
print("Score for number " + number +" is: "+ str(np.round(score,2)) )
if score > best_restult: #If we find better match
best_restult = score #Set highest score yet
matching_digit = number #Set best match number
print("Best match: " + matching_digit)
Final result then will be "?" for first digit in image 3, because there is no number "1" in our bank, and second result will be "6" with score of 0.97.
TLDR: I made algorithm that separates digits from your images, and compares these digits. Best matches are printed.
Having problem with digits recognition in python using opencv, tesseract
A magic happens when adding config='--psm 6'
.
According to Tesseract OCR options page:
6 Assume a single uniform block of text.
Code sample:
crop_img = cv2.imread('crop_img.png')
text = pytesseract.image_to_string(crop_img, config='--psm 6')
print(text)
Result:73.9
Related Topics
How to Find and Replace a Part of a Value in Json File
Splitting a Phone Number into a List of Digits: Python
Find the Index of the First Digit in a String
Pandas Convert from Datetime to Integer Timestamp
How to Get One Key and Value from a Json in Python
How to Find a Minimum Value in a 2D Array Without Using Numpy or Flattened in Python
How to Convert Data from Txt Files to Excel Files Using Python
How to Select the Last Column of Dataframe
Get Only Unique Words from a Sentence in Python
How to Read Numbers from File in Python
Json.Decoder.Jsondecodeerror: Expecting Value: Line 1 Column 1 (Char 0) Python
How to Make a Roll the Dice Command With My Discord Bot
Converting Pandas Column of Comma-Separated Strings into Integers
Key Error When Selecting Columns in Pandas Dataframe After Read_Csv