Simple Digit Recognition Ocr in Opencv-Python

Simple Digit Recognition OCR in OpenCV-Python

Well, I decided to workout myself on my question to solve the above problem. What I wanted is to implement a simple OCR using KNearest or SVM features in OpenCV. And below is what I did and how. (it is just for learning how to use KNearest for simple OCR purposes).

1) My first question was about letter_recognition.data file that comes with OpenCV samples. I wanted to know what is inside that file.

It contains a letter, along with 16 features of that letter.

And this SOF helped me to find it. These 16 features are explained in the paper Letter Recognition Using Holland-Style Adaptive Classifiers.
(Although I didn't understand some of the features at the end)

2) Since I knew, without understanding all those features, it is difficult to do that method. I tried some other papers, but all were a little difficult for a beginner.

So I just decided to take all the pixel values as my features. (I was not worried about accuracy or performance, I just wanted it to work, at least with the least accuracy)

I took the below image for my training data:

Sample Image

(I know the amount of training data is less. But, since all letters are of the same font and size, I decided to try on this).

To prepare the data for training, I made a small code in OpenCV. It does the following things:

It loads the image.
Selects the digits (obviously by contour finding and applying constraints on area and height of letters to avoid false detections).
Draws the bounding rectangle around one letter and wait for key press manually. This time we press the digit key ourselves corresponding to the letter in the box.
Once the corresponding digit key is pressed, it resizes this box to 10x10 and saves all 100 pixel values in an array (here, samples) and corresponding manually entered digit in another array(here, responses).
Then save both the arrays in separate .txt files.

At the end of the manual classification of digits, all the digits in the training data (train.png) are labeled manually by ourselves, image will look like below:

Sample Image

Below is the code I used for the above purpose (of course, not so clean):

import sys

import numpy as np
import cv2

im = cv2.imread('pitrain.png')
im3 = im.copy()

gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)

#################      Now finding Contours         ###################

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

samples =  np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)
        
        if  h>28:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            cv2.imshow('norm',im)
            key = cv2.waitKey(0)

            if key == 27:  # (escape to quit)
                sys.exit()
            elif key in keys:
                responses.append(int(chr(key)))
                sample = roismall.reshape((1,100))
                samples = np.append(samples,sample,0)

responses = np.array(responses,np.float32)
responses = responses.reshape((responses.size,1))
print "training complete"

np.savetxt('generalsamples.data',samples)
np.savetxt('generalresponses.data',responses)

Now we enter in to training and testing part.

For the testing part, I used the below image, which has the same type of letters I used for the training phase.

Sample Image

For training we do as follows:

Load the .txt files we already saved earlier
create an instance of the classifier we are using (it is KNearest in this case)
Then we use KNearest.train function to train the data

For testing purposes, we do as follows:

We load the image used for testing
process the image as earlier and extract each digit using contour methods
Draw a bounding box for it, then resize it to 10x10, and store its pixel values in an array as done earlier.
Then we use KNearest.find_nearest() function to find the nearest item to the one we gave. ( If lucky, it recognizes the correct digit.)

I included last two steps (training and testing) in single code below:

import cv2
import numpy as np

#######   training part    ############### 
samples = np.loadtxt('generalsamples.data',np.float32)
responses = np.loadtxt('generalresponses.data',np.float32)
responses = responses.reshape((responses.size,1))

model = cv2.KNearest()
model.train(samples,responses)

############################# testing part  #########################

im = cv2.imread('pi.png')
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)
        if  h>28:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            roismall = roismall.reshape((1,100))
            roismall = np.float32(roismall)
            retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
            string = str(int((results[0][0])))
            cv2.putText(out,string,(x,y+h),0,1,(0,255,0))

cv2.imshow('im',im)
cv2.imshow('out',out)
cv2.waitKey(0)

And it worked, below is the result I got:

Sample Image

Here it worked with 100% accuracy. I assume this is because all the digits are of the same kind and the same size.

But anyway, this is a good start to go for beginners (I hope so).

Digit Recognition OCR in OpenCV-Python

I found the right way, it needed just more customised code.

The same process before detecting countours :

gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)

Not

gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)

And

cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)

Not

cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)

I get 99% accuracy, good beggining percentage

Thanks for you anyway

Recognizing digits with OpenCV and Python (Simple digit OCR)

Instead of using Template Matching, a better approach is to use Pytesseract OCR to read the number with image_to_string(). But before performing OCR, you need to preprocess the image. For optimal OCR performance, the preprocessed image should have the desired text/number/characters to OCR in black with the background in white. A simple preprocessing step is to convert the image to grayscale, Otsu's threshold to obtain a binary image, then invert the image. Here's a visualization of the preprocessing step:

Input image -> Grayscale -> Otsu's threshold -> Inverted image ready for OCR

enter image description here

Result from Pytesseract OCR

2

Here's the results with the other images:

enter image description here

2

enter image description here

5

We use the --psm 6 configuration option to assume a single uniform block of text. See here for more configuration options.

Code

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image, grayscale, Otsu's threshold, then invert
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
invert = 255 - thresh

# Perfrom OCR with Pytesseract
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.imshow('invert', invert)
cv2.waitKey()

Note: If you insist on using Template Matching, you need to use scale variant template matching. Take a look at how to isolate everything inside of a contour, scale it, and test the similarity to an image? and Python OpenCV line detection to detect X symbol in image for some examples. If you know for certain that your images are blue, then another approach would be to use color thresholding with cv2.inRange() to obtain a binary mask image then apply OCR on the image.

Most simple approach for digit recognition in Python

I suggest that you should use both opencv and scikitlearn. After you turn your pdf into an image, you can use opencv for image pre-processing (Gaussian Blur, thresholding, Erosion/Dilation Filters), so that the digits will become more easy to extract. Then you can use contour tracing (again opencv) to detect the individual digits. After you have extracted your digits (and given that you have a training set), you can use scikitlearn for the classification.

Digit recognizing, using opencv

You can find many papers and software about OCR, because it is widely used in many applications. I want to present quite simple solution for your problem, using numpy and opencv, that will do the job.

What we will do:

Import numpy and opencv
Load images you have provided
Treshold them
Make function, that will return array of digits in given image
Compare digit from image 1 and image 2
Make our "bank of digits" so we know how number 9 looks like
Compare digits we found in image 3 with our "bank of digits"

Code:

import cv2
import numpy as np

treshold = 70

#Treshold every image, so "0" in image means no digit and "1" is digit 
image1 = (cv2.imread("number_1.png",0) > treshold).astype(np.uint8)
image2 = (cv2.imread("number_2.png",0) > treshold).astype(np.uint8)
image3 = (cv2.imread("number_3.png",0) > treshold).astype(np.uint8)
image4 = (cv2.imread("number_4.png",0) > treshold).astype(np.uint8)

Function, that will return array of digits in given image:

def get_images_of_digits(image):
    components = cv2.connectedComponentsWithStats(image, 8, cv2.CV_16U) #Separate digits
    #Get position of every components
    #For details how this works take a look at 
    #https://stackoverflow.com/questions/35854197/how-to-use-opencvs-connected-components-with-stats-in-python
    position_of_digits = components[2] 
    number_of_digits = len(position_of_digits) - 1 #number of digits found in image
    digits = [] #Array with every digit in image
    for i in range(number_of_digits):    
        w = position_of_digits[i+1,0] #Left corner of digit
        h = position_of_digits[i+1,1] #Top corner of digit
        digit = image[h:h+height_of_digit,w:w+width_of_digit] #Cut this digit out of image

        #Count how many white pixels there are
        px_count = np.count_nonzero(digit)
        #Divide every pixel by square root of count of pixels in digit. 
        #Why? If we make convolution with the same digit it will give us sweet "1", which means these digits are identical
        digit = digit / np.sqrt(px_count)  
    digits.append(digit)

return digits #Return all digits

Get digits

d_1 = get_images_of_digits(image1)[0] #Digit "9" from first image
d_2 = get_images_of_digits(image2)[0] #Digit "9" from second image
d_3 = get_images_of_digits(image4)[0] #Digit "6" from last image

print(cv2.filter2D(d_1,-1,d_2).max()) #Digit "9" on image 1 and 2 match perfectly (result of convolution is 1).
#Filter2D does convolution (correlation to be precise, but they are the same for our purpose)

Put number "9" from first image and number "6" from last image into digit bank. Then go trough every number we find in image 3 and compare it with our digit bank. If score is below 0.9, it is not match.

bank_of_digits = {"9":d_1, "6":d_3} 
for digit in get_images_of_digits(image3):
    #print(digit)
    best_restult = 0.9 #If score is above 0.9, we say it is match
    #Maybe tweak this higher for separating chars "8" and "9" and "0"
    matching_digit = "?" #Default char, when there is no match
    for number in bank_of_digits:
        score = cv2.filter2D(digit,-1,bank_of_digits[number]).max() #Returns 0-1 . 1 Means perfect match       
        print("Score for number " + number +" is: "+ str(np.round(score,2)) )
        if score > best_restult: #If we find better match
            best_restult = score #Set highest score yet
            matching_digit = number #Set best match number
    print("Best match: " + matching_digit)

Final result then will be "?" for first digit in image 3, because there is no number "1" in our bank, and second result will be "6" with score of 0.97.

TLDR: I made algorithm that separates digits from your images, and compares these digits. Best matches are printed.

Having problem with digits recognition in python using opencv, tesseract

A magic happens when adding config='--psm 6'.

According to Tesseract OCR options page:

6 Assume a single uniform block of text.

Code sample:

crop_img = cv2.imread('crop_img.png')
text = pytesseract.image_to_string(crop_img, config='--psm 6')
print(text)

Result:

73.9

Simple Digit Recognition Ocr in Opencv-Python