Removing Horizontal Lines in image (OpenCV, Python, Matplotlib)
Obtain binary image. Load the image, convert to grayscale, then Otsu's threshold to obtain a binary black/white image.
Detect and remove horizontal lines. To detect horizontal lines, we create a special horizontal kernel and morph open to detect horizontal contours. From here we find contours on the mask and "fill in"
the detected horizontal contours with white to effectively remove the linesRepair image. At this point the image may have gaps if the horizontal lines intersected through characters. To repair the text, we create a vertical kernel and morph close to reverse the damage
After converting to grayscale, we Otsu's threshold to obtain a binary image
image = cv2.imread('1.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
Next we create a special horizontal kernel to detect horizontal lines. We draw these lines onto a mask and then find contours on the mask. To remove the lines, we fill in the contours with white
Detected lines
Mask
Filled in contours
# Remove horizontal
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (25,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(image, [c], -1, (255,255,255), 2)
The image currently has gaps. To fix this, we construct a vertical kernel to repair the image
# Repair image
repair_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,6))
result = 255 - cv2.morphologyEx(255 - image, cv2.MORPH_CLOSE, repair_kernel, iterations=1)
Note depending on the image, the size of the kernel will change. You can think of the kernel as
(horizontal, vertical)
. For instance, to detect longer lines, we could use a(50,1)
kernel instead. If we wanted thicker lines, we could increase the 2nd parameter to say(50,2)
.
Here's the results with the other images
Detected lines
Original ->
Removed
Detected lines
Original ->
Removed
Full code
import cv2
image = cv2.imread('1.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Remove horizontal
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (25,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(image, [c], -1, (255,255,255), 2)
# Repair image
repair_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,6))
result = 255 - cv2.morphologyEx(255 - image, cv2.MORPH_CLOSE, repair_kernel, iterations=1)
cv2.imshow('thresh', thresh)
cv2.imshow('detected_lines', detected_lines)
cv2.imshow('image', image)
cv2.imshow('result', result)
cv2.waitKey()
Remove horizontal lines with Open CV
So, I saw that working on the drawing separated from the paper would lead to a better result. I used MORPH_CLOSE to work on the paper and MORPH_OPEN for the lines in the inner part. I hope your daughter likes it :)
img = cv2.imread(r'E:\Downloads\i0RDA.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Remove horizontal lines
thresh = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY_INV,81,17)
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (25,1))
# Using morph close to get lines outside the drawing
remove_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, horizontal_kernel, iterations=3)
cnts = cv2.findContours(remove_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
mask = np.zeros(gray.shape, np.uint8)
for c in cnts:
cv2.drawContours(mask, [c], -1, (255,255,255),2)
# First inpaint
img_dst = cv2.inpaint(img, mask, 3, cv2.INPAINT_TELEA)
gray_dst = cv2.cvtColor(img_dst, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray_dst, 50, 150, apertureSize = 3)
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
# Using morph open to get lines inside the drawing
opening = cv2.morphologyEx(edges, cv2.MORPH_OPEN, horizontal_kernel)
cnts = cv2.findContours(opening, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
mask = np.uint8(img_dst)
mask = np.zeros(gray_dst.shape, np.uint8)
for c in cnts:
cv2.drawContours(mask, [c], -1, (255,255,255),2)
# Second inpaint
img2_dst = cv2.inpaint(img_dst, mask, 3, cv2.INPAINT_TELEA)
Removing horizontal underlines
All the answers so far seem to be using morphological operations. Here's something a bit different. This should give fairly good results if the lines are horizontal.
For this I use a part of your sample image shown below.
Load the image, convert it to gray scale and invert it.
import cv2
import numpy as np
import matplotlib.pyplot as plt
im = cv2.imread('sample.jpg')
gray = 255 - cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
Inverted gray-scale image:
If you scan a row in this inverted image, you'll see that its profile looks different depending on the presence or the absence of a line.
plt.figure(1)
plt.plot(gray[18, :] > 16, 'g-')
plt.axis([0, gray.shape[1], 0, 1.1])
plt.figure(2)
plt.plot(gray[36, :] > 16, 'r-')
plt.axis([0, gray.shape[1], 0, 1.1])
Profile in green is a row where there's no underline, red is for a row with underline. If you take the average of each profile, you'll see that red one has a higher average.
So, using this approach you can detect the underlines and remove them.
for row in range(gray.shape[0]):
avg = np.average(gray[row, :] > 16)
if avg > 0.9:
cv2.line(im, (0, row), (gray.shape[1]-1, row), (0, 0, 255))
cv2.line(gray, (0, row), (gray.shape[1]-1, row), (0, 0, 0), 1)
cv2.imshow("gray", 255 - gray)
cv2.imshow("im", im)
Here are the detected underlines in red, and the cleaned image.
tesseract output of the cleaned image:
Convthed as th(
shot once in the
she stepped fr<
brother-in-lawii
collect on life in
applied for man
to the scheme i|
Reason for using part of the image should be clear by now. Since personally identifiable information have been removed in the original image, the threshold wouldn't have worked. But this should not be a problem when you apply it for processing. Sometimes you may have to adjust the thresholds (16, 0.9).
The result does not look very good with parts of the letters removed and some of the faint lines still remaining. Will update if I can improve it a bit more.
UPDATE:
Dis some improvements; cleanup and link the missing parts of the letters. I've commented the code, so I believe the process is clear. You can also check the resulting intermediate images to see how it works. Results are a bit better.
tesseract output of the cleaned image:
Convicted as th(
shot once in the
she stepped fr<
brother-in-law. ‘
collect on life ix
applied for man
to the scheme i|
tesseract output of the cleaned image:
)r-hire of 29-year-old .
revolver in the garage ‘
red that the victim‘s h
{2000 to kill her. mum
250.000. Before the kil
If$| 50.000 each on bin
to police.
python code:
import cv2
import numpy as np
import matplotlib.pyplot as plt
im = cv2.imread('sample2.jpg')
gray = 255 - cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
# prepare a mask using Otsu threshold, then copy from original. this removes some noise
__, bw = cv2.threshold(cv2.dilate(gray, None), 128, 255, cv2.THRESH_BINARY or cv2.THRESH_OTSU)
gray = cv2.bitwise_and(gray, bw)
# make copy of the low-noise underlined image
grayu = gray.copy()
imcpy = im.copy()
# scan each row and remove lines
for row in range(gray.shape[0]):
avg = np.average(gray[row, :] > 16)
if avg > 0.9:
cv2.line(im, (0, row), (gray.shape[1]-1, row), (0, 0, 255))
cv2.line(gray, (0, row), (gray.shape[1]-1, row), (0, 0, 0), 1)
cont = gray.copy()
graycpy = gray.copy()
# after contour processing, the residual will contain small contours
residual = gray.copy()
# find contours
contours, hierarchy = cv2.findContours(cont, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
for i in range(len(contours)):
# find the boundingbox of the contour
x, y, w, h = cv2.boundingRect(contours[i])
if 10 < h:
cv2.drawContours(im, contours, i, (0, 255, 0), -1)
# if boundingbox height is higher than threshold, remove the contour from residual image
cv2.drawContours(residual, contours, i, (0, 0, 0), -1)
else:
cv2.drawContours(im, contours, i, (255, 0, 0), -1)
# if boundingbox height is less than or equal to threshold, remove the contour gray image
cv2.drawContours(gray, contours, i, (0, 0, 0), -1)
# now the residual only contains small contours. open it to remove thin lines
st = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3))
residual = cv2.morphologyEx(residual, cv2.MORPH_OPEN, st, iterations=1)
# prepare a mask for residual components
__, residual = cv2.threshold(residual, 0, 255, cv2.THRESH_BINARY)
cv2.imshow("gray", gray)
cv2.imshow("residual", residual)
# combine the residuals. we still need to link the residuals
combined = cv2.bitwise_or(cv2.bitwise_and(graycpy, residual), gray)
# link the residuals
st = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (1, 7))
linked = cv2.morphologyEx(combined, cv2.MORPH_CLOSE, st, iterations=1)
cv2.imshow("linked", linked)
# prepare a msak from linked image
__, mask = cv2.threshold(linked, 0, 255, cv2.THRESH_BINARY)
# copy region from low-noise underlined image
clean = 255 - cv2.bitwise_and(grayu, mask)
cv2.imshow("clean", clean)
cv2.imshow("im", im)
How to remove horizontal and vertical lines from an image
Your case is less trivial than the one provided in the tutorial that you have based your solution on. With this approach you will not be able to filter the lines in 100%, because of the fact that horizontal parts of the characters will sometimes be treated as lines.
Depends on your expectations (which you haven't really specified) and specifically the accuracy that you expect, you might want to try to find the characters instead of finding the line. That should provide you with more robustness.
Regarding your code, by adding few lines of code right after finding horizontal lines on the image (before verticalsize = rows / 30
line of code), you can get some results. I've worked on a half size image.
Result with horizontalsize = int(cols/30)
Result with horizontalsize = int(cols/15)
Again, I'm stressing that those will never be accurate with that approach in your case. Here's the snippet:
#inverse the image, so that lines are black for masking
horizontal_inv = cv2.bitwise_not(horizontal)
#perform bitwise_and to mask the lines with provided mask
masked_img = cv2.bitwise_and(img, img, mask=horizontal_inv)
#reverse the image back to normal
masked_img_inv = cv2.bitwise_not(masked_img)
cv2.imshow("masked img", masked_img_inv)
cv2.imwrite("result2.jpg", masked_img_inv)
cv2.waitKey(0)
cv2.destroyAllWindows()
Try playing with horizontalsize
if the images I provided are somewhat satisfactory. I've also used int conversion, since that's what the getStructuringElement
function expects: horizontalsize = int(cols / 30)
.
You can also try some smoothing and morphology on the result. That should make the characters a little bit more readable.
Removing slant horizontal lines in image
Here's an approach:
- Convert image to grayscale
- Otsu's threshold to get binary image
- Create horizontal kernel and morph open to detect lines
- Find contours and draw in detected lines
After converting to grayscale, we Otsu's threshold to obtain a binary image
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
Now we create a special horizontal kernel to detect horizontal lines then morph open to obtain a mask of the detected lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (45,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
Here's the detected lines drawn on the original image
From here we find contours on this mask and draw them in to effectively remove the horizontal lines to get our result
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(image, [c], -1, (255,255,255), 3)
Now that the horizontal lines are removed, to repair the text, you can try cv2.MORPH_CLOSE
with a cv2.MORPH_CROSS
kernel and experiment with various kernel sizes. There is a tradeoff between dilating too much to close the holes as the detail in the text will be lost. Another approach is to use image inpainting to fill in the holes. I'll leave this step to you
Full code
import cv2
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (45,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(image, [c], -1, (255,255,255), 3)
cv2.imshow('thresh', thresh)
cv2.imshow('detected_lines', detected_lines)
cv2.imshow('image', image)
cv2.waitKey()
Related Topics
Python: Sorting Items from Top Left to Bottom Right with Opencv
Typeerror: 'Range' Object Does Not Support Item Assignment
Combining Two Series into a Dataframe in Pandas
Is Generator.Next() Visible in Python 3
Best Way to Determine If a Sequence Is in Another Sequence
Convert Spreadsheet Number to Column Letter
Python Django Global Variables
Return Result from Python to Vba
Difference Between Pygame.Display.Update and Pygame.Display.Flip
How to Use Append with Pickle in Python
Factorize a Column of Strings in Pandas
Installing Module from Github Through Jupyter Notebook
How to Trace the Path in a Breadth-First Search
How to Set Ticks on Fixed Position , Matplotlib
Any Reason Not to Use '+' to Concatenate Two Strings
How to Convert Defaultdict to Dict
Python Pip Install Throws Typeerror: Unsupported Operand Type(S) for -=: 'Retry' and 'Int'