Removing Horizontal Lines in Image (Opencv, Python, Matplotlib)

Removing Horizontal Lines in image (OpenCV, Python, Matplotlib)

Obtain binary image. Load the image, convert to grayscale, then Otsu's threshold to obtain a binary black/white image.
Detect and remove horizontal lines. To detect horizontal lines, we create a special horizontal kernel and morph open to detect horizontal contours. From here we find contours on the mask and "fill in"
the detected horizontal contours with white to effectively remove the lines
Repair image. At this point the image may have gaps if the horizontal lines intersected through characters. To repair the text, we create a vertical kernel and morph close to reverse the damage

After converting to grayscale, we Otsu's threshold to obtain a binary image

Sample Image

image = cv2.imread('1.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

Next we create a special horizontal kernel to detect horizontal lines. We draw these lines onto a mask and then find contours on the mask. To remove the lines, we fill in the contours with white

Detected lines

Sample Image

Mask

Sample Image

Filled in contours

Sample Image

# Remove horizontal
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (25,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(image, [c], -1, (255,255,255), 2)

The image currently has gaps. To fix this, we construct a vertical kernel to repair the image

Sample Image

# Repair image
repair_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,6))
result = 255 - cv2.morphologyEx(255 - image, cv2.MORPH_CLOSE, repair_kernel, iterations=1)

Note depending on the image, the size of the kernel will change. You can think of the kernel as (horizontal, vertical). For instance, to detect longer lines, we could use a (50,1) kernel instead. If we wanted thicker lines, we could increase the 2nd parameter to say (50,2).

Here's the results with the other images

Detected lines

Original -> Removed

Detected lines

Original -> Removed

Full code

import cv2

image = cv2.imread('1.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Remove horizontal
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (25,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(image, [c], -1, (255,255,255), 2)

# Repair image
repair_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,6))
result = 255 - cv2.morphologyEx(255 - image, cv2.MORPH_CLOSE, repair_kernel, iterations=1)

cv2.imshow('thresh', thresh)
cv2.imshow('detected_lines', detected_lines)
cv2.imshow('image', image)
cv2.imshow('result', result)
cv2.waitKey()

Remove horizontal lines with Open CV

So, I saw that working on the drawing separated from the paper would lead to a better result. I used MORPH_CLOSE to work on the paper and MORPH_OPEN for the lines in the inner part. I hope your daughter likes it :)

img = cv2.imread(r'E:\Downloads\i0RDA.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Remove horizontal lines
thresh = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY_INV,81,17)
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (25,1))

# Using morph close to get lines outside the drawing
remove_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, horizontal_kernel, iterations=3)
cnts = cv2.findContours(remove_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
mask = np.zeros(gray.shape, np.uint8)
for c in cnts:
    cv2.drawContours(mask, [c], -1, (255,255,255),2)

# First inpaint
img_dst = cv2.inpaint(img, mask, 3, cv2.INPAINT_TELEA)

Sample Image

gray_dst = cv2.cvtColor(img_dst, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray_dst, 50, 150, apertureSize = 3)
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))

# Using morph open to get lines inside the drawing
opening = cv2.morphologyEx(edges, cv2.MORPH_OPEN, horizontal_kernel)
cnts = cv2.findContours(opening, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
mask = np.uint8(img_dst)
mask = np.zeros(gray_dst.shape, np.uint8)
for c in cnts:
    cv2.drawContours(mask, [c], -1, (255,255,255),2)

# Second inpaint
img2_dst = cv2.inpaint(img_dst, mask, 3, cv2.INPAINT_TELEA)

Sample Image

Removing horizontal underlines

All the answers so far seem to be using morphological operations. Here's something a bit different. This should give fairly good results if the lines are horizontal.

For this I use a part of your sample image shown below.

sample

Load the image, convert it to gray scale and invert it.

import cv2
import numpy as np
import matplotlib.pyplot as plt

im = cv2.imread('sample.jpg')
gray = 255 - cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)

Inverted gray-scale image:

inverted-gray

If you scan a row in this inverted image, you'll see that its profile looks different depending on the presence or the absence of a line.

plt.figure(1)
plt.plot(gray[18, :] > 16, 'g-')
plt.axis([0, gray.shape[1], 0, 1.1])
plt.figure(2)
plt.plot(gray[36, :] > 16, 'r-')
plt.axis([0, gray.shape[1], 0, 1.1])

Profile in green is a row where there's no underline, red is for a row with underline. If you take the average of each profile, you'll see that red one has a higher average.

no-line line

So, using this approach you can detect the underlines and remove them.

for row in range(gray.shape[0]):
    avg = np.average(gray[row, :] > 16)
    if avg > 0.9:
        cv2.line(im, (0, row), (gray.shape[1]-1, row), (0, 0, 255))
        cv2.line(gray, (0, row), (gray.shape[1]-1, row), (0, 0, 0), 1)

cv2.imshow("gray", 255 - gray)
cv2.imshow("im", im)

Here are the detected underlines in red, and the cleaned image.

detected cleaned

tesseract output of the cleaned image:

Convthed as th(
shot once in the
she stepped fr<
brother-in-lawii
collect on life in
applied for man
to the scheme i|

Reason for using part of the image should be clear by now. Since personally identifiable information have been removed in the original image, the threshold wouldn't have worked. But this should not be a problem when you apply it for processing. Sometimes you may have to adjust the thresholds (16, 0.9).

The result does not look very good with parts of the letters removed and some of the faint lines still remaining. Will update if I can improve it a bit more.

UPDATE:

Dis some improvements; cleanup and link the missing parts of the letters. I've commented the code, so I believe the process is clear. You can also check the resulting intermediate images to see how it works. Results are a bit better.

1-clean

tesseract output of the cleaned image:

Convicted as th(
shot once in the
she stepped fr<
brother-in-law. ‘
collect on life ix
applied for man
to the scheme i|

2-clean

tesseract output of the cleaned image:

)r-hire of 29-year-old .
revolver in the garage ‘
red that the victim‘s h
{2000 to kill her. mum
250.000. Before the kil
If$| 50.000 each on bin
to police.

python code:

import cv2
import numpy as np
import matplotlib.pyplot as plt

im = cv2.imread('sample2.jpg')
gray = 255 - cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
# prepare a mask using Otsu threshold, then copy from original. this removes some noise
__, bw = cv2.threshold(cv2.dilate(gray, None), 128, 255, cv2.THRESH_BINARY or cv2.THRESH_OTSU)
gray = cv2.bitwise_and(gray, bw)
# make copy of the low-noise underlined image
grayu = gray.copy()
imcpy = im.copy()
# scan each row and remove lines
for row in range(gray.shape[0]):
    avg = np.average(gray[row, :] > 16)
    if avg > 0.9:
        cv2.line(im, (0, row), (gray.shape[1]-1, row), (0, 0, 255))
        cv2.line(gray, (0, row), (gray.shape[1]-1, row), (0, 0, 0), 1)

cont = gray.copy()
graycpy = gray.copy()
# after contour processing, the residual will contain small contours
residual = gray.copy()
# find contours
contours, hierarchy = cv2.findContours(cont, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
for i in range(len(contours)):
    # find the boundingbox of the contour
    x, y, w, h = cv2.boundingRect(contours[i])
    if 10 < h:
        cv2.drawContours(im, contours, i, (0, 255, 0), -1)
        # if boundingbox height is higher than threshold, remove the contour from residual image
        cv2.drawContours(residual, contours, i, (0, 0, 0), -1)
    else:
        cv2.drawContours(im, contours, i, (255, 0, 0), -1)
        # if boundingbox height is less than or equal to threshold, remove the contour gray image
        cv2.drawContours(gray, contours, i, (0, 0, 0), -1)

# now the residual only contains small contours. open it to remove thin lines
st = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3))
residual = cv2.morphologyEx(residual, cv2.MORPH_OPEN, st, iterations=1)
# prepare a mask for residual components
__, residual = cv2.threshold(residual, 0, 255, cv2.THRESH_BINARY)

cv2.imshow("gray", gray)
cv2.imshow("residual", residual)   

# combine the residuals. we still need to link the residuals
combined = cv2.bitwise_or(cv2.bitwise_and(graycpy, residual), gray)
# link the residuals
st = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (1, 7))
linked = cv2.morphologyEx(combined, cv2.MORPH_CLOSE, st, iterations=1)
cv2.imshow("linked", linked)
# prepare a msak from linked image
__, mask = cv2.threshold(linked, 0, 255, cv2.THRESH_BINARY)
# copy region from low-noise underlined image
clean = 255 - cv2.bitwise_and(grayu, mask)
cv2.imshow("clean", clean)
cv2.imshow("im", im)

How to remove horizontal and vertical lines from an image

Your case is less trivial than the one provided in the tutorial that you have based your solution on. With this approach you will not be able to filter the lines in 100%, because of the fact that horizontal parts of the characters will sometimes be treated as lines.

Depends on your expectations (which you haven't really specified) and specifically the accuracy that you expect, you might want to try to find the characters instead of finding the line. That should provide you with more robustness.

Regarding your code, by adding few lines of code right after finding horizontal lines on the image (before verticalsize = rows / 30 line of code), you can get some results. I've worked on a half size image.

Result with horizontalsize = int(cols/30)

Result with horizontalsize = int(cols/15)

Again, I'm stressing that those will never be accurate with that approach in your case. Here's the snippet:

#inverse the image, so that lines are black for masking
horizontal_inv = cv2.bitwise_not(horizontal)
#perform bitwise_and to mask the lines with provided mask
masked_img = cv2.bitwise_and(img, img, mask=horizontal_inv)
#reverse the image back to normal
masked_img_inv = cv2.bitwise_not(masked_img)
cv2.imshow("masked img", masked_img_inv)
cv2.imwrite("result2.jpg", masked_img_inv)
cv2.waitKey(0)
cv2.destroyAllWindows()

Try playing with horizontalsize if the images I provided are somewhat satisfactory. I've also used int conversion, since that's what the getStructuringElement function expects: horizontalsize = int(cols / 30).

You can also try some smoothing and morphology on the result. That should make the characters a little bit more readable.

Removing slant horizontal lines in image

Here's an approach:

Convert image to grayscale
Otsu's threshold to get binary image
Create horizontal kernel and morph open to detect lines
Find contours and draw in detected lines

After converting to grayscale, we Otsu's threshold to obtain a binary image

Sample Image

image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

Now we create a special horizontal kernel to detect horizontal lines then morph open to obtain a mask of the detected lines

Sample Image

horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (45,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)

Here's the detected lines drawn on the original image

Sample Image

From here we find contours on this mask and draw them in to effectively remove the horizontal lines to get our result

Sample Image

cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

for c in cnts:
    cv2.drawContours(image, [c], -1, (255,255,255), 3)

Now that the horizontal lines are removed, to repair the text, you can try cv2.MORPH_CLOSE with a cv2.MORPH_CROSS kernel and experiment with various kernel sizes. There is a tradeoff between dilating too much to close the holes as the detail in the text will be lost. Another approach is to use image inpainting to fill in the holes. I'll leave this step to you

Full code

import cv2

image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (45,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)

cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

for c in cnts:
    cv2.drawContours(image, [c], -1, (255,255,255), 3)

cv2.imshow('thresh', thresh)
cv2.imshow('detected_lines', detected_lines)
cv2.imshow('image', image)
cv2.waitKey()

Removing Horizontal Lines in Image (Opencv, Python, Matplotlib)