Use Python Extract Images from Excel Sheets

Use python extract images from Excel sheets

You can grab images from existing Excel file like this:

from PIL import ImageGrab
import win32com.client as win32

excel = win32.gencache.EnsureDispatch('Excel.Application')
workbook = excel.Workbooks.Open(r'C:\Users\file.xlsx')

for sheet in workbook.Worksheets:
for i, shape in enumerate(sheet.Shapes):
if shape.Name.startswith('Picture'): # or try 'Image'
shape.Copy()
image = ImageGrab.grabclipboard()
image.save('{}.jpg'.format(i+1), 'jpeg')

Extract images from Excel file with python

I found a solution using openpyxl and openpyxl-image-loader modules

# installing the modules
pip3 install openpyxl
pip3 install openpyxl-image-loader

Then, in the script :

#Importing the modules
import openpyxl
from openpyxl_image_loader import SheetImageLoader

#loading the Excel File and the sheet
pxl_doc = openpyxl.load_workbook('myfile.xlsx')
sheet = pxl_doc['Sheet_name']

#calling the image_loader
image_loader = SheetImageLoader(sheet)

#get the image (put the cell you need instead of 'A1')
image = image_loader.get('A1')

#showing the image
image.show()

#saving the image
image.save('my_path/image_name.jpg')

In the end, I can store the path and the image name in my dictionaries in a loop for each row

Export Images From Excel using Python with specific name

Following script extracts all images from Excel file and name them with "Channel name" value:

import re
from PIL import ImageGrab
import win32com.client as win32

FILE = r'C:\Users\user\Desktop\so\53994108\logo.xlsx'
CELLS = [(4, 5, 'F'), (3, 3, 'D')]

excel = win32.gencache.EnsureDispatch('Excel.Application')
workbook = excel.Workbooks.Open(FILE)
for i, worksheet in enumerate(workbook.Sheets):
row = CELLS[i][0]
while True:
name = worksheet.Cells(row, CELLS[i][1]).Value
if not name:
break
name = re.sub(r'\W+ *', ' ', name)
rng = worksheet.Range('{}{}'.format(CELLS[i][2], row))
rng.CopyPicture(1, 2)
im = ImageGrab.grabclipboard()
im.save('{}.jpg'.format(name))
row += 1

So I've got following images on the end:

enter image description here

Is there a way to extract a picture from an excel file using R? It could then be placed into the tesseract ocr

You can access the image path with the @media slot of your workbook object.

Here's a reprex of plotting a PNG stored within an xlsx file:

require(png)
require(openxlsx)
require(grid)

wb <- openxlsx::loadWorkbook("~/img.xlsx")
img <- png::readPNG(wb@.xData$media[1])
grid::grid.newpage()
grid::grid.raster(img)

Created on 2020-03-04 by the reprex package (v0.3.0)



Related Topics



Leave a reply



Submit