Reading an Excel Named Range into a Pandas Dataframe

Reading an Excel named range into a pandas DataFrame

Maybe someday pandas will support this natively. Until then, I use a helper function:

import pandas as pd
import openpyxl

def data_frame_from_xlsx(xlsx_file, range_name):
""" Get a single rectangular region from the specified file.
range_name can be a standard Excel reference ('Sheet1!A2:B7') or
refer to a named region ('my_cells')."""
wb = openpyxl.load_workbook(xlsx_file, data_only=True, read_only=True)
if '!' in range_name:
# passed a worksheet!cell reference
ws_name, reg = range_name.split('!')
if ws_name.startswith("'") and ws_name.endswith("'"):
# optionally strip single quotes around sheet name
ws_name = ws_name[1:-1]
region = wb[ws_name][reg]
else:
# passed a named range; find the cells in the workbook
full_range = wb.get_named_range(range_name)
if full_range is None:
raise ValueError(
'Range "{}" not found in workbook "{}".'.format(range_name, xlsx_file)
)
# convert to list (openpyxl 2.3 returns a list but 2.4+ returns a generator)
destinations = list(full_range.destinations)
if len(destinations) > 1:
raise ValueError(
'Range "{}" in workbook "{}" contains more than one region.'
.format(range_name, xlsx_file)
)
ws, reg = destinations[0]
# convert to worksheet object (openpyxl 2.3 returns a worksheet object
# but 2.4+ returns the name of a worksheet)
if isinstance(ws, str):
ws = wb[ws]
region = ws[reg]
# an anonymous user suggested this to catch a single-cell range (untested):
# if not isinstance(region, 'tuple'): df = pd.DataFrame(region.value)
df = pd.DataFrame([cell.value for cell in row] for row in region)
return df

Python Pandas dataframe reading exact specified range in an excel sheet

One way to do this is to use the openpyxl module.

Here's an example:

from openpyxl import load_workbook

wb = load_workbook(filename='data.xlsx',
read_only=True)

ws = wb['Sheet2']

# Read the cell values into a list of lists
data_rows = []
for row in ws['A3':'D20']:
data_cols = []
for cell in row:
data_cols.append(cell.value)
data_rows.append(data_cols)

# Transform into dataframe
import pandas as pd
df = pd.DataFrame(data_rows)

Pandas dataframe to Excel with Defined Name range

I fixed this by simply switching from OpenPyXL to XLSXWriter

https://xlsxwriter.readthedocs.io/example_defined_name.html?highlight=names

Reading a named range from excel - Python - xlrd

My method is to find out his column coordinates,

but I still recommend using openpyxl to be more intuitive.

def col2int(s: str):
weight = 1
n = 0
list_s = list(s)
while list_s:
n += (ord(list_s.pop()) - ord('A')+1) * weight
weight *= 26
return n

# ...
# How do I print the contents of the cells knowing the range. ↓
temp, col_start, row_start, col_end, row_end = ref.replace(':', '').split('$')
for row in range(int(row_start)-1, int(row_end)):
for col in range(col2int(col_start)-1, col2int(col_end)):
print(sht.cell(row, col).value)

enter image description here

How can I read a range('A5:B10') and place these values into a dataframe using openpyxl

Using openpyxl

Since you have indicated, that you are looking into a very user friendly way to specify the range (like the excel-syntax) and as Charlie Clark already suggested, you can use openpyxl.

The following utility function takes a workbook and a column/row range and returns a pandas DataFrame:

from openpyxl import load_workbook
from openpyxl.utils import get_column_interval
import re

def load_workbook_range(range_string, ws):
col_start, col_end = re.findall("[A-Z]+", range_string)

data_rows = []
for row in ws[range_string]:
data_rows.append([cell.value for cell in row])

return pd.DataFrame(data_rows, columns=get_column_interval(col_start, col_end))

Usage:

wb = load_workbook(filename='excel-sheet.xlsx', 
read_only=True)
ws = wb.active
load_workbook_range('B1:C2', ws)

Output:

   B  C
0 5 6
1 8 9

Pandas only Solution

Given the following data in an excel sheet:

    A   B   C
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12

You can load it with the following command:
pd.read_excel('excel-sheet.xlsx')

If you were to limit the data being read, the pandas.read_excel method offers a number of options. Use the parse_cols, skiprows and skip_footer to select the specific subset that you want to load:

pd.read_excel(
'excel-sheet.xlsx', # name of excel sheet
names=['B','C'], # new column header
skiprows=range(0,1), # list of rows you want to omit at the beginning
skip_footer=1, # number of rows you want to skip at the end
parse_cols='B:C' # columns to parse (note the excel-like syntax)
)

Output:

   B  C
0 5 6
1 8 9

Some notes:

The API of the read_excel method is not meant to support more complex selections. In case you require a complex filter it is much easier (and cleaner) to load the whole data into a DataFrame and use the excellent slicing and indexing mechanisms provided by pandas.



Related Topics



Leave a reply



Submit