Possible to Loop Through Excel Files With Differently Named Sheets, and Import into a List

Possible to loop through excel files with differently named sheets, and import into a list?

another (fast) R-solution using the readxl-package

l <- lapply( file.list, readxl::read_excel, sheet = 1 )

xlsx R: looping through list of files to check all sheet names; create a blank sheet if it does not exist

I'm old skool. I like a for loop. That's what happens when you come from other languages!

So I'd be skipping all that lapply which doesn't explain what is going on and doing

new_data = "" # a placeholder for the data you will insert in an new empty sheet

for (file %in% all_files) {
# open the file and get it's sheets
sheets = excel_sheets(file)

# check if the file has the sheet you want
if ( "sheet1" %in% sheets) {
# do nothing
} else {
# if not - create a sheet
xlsx::write.xlsx(new_data,
file, # You may need to add the path?
sheetName="sheet1",
append=TRUE)
} #if you are checking for all 3 sheets and adding any 3 missing, add another for loop?


The R purists will say this is slow and inefficient. My question would be how fast does it need to be? How readable does it need to be?

Python Pandas - loop through folder of .xlsx files, only add data from Excel tabs with xx.xx in the name using regex

You're really close indeed, you just have to filter the sheets names with re.match. Loop through each Excel file, and for each file, open it and get the list of tab names (excel_file.sheet_names) use re.match with the expression you already defined to get only those tabs that match the desired pattern. Read the content of these sheets (sheet_name=valid_sheets) adjusting headers and index as needed for you particular case, then, add the extracted content of each excel file to a list. Concatenate the list with pd.concat and generate the new excel file.

import pandas as pd
import os
import re

# filenames
files = os.listdir()
excel_names = list(filter(lambda f: f.endswith('.xlsx'), files))

regex = r'[0-9][0-9]+\.[0-9][0-9]'

frame_list = []
# loop through each Excel file
for name in excel_names:
# open one excel file
excel_file = pd.ExcelFile(name, engine='openpyxl')
# get the list of tabs that have xx.xx in the string
valid_sheets = [tab for tab in excel_file.sheet_names if re.match(regex, tab)]
# read the content from that tab list
d = excel_file.parse(sheet_name=valid_sheets, header=0)
# add the content to the frame list
frame_list += list(d.values())

combined = pd.concat(frame_list)
combined.to_excel("combinedfiles.xlsx", header=False, index=False)

Iterate through excel files' sheets and append if sheet names share common part in Python

Try:

dfs = pd.read_excel('Downloads/WS_1.xlsx', sheet_name=None, index_col=[0])

df_out = pd.concat(dfs.values(), keys=dfs.keys())

for n, g in df_out.groupby(df_out.index.to_series().str[0].str.rsplit('_', n=1).str[-1]):
g.droplevel(level=0).dropna(how='all', axis=1).reset_index(drop=True).to_excel(f'Out_{n}.xlsx')

Update

import os, glob
import pandas as pd

files = glob.glob("Downloads/test_data/*.xlsx")
writer = pd.ExcelWriter('Downloads/test_data/Output_file.xlsx', engine='xlsxwriter')

excel_dict = {}

for each in files:
dfs = pd.read_excel(each, sheet_name=None, index_col=[0])
excel_dict.update(dfs)

df_out = pd.concat(dfs.values(), keys=dfs.keys())
for n, g in df_out.groupby(df_out.index.to_series().str[0].str.rsplit('_', n=1).str[-1]):
g.droplevel(level=0).dropna(how='all', axis=1).reset_index(drop=True).to_excel(writer, index=False, sheet_name=f'{n}')
writer.save()
writer.close()

Loop in order to create several DataFrames for each sheet in an Excel file

You can make use of exec() for this. exec() function is used for the dynamic execution of Python program which can either be a string or object code.

You can use xlrd library to get the sheet names too. You can use pandas libary too for the sheet names(I didn't look around, there definitely might be a way of doing that).

import xlrd

filename='try.xlsx'
xls = xlrd.open_workbook(filename, on_demand=True)
sheet_names=xls.sheet_names()

print(sheet_names)

Output:

['see1', 'see2', 'Sheet3']

Now that you've got sheet names, you can now run loop over them and use exec to create dataframes of same name:

for name in sheet_names:
exec(f"{name}=pd.read_excel('{filename}', sheet_name='{name}')")

This creates dataframes with filenames as the see1, see2 and Sheet3.

print(see1)

Output:

   Col1  COl2
0 1 2
1 2 3
2 3 4
3 4 4

Hope this is what you need.

NOTE: In case your sheet name is just numbers, then it won't be possible to name a variable as just a number, so you might have to assign it a new name.

So just for the OP's case, here's a solution:

for name in sheet_names:
if name.isdigit():
exec(f"Sheet_name{name}=pd.read_excel('{filename}', sheet_name='{name}')")

else:
exec(f"{name}=pd.read_excel('{filename}', sheet_name='{name}')")

So what this code will do is, if you have any sheet name which is just numeric, it will create the variable name as, Sheet_name{the numeric}.

So in my case, I had sheet names as: ['Sheet1', '245', 'Sheet3'] and I finally get the second variable as a dataframe as below:

print(Sheet_name245)

Output:

   Col1  Col2
0 1 4
1 2 5
2 3 6

Hope this helps with your case.

NOTE2: The case where the sheet name has a decimal in it and not just integer as a number, then the above code will stop, since a
decimal can't be used in a variable name either. So here's a
workaround:

for name in sheet_names:
if name.isdigit():
exec(f"Sheet_name{name}=pd.read_excel('{filename}', sheet_name='{name}')")

elif '.' in name:
temp_name=name.replace('.', '_')
exec(f"Sheet_name{temp_name}=pd.read_excel('{filename}', sheet_name='{name}')")

else:
exec(f"{name}=pd.read_excel('{filename}', sheet_name='{name}')")

So now we will get filename for 245.63 as Sheet_name245_63. I hope now your issue is resolved.

Loop through Excel sheets in Python

you can read all sheets by providing sheet_name=None

dict_of_frames = pd.read_excel(f, sheet_name=None)

full example:

all_sheets = []
for f in glob.glob(r'C:\Users\Sarah\Desktop\test\*.xlsx'):
all_sheets.extend(pd.read_excel(f, sheet_name=None).values())
data = pd.concat(all_sheets)
data.to_excel(r'C:\Users\Sarah\Desktop\test\appended.xlsx')


Related Topics



Leave a reply



Submit