How to Concatenate Three Excels Files Xlsx Using Python

In python, how to concatenate corresponding sheets in multiple excel files

I would iterate over each file, and then over each worksheet, adding each sheet to a different list based on the sheet name.

Then you'll have a structure like...

{
'sheet1': [df_file1_sheet1, df_file2_sheet1, df_file3_sheet1],
'sheet2': [df_file1_sheet2, df_file2_sheet2, df_file3_sheet2],
'sheet3': [df_file1_sheet3, df_file2_sheet3, df_file3_sheet3],
}

Then concatenate each list in to a single dataframe, them write the three dataframes to an excel file.

# This part is just your own code, I've added it here because you
# couldn't figure out where `excel_files` came from
#################################################################

import os
import pandas as pd

path = os.chdir(r'mypath\\')
files = os.listdir(path)
files
# pull files with `.xlsx` extension
excel_files = [file for file in files if '.xlsx' in file]
excel_files

# This part is my actual answer
###############################

from collections import defaultdict

worksheet_lists = defaultdict(list)
for file_name in excel_files:
workbook = pd.ExcelFile(file_name)
for sheet_name in workbook.sheet_names:
worksheet = workbook.parse(sheet_name)
worksheet['source'] = file_name
worksheet_lists[sheet_name].append(worksheet)

worksheets = {
sheet_name: pd.concat(sheet_list)
for (sheet_name, sheet_list)
in worksheet_lists.items()
}

writer = pd.ExcelWriter('family_reschedule.xlsx')

for sheet_name, df in worksheets.items():
df.to_excel(writer, sheet_name=sheet_name, index=False)

writer.save()

Python: How to Combine (concat) Multiple Excel Files into One File? (Not append)

Try this code ad your loop thru and concat:

# Budget Roll-up
# Used to roll-up individual budgets into one master budget

#import libraries
import pandas as pd
import glob

# import excel files
path = '*.xlsx'
files = glob.glob(path)

# loop thru
combined_files = pd.DataFrame()
for i in files:
df = pd.read_excel(i, index_col=None,
skiprows=11, nrows=147, usecols='D:P')
df.rename(columns={ df.columns[0]: 'test'}, inplace = True)
df.set_index('test', inplace=True)
combined_files = combined_files.add(df, fill_value=0, axis=1)

combined_files.to_excel('output.xlsx', index=False)

How can I concatenate multiple excel sheets in a single file to one single file with an extra column that contains original sheet name?

single_file = pd.read_excel('multiple_sheets.xlsx',sheet_name=None)
single_file = pd.concat([sheet.assign(identifier=i) for i,sheet in single_file.items()])

Here we are using the idea that all sheets can be accessed by iterating (using items()) over the dictionary of dataframes.

How to merge multiple .xls files with hyperlinks in python?

Inspired by @Kunal, I managed to write code that avoids using Pandas libraries. .xls files are read by xlrd, and written to a new excel file by xlwt. Hyperlinks are maintened, and output file was saved as .xlsx format:

import os
import xlwt
from xlrd import open_workbook

# read and combine data
directory = "random_directory"
required_files = os.listdir(directory)

#Define new file and sheet to get files into
new_file = xlwt.Workbook(encoding='utf-8', style_compression = 0)
new_sheet = new_file.add_sheet('Sheet1', cell_overwrite_ok = True)

#Initialize header row, can be done with any file
old_file = open_workbook(directory+"/"+required_files[0], formatting_info=True)
old_sheet = old_file.sheet_by_index(0)
for column in list(range(0, old_sheet.ncols)):
new_sheet.write(0, column, old_sheet.cell(0, column).value) #To create header row

#Add rows from all files present in folder
for file in required_files:
old_file = open_workbook(directory+"/"+file, formatting_info=True)
old_sheet = old_file.sheet_by_index(0) #Define old sheet
hyperlink_map = old_sheet.hyperlink_map #Create map of all hyperlinks
for row in range(1, old_sheet.nrows): #We need all rows except header row
if row-1 < len(hyperlink_map.items()): #Statement to ensure we do not go out of range on the lower side of hyperlink_map.items()
Row_depth=len(new_sheet._Worksheet__rows) #We need row depth to know where to add new row
for col in list(range(old_sheet.ncols)): #For every column we need to add row cell
if col is 1: #We need to make an exception for column 2 being the hyperlinked column
click=list(hyperlink_map.items())[row-1][1].url_or_path #define URL
new_sheet.write(Row_depth, col, xlwt.Formula('HYPERLINK("{}", "{}")'.format(click, old_sheet.cell(row, 1).value)))
else: #If not hyperlinked column
new_sheet.write(Row_depth, col, old_sheet.cell(row, col).value) #Write cell

new_file.save("random_directory/output_file.xlsx")

How can combine or merge all worksheets within an Excel file into one worksheet using python?

If you want to do what is said in the title, you could do this solely with pandas, as pd.read_excel(path_input, sheet_name=None) can read all worksheets of a workbook in one pass:

import pandas as pd

path_input = r"test.xlsx"
path_save = r"finished.xlsx"

df_lst = pd.read_excel(path_input, sheet_name=None).values()
df_lst = [dfx.transpose().reset_index().transpose() for dfx in df_lst]
df_result = pd.concat(df_lst, ignore_index=True)
df_result.to_excel(path_save, index=False, header=False)

It would also be possible to do this with xlwings or openpyxl, but usually pandas is fast.

Example with data
Assume an Excel workbook with three worksheets.

Worksheet1:

a   b   c
foo cor wal
bar gra plu
baz ult xyz
qux ply thu

Worksheet2:

u   v   w   x   y   z
12 92 86 22 80
23 29 74 21
16 10 75 67 61 99

Worksheet3:

I   II  III IV
1 5 9 1
2 6 0 6
3 7 3
4 8 2 0

Final output (after executing this snippet, i.e. after to_excel):

a   b   c
foo cor wal
bar gra plu
baz ult xyz
qux ply thu
u v w x y z
12 92 86 22 80
23 29 74 21
16 10 75 67 61 99
I II III IV
1 5 9 1
2 6 0 6
3 7 3
4 8 2 0


Related Topics



Leave a reply



Submit