Using Pandas to Pd.Read_Excel() for Multiple Worksheets of the Same Workbook

Using Pandas to pd.read_excel() for multiple worksheets of the same workbook

Try pd.ExcelFile:

xls = pd.ExcelFile('path_to_file.xls')
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')

As noted by @HaPsantran, the entire Excel file is read in during the ExcelFile() call (there doesn't appear to be a way around this). This merely saves you from having to read the same file in each time you want to access a new sheet.

Note that the sheet_name argument to pd.read_excel() can be the name of the sheet (as above), an integer specifying the sheet number (eg 0, 1, etc), a list of sheet names or indices, or None. If a list is provided, it returns a dictionary where the keys are the sheet names/indices and the values are the data frames. The default is to simply return the first sheet (ie, sheet_name=0).

If None is specified, all sheets are returned, as a {sheet_name:dataframe} dictionary.

I want to read Multiple sheets in excel into multiple dataframes in python

xls = pd.ExcelFile('path_to_file.xls') 
sheets_names = ['Sheet1', 'Sheet2']
dfs = []
for sheet_name in sheets_names:
df[i] = pd.read_excel(xls, sheet_name)

How to open an excel file with multiple sheets in pandas?

Use pandas read_excel() method that accepts a sheet_name parameter:

import pandas as pd

df = pd.read_excel(excel_file_path, sheet_name="sheet_name")

Multiple data frames can be loaded by passing in a list. For a more in-depth explanation of how read_excel() works see: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

pd.read_excel - import multiple sheets with different settings

When you load multiple sheets using pandas.read_excel() as you have done here, the sheets will be stored in a dictionary with the key being the respective sheet names. When the skiprows argument is passed in loading multiple sheets this way, the specified number of top rows or the list of rows given will be skipped from all of the sheets.

For example, if your code is modified as follows,

TuFile=pd.read_excel('TUp.xlsx', sheet_name=['T_up','Raw_Data','Base','Summary'], skiprows=[8])

This will skip the top 8 rows in loading your data for all of the sheets.

When a list of rows is specified,

TuFile=pd.read_excel('TUp.xlsx', sheet_name=['T_up','Raw_Data','Base','Summary'], skiprows=[1, 8])

This will skip the first and eighth rows in all of the sheets when loading the data.

Therefore, if you want the rows in just one of these sheets to be skipped, the best option would be to load that in separately by defining the sheet name and then load in the rest. Assuming that you want to skip rows only in the 'T_up' sheet and leave the rest intact, you could do something like this,

TuFile=pd.read_excel('TUp.xlsx', sheet_name='T_up', skiprows=[8])

TuFile=pd.read_excel('TUp.xlsx', sheet_name=['Raw_Data','Base','Summary'])

split a workbook into different workbooks with worksheets using python pandas

I know this is a bit late, but perhaps better late than never...

I'm not sure what issue you ran into b/c it doesn't really say, but I suspect your issue was b/c you created a new writer for each sheet instead of each workbook. You also tried to write all months for all years and didn't create a new DF for each each year.

Without testing, I can't say this is 100% working code, but I'd rearrange what you have to something like below. This should get you close.

for value in each_year:
dfyear = df1[df1['year'] == value]
output_file_name = str(value)+'money.xlsx'
writer = pd.ExcelWriter(output_file_name, engine='xlsxwriter')
each_month = dfyear['month'].unique()
for month in each_month:
dfyear[month].to_excel(writer, sheet_name=str(month), index=False)
writer.save()

print('DataFrame written to Excel File successfully.')

How to concat excels with multiple sheets into one excel?

I had to do something similair a while back:

This code should do the trick for you:

import pandas as pd
import os

collection = {}
for file in os.listdir():
if file.endswith(".xlsx"):
mysheets = pd.ExcelFile(file)
mysheetnames = mysheets.sheet_names
for i in mysheetnames[2:]: #change the 2 in [2:] to change how many sheets you delete
mydata = pd.read_excel(file, i)
combi = collection.get(i, [])
collection[i] = combi + [mydata]

writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')

for key in collection:
myresult = pd.concat(collection.get(key), sort=False)
myresult.to_excel(writer, sheet_name=key)

writer.save()

Extract Partial Data from multiple excel sheets in the same workbook using pandas

Not sure exactly what you are trying to do, but an easier way to traverse through the sheet names would be with a for-each loop:

 for sheet in input.sheet_names:

Now you can do something for all the sheets no matter their name.

Regarding " would like to assign each sheet to an individual variable" you could use a dictionary:

sheets = {}
for sheet in input.sheet_names:
sheets[sheet] = pd.read_excel(xlsx, sheet)

Now to get a sheet from the dictionary sheets:

sheets.get("15")

Or to traverse all the sheets:

for sheet in sheets:
%do_something eg.%
print(sheet)

This will print the data for each sheet in sheets.

Hope this helps / brings you further

reading multiple tabs from excel in different dataframes

Demo:

file name

In [94]: fn = r'D:\temp\.data\test.xlsx'

creating pandas.io.excel.ExcelFile object

In [95]: xl = pd.ExcelFile(fn)

it has sheet_names attribute

In [96]: xl.sheet_names
Out[96]: ['Sheet1', 'aaa']

we can use it for looping through sheets

In [98]: for sh in xl.sheet_names:
...: df = xl.parse(sh)
...: print('Processing: [{}] ...'.format(sh))
...: print(df.head())
...:
Processing: [Sheet1] ...
col1 col2 col3
0 11 12 13
1 21 22 23
2 31 32 33
Processing: [aaa] ...
a b c
0 1 2 3
1 4 5 6
2 7 8 9

a bit more elegant way is to generate a dictionary of DataFrames:

In [100]: dfs = {sh:xl.parse(sh) for sh in xl.sheet_names}

In [101]: dfs.keys()
Out[101]: dict_keys(['Sheet1', 'aaa'])

In [102]: dfs['Sheet1']
Out[102]:
col1 col2 col3
0 11 12 13
1 21 22 23
2 31 32 33

In [103]: dfs['aaa']
Out[103]:
a b c
0 1 2 3
1 4 5 6
2 7 8 9


Related Topics



Leave a reply



Submit