Pandas: Looking Up the List of Sheets in an Excel File

Pandas: Looking up the list of sheets in an excel file

You can still use the ExcelFile class (and the sheet_names attribute):

xl = pd.ExcelFile('foo.xls')

xl.sheet_names # see all sheet names

xl.parse(sheet_name) # read a specific sheet to DataFrame

see docs for parse for more options...

Choose A Specific Sheet In Excel Containing a String Pandas

I think the answer here would probably give you what you need.

Bring in the file as an Excelfile before reading it as a dataframe. Get the Sheet_names, and then extract the sheet_name that has 'ITD_'.

excel = pd.ExcelFile("your_excel.xlsx")
excel.sheet_names
# ["Sheet1", "Sheet2"]
for n in excel.sheet_names:
if n.startswith('ITD_'):
sheetname = n
break
df = excel.parse(sheetname)

Iteratively read excel sheet names, split and save them as new columns for each sheet in Python

You can use use f = pd.ExcelFile('data1.xlsx') to read the excel file in as an object, then loop through the list of sheet names by iterating through f.sheet_names, splitting each sheet name such as the "2019_q1_sh" string into the appropriate year, quarter, city and setting these as values of new columns in the DataFrame you are reading in from each sheet.

Then create a dictionary with sheet names as keys, and the corresponding modified DataFrame as the values. You can create a custom save_xls function that takes in such a dictionary and saves it, as described in this helpful answer.

Update: since you want to loop through all excel files in your current directory, you can use the glob library to get all of the files with extension .xlsx and loop through each of these files, read them in, and save a new file with the string new_ in front of the file name

import pandas as pd
from pandas import ExcelWriter
import glob

"""
Save a dictionary of dataframes to an excel file, with each dataframe as a separate page

Reference: https://stackoverflow.com/questions/14225676/save-list-of-dataframes-to-multisheet-excel-spreadsheet
"""
def save_xls(dict_df, path):
writer = ExcelWriter(path)
for key in dict_df:
dict_df[key].to_excel(writer, key)
writer.save()

## loop through all excel files
for filename in glob.glob("*.xlsx"):
f = pd.ExcelFile(filename)
dict_dfs = {}
for sheet_name in f.sheet_names:
df_new = f.parse(sheet_name = sheet_name)

## get the year and quarter from the sheet name
year, quarter, city = sheet_name.split("_")
df_new["year"] = year
df_new["quarter"] = quarter
df_new["city"] = city

## populate dictionary
dict_dfs[sheet_name] = df_new

save_xls(dict_df = dict_dfs, path = "new_" + filename)

How to obtain sheet names from XLS files without loading the whole file?

you can use the xlrd library and open the workbook with the "on_demand=True" flag, so that the sheets won't be loaded automaticaly.

Than you can retrieve the sheet names in a similar way to pandas:

import xlrd
xls = xlrd.open_workbook(r'<path_to_your_excel_file>', on_demand=True)
print xls.sheet_names() # <- remeber: xlrd sheet_names is a function, not a property

excel sheets name in pandas dataframe

This should work:

xl = pd.ExcelFile('archvio.xlsx')
df_combined = pd.DataFrame()
for sheet_name in xl.sheet_names:
df = xl.parse(sheet_name)
df['Week'] = sheet_name # this adds `sheet_name` into the column `Week`
df_combined = df_combined.append(df)

Read each excel sheet as a different dataframe in Python

Specifying sheet_name as None with read_excel reads all worksheets and returns a dict of DataFrames.

import pandas as pd

file = 'C:\Users\filename.xlsx'
xl = pd.read_excel(file, sheet_name=None)
sheets = xl.keys()

for sheet in sheets:
xl[sheet].to_excel(f"{sheet}.xlsx")

Using Pandas to pd.read_excel() for multiple worksheets of the same workbook

Try pd.ExcelFile:

xls = pd.ExcelFile('path_to_file.xls')
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')

As noted by @HaPsantran, the entire Excel file is read in during the ExcelFile() call (there doesn't appear to be a way around this). This merely saves you from having to read the same file in each time you want to access a new sheet.

Note that the sheet_name argument to pd.read_excel() can be the name of the sheet (as above), an integer specifying the sheet number (eg 0, 1, etc), a list of sheet names or indices, or None. If a list is provided, it returns a dictionary where the keys are the sheet names/indices and the values are the data frames. The default is to simply return the first sheet (ie, sheet_name=0).

If None is specified, all sheets are returned, as a {sheet_name:dataframe} dictionary.



Related Topics



Leave a reply



Submit