Pandas: Looking up the list of sheets in an excel file
You can still use the ExcelFile class (and the sheet_names
attribute):
xl = pd.ExcelFile('foo.xls')
xl.sheet_names # see all sheet names
xl.parse(sheet_name) # read a specific sheet to DataFrame
see docs for parse for more options...
Choose A Specific Sheet In Excel Containing a String Pandas
I think the answer here would probably give you what you need.
Bring in the file as an Excelfile before reading it as a dataframe. Get the Sheet_names, and then extract the sheet_name that has 'ITD_'.
excel = pd.ExcelFile("your_excel.xlsx")
excel.sheet_names
# ["Sheet1", "Sheet2"]
for n in excel.sheet_names:
if n.startswith('ITD_'):
sheetname = n
break
df = excel.parse(sheetname)
Iteratively read excel sheet names, split and save them as new columns for each sheet in Python
You can use use f = pd.ExcelFile('data1.xlsx')
to read the excel file in as an object, then loop through the list of sheet names by iterating through f.sheet_names
, splitting each sheet name such as the "2019_q1_sh" string into the appropriate year, quarter, city
and setting these as values of new columns in the DataFrame you are reading in from each sheet.
Then create a dictionary with sheet names as keys, and the corresponding modified DataFrame as the values. You can create a custom save_xls
function that takes in such a dictionary and saves it, as described in this helpful answer.
Update: since you want to loop through all excel files in your current directory, you can use the glob
library to get all of the files with extension .xlsx
and loop through each of these files, read them in, and save a new file with the string new_
in front of the file name
import pandas as pd
from pandas import ExcelWriter
import glob
"""
Save a dictionary of dataframes to an excel file, with each dataframe as a separate page
Reference: https://stackoverflow.com/questions/14225676/save-list-of-dataframes-to-multisheet-excel-spreadsheet
"""
def save_xls(dict_df, path):
writer = ExcelWriter(path)
for key in dict_df:
dict_df[key].to_excel(writer, key)
writer.save()
## loop through all excel files
for filename in glob.glob("*.xlsx"):
f = pd.ExcelFile(filename)
dict_dfs = {}
for sheet_name in f.sheet_names:
df_new = f.parse(sheet_name = sheet_name)
## get the year and quarter from the sheet name
year, quarter, city = sheet_name.split("_")
df_new["year"] = year
df_new["quarter"] = quarter
df_new["city"] = city
## populate dictionary
dict_dfs[sheet_name] = df_new
save_xls(dict_df = dict_dfs, path = "new_" + filename)
How to obtain sheet names from XLS files without loading the whole file?
you can use the xlrd library and open the workbook with the "on_demand=True" flag, so that the sheets won't be loaded automaticaly.
Than you can retrieve the sheet names in a similar way to pandas:
import xlrd
xls = xlrd.open_workbook(r'<path_to_your_excel_file>', on_demand=True)
print xls.sheet_names() # <- remeber: xlrd sheet_names is a function, not a property
excel sheets name in pandas dataframe
This should work:
xl = pd.ExcelFile('archvio.xlsx')
df_combined = pd.DataFrame()
for sheet_name in xl.sheet_names:
df = xl.parse(sheet_name)
df['Week'] = sheet_name # this adds `sheet_name` into the column `Week`
df_combined = df_combined.append(df)
Read each excel sheet as a different dataframe in Python
Specifying sheet_name
as None
with read_excel reads all worksheets and returns a dict
of DataFrames
.
import pandas as pd
file = 'C:\Users\filename.xlsx'
xl = pd.read_excel(file, sheet_name=None)
sheets = xl.keys()
for sheet in sheets:
xl[sheet].to_excel(f"{sheet}.xlsx")
Using Pandas to pd.read_excel() for multiple worksheets of the same workbook
Try pd.ExcelFile
:
xls = pd.ExcelFile('path_to_file.xls')
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')
As noted by @HaPsantran, the entire Excel file is read in during the ExcelFile()
call (there doesn't appear to be a way around this). This merely saves you from having to read the same file in each time you want to access a new sheet.
Note that the sheet_name
argument to pd.read_excel()
can be the name of the sheet (as above), an integer specifying the sheet number (eg 0, 1, etc), a list of sheet names or indices, or None
. If a list is provided, it returns a dictionary where the keys are the sheet names/indices and the values are the data frames. The default is to simply return the first sheet (ie, sheet_name=0
).
If None
is specified, all sheets are returned, as a {sheet_name:dataframe}
dictionary.
Related Topics
Random State (Pseudo-Random Number) in Scikit Learn
Difference Between Returns and Printing in Python
Find P-Value (Significance) in Scikit-Learn Linearregression
What Is the Correct Way to Set Python's Locale on Windows
Getting One Value from a Tuple
Most Efficient Property to Hash for Numpy Array
N-Grams in Python, Four, Five, Six Grams
Select Pandas Rows Based on List Index
How to Make Urllib2 Requests Through Tor in Python
Catching an Exception While Using a Python 'With' Statement
What Do I Do When I Need a Self Referential Dictionary
Replacing Text in a File with Python
How to Check If Character in a String Is a Letter? (Python)