Reading an Excel File in Python Using Pandas

Reading an Excel file in python using pandas

Close: first you call ExcelFile, but then you call the .parse method and pass it the sheet name.

>>> xl = pd.ExcelFile("dummydata.xlsx")
>>> xl.sheet_names
[u'Sheet1', u'Sheet2', u'Sheet3']
>>> df = xl.parse("Sheet1")
>>> df.head()
                  Tid  dummy1    dummy2    dummy3    dummy4    dummy5  \
0 2006-09-01 00:00:00       0  5.894611  0.605211  3.842871  8.265307   
1 2006-09-01 01:00:00       0  5.712107  0.605211  3.416617  8.301360   
2 2006-09-01 02:00:00       0  5.105300  0.605211  3.090865  8.335395   
3 2006-09-01 03:00:00       0  4.098209  0.605211  3.198452  8.170187   
4 2006-09-01 04:00:00       0  3.338196  0.605211  2.970015  7.765058   

     dummy6  dummy7    dummy8    dummy9  
0  0.623354       0  2.579108  2.681728  
1  0.554211       0  7.210000  3.028614  
2  0.567841       0  6.940000  3.644147  
3  0.581470       0  6.630000  4.016155  
4  0.595100       0  6.350000  3.974442

What you're doing is calling the method which lives on the class itself, rather than the instance, which is okay (although not very idiomatic), but if you're doing that you would also need to pass the sheet name:

>>> parsed = pd.io.parsers.ExcelFile.parse(xl, "Sheet1")
>>> parsed.columns
Index([u'Tid', u'dummy1', u'dummy2', u'dummy3', u'dummy4', u'dummy5', u'dummy6', u'dummy7', u'dummy8', u'dummy9'], dtype=object)

read the excel file in directory using pandas python

The directory is missing when you read_excel, you only point to the file as you showed with the print.

You need to rebuild the full path with for instance, os.path.join:

import os
import pandas as pd

for filename in os.listdir(my_path):
    if filename.startswith('PB orders Dec'):
        dec = pd.read_excel(os.path.join(my_path, filename), sheet_name='Raw data')

Reading an excel file into a pandas DF that has a pipe and spaces as delimiters

You can use a regex in the sep field:

my_file = '''
ID|Name|Job|Nationality|

123 Cian|IT|-|

222 John|Teacher|Spanish|
'''

df = pd.read_csv(StringIO(my_file), sep='[ |]')

Using Pandas to pd.read_excel() for multiple worksheets of the same workbook

Try pd.ExcelFile:

xls = pd.ExcelFile('path_to_file.xls')
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')

As noted by @HaPsantran, the entire Excel file is read in during the ExcelFile() call (there doesn't appear to be a way around this). This merely saves you from having to read the same file in each time you want to access a new sheet.

Note that the sheet_name argument to pd.read_excel() can be the name of the sheet (as above), an integer specifying the sheet number (eg 0, 1, etc), a list of sheet names or indices, or None. If a list is provided, it returns a dictionary where the keys are the sheet names/indices and the values are the data frames. The default is to simply return the first sheet (ie, sheet_name=0).

If None is specified, all sheets are returned, as a {sheet_name:dataframe} dictionary.

Read excel file in python using pandas

Your fileLocation variable includes the name of the file. reading fileLocation + fileName is essentially reading

C:\\Users\\GTS\\Desktop\\Network Interdiction Problem\\Manuscript\\Interdiction_Data.xlsxInterdiction_Data.xlsx

Another issue is that you have quotation marks around your variable names when calling pd.read_excel() meaning that you are passing a string to the function.

Try:

data = pd.read_excel(fileLocation)

Reading an Excel File in Python Using Pandas