Python Pandas: How to Specify Data Types When Reading an Excel File

Python pandas: how to specify data types when reading an Excel file?

You just specify converters. I created an excel spreadsheet of the following structure:

names   ages
bob 05
tom 4
suzy 3

Where the "ages" column is formatted as strings. To load:

import pandas as pd

df = pd.read_excel('Book1.xlsx',sheetname='Sheet1',header=0,converters={'names':str,'ages':str})
>>> df
names ages
0 bob 05
1 tom 4
2 suzy 3

Specify datatype when reading in excel data to pandas/python

As per the sample given here, two blank rows are present after the heading. So if you want heading, you can give skip rows in range:

pd.read_excel("test.xls",parse_cols="A,C",skiprows=[1,2])

Also, can you please confirm if there are any other NaN cells in that column. If there are NaN values in the column, column dtype will be promoted to float.
Please see the link below:
http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na

Also please note that since the first column heading is not given, while importing it takes first column as index.
To avoid that, I have followed the below steps:

My excel file looks like this

    NaN       gdp   gdp (2009)
NaN NaN NaN
NaN NaN NaN
1929 104.6 1056.7
1930 173.6 962
1931 72.3 846.6
NaN NaN NaN
1952 45.3 56.6

I removed the default headers and added headers to avoid indexing issue:

 test = pd.read_excel("test.xls",skiprows=[0,3],header=None,names=['Year','gdp (2009)'],parse_cols="A,C")

As stated above, since the column contains NaN value, column type will be converted into float.You can dropna or fill na values with 0 or some other value. In this case I'm dropping na rows.

test = test.dropna(axis=0, how='all')

Once you have removed NaN values, you can use astype to convert it into int

test['Year']=test.Year.astype(int)

Please check if this works for you and let me know if you need more clarification on this.
Thanks,

handle datatype issue while reading excel file to pandas dataframe

Looks like you could deal with that using a custom converter:

def bcvt(x):
return float(x.replace('>','').replace('%',''))/100

dfd = pd.read_csv(r'd:\jchtempnew\t1.csv', converters={'Budget': bcvt})

dfd

Location Month Desc Position Budget
0 EUR 1/1/2020 In Europe Right 0.34
1 AUS 1/1/2020 In Australia Left 0.22

(Updated per @user128029 recommendation)

Changing data type with pandas on read_excel

You can convert to string with formatting:

df["DEPS"]=df["DEPS"].map(lambda x:'{0:03d}'.format(int(x)))

convert to int to drop decimal place and convert int to string with 3 digits.

edit: Just to elaborate, excel stores numbers as floats, and not strings like in a csv file. When reading .csv files, you can specify column "dtype"s

Using specific column and cells in Excel workbook using Python

Not sure to understand your goal here, but here is one way to iterate on rows:

import pandas as pd

eIP = pd.DataFrame({"IP": ["a", "b", "c"]})

for _, row in eIP.iterrows():
print(row.values[0])

# Output
a
b
c

You can also get a dictionary of the values like this:

print(eIP.to_dict()["IP"])
# Output
{0: 'a', 1: 'b', 2: 'c'}


Related Topics



Leave a reply



Submit