Python pandas: how to specify data types when reading an Excel file?
You just specify converters. I created an excel spreadsheet of the following structure:
names ages
bob 05
tom 4
suzy 3
Where the "ages" column is formatted as strings. To load:
import pandas as pd
df = pd.read_excel('Book1.xlsx',sheetname='Sheet1',header=0,converters={'names':str,'ages':str})
>>> df
names ages
0 bob 05
1 tom 4
2 suzy 3
Specify datatype when reading in excel data to pandas/python
As per the sample given here, two blank rows are present after the heading. So if you want heading, you can give skip rows in range:
pd.read_excel("test.xls",parse_cols="A,C",skiprows=[1,2])
Also, can you please confirm if there are any other NaN cells in that column. If there are NaN values in the column, column dtype will be promoted to float.
Please see the link below:
http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na
Also please note that since the first column heading is not given, while importing it takes first column as index.
To avoid that, I have followed the below steps:
My excel file looks like this
NaN gdp gdp (2009)
NaN NaN NaN
NaN NaN NaN
1929 104.6 1056.7
1930 173.6 962
1931 72.3 846.6
NaN NaN NaN
1952 45.3 56.6
I removed the default headers and added headers to avoid indexing issue:
test = pd.read_excel("test.xls",skiprows=[0,3],header=None,names=['Year','gdp (2009)'],parse_cols="A,C")
As stated above, since the column contains NaN value, column type will be converted into float.You can dropna or fill na values with 0 or some other value. In this case I'm dropping na rows.
test = test.dropna(axis=0, how='all')
Once you have removed NaN values, you can use astype to convert it into int
test['Year']=test.Year.astype(int)
Please check if this works for you and let me know if you need more clarification on this.
Thanks,
handle datatype issue while reading excel file to pandas dataframe
Looks like you could deal with that using a custom converter
:
def bcvt(x):
return float(x.replace('>','').replace('%',''))/100
dfd = pd.read_csv(r'd:\jchtempnew\t1.csv', converters={'Budget': bcvt})
dfd
Location Month Desc Position Budget
0 EUR 1/1/2020 In Europe Right 0.34
1 AUS 1/1/2020 In Australia Left 0.22
(Updated per @user128029
recommendation)
Changing data type with pandas on read_excel
You can convert to string with formatting:
df["DEPS"]=df["DEPS"].map(lambda x:'{0:03d}'.format(int(x)))
convert to int to drop decimal place and convert int to string with 3 digits.
edit: Just to elaborate, excel stores numbers as floats, and not strings like in a csv file. When reading .csv files, you can specify column "dtype"s
Using specific column and cells in Excel workbook using Python
Not sure to understand your goal here, but here is one way to iterate on rows:
import pandas as pd
eIP = pd.DataFrame({"IP": ["a", "b", "c"]})
for _, row in eIP.iterrows():
print(row.values[0])
# Output
a
b
c
You can also get a dictionary of the values like this:
print(eIP.to_dict()["IP"])
# Output
{0: 'a', 1: 'b', 2: 'c'}
Related Topics
Python Method for Reading Keypress
Python Read from Subprocess Stdout and Stderr Separately While Preserving Order
Getting the Docstring from a Function
Why Do "Not a Number" Values Equal True When Cast as Boolean in Python/Numpy
Process to Convert Simple Python Script into Windows Executable
How to Make Python Scripts Executable on Windows
Calculation Error with Pow Operator
How to Save and Load Numpy.Array() Data Properly
Why Sum on Lists Is (Sometimes) Faster Than Itertools.Chain
Using Multipartposthandler to Post Form-Data with Python
Update Row Values Where Certain Condition Is Met in Pandas
How to Move Pandas Data from Index to Column After Multiple Groupby
In Python, How to Convert Seconds Since Epoch to a 'Datetime' Object