Pandas reading csv as string type
Update: this has been fixed: from 0.11.1 you passing str
/np.str
will be equivalent to using object
.
Use the object dtype:
In [11]: pd.read_csv('a', dtype=object, index_col=0)
Out[11]:
A B
1A 0.35633069074776547 0.745585398803751
1B 0.20037376323337375 0.013921830784260236
or better yet, just don't specify a dtype:
In [12]: pd.read_csv('a', index_col=0)
Out[12]:
A B
1A 0.356331 0.745585
1B 0.200374 0.013922
but bypassing the type sniffer and truly returning only strings requires a hacky use of converters
:
In [13]: pd.read_csv('a', converters={i: str for i in range(100)})
Out[13]:
A B
1A 0.35633069074776547 0.745585398803751
1B 0.20037376323337375 0.013921830784260236
where 100
is some number equal or greater than your total number of columns.
It's best to avoid the str dtype, see for example here.
Pandas read_csv dtype read all columns but few as string
EDIT - sorry, I misread your question. Updated my answer.
You can read the entire csv as strings then convert your desired columns to other types afterwards like this:
df = pd.read_csv('/path/to/file.csv', dtype=str)
# example df; yours will be from pd.read_csv() above
df = pd.DataFrame({'A': ['1', '3', '5'], 'B': ['2', '4', '6'], 'C': ['x', 'y', 'z']})
types_dict = {'A': int, 'B': float}
for col, col_type in types_dict.items():
df[col] = df[col].astype(col_type)
Another approach, if you really want to specify the proper types for all columns when reading the file in and not change them after: read in just the column names (no rows), then use those to fill in which columns should be strings
col_names = pd.read_csv('file.csv', nrows=0).columns
types_dict = {'A': int, 'B': float}
types_dict.update({col: str for col in col_names if col not in types_dict})
pd.read_csv('file.csv', dtype=types_dict)
Import pandas dataframe column as string not int
Just want to reiterate this will work in pandas >= 0.9.1:
In [2]: read_csv('sample.csv', dtype={'ID': object})
Out[2]:
ID
0 00013007854817840016671868
1 00013007854817840016749251
2 00013007854817840016754630
3 00013007854817840016781876
4 00013007854817840017028824
5 00013007854817840017963235
6 00013007854817840018860166
I'm creating an issue about detecting integer overflows also.
EDIT: See resolution here: https://github.com/pydata/pandas/issues/2247
Update as it helps others:
To have all columns as str, one can do this (from the comment):
pd.read_csv('sample.csv', dtype = str)
To have most or selective columns as str, one can do this:
# lst of column names which needs to be string
lst_str_cols = ['prefix', 'serial']
# use dictionary comprehension to make dict of dtypes
dict_dtypes = {x : 'str' for x in lst_str_cols}
# use dict on dtypes
pd.read_csv('sample.csv', dtype=dict_dtypes)
Read a cell from CSV using pandas in object data type
One of the joys/horrors (depending on your standpoint) of a scripting language like Python is that you can make up code on the fly, using the eval()
function.
l = ['John','Graham','Michael']
strExpr = 'l[1]'
print(strExpr,'=',eval(strExpr))
gives a result of:
l[1] = Graham
So in this case,
l = ['client.read_holding_registers(0,unit=1)',
'client.read_holding_registers(1,unit=1)',
'client.read_holding_registers(2,unit=1)']
k1 = np.array([eval(reg) for reg in l])
will evaluate whatever is in the string-based expression.
EDIT: Since OP only wants to loop through items once, and assuming that client
is already a valid object:
k1 = []
f = open('MB_REGISTERS.csv',mode='r')
while True:
line = f.readline()
if not line:
break
k1.append(eval(line))
print(k1)
(NB. Extra checks might be needed for blank lines etc. Also it seems that eval() does not mind having the newline character at the end of the string)
Related Topics
Understanding .Get() Method in Python
Overwriting File in Ziparchive
How to Print a Percentage Value in Python
What's the Difference Between 'R+' and 'A+' When Open File in Python
Distributing My Python Scripts as Jar Files with Jython
Valueerror: Could Not Broadcast Input Array from Shape (224,224,3) into Shape (224,224)
How to Change Spacing Between Ticks in Matplotlib
How to Plot Normal Distribution
In Python, How to Escape Newline Characters When Printing a String
How to Import a Python Module from a Sibling Folder
Creating Lowpass Filter in Scipy - Understanding Methods and Units
Regex: Attributeerror: 'Nonetype' Object Has No Attribute 'Groups'
Matplotlib Semi-Log Plot: Minor Tick Marks Are Gone When Range Is Large