Pandas Reading CSV as String Type

Pandas reading csv as string type

Update: this has been fixed: from 0.11.1 you passing str/np.str will be equivalent to using object.

Use the object dtype:

In [11]: pd.read_csv('a', dtype=object, index_col=0)
Out[11]:
A B
1A 0.35633069074776547 0.745585398803751
1B 0.20037376323337375 0.013921830784260236

or better yet, just don't specify a dtype:

In [12]: pd.read_csv('a', index_col=0)
Out[12]:
A B
1A 0.356331 0.745585
1B 0.200374 0.013922

but bypassing the type sniffer and truly returning only strings requires a hacky use of converters:

In [13]: pd.read_csv('a', converters={i: str for i in range(100)})
Out[13]:
A B
1A 0.35633069074776547 0.745585398803751
1B 0.20037376323337375 0.013921830784260236

where 100 is some number equal or greater than your total number of columns.

It's best to avoid the str dtype, see for example here.

Pandas read_csv dtype read all columns but few as string

EDIT - sorry, I misread your question. Updated my answer.

You can read the entire csv as strings then convert your desired columns to other types afterwards like this:

df = pd.read_csv('/path/to/file.csv', dtype=str)
# example df; yours will be from pd.read_csv() above
df = pd.DataFrame({'A': ['1', '3', '5'], 'B': ['2', '4', '6'], 'C': ['x', 'y', 'z']})
types_dict = {'A': int, 'B': float}
for col, col_type in types_dict.items():
df[col] = df[col].astype(col_type)

Another approach, if you really want to specify the proper types for all columns when reading the file in and not change them after: read in just the column names (no rows), then use those to fill in which columns should be strings

col_names = pd.read_csv('file.csv', nrows=0).columns
types_dict = {'A': int, 'B': float}
types_dict.update({col: str for col in col_names if col not in types_dict})
pd.read_csv('file.csv', dtype=types_dict)

Import pandas dataframe column as string not int

Just want to reiterate this will work in pandas >= 0.9.1:

In [2]: read_csv('sample.csv', dtype={'ID': object})
Out[2]:
ID
0 00013007854817840016671868
1 00013007854817840016749251
2 00013007854817840016754630
3 00013007854817840016781876
4 00013007854817840017028824
5 00013007854817840017963235
6 00013007854817840018860166

I'm creating an issue about detecting integer overflows also.

EDIT: See resolution here: https://github.com/pydata/pandas/issues/2247

Update as it helps others:

To have all columns as str, one can do this (from the comment):

pd.read_csv('sample.csv', dtype = str)

To have most or selective columns as str, one can do this:

# lst of column names which needs to be string
lst_str_cols = ['prefix', 'serial']
# use dictionary comprehension to make dict of dtypes
dict_dtypes = {x : 'str' for x in lst_str_cols}
# use dict on dtypes
pd.read_csv('sample.csv', dtype=dict_dtypes)

Read a cell from CSV using pandas in object data type

One of the joys/horrors (depending on your standpoint) of a scripting language like Python is that you can make up code on the fly, using the eval() function.

l = ['John','Graham','Michael']
strExpr = 'l[1]'
print(strExpr,'=',eval(strExpr))

gives a result of:

l[1] = Graham

So in this case,

l = ['client.read_holding_registers(0,unit=1)',
'client.read_holding_registers(1,unit=1)',
'client.read_holding_registers(2,unit=1)']

k1 = np.array([eval(reg) for reg in l])

will evaluate whatever is in the string-based expression.

EDIT: Since OP only wants to loop through items once, and assuming that client is already a valid object:

k1 = []

f = open('MB_REGISTERS.csv',mode='r')
while True:
line = f.readline()
if not line:
break
k1.append(eval(line))

print(k1)

(NB. Extra checks might be needed for blank lines etc. Also it seems that eval() does not mind having the newline character at the end of the string)



Related Topics



Leave a reply



Submit