Import Pandas Dataframe Column as String Not Int

Import pandas dataframe column as string not int

Just want to reiterate this will work in pandas >= 0.9.1:

In [2]: read_csv('sample.csv', dtype={'ID': object})
Out[2]:
ID
0 00013007854817840016671868
1 00013007854817840016749251
2 00013007854817840016754630
3 00013007854817840016781876
4 00013007854817840017028824
5 00013007854817840017963235
6 00013007854817840018860166

I'm creating an issue about detecting integer overflows also.

EDIT: See resolution here: https://github.com/pydata/pandas/issues/2247

Update as it helps others:

To have all columns as str, one can do this (from the comment):

pd.read_csv('sample.csv', dtype = str)

To have most or selective columns as str, one can do this:

# lst of column names which needs to be string
lst_str_cols = ['prefix', 'serial']
# use dictionary comprehension to make dict of dtypes
dict_dtypes = {x : 'str' for x in lst_str_cols}
# use dict on dtypes
pd.read_csv('sample.csv', dtype=dict_dtypes)

Import pandas dataframe column as string not int or float

You can use parameter dtype with str:

df = pd.read_csv('sample.csv', dtype=str)

Pandas read_csv dtype read all columns but few as string

EDIT - sorry, I misread your question. Updated my answer.

You can read the entire csv as strings then convert your desired columns to other types afterwards like this:

df = pd.read_csv('/path/to/file.csv', dtype=str)
# example df; yours will be from pd.read_csv() above
df = pd.DataFrame({'A': ['1', '3', '5'], 'B': ['2', '4', '6'], 'C': ['x', 'y', 'z']})
types_dict = {'A': int, 'B': float}
for col, col_type in types_dict.items():
df[col] = df[col].astype(col_type)

Another approach, if you really want to specify the proper types for all columns when reading the file in and not change them after: read in just the column names (no rows), then use those to fill in which columns should be strings

col_names = pd.read_csv('file.csv', nrows=0).columns
types_dict = {'A': int, 'B': float}
types_dict.update({col: str for col in col_names if col not in types_dict})
pd.read_csv('file.csv', dtype=types_dict)

Cannot convert pandas column to string

That is how pandas define the column type , there is not string type column, it belong to object

df.column1.apply(type)
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
3 <class 'str'>
4 <class 'str'>
5 <class 'str'>
Name: column1, dtype: object

DataFrame dose not str.replace

You should do

df.replace({'...':'...'}) 

Or

df['column1']=df['column1'].str.replace()

Dataframe column won't convert from integer string to an actual integer

Per my answer in your previous question:

import pandas as pd
data = ["20181231235959383171", "20181231235959383172"]
df = pd.DataFrame(data=data, columns=["A"])

# slow but big enough
df["A_as_python_int"] = df["A"].apply(int)

# fast but has to be split to two integers
df["A_seconds"] = (df["A_as_python_int"] // 1000000).astype(np.int)
df["A_fractions"] = (df["A_as_python_int"] % 1000000).astype(np.int)

Convert columns to string in Pandas

One way to convert to string is to use astype:

total_rows['ColumnID'] = total_rows['ColumnID'].astype(str)

However, perhaps you are looking for the to_json function, which will convert keys to valid json (and therefore your keys to strings):

In [11]: df = pd.DataFrame([['A', 2], ['A', 4], ['B', 6]])

In [12]: df.to_json()
Out[12]: '{"0":{"0":"A","1":"A","2":"B"},"1":{"0":2,"1":4,"2":6}}'

In [13]: df[0].to_json()
Out[13]: '{"0":"A","1":"A","2":"B"}'

Note: you can pass in a buffer/file to save this to, along with some other options...

Pandas Dataframe interpreting columns as float instead of String

A solution could be this, but after you have imported the df:

df = pd.read_csv(filename)
df['ID'] = df['ID'].astype(int).astype(str)

Or since there are NaN with:

df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))

How to find string data-type that includes a number in Pandas DataFrame

You can use pandas.to_numeric with errors='coerce', then dropna to remove the invalid rows:

(data_df.assign(value=pd.to_numeric(data_df['value'], errors='coerce'))
.dropna(subset=['value'])
)

NB. this upcasts the integers into floats, but this is the way Series works and it's better to have upcasting than forcing an object type

output:

  name  value
1 B 10.0
3 D 10.0
5 F 20.0
6 G 25.1

If you just want to slice the rows and keep the string type:

data_df[pd.to_numeric(data_df['value'], errors='coerce').notna()]

output:

  name value
1 B 10
3 D 10
5 F 20.0
6 G 25.1
updated question (multi columns)

build a mask and use any/all prior to slicing:

mask = data_df[data_df.columns[1:]].apply(pd.to_numeric, errors='coerce').notna().all(1)
data_df[mask]


Related Topics



Leave a reply



Submit