Import pandas dataframe column as string not int
Just want to reiterate this will work in pandas >= 0.9.1:
In [2]: read_csv('sample.csv', dtype={'ID': object})
Out[2]:
ID
0 00013007854817840016671868
1 00013007854817840016749251
2 00013007854817840016754630
3 00013007854817840016781876
4 00013007854817840017028824
5 00013007854817840017963235
6 00013007854817840018860166
I'm creating an issue about detecting integer overflows also.
EDIT: See resolution here: https://github.com/pydata/pandas/issues/2247
Update as it helps others:
To have all columns as str, one can do this (from the comment):
pd.read_csv('sample.csv', dtype = str)
To have most or selective columns as str, one can do this:
# lst of column names which needs to be string
lst_str_cols = ['prefix', 'serial']
# use dictionary comprehension to make dict of dtypes
dict_dtypes = {x : 'str' for x in lst_str_cols}
# use dict on dtypes
pd.read_csv('sample.csv', dtype=dict_dtypes)
Import pandas dataframe column as string not int or float
You can use parameter dtype
with str
:
df = pd.read_csv('sample.csv', dtype=str)
Pandas read_csv dtype read all columns but few as string
EDIT - sorry, I misread your question. Updated my answer.
You can read the entire csv as strings then convert your desired columns to other types afterwards like this:
df = pd.read_csv('/path/to/file.csv', dtype=str)
# example df; yours will be from pd.read_csv() above
df = pd.DataFrame({'A': ['1', '3', '5'], 'B': ['2', '4', '6'], 'C': ['x', 'y', 'z']})
types_dict = {'A': int, 'B': float}
for col, col_type in types_dict.items():
df[col] = df[col].astype(col_type)
Another approach, if you really want to specify the proper types for all columns when reading the file in and not change them after: read in just the column names (no rows), then use those to fill in which columns should be strings
col_names = pd.read_csv('file.csv', nrows=0).columns
types_dict = {'A': int, 'B': float}
types_dict.update({col: str for col in col_names if col not in types_dict})
pd.read_csv('file.csv', dtype=types_dict)
Cannot convert pandas column to string
That is how pandas
define the column type , there is not string type column, it belong to object
df.column1.apply(type)
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
3 <class 'str'>
4 <class 'str'>
5 <class 'str'>
Name: column1, dtype: object
DataFrame dose not str.replace
You should do
df.replace({'...':'...'})
Or
df['column1']=df['column1'].str.replace()
Dataframe column won't convert from integer string to an actual integer
Per my answer in your previous question:
import pandas as pd
data = ["20181231235959383171", "20181231235959383172"]
df = pd.DataFrame(data=data, columns=["A"])
# slow but big enough
df["A_as_python_int"] = df["A"].apply(int)
# fast but has to be split to two integers
df["A_seconds"] = (df["A_as_python_int"] // 1000000).astype(np.int)
df["A_fractions"] = (df["A_as_python_int"] % 1000000).astype(np.int)
Convert columns to string in Pandas
One way to convert to string is to use astype:
total_rows['ColumnID'] = total_rows['ColumnID'].astype(str)
However, perhaps you are looking for the to_json
function, which will convert keys to valid json (and therefore your keys to strings):
In [11]: df = pd.DataFrame([['A', 2], ['A', 4], ['B', 6]])
In [12]: df.to_json()
Out[12]: '{"0":{"0":"A","1":"A","2":"B"},"1":{"0":2,"1":4,"2":6}}'
In [13]: df[0].to_json()
Out[13]: '{"0":"A","1":"A","2":"B"}'
Note: you can pass in a buffer/file to save this to, along with some other options...
Pandas Dataframe interpreting columns as float instead of String
A solution could be this, but after you have imported the df:
df = pd.read_csv(filename)
df['ID'] = df['ID'].astype(int).astype(str)
Or since there are NaN
with:
df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))
How to find string data-type that includes a number in Pandas DataFrame
You can use pandas.to_numeric
with errors='coerce'
, then dropna
to remove the invalid rows:
(data_df.assign(value=pd.to_numeric(data_df['value'], errors='coerce'))
.dropna(subset=['value'])
)
NB. this upcasts the integers into floats, but this is the way Series works and it's better to have upcasting than forcing an object type
output:
name value
1 B 10.0
3 D 10.0
5 F 20.0
6 G 25.1
If you just want to slice the rows and keep the string type:
data_df[pd.to_numeric(data_df['value'], errors='coerce').notna()]
output:
name value
1 B 10
3 D 10
5 F 20.0
6 G 25.1
updated question (multi columns)
build a mask and use any
/all
prior to slicing:
mask = data_df[data_df.columns[1:]].apply(pd.to_numeric, errors='coerce').notna().all(1)
data_df[mask]
Related Topics
What Does Blazeds Livecycle Data Services Do, That Something Like Pyamf or Rubyamf Not Do
Separate a Row of Strings into Separate Rows
Is There a Python Equivalent for Rspec to Do Tdd
Simple File Server to Serve Current Directory
Python, Ruby, Haskell - Do They Provide True Multithreading
How to Redirect Stdout to Both File and Console with Scripting
How to Validate a Date String Format in Python
Ipython Reads Wrong Python Version
Function Which Returns the Least-Squares Solution to a Linear Matrix Equation
Extract Column Value Based on Another Column Pandas Dataframe
How to Add Title to Subplots in Matplotlib
Create Dynamic Urls in Flask with Url_For()
How to Import CSV Data into Django Models
Dead Simple Example of Using Multiprocessing Queue, Pool and Locking