Pandas: Cast Column to String Does Not Work

Pandas: Cast column to string does not work

dtype of string, dict, list is always object, for testing type need select some value of column e.g. by iat:

type(resultstatsDF['file'].iat[0])

Sample:

resultstatsDF = pd.DataFrame({'file':['a','d','f']})
print (resultstatsDF)
file
0 a
1 d
2 f

print (type(resultstatsDF['file'].iloc[0]))
<class 'str'>

print (resultstatsDF['file'].apply(type))
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
Name: file, dtype: object

Sample:

df = pd.DataFrame({'strings':['a','d','f'],
'dicts':[{'a':4}, {'c':8}, {'e':9}],
'lists':[[4,8],[7,8],[3]],
'tuples':[(4,8),(7,8),(3,)],
'sets':[set([1,8]), set([7,3]), set([0,1])] })

print (df)
dicts lists sets strings tuples
0 {'a': 4} [4, 8] {8, 1} a (4, 8)
1 {'c': 8} [7, 8] {3, 7} d (7, 8)
2 {'e': 9} [3] {0, 1} f (3,)

All values have same dtypes:

print (df.dtypes)
dicts object
lists object
sets object
strings object
tuples object
dtype: object

But type is different, if need check it by loop:

for col in df:
print (df[col].apply(type))

0 <class 'dict'>
1 <class 'dict'>
2 <class 'dict'>
Name: dicts, dtype: object
0 <class 'list'>
1 <class 'list'>
2 <class 'list'>
Name: lists, dtype: object
0 <class 'set'>
1 <class 'set'>
2 <class 'set'>
Name: sets, dtype: object
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
Name: strings, dtype: object
0 <class 'tuple'>
1 <class 'tuple'>
2 <class 'tuple'>
Name: tuples, dtype: object

Or first value of columns:

print (type(df['strings'].iat[0]))
<class 'str'>

print (type(df['dicts'].iat[0]))
<class 'dict'>

print (type(df['lists'].iat[0]))
<class 'list'>

print (type(df['tuples'].iat[0]))
<class 'tuple'>

print (type(df['sets'].iat[0]))
<class 'set'>

With boolean indexing if possible mixed column (then some pandas function can be broken) is possible filter by type:

df = pd.DataFrame({'mixed':['3', 5, 9,'2']})
print (df)
mixed
0 3
1 5
2 9
3 2

print (df.dtypes)
mixed object
dtype: object

for col in df:
print (df[col].apply(type))
0 <class 'str'>
1 <class 'int'>
2 <class 'int'>
3 <class 'str'>
Name: mixed, dtype: object

#python 3 - string
#python 2 - basestring
mask = df['mixed'].apply(lambda x: isinstance(x,str))
print (mask)
0 True
1 False
2 False
3 True
Name: mixed, dtype: bool

df = df[mask]
print (df)
mixed
0 3
3 2

Convert columns to string in Pandas

One way to convert to string is to use astype:

total_rows['ColumnID'] = total_rows['ColumnID'].astype(str)

However, perhaps you are looking for the to_json function, which will convert keys to valid json (and therefore your keys to strings):

In [11]: df = pd.DataFrame([['A', 2], ['A', 4], ['B', 6]])

In [12]: df.to_json()
Out[12]: '{"0":{"0":"A","1":"A","2":"B"},"1":{"0":2,"1":4,"2":6}}'

In [13]: df[0].to_json()
Out[13]: '{"0":"A","1":"A","2":"B"}'

Note: you can pass in a buffer/file to save this to, along with some other options...

Cannot convert pandas column to string

That is how pandas define the column type , there is not string type column, it belong to object

df.column1.apply(type)
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
3 <class 'str'>
4 <class 'str'>
5 <class 'str'>
Name: column1, dtype: object

DataFrame dose not str.replace

You should do

df.replace({'...':'...'}) 

Or

df['column1']=df['column1'].str.replace()

Pandas: change data type of Series to String

A new answer to reflect the most current practices: as of now (v1.2.4), neither astype('str') nor astype(str) work.

As per the documentation, a Series can be converted to the string datatype in the following ways:

df['id'] = df['id'].astype("string")

df['id'] = pandas.Series(df['id'], dtype="string")

df['id'] = pandas.Series(df['id'], dtype=pandas.StringDtype)


Related Topics



Leave a reply



Submit