Pandas: Change Data Type of Series to String

Pandas: change data type of Series to String

A new answer to reflect the most current practices: as of now (v1.2.4), neither astype('str') nor astype(str) work.

As per the documentation, a Series can be converted to the string datatype in the following ways:

df['id'] = df['id'].astype("string")

df['id'] = pandas.Series(df['id'], dtype="string")

df['id'] = pandas.Series(df['id'], dtype=pandas.StringDtype)

How to covert pandas.core.series.Series to string?

There are many methods for doing this. You'll need to provide more details about your goal to get a more detailed answer.

you can use:

.to_string()

or

.astype(str)

or

.apply(str)

Convert columns to string in Pandas

One way to convert to string is to use astype:

total_rows['ColumnID'] = total_rows['ColumnID'].astype(str)

However, perhaps you are looking for the to_json function, which will convert keys to valid json (and therefore your keys to strings):

In [11]: df = pd.DataFrame([['A', 2], ['A', 4], ['B', 6]])

In [12]: df.to_json()
Out[12]: '{"0":{"0":"A","1":"A","2":"B"},"1":{"0":2,"1":4,"2":6}}'

In [13]: df[0].to_json()
Out[13]: '{"0":"A","1":"A","2":"B"}'

Note: you can pass in a buffer/file to save this to, along with some other options...

Pandas DataFrame stored list as string: How to convert back to list

As you pointed out, this can commonly happen when saving and loading pandas DataFrames as .csv files, which is a text format.

In your case this happened because list objects have a string representation, allowing them to be stored as .csv files. Loading the .csv will then yield that string representation.

If you want to store the actual objects, you should use DataFrame.to_pickle() (note: objects must be picklable!).

To answer your second question, you can convert it back with ast.literal_eval:

>>> from ast import literal_eval
>>> literal_eval('[1.23, 2.34]')
[1.23, 2.34]

pandas: Convert int Series to new StringDtype

This is explained in the docs, in the example section:

Unlike object dtype arrays, StringArray doesn’t allow non-string values

Where the following example is shown:

pd.array(['1', 1], dtype="string")

Traceback (most recent call last):
...
ValueError: StringArray requires an object-dtype ndarray of strings.

The only solution seems to be casting to Object dtype as you were doing and then to string.

This is also clearly stated in the source code of StringArray, where right at the top you'll see the warning:

   .. warning::
Currently, this expects an object-dtype ndarray
where the elements are Python strings or :attr:`pandas.NA`.
This may change without warning in the future. Use
:meth:`pandas.array` with ``dtype="string"`` for a stable way of
creating a `StringArray` from any sequence.

If you check the validation step in _validate, you'll see how this will fail for arrays of non-strings:

def _validate(self):
"""Validate that we only store NA or strings."""
if len(self._ndarray) and not lib.is_string_array(self._ndarray, skipna=True):
raise ValueError("StringArray requires a sequence of strings or pandas.NA")
if self._ndarray.dtype != "object":
raise ValueError(
"StringArray requires a sequence of strings or pandas.NA. Got "
f"'{self._ndarray.dtype}' dtype instead."
)

For the example in the question:

from pandas._libs import lib

lib.is_string_array(np.array(range(20)), skipna=True)
# False

pandas timestamp series to string?

Consider the dataframe df

df = pd.DataFrame(dict(timestamp=pd.to_datetime(['2000-01-01'])))

df

timestamp
0 2000-01-01

Use the datetime accessor dt to access the strftime method. You can pass a format string to strftime and it will return a formatted string. When used with the dt accessor you will get a series of strings.

df.timestamp.dt.strftime('%Y-%m-%d')

0 2000-01-01
Name: timestamp, dtype: object

Visit strftime.org for a handy set of format strings.

How to convert a pandas Series with multiple object into string series?

The problem is you've defined a series of lists:

s = pd.Series({'A':[10,'héllo','world']})

print(s)

A [10, héllo, world]
dtype: object

If this is truly what you have, you need to modify each list in a Python-level loop. For example, via pd.Series.apply:

s = s.apply(lambda x: list(map(str, x)))

If you have a series of scalars, then astype will work:

s = pd.Series([10,'héllo','world'])

res = s.astype(str)

print(res, res.map(type), sep='\n'*2)

0 10
1 héllo
2 world
dtype: object

0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
dtype: object

Convert multiple columns to string in pandas dataframe

To convert multiple columns to string, include a list of columns to your above-mentioned command:

df[['one', 'two', 'three']] = df[['one', 'two', 'three']].astype(str)
# add as many column names as you like.

That means that one way to convert all columns is to construct the list of columns like this:

all_columns = list(df) # Creates list of all column headers
df[all_columns] = df[all_columns].astype(str)

Note that the latter can also be done directly (see comments).



Related Topics



Leave a reply



Submit