Pandas: change data type of Series to String
A new answer to reflect the most current practices: as of now (v1.2.4), neither astype('str')
nor astype(str)
work.
As per the documentation, a Series can be converted to the string datatype in the following ways:
df['id'] = df['id'].astype("string")
df['id'] = pandas.Series(df['id'], dtype="string")
df['id'] = pandas.Series(df['id'], dtype=pandas.StringDtype)
How to covert pandas.core.series.Series to string?
There are many methods for doing this. You'll need to provide more details about your goal to get a more detailed answer.
you can use:
.to_string()
or
.astype(str)
or
.apply(str)
Convert columns to string in Pandas
One way to convert to string is to use astype:
total_rows['ColumnID'] = total_rows['ColumnID'].astype(str)
However, perhaps you are looking for the to_json
function, which will convert keys to valid json (and therefore your keys to strings):
In [11]: df = pd.DataFrame([['A', 2], ['A', 4], ['B', 6]])
In [12]: df.to_json()
Out[12]: '{"0":{"0":"A","1":"A","2":"B"},"1":{"0":2,"1":4,"2":6}}'
In [13]: df[0].to_json()
Out[13]: '{"0":"A","1":"A","2":"B"}'
Note: you can pass in a buffer/file to save this to, along with some other options...
Pandas DataFrame stored list as string: How to convert back to list
As you pointed out, this can commonly happen when saving and loading pandas DataFrames as .csv
files, which is a text format.
In your case this happened because list objects have a string representation, allowing them to be stored as .csv
files. Loading the .csv
will then yield that string representation.
If you want to store the actual objects, you should use DataFrame.to_pickle()
(note: objects must be picklable!).
To answer your second question, you can convert it back with ast.literal_eval
:
>>> from ast import literal_eval
>>> literal_eval('[1.23, 2.34]')
[1.23, 2.34]
pandas: Convert int Series to new StringDtype
This is explained in the docs, in the example section:
Unlike object dtype arrays, StringArray doesn’t allow non-string values
Where the following example is shown:
pd.array(['1', 1], dtype="string")
Traceback (most recent call last):
...
ValueError: StringArray requires an object-dtype ndarray of strings.
The only solution seems to be casting to Object
dtype as you were doing and then to string.
This is also clearly stated in the source code of StringArray
, where right at the top you'll see the warning:
.. warning::
Currently, this expects an object-dtype ndarray
where the elements are Python strings or :attr:`pandas.NA`.
This may change without warning in the future. Use
:meth:`pandas.array` with ``dtype="string"`` for a stable way of
creating a `StringArray` from any sequence.
If you check the validation step in _validate
, you'll see how this will fail for arrays of non-strings:
def _validate(self):
"""Validate that we only store NA or strings."""
if len(self._ndarray) and not lib.is_string_array(self._ndarray, skipna=True):
raise ValueError("StringArray requires a sequence of strings or pandas.NA")
if self._ndarray.dtype != "object":
raise ValueError(
"StringArray requires a sequence of strings or pandas.NA. Got "
f"'{self._ndarray.dtype}' dtype instead."
)
For the example in the question:
from pandas._libs import lib
lib.is_string_array(np.array(range(20)), skipna=True)
# False
pandas timestamp series to string?
Consider the dataframe df
df = pd.DataFrame(dict(timestamp=pd.to_datetime(['2000-01-01'])))
df
timestamp
0 2000-01-01
Use the datetime accessor dt
to access the strftime
method. You can pass a format string to strftime
and it will return a formatted string. When used with the dt
accessor you will get a series of strings.
df.timestamp.dt.strftime('%Y-%m-%d')
0 2000-01-01
Name: timestamp, dtype: object
Visit strftime.org
for a handy set of format strings.
How to convert a pandas Series with multiple object into string series?
The problem is you've defined a series of lists:
s = pd.Series({'A':[10,'héllo','world']})
print(s)
A [10, héllo, world]
dtype: object
If this is truly what you have, you need to modify each list in a Python-level loop. For example, via pd.Series.apply
:
s = s.apply(lambda x: list(map(str, x)))
If you have a series of scalars, then astype
will work:
s = pd.Series([10,'héllo','world'])
res = s.astype(str)
print(res, res.map(type), sep='\n'*2)
0 10
1 héllo
2 world
dtype: object
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
dtype: object
Convert multiple columns to string in pandas dataframe
To convert multiple columns to string, include a list of columns to your above-mentioned command:
df[['one', 'two', 'three']] = df[['one', 'two', 'three']].astype(str)
# add as many column names as you like.
That means that one way to convert all columns is to construct the list of columns like this:
all_columns = list(df) # Creates list of all column headers
df[all_columns] = df[all_columns].astype(str)
Note that the latter can also be done directly (see comments).
Related Topics
Pythonic Way to Combine For-Loop and If-Statement
How to Install Python Packages in Google's Colab
Is There a Difference Between Using a Dict Literal and a Dict Constructor
Securely Erasing Password in Memory (Python)
How to Improve My Paw Detection
How to Use a String as a Keyword Argument
How to Create an Object and Add Attributes to It
How to Fix Character Constantly Accelerating in Both Directions After Deceleration Pygame
Why Do I Need to Deploy a "Default" App Before I Can Deploy Multiple Services in Gae
Keep Same Dummy Variable in Training and Testing Data
Generalise Slicing Operation in a Numpy Array