Pandas Dataframe Stored List as String: How to Convert Back to List

Pandas DataFrame stored list as string: How to convert back to list

As you pointed out, this can commonly happen when saving and loading pandas DataFrames as .csv files, which is a text format.

In your case this happened because list objects have a string representation, allowing them to be stored as .csv files. Loading the .csv will then yield that string representation.

If you want to store the actual objects, you should use DataFrame.to_pickle() (note: objects must be picklable!).

To answer your second question, you can convert it back with ast.literal_eval:

>>> from ast import literal_eval
>>> literal_eval('[1.23, 2.34]')
[1.23, 2.34]

pandas - convert string into list of strings

You can split the string manually:

>>> df['Tags'] = df.Tags.apply(lambda x: x[1:-1].split(','))
>>> df.Tags[0]
['Tag1', 'Tag2']

How to convert string back to list using Pandas

You can use ast.literal_eval as :

>>> import ast
>>> a = "['BONGO', 'TOZZO', 'FALLO', 'PINCO']"
>>> print ast.literal_eval(a)
>>> ['BONGO', 'TOZZO', 'FALLO', 'PINCO']

Pandas stored list as string, but cannot convert it back due to decimal

You can do it with eval() since ast.literal_eval() is not converting to Decimal() object just note you need to be very aware of your data with this method.

The eval() method will execute a given string just like the Python interpreter so it will create objects that in the given string in your case Decimal().

val = "[{'product':'ABC', 'quantity':1, 'price':Decimal(91.99)}, {'product':'YXZ', 'quantity':2, 'price':Decimal(11.99)}"
print(eval(val))

Output

[{'product': 'ABC',
'quantity': 1,
'price': Decimal('91.9899999999999948840923025272786617279052734375')},
{'product': 'YXZ',
'quantity': 2,
'price': Decimal('11.9900000000000002131628207280300557613372802734375')}]

How can I save DataFrame as list and not as string

Try this:

import pandas as pd

df = pd.DataFrame({'a': ["[1,2,3,4]", "[6,7,8,9]"]})
df['b'] = df['a'].apply(eval)
print(df)

The data in column b is now an array.

           a             b
0 [1,2,3,4] [1, 2, 3, 4]
1 [6,7,8,9] [6, 7, 8, 9]

Transform string that should be list of floats in a column of dataframe?

Use ast.literal_eval:

import ast

df['interval'] = df['interval'].apply(ast.literal_eval)

Output

>>> df
interval
0 [100.0, 3.0]
1 [3.0, 2.0]
2 [2.0, 1.0]
3 [1, 0.25]
4 [0.25, 0.0]

>>> df.loc[0, 'interval']
[100.0, 3.0]

>>> type(df.loc[0, 'interval'])
list

Now you can convert to columns if you want:

>>> df['interval'].apply(pd.Series)
0 1
0 100.00 3.00
1 3.00 2.00
2 2.00 1.00
3 1.00 0.25
4 0.25 0.00

Column of lists, convert list to string as a new column

List Comprehension

If performance is important, I strongly recommend this solution and I can explain why.

df['liststring'] = [','.join(map(str, l)) for l in df['lists']]
df

lists liststring
0 [1, 2, 12, 6, ABC] 1,2,12,6,ABC
1 [1000, 4, z, a] 1000,4,z,a

You can extend this to more complicated use cases using a function.

def try_join(l):
try:
return ','.join(map(str, l))
except TypeError:
return np.nan

df['liststring'] = [try_join(l) for l in df['lists']]


Series.apply/Series.agg with ','.join

You need to convert your list items to strings first, that's where the map comes in handy.

df['liststring'] = df['lists'].apply(lambda x: ','.join(map(str, x)))

Or,

df['liststring'] = df['lists'].agg(lambda x: ','.join(map(str, x)))

<!- >

df
lists liststring
0 [1, 2, 12, 6, ABC] 1,2,12,6,ABC
1 [1000, 4, z, a] 1000,4,z,a


pd.DataFrame constructor with DataFrame.agg

A non-loopy/non-lambda solution.

df['liststring'] = (pd.DataFrame(df.lists.tolist())
.fillna('')
.astype(str)
.agg(','.join, 1)
.str.strip(','))

df
lists liststring
0 [1, 2, 12, 6, ABC] 1,2,12,6,ABC
1 [1000, 4, z, a] 1000,4,z,a

Converting list of strings in pandas column into string

Edit:

As your edit shows, it seems the rows are not actually lists but strings interpreted as lists. We can use eval to ensure the format is of type list so as to later perform the join. It seems your sample data is the following:

df = pd.DataFrame({'index':[0,1,2,3,4],
'words':["['me']","['they']","['it','we','it']","[]","['we','we','it']"]})

How about this? Using apply with a lambda function which uses ' '.join() for each row (list):

df['words'] = df['words'].apply(eval).apply(' '.join)
print(df)

Output:

   index     words
0 0 me
1 1 they
2 2 it we it
3 3
4 4 we we it


Related Topics



Leave a reply



Submit