How Split Column of List-Values into Multiple Columns

Split a Pandas column of lists into multiple columns

You can use the DataFrame constructor with lists created by to_list:

import pandas as pd

d1 = {'teams': [['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],
                ['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG']]}
df2 = pd.DataFrame(d1)
print (df2)
       teams
0  [SF, NYG]
1  [SF, NYG]
2  [SF, NYG]
3  [SF, NYG]
4  [SF, NYG]
5  [SF, NYG]
6  [SF, NYG]

df2[['team1','team2']] = pd.DataFrame(df2.teams.tolist(), index= df2.index)
print (df2)
       teams team1 team2
0  [SF, NYG]    SF   NYG
1  [SF, NYG]    SF   NYG
2  [SF, NYG]    SF   NYG
3  [SF, NYG]    SF   NYG
4  [SF, NYG]    SF   NYG
5  [SF, NYG]    SF   NYG
6  [SF, NYG]    SF   NYG

And for a new DataFrame:

df3 = pd.DataFrame(df2['teams'].to_list(), columns=['team1','team2'])
print (df3)
  team1 team2
0    SF   NYG
1    SF   NYG
2    SF   NYG
3    SF   NYG
4    SF   NYG
5    SF   NYG
6    SF   NYG

A solution with apply(pd.Series) is very slow:

#7k rows
df2 = pd.concat([df2]*1000).reset_index(drop=True)

In [121]: %timeit df2['teams'].apply(pd.Series)
1.79 s ± 52.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [122]: %timeit pd.DataFrame(df2['teams'].to_list(), columns=['team1','team2'])
1.63 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Split list in a column to multiple columns

You could map ast.literal_eval to items in df2["1"]; build a DataFrame and join it to df1:

import ast
out = df1.join(pd.DataFrame(map(ast.literal_eval, df2["1"].tolist())).add_prefix('feature_'))

Output:

                          Text    Topic  feature_0  feature_1  feature_2
0  Where is the party tonight?    Party  -0.011571  -0.010117   0.062448
1                  Let's dance    Party  -0.082682  -0.001614   0.020942
2                  Hello world    Other  -0.063768  -0.015903   0.020942
3            It is rainy today  Weather   0.063796  -0.028781   0.056791

Splitting a list in a Pandas cell into multiple columns

You can loop through the Series with apply() function and convert each list to a Series, this automatically expand the list as a series in the column direction:

df[0].apply(pd.Series)

#   0    1   2
#0  8   10  12
#1  7    9  11

Update: To keep other columns of the data frame, you can concatenate the result with the columns you want to keep:

pd.concat([df[0].apply(pd.Series), df[1]], axis = 1)

#   0    1   2  1
#0  8   10  12  A
#1  7    9  11  B

Pandas: split column of lists of unequal length into multiple columns

Try:

pd.DataFrame(df.codes.values.tolist()).add_prefix('code_')

   code_0   code_1   code_2
0   71020      NaN      NaN
1   77085      NaN      NaN
2   36415      NaN      NaN
3   99213  99287.0      NaN
4   99233  99233.0  99233.0

Include the index

pd.DataFrame(df.codes.values.tolist(), df.index).add_prefix('code_')

   code_0   code_1   code_2
1   71020      NaN      NaN
2   77085      NaN      NaN
3   36415      NaN      NaN
4   99213  99287.0      NaN
5   99233  99233.0  99233.0

We can nail down all the formatting with this:

f = lambda x: 'code_{}'.format(x + 1)
pd.DataFrame(
    df.codes.values.tolist(),
    df.index, dtype=object
).fillna('').rename(columns=f)

   code_1 code_2 code_3
1   71020              
2   77085              
3   36415              
4   99213  99287       
5   99233  99233  99233

split one column into multiple columns usining delimiter

Use str.split to split

df[['date', 'date2', 'date3']] = df['date'].replace('NULL', np.nan).str.split('+', expand=True)

and count to count

df['number of dates'] = df[['date', 'date2', 'date3']].count(axis=1)

print(df)

     ID  date date2 date3  number of dates
0  3009  2016  2017  None                2
1   129  2015  None  None                1
2   119  2014  2019  2020                3
3   120  2020  None  None                1
4   121   NaN   NaN   NaN                0

How to split a pandas column with a list of dicts into separate columns for each key

The columns are lists of dicts.
- Each dict in the list can be moved to a separate column by using pandas.explode().
- Convert the column of dicts to a dataframe where the keys are column headers and the values are observations, by using pandas.json_normalize(), .join() this back to df.
Use .drop() to remove the unneeded column.
If the column contains list of dicts that are strings (e.g. "[{key: value}]"), refer to this solution in Splitting dictionary/list inside a Pandas Column into Separate Columns, and use:
- df.col2 = df.col2.apply(literal_eval), with from ast import literal_eval.

import pandas as pd

# create sample dataframe
df = pd.DataFrame({'col1': ['x', 'y'], 'col2': [[{"target": "NAge", "segment": "21 and older"}, {"target": "MinAge", "segment": "21"}, {"target": "Retargeting", "segment": "people who may be similar to their customers"}, {"target": "Region", "segment": "the United States"}], [{"target": "NAge", "segment": "18 and older"}, {"target": "Location Type", "segment": "HOME"}, {"target": "Interest", "segment": "Hispanic culture"}, {"target": "Interest", "segment": "Republican Party (United States)"}, {"target": "Location Granularity", "segment": "country"}, {"target": "Country", "segment": "the United States"}, {"target": "MinAge", "segment": 18}]]})

# display(df)
  col1                                                                                                                                                                                                                                                                                                                                                                                 col2
0    x                                                                                                                                                   [{'target': 'NAge', 'segment': '21 and older'}, {'target': 'MinAge', 'segment': '21'}, {'target': 'Retargeting', 'segment': 'people who may be similar to their customers'}, {'target': 'Region', 'segment': 'the United States'}]
1    y  [{'target': 'NAge', 'segment': '18 and older'}, {'target': 'Location Type', 'segment': 'HOME'}, {'target': 'Interest', 'segment': 'Hispanic culture'}, {'target': 'Interest', 'segment': 'Republican Party (United States)'}, {'target': 'Location Granularity', 'segment': 'country'}, {'target': 'Country', 'segment': 'the United States'}, {'target': 'MinAge', 'segment': 18}]

# use explode to give each dict in a list a separate row
df = df.explode('col2').reset_index(drop=True)

# normalize the column of dicts, join back to the remaining dataframe columns, and drop the unneeded column
df = df.join(pd.json_normalize(df.col2)).drop(columns=['col2'])

`display(df)`

   col1                target                                       segment
0     x                  NAge                                  21 and older
1     x                MinAge                                            21
2     x           Retargeting  people who may be similar to their customers
3     x                Region                             the United States
4     y                  NAge                                  18 and older
5     y         Location Type                                          HOME
6     y              Interest                              Hispanic culture
7     y              Interest              Republican Party (United States)
8     y  Location Granularity                                       country
9     y               Country                             the United States
10    y                MinAge                                            18

Get `count`

If the goal is to get the count for each 'target' and associated 'segment'

counts = df.groupby(['target', 'segment']).count()

Updated

This update is implemented for the full file

import pandas as pd
from ast import literal_eval

# load the file
df = pd.read_csv('en-US.csv')

# replace NaNs with '[]', otherwise literal_eval will error
df.targets = df.targets.fillna('[]')

# replace null with None, otherwise literal_eval will error
df.targets = df.targets.str.replace('null', 'None')

# convert the strings to lists of dicts
df.targets = df.targets.apply(literal_eval)

# use explode to give each dict in a list a separate row
df = df.explode('targets').reset_index(drop=True)

# fillna with {} is required for json_normalize
df.targets = df.targets.fillna({i: {} for i in df.index})

# normalize the column of dicts, join back to the remaining dataframe columns, and drop the unneeded column
normalized = pd.json_normalize(df.targets)

# get the counts
counts = normalized.groupby(['target', 'segment']).segment.count().reset_index(name='counts')

Split/Parse Values in One Column and create multiple Columns in Python

Hope this will give you the solution you want.

Original Data:

df = pd.DataFrame({'A': ['Order ID:0001ACW120I .Record ID:01160000000UAxCCW .Type:Small .Amount:4596.35  .Booked Date 2021-06-14']})

Replacing . with : & then splitting with :

df = df['A'].replace(to_replace ='\s[.]', value = ':', regex = True).str.split(':', expand = True)

Final dataset. Rename the columns.

print(df)

How Split Column of List-Values into Multiple Columns

Split a Pandas column of lists into multiple columns

Split list in a column to multiple columns

Splitting a list in a Pandas cell into multiple columns

Pandas: split column of lists of unequal length into multiple columns

split one column into multiple columns usining delimiter

How to split a pandas column with a list of dicts into separate columns for each key

`display(df)`

Get `count`

Updated

Split/Parse Values in One Column and create multiple Columns in Python

Related Topics

Leave a reply

Split a Pandas column of lists into multiple columns

Split list in a column to multiple columns

Splitting a list in a Pandas cell into multiple columns

Pandas: split column of lists of unequal length into multiple columns

split one column into multiple columns usining delimiter

How to split a pandas column with a list of dicts into separate columns for each key

display(df)

Get count

Updated

Split/Parse Values in One Column and create multiple Columns in Python

Related Topics

Leave a reply

`display(df)`

Get `count`