How to convert column with list of values into rows in Pandas DataFrame
you can do it this way:
In [84]: df
Out[84]:
A B
0 some value [[L1, L2]]
1 another value [[L3, L4, L5]]
In [85]: (df['B'].apply(lambda x: pd.Series(x[0]))
....: .stack()
....: .reset_index(level=1, drop=True)
....: .to_frame('B')
....: .join(df[['A']], how='left')
....: )
Out[85]:
B A
0 L1 some value
0 L2 some value
1 L3 another value
1 L4 another value
1 L5 another value
UPDATE: a more generic solution
Pandas column of lists, create a row for each list element
UPDATE: the solution below was helpful for older Pandas versions, because the DataFrame.explode() wasn’t available. Starting from Pandas 0.25.0 you can simply use DataFrame.explode()
.
lst_col = 'samples'
r = pd.DataFrame({
col:np.repeat(df[col].values, df[lst_col].str.len())
for col in df.columns.drop(lst_col)}
).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]
Result:
In [103]: r
Out[103]:
samples subject trial_num
0 0.10 1 1
1 -0.20 1 1
2 0.05 1 1
3 0.25 1 2
4 1.32 1 2
5 -0.17 1 2
6 0.64 1 3
7 -0.22 1 3
8 -0.71 1 3
9 -0.03 2 1
10 -0.65 2 1
11 0.76 2 1
12 1.77 2 2
13 0.89 2 2
14 0.65 2 2
15 -0.98 2 3
16 0.65 2 3
17 -0.30 2 3
PS here you may find a bit more generic solution
UPDATE: some explanations: IMO the easiest way to understand this code is to try to execute it step-by-step:
in the following line we are repeating values in one column N
times where N
- is the length of the corresponding list:
In [10]: np.repeat(df['trial_num'].values, df[lst_col].str.len())
Out[10]: array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int64)
this can be generalized for all columns, containing scalar values:
In [11]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.drop(lst_col)}
...: )
Out[11]:
trial_num subject
0 1 1
1 1 1
2 1 1
3 2 1
4 2 1
5 2 1
6 3 1
.. ... ...
11 1 2
12 2 2
13 2 2
14 2 2
15 3 2
16 3 2
17 3 2
[18 rows x 2 columns]
using np.concatenate()
we can flatten all values in the list
column (samples
) and get a 1D vector:
In [12]: np.concatenate(df[lst_col].values)
Out[12]: array([-1.04, -0.58, -1.32, 0.82, -0.59, -0.34, 0.25, 2.09, 0.12, 0.83, -0.88, 0.68, 0.55, -0.56, 0.65, -0.04, 0.36, -0.31])
putting all this together:
In [13]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.drop(lst_col)}
...: ).assign(**{lst_col:np.concatenate(df[lst_col].values)})
Out[13]:
trial_num subject samples
0 1 1 -1.04
1 1 1 -0.58
2 1 1 -1.32
3 2 1 0.82
4 2 1 -0.59
5 2 1 -0.34
6 3 1 0.25
.. ... ... ...
11 1 2 0.68
12 2 2 0.55
13 2 2 -0.56
14 2 2 0.65
15 3 2 -0.04
16 3 2 0.36
17 3 2 -0.31
[18 rows x 3 columns]
using pd.DataFrame()[df.columns]
will guarantee that we are selecting columns in the original order...
How to convert dataframe columns with list of values into rows in Pandas DataFrame
You can use explode method chained like this,
df.explode('A').explode('B').explode('C').reset_index(drop=True)
A B C
0 X 1 aa
1 X 1 bb
2 X 1 cc
3 Y 2 xx
4 Y 2 yy
Alternatively, you can apply pd.Series.explode
on the dataframe like this,
df.apply(pd.Series.explode).reset_index(drop=True)
In pandas 1.3+ you can use a list of columns to explode on,
So the code will look like,
df.explode(['A', 'B', 'C']).reset_index(drop=True)
converting list like column values into multiple rows using Pandas DataFrame
You can use findall
or extractall
for get lists from hobbies
colum, then flatten with chain.from_iterable
and repeat another columns:
a = df['hobbies'].str.findall("'(.*?)'").astype(np.object)
lens = a.str.len()
from itertools import chain
df1 = pd.DataFrame({
'Location_City' : df['Location_City'].values.repeat(lens),
'Location_State' : df['Location_State'].values.repeat(lens),
'Name' : df['Name'].values.repeat(lens),
'hobbies' : list(chain.from_iterable(a.tolist())),
})
Or create Series
, remove first level and join
to original DataFrame
:
df1 = (df.join(df.pop('hobbies').str.extractall("'(.*?)'")[0]
.reset_index(level=1, drop=True)
.rename('hobbies'))
.reset_index(drop=True))
print (df1)
Location_City Location_State Name hobbies
0 Los Angeles CA John Music
1 Los Angeles CA John Running
2 Texas TX Jack Swimming
3 Texas TX Jack Trekking
Get list from pandas dataframe column or row?
Pandas DataFrame columns are Pandas Series when you pull them out, which you can then call x.tolist()
on to turn them into a Python list. Alternatively you cast it with list(x)
.
import pandas as pd
data_dict = {'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two': pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(data_dict)
print(f"DataFrame:\n{df}\n")
print(f"column types:\n{df.dtypes}")
col_one_list = df['one'].tolist()
col_one_arr = df['one'].to_numpy()
print(f"\ncol_one_list:\n{col_one_list}\ntype:{type(col_one_list)}")
print(f"\ncol_one_arr:\n{col_one_arr}\ntype:{type(col_one_arr)}")
Output:
DataFrame:
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
column types:
one float64
two int64
dtype: object
col_one_list:
[1.0, 2.0, 3.0, nan]
type:<class 'list'>
col_one_arr:
[ 1. 2. 3. nan]
type:<class 'numpy.ndarray'>
Pandas: Convert columns of lists into a single list
You can use a nested list comprehension:
dataSet['combined'] = [[e for l in x for e in l]
for _,x in dataSet.filter(like='value').iterrows()]
Output:
key valueA valueB valueN combined
0 1_1 [1, 2, 3, 4, 5, 6] [1, 2, 3, 4, 5, 6] [1, 2, 3, 4, 5, 6] [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]
1 1_2 [7, 8, 9, 10, 11, 12] [7, 8, 9, 10, 11, 12] [7, 8, 9, 10, 11, 12] [7, 8, 9, 10, 11, 12, 7, 8, 9, 10, 11, 12, 7, 8, 9, 10, 11, 12]
2 1_3 [13, 14, 15, 16, 17, 18] [13, 14, 15, 16, 17, 18] [13, 14, 15, 16, 17, 18] [13, 14, 15, 16, 17, 18, 13, 14, 15, 16, 17, 18, 13, 14, 15, 16, 17, 18]
Timing comparison with repeated addition (100 rows, 100 columns, 1000 items per list):
# repeated addition of the lists
8.66 s ± 309 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# nested list comprehension
729 ms ± 285 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Dataframe with a column which every row is a list of values
Here's an alternative approach:
result = pd.concat(
[df[["name", "favourite_fruits"]],
pd.DataFrame(lst for lst in df["votes"]).rename(columns=lambda n: f"vote{n + 1}")],
axis=1
)
How to transform dataframe column containing list of values in to its own individual column with count of occurrence?
edited based on feedback from Henry Ecker in comments, might as well have the better answer here:
You can use pd.explode()
to get everything within the lists to become separate rows, and then use pd.crosstab()
to count the occurrences.
df = presence_data.explode('presence')
pd.crosstab(index=df['id'],columns=df['presence'])
This gave me the following:
presence A B C G I
id
id1 2 1 1 0 0
id2 1 2 0 1 1
Python dataframe : converting columns into rows
If possible number in column names more like 9
use Series.str.extract
for get integers with values before to MultiIndex
to columns, so possible DataFrame.stack
:
df = df.set_index('Movie')
df1 = df.columns.to_series().str.extract('([a-zA-Z]+)(\d+)')
df.columns = pd.MultiIndex.from_arrays([df1[0], df1[1].astype(int)])
df = df.rename_axis((None, None), axis=1).stack().reset_index(level=1, drop=True).reset_index()
print (df)
Movie FirstName ID LastName
0 The Shawshank Redemption Tim TM Robbins
1 The Shawshank Redemption Morgan MF Freeman
2 The Godfather Marlon MB Brando
3 The Godfather Al AP Pacino
If not use indexing for get last values of columns names with all previous and pass to MultiIndex.from_arrays
:
df = df.set_index('Movie')
df.columns = pd.MultiIndex.from_arrays([df.columns.str[:-1], df.columns.str[-1].astype(int)])
df = df.stack().reset_index(level=1, drop=True).reset_index()
print (df)
Movie FirstName ID LastName
0 The Shawshank Redemption Tim TM Robbins
1 The Shawshank Redemption Morgan MF Freeman
2 The Godfather Marlon MB Brando
3 The Godfather Al AP Pacino
Related Topics
Matplotlib Yaxis Range Display Using Absolute Values Rather Than Offset Values
How Does Python Importing Exactly Work
Oserror: [Winerror 193] %1 Is Not a Valid Win32 Application
How to Strip Decorators from a Function in Python
Generate Permutations of List with Repeated Elements
Using a Dictionary to Select Function to Execute
How to Use Python to Get the System Hostname
Setting Django Up to Use MySQL
What's the Cleanest Way to Extract Urls from a String Using Python
Numpy: Get Random Set of Rows from 2D Array
How to Do Virtual File Processing
Python: One Try Multiple Except
Defining "Boolness" of a Class in Python
Identifying Objects, Why Does the Returned Value from Id(...) Change