Split (explode) pandas dataframe string entry to separate rows
How about something like this:
In [55]: pd.concat([Series(row['var2'], row['var1'].split(','))
for _, row in a.iterrows()]).reset_index()
Out[55]:
index 0
0 a 1
1 b 1
2 c 1
3 d 2
4 e 2
5 f 2
Then you just have to rename the columns
Explode rows pandas dataframe
We can use Series.str.split
to parse out the relevant information in to a list
prior to using explode
.
df.assign(
Letters=df.Letters \
.str \
.split(" : ", expand=True)[1] \
.str.split(",") \
) \
.explode("Letters")
Letters Date
0 a 2021
1 a 2019
1 b 2019
1 c 2019
2 a 2017
2 b 2017
Please note the index is not reset in this answer, you can do that if you need by calling reset_index
.
Python Dataframe Explode Rows with multiple values
From pandas docs pandas.DataFrame.explode
specify a non-empty list with each element be str or tuple
To use explode your 'tags' column needs to be a list type. Apply a function to convert your string tags separated by commas to a list then go with option 1 df.explode('tags')
Split cell into multiple rows in pandas dataframe
Here's one way using numpy.repeat
and itertools.chain
. Conceptually, this is exactly what you want to do: repeat some values, chain others. Recommended for small numbers of columns, otherwise stack
based methods may fare better.
import numpy as np
from itertools import chain
# return list from series of comma-separated strings
def chainer(s):
return list(chain.from_iterable(s.str.split(',')))
# calculate lengths of splits
lens = df['package'].str.split(',').map(len)
# create new dataframe, repeating or chaining as appropriate
res = pd.DataFrame({'order_id': np.repeat(df['order_id'], lens),
'order_date': np.repeat(df['order_date'], lens),
'package': chainer(df['package']),
'package_code': chainer(df['package_code'])})
print(res)
order_id order_date package package_code
0 1 20/5/2018 p1 #111
0 1 20/5/2018 p2 #222
0 1 20/5/2018 p3 #333
1 3 22/5/2018 p4 #444
2 7 23/5/2018 p5 #555
2 7 23/5/2018 p6 #666
Split words from datraframe by space to rows while duplicating the info from other columns ( python,pandas)
You need to split
and explode
:
df2 = (df
.assign(comments=df['comments'].str.split())
.explode('comments')
)
output:
r_id start comments
0 1 2021-01-01 i
0 1 2021-01-01 am
0 1 2021-01-01 the
0 1 2021-01-01 text
0 1 2021-01-01 that
0 1 2021-01-01 needs
0 1 2021-01-01 splitting
0 1 2021-01-01 by
0 1 2021-01-01 space
0 1 2021-01-01 to
0 1 2021-01-01 rows
1 2 2021-01-02 hello
1 2 2021-01-02 hello
Splitting and Visualizing in Python
Hi and welcome to StackOverflow. You mentioned countplot()
. This is available in seaborn
. Assuming that is what you are planning to use... Note that the countplot will count the number of entries and graph will show how many items are present once, how many are present twice, etc...
The updated code is below.
>>df
Gender KnownBrands
0 Man NIVEA MEN;GATSBY;
1 Man GATSBY;GARNIER MEN;L’OREAL MEN EXPERT;
2 Woman CLINIQUE FOR MEN;SK-II MEN;Neutrogena MEN;
3 Man NIVEA MEN;GARNIER MEN;L’OREAL MEN EXPERT;GATSBY;
4 Woman NIVEA MEN;GATSBY;
brands = df["KnownBrands"].str.split(";").explode().astype(object).reset_index()
output = brands.pivot(index="index", columns="KnownBrands", values= "KnownBrands").reset_index(drop = True).drop('', 1)
>>output.count()
KnownBrands
CLINIQUE FOR MEN 1
GARNIER MEN 2
GATSBY 4
L’OREAL MEN EXPERT 2
NIVEA MEN 3
Neutrogena MEN 1
SK-II MEN 1
dtype: int64
import seaborn as sns
sns.countplot(x=output.count())
Output plot
Python: How to expand column with list of values to multiple rows?
I think this will solve your issue:
import pandas as pd
df = pd.DataFrame({"Item": ["a", "b"], "Match": ["bb,cc", "dd,ee"]})
df["Match"] = df["Match"].str.split(",")
df.explode("Match")
Related Topics
How to Remove Script Tags With Beautifulsoup
Python Spawn Off a Child Subprocess, Detach, and Exit
Tkinter.Photoimage Doesn't Not Support Png Image
Understanding Python Subprocess.Check_Output'S First Argument and Shell=True
Convert Rgb Color to English Color Name, Like 'Green' With Python
Uninstall Python Built from Source
Run Interactive Bash With Popen and a Dedicated Tty Python
Parsing a Date That Can Be in Several Formats in Python
What Does the "Yield" Keyword Do
How to Remove Duplicates from a List, While Preserving Order
Why Does Append() Always Return None in Python
Split (Explode) Pandas Dataframe String Entry to Separate Rows
How to Plot in Multiple Subplots
How to Concatenate Items in a List to a Single String
Import Multiple CSV Files into Pandas and Concatenate into One Dataframe