Concatenate strings from several rows using Pandas groupby
You can groupby the 'name'
and 'month'
columns, then call transform
which will return data aligned to the original df and apply a lambda where we join
the text entries:
In [119]:
df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))
df[['name','text','month']].drop_duplicates()
Out[119]:
name text month
0 name1 hej,du 11
2 name1 aj,oj 12
4 name2 fin,katt 11
6 name2 mycket,lite 12
I sub the original df by passing a list of the columns of interest df[['name','text','month']]
here and then call drop_duplicates
EDIT actually I can just call apply
and then reset_index
:
In [124]:
df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()
Out[124]:
name month text
0 name1 11 hej,du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite
update
the lambda
is unnecessary here:
In[38]:
df.groupby(['name','month'])['text'].apply(','.join).reset_index()
Out[38]:
name month text
0 name1 11 du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite
Pandas Dataframe Groupby join string whilst preserving order of strings
Use the sort=False
parameter in groupby
and drop_duplicates
instead set
:
df = df.sort_values(
['id', 'order_column']
).groupby('id', sort=False).agg(
{
'channel': lambda x: ' > '.join(x.drop_duplicates()),
'value': np.sum
}
)
How to use groupby to concatenate strings in python pandas?
You can apply join
on your column after groupby
:
df.groupby('index')['words'].apply(','.join)
Example:
In [326]:
df = pd.DataFrame({'id':['a','a','b','c','c'], 'words':['asd','rtr','s','rrtttt','dsfd']})
df
Out[326]:
id words
0 a asd
1 a rtr
2 b s
3 c rrtttt
4 c dsfd
In [327]:
df.groupby('id')['words'].apply(','.join)
Out[327]:
id
a asd,rtr
b s
c rrtttt,dsfd
Name: words, dtype: object
how to combine and concatenate strings from several rows in dataframe if unique key value is NaN?
You can use the non-empty values in a safe column to define groups, then aggregate:
# group rows that follow a row with non-empty value in Item
group = df['Item'].fillna('').ne('').cumsum()
# create a dictionary of aggregation functions
# by default get first row of group
d = {c: 'first' for c in df}
# for Address, join the rows
d['Address'] = ' '.join
df2 = df.groupby(group).agg(d)
Output:
Item Date Invoice No Center Address
Item
1 44 24/2/2022 AF6026321237160 Japan 106-0041 Tokyo-to, Minato-ku, Azabudai, 1 no 9 no 12.
2 45 24/2/2022 AF6026321237179 Korea Bldg. 102 Unit 304 Sajik-ro-3-gil23 Jongno-gu, Seoul 30174
3 46 24/2/2022 AF6026321237188 HK Flat 25, 12/F, Acacia Building 150 Kennedy Road WAN CHAI
Python Pandas: Groupby Sum AND Concatenate Strings
Let us make it into one line
df.groupby(['ID','Name'],as_index=False).agg(lambda x : x.sum() if x.dtype=='float64' else ' '.join(x))
Out[1510]:
ID Name COMMENT1 COMMENT2 NUM
0 1 dan hi you hello friend 3.0
1 2 jon dog cat 0.5
2 3 jon yeah yes nope no 3.1
pandas groupby concatenate strings in multiple columns
Use groupby/agg
to aggregate the groups. For each group, apply set
to find the unique strings, and ''.join
to concatenate the strings:
In [34]: df.groupby('id').agg(lambda x: ''.join(set(x)))
Out[34]:
category category2
id
a z 1
b yxz 2
c y 12
To move id
from the index to a column of the resultant DataFrame, call reset_index
:
In [59]: df.groupby('id').agg(lambda x: ''.join(set(x))).reset_index()
Out[59]:
id category category2
0 a z 1
1 b yxz 2
2 c y 12
Related Topics
Count the Number of Occurrences of a Character in a String
Prevent Scientific Notation in Matplotlib.Pyplot
Should I Put #! (Shebang) in Python Scripts, and What Form Should It Take
Why Does the Expression 0 ≪ 0 == 0 Return False in Python
When Is "I += X" Different from "I = I + X" in Python
Normal Arguments Vs. Keyword Arguments
How to Identify on Which Os Python Is Running On
Split Pandas Dataframe Based on Groupby
How to Read a Text File into a String Variable and Strip Newlines
How to Create a New Column from the Output of Pandas Groupby().Sum()
Importerror: No Module Named 'Pygame'
Finding Local Ip Addresses Using Python'S Stdlib
What Exactly Is Current Working Directory
Process Escape Sequences in a String in Python
Setting the Correct Encoding When Piping Stdout in Python
What Is an Alternative to Execfile in Python 3
Is There a Portable Way to Get the Current Username in Python