Concatenate Strings from Several Rows Using Pandas Groupby

Concatenate strings from several rows using Pandas groupby

You can groupby the 'name' and 'month' columns, then call transform which will return data aligned to the original df and apply a lambda where we join the text entries:

In [119]:

df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))
df[['name','text','month']].drop_duplicates()
Out[119]:
name text month
0 name1 hej,du 11
2 name1 aj,oj 12
4 name2 fin,katt 11
6 name2 mycket,lite 12

I sub the original df by passing a list of the columns of interest df[['name','text','month']] here and then call drop_duplicates

EDIT actually I can just call apply and then reset_index:

In [124]:

df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()

Out[124]:
name month text
0 name1 11 hej,du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite

update

the lambda is unnecessary here:

In[38]:
df.groupby(['name','month'])['text'].apply(','.join).reset_index()

Out[38]:
name month text
0 name1 11 du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite

Pandas Dataframe Groupby join string whilst preserving order of strings

Use the sort=False parameter in groupby and drop_duplicates instead set:

df = df.sort_values(
['id', 'order_column']
).groupby('id', sort=False).agg(
{
'channel': lambda x: ' > '.join(x.drop_duplicates()),
'value': np.sum
}
)

How to use groupby to concatenate strings in python pandas?

You can apply join on your column after groupby:

df.groupby('index')['words'].apply(','.join)

Example:

In [326]:
df = pd.DataFrame({'id':['a','a','b','c','c'], 'words':['asd','rtr','s','rrtttt','dsfd']})
df

Out[326]:
id words
0 a asd
1 a rtr
2 b s
3 c rrtttt
4 c dsfd

In [327]:
df.groupby('id')['words'].apply(','.join)

Out[327]:
id
a asd,rtr
b s
c rrtttt,dsfd
Name: words, dtype: object

how to combine and concatenate strings from several rows in dataframe if unique key value is NaN?

You can use the non-empty values in a safe column to define groups, then aggregate:

# group rows that follow a row with non-empty value in Item
group = df['Item'].fillna('').ne('').cumsum()

# create a dictionary of aggregation functions
# by default get first row of group
d = {c: 'first' for c in df}
# for Address, join the rows
d['Address'] = ' '.join

df2 = df.groupby(group).agg(d)

Output:

     Item       Date       Invoice No Center                                                     Address
Item
1 44 24/2/2022 AF6026321237160 Japan 106-0041 Tokyo-to, Minato-ku, Azabudai, 1 no 9 no 12.
2 45 24/2/2022 AF6026321237179 Korea Bldg. 102 Unit 304 Sajik-ro-3-gil23 Jongno-gu, Seoul 30174
3 46 24/2/2022 AF6026321237188 HK Flat 25, 12/F, Acacia Building 150 Kennedy Road WAN CHAI

Python Pandas: Groupby Sum AND Concatenate Strings

Let us make it into one line

df.groupby(['ID','Name'],as_index=False).agg(lambda x : x.sum() if x.dtype=='float64' else ' '.join(x))
Out[1510]:
ID Name COMMENT1 COMMENT2 NUM
0 1 dan hi you hello friend 3.0
1 2 jon dog cat 0.5
2 3 jon yeah yes nope no 3.1

pandas groupby concatenate strings in multiple columns

Use groupby/agg to aggregate the groups. For each group, apply set to find the unique strings, and ''.join to concatenate the strings:

In [34]: df.groupby('id').agg(lambda x: ''.join(set(x)))
Out[34]:
category category2
id
a z 1
b yxz 2
c y 12

To move id from the index to a column of the resultant DataFrame, call reset_index:

In [59]: df.groupby('id').agg(lambda x: ''.join(set(x))).reset_index()
Out[59]:
id category category2
0 a z 1
1 b yxz 2
2 c y 12


Related Topics



Leave a reply



Submit