How to Use Groupby to Concatenate Strings in Python Pandas

Concatenate strings from several rows using Pandas groupby

You can groupby the 'name' and 'month' columns, then call transform which will return data aligned to the original df and apply a lambda where we join the text entries:

In [119]:

df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))
df[['name','text','month']].drop_duplicates()
Out[119]:
name text month
0 name1 hej,du 11
2 name1 aj,oj 12
4 name2 fin,katt 11
6 name2 mycket,lite 12

I sub the original df by passing a list of the columns of interest df[['name','text','month']] here and then call drop_duplicates

EDIT actually I can just call apply and then reset_index:

In [124]:

df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()

Out[124]:
name month text
0 name1 11 hej,du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite

update

the lambda is unnecessary here:

In[38]:
df.groupby(['name','month'])['text'].apply(','.join).reset_index()

Out[38]:
name month text
0 name1 11 du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite

How to use groupby to concatenate strings in python pandas?

You can apply join on your column after groupby:

df.groupby('index')['words'].apply(','.join)

Example:

In [326]:
df = pd.DataFrame({'id':['a','a','b','c','c'], 'words':['asd','rtr','s','rrtttt','dsfd']})
df

Out[326]:
id words
0 a asd
1 a rtr
2 b s
3 c rrtttt
4 c dsfd

In [327]:
df.groupby('id')['words'].apply(','.join)

Out[327]:
id
a asd,rtr
b s
c rrtttt,dsfd
Name: words, dtype: object

Concatenate string in groupby with conditions

You want to add an if statement inside your lambda function.

print (df.groupby(['id'])[['v1', 'v2']].agg(lambda x: ', '.join(set(x)) if len(set(x))==1 else ', '.join(x)).reset_index())

If the set(x) has only one element, then you just pass set(x) else you join the values.

The output of this will be:

   id       v1       v2
0 1 a b, d
1 2 c, d, f e, e, g

Pandas Dataframe Groupby join string whilst preserving order of strings

Use the sort=False parameter in groupby and drop_duplicates instead set:

df = df.sort_values(
['id', 'order_column']
).groupby('id', sort=False).agg(
{
'channel': lambda x: ' > '.join(x.drop_duplicates()),
'value': np.sum
}
)

Pandas: groupby and concat strings with condition

you can filter before groupby then reindex with the missing groups

out = data.loc[data.status == 'Finished'].groupby(['id', 'category'])['description'].apply(' '.join).reindex(pd.MultiIndex.from_frame(data[['id','category']].drop_duplicates()),fill_value= ' ').reset_index()
Out[70]:
id category description
0 11 A Text_1
1 22 A
2 33 B Text_1 Text_2

Pandas groupby and concatenate strings

Try with groupy and agg + join

s=df[['two','three']].agg('+'.join,1).groupby(df.one).agg('/n'.join).\
to_frame('two + three').reset_index()
one two + three
0 1 x+a/ny+b/nz+c
1 2 x+a/ny+b/nz+c

Concatenating strings after grouping by name and then sorting by date

Try apply with join

df.sort_values('date').groupby('name')['message'].apply(' '.join).reset_index()

name message
0 a Hi there
1 b Hello everyone
2 c Test

how to perform a groupby, sort, and concatenate strings in a pandas dataframe

Looks like groupby() and aggregation:

df.groupby(['PK', 'Source'], as_index=False).Text.agg(' '.join)

You can add sort_values('Line') to make sure that the lines are in order, e.g.

(df.sort_values('Line')
.groupby(['PK', 'Source'], as_index=False)
.Text.agg(' '.join)
)

Output:

   PK Source             Text
0 1 A The quick brown
1 2 A fox jumped
2 3 A over the lazy
3 4 A yellow
4 5 A dogs sam


Related Topics



Leave a reply



Submit