Concatenate Strings from Several Rows Using Pandas Groupby

Concatenate strings from several rows using Pandas groupby

You can groupby the 'name' and 'month' columns, then call transform which will return data aligned to the original df and apply a lambda where we join the text entries:

In [119]:

df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))
df[['name','text','month']].drop_duplicates()
Out[119]:
    name         text  month
0  name1       hej,du     11
2  name1        aj,oj     12
4  name2     fin,katt     11
6  name2  mycket,lite     12

I sub the original df by passing a list of the columns of interest df[['name','text','month']] here and then call drop_duplicates

EDIT actually I can just call apply and then reset_index:

In [124]:

df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()

Out[124]:
    name  month         text
0  name1     11       hej,du
1  name1     12        aj,oj
2  name2     11     fin,katt
3  name2     12  mycket,lite

update

the lambda is unnecessary here:

In[38]:
df.groupby(['name','month'])['text'].apply(','.join).reset_index()

Out[38]: 
    name  month         text
0  name1     11           du
1  name1     12        aj,oj
2  name2     11     fin,katt
3  name2     12  mycket,lite

Pandas Dataframe Groupby join string whilst preserving order of strings

Use the sort=False parameter in groupby and drop_duplicates instead set:

df = df.sort_values(
        ['id', 'order_column']
    ).groupby('id', sort=False).agg(
        {
            'channel': lambda x: ' > '.join(x.drop_duplicates()),
            'value': np.sum
        }
    )

How to use groupby to concatenate strings in python pandas?

You can apply join on your column after groupby:

df.groupby('index')['words'].apply(','.join)

Example:

In [326]:
df = pd.DataFrame({'id':['a','a','b','c','c'], 'words':['asd','rtr','s','rrtttt','dsfd']})
df

Out[326]:
  id   words
0  a     asd
1  a     rtr
2  b       s
3  c  rrtttt
4  c    dsfd

In [327]:
df.groupby('id')['words'].apply(','.join)

Out[327]:
id
a        asd,rtr
b              s
c    rrtttt,dsfd
Name: words, dtype: object

how to combine and concatenate strings from several rows in dataframe if unique key value is NaN?

You can use the non-empty values in a safe column to define groups, then aggregate:

# group rows that follow a row with non-empty value in Item
group = df['Item'].fillna('').ne('').cumsum()

# create a dictionary of aggregation functions
# by default get first row of group
d = {c: 'first' for c in df}
# for Address, join the rows
d['Address'] = ' '.join

df2 = df.groupby(group).agg(d)

Output:

     Item       Date       Invoice No Center                                                     Address
Item                                                                                                    
1      44  24/2/2022  AF6026321237160  Japan       106-0041 Tokyo-to, Minato-ku, Azabudai, 1 no 9 no 12.
2      45  24/2/2022  AF6026321237179  Korea  Bldg. 102 Unit 304 Sajik-ro-3-gil23 Jongno-gu, Seoul 30174
3      46  24/2/2022  AF6026321237188     HK    Flat 25, 12/F, Acacia Building 150 Kennedy Road WAN CHAI

Python Pandas: Groupby Sum AND Concatenate Strings

Let us make it into one line

df.groupby(['ID','Name'],as_index=False).agg(lambda x : x.sum() if x.dtype=='float64' else ' '.join(x))
Out[1510]: 
   ID Name  COMMENT1      COMMENT2  NUM
0   1  dan    hi you  hello friend  3.0
1   2  jon       dog           cat  0.5
2   3  jon  yeah yes       nope no  3.1

pandas groupby concatenate strings in multiple columns

Use groupby/agg to aggregate the groups. For each group, apply set to find the unique strings, and ''.join to concatenate the strings:

In [34]: df.groupby('id').agg(lambda x: ''.join(set(x)))
Out[34]: 
   category category2
id                   
a         z         1
b       yxz         2
c         y        12

To move id from the index to a column of the resultant DataFrame, call reset_index:

In [59]: df.groupby('id').agg(lambda x: ''.join(set(x))).reset_index()
Out[59]: 
  id category category2
0  a        z         1
1  b      yxz         2
2  c        y        12

Concatenate Strings from Several Rows Using Pandas Groupby