Pandas Groupby With Delimiter Join

Pandas groupby with delimiter join

Alternatively you can do it this way:

In [48]: df.groupby('col')['val'].agg('-'.join)
Out[48]:
col
A    Cat-Tiger
B     Ball-Bat
Name: val, dtype: object

UPDATE: answering question from the comment:

In [2]: df
Out[2]:
  col    val
0   A    Cat
1   A  Tiger
2   A  Panda
3   B   Ball
4   B    Bat
5   B  Mouse
6   B    Egg

In [3]: df.groupby('col')['val'].agg('-'.join)
Out[3]:
col
A       Cat-Tiger-Panda
B    Ball-Bat-Mouse-Egg
Name: val, dtype: object

Last for convert index or MultiIndex to columns:

df1 = df.groupby('col')['val'].agg('-'.join).reset_index(name='new')

Concatenate strings from several rows using Pandas groupby

You can groupby the 'name' and 'month' columns, then call transform which will return data aligned to the original df and apply a lambda where we join the text entries:

In [119]:

df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))
df[['name','text','month']].drop_duplicates()
Out[119]:
    name         text  month
0  name1       hej,du     11
2  name1        aj,oj     12
4  name2     fin,katt     11
6  name2  mycket,lite     12

I sub the original df by passing a list of the columns of interest df[['name','text','month']] here and then call drop_duplicates

EDIT actually I can just call apply and then reset_index:

In [124]:

df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()

Out[124]:
    name  month         text
0  name1     11       hej,du
1  name1     12        aj,oj
2  name2     11     fin,katt
3  name2     12  mycket,lite

update

the lambda is unnecessary here:

In[38]:
df.groupby(['name','month'])['text'].apply(','.join).reset_index()

Out[38]: 
    name  month         text
0  name1     11           du
1  name1     12        aj,oj
2  name2     11     fin,katt
3  name2     12  mycket,lite

Groupby and join in pandas dataframe for string column with newline character

You can't accomplish what you want without have a new indexed lines (so '\n' doesn't work in DataFrame).

Easier solution:

df = df.sort_values('Category')
df['Last'] = df.groupby('Category')['Last'].transform(' '.join)
df.loc[df.duplicated('Category'), df.columns != 'Name'] = ''

>>> df
       Name      Last Loc Category
0  [1]Tabby   buy buy  NJ        A
2  [3]Tabby
1  [2]Tabby  buy sell  JP        B
3  [4]Tabby

>>> print(df.to_string(index=False))

    Name     Last Loc Category
[1]Tabby  buy buy  NJ        A
[3]Tabby
[2]Tabby buy sell  JP        B
[4]Tabby

Old answer

An alternative could be:

out = df.groupby('Category', as_index=False) \
        .agg({'Name': list, 'Last': ' '.join, 'Loc': 'first'}) \
        .explode('Name')

At this point, the output is:

>>> out
  Category      Name      Last Loc
0        A  [1]Tabby   buy buy  NJ
0        A  [3]Tabby   buy buy  NJ
1        B  [2]Tabby  buy sell  JP
1        B  [4]Tabby  buy sell  JP

Now you can use .loc to remove extra content:

out.loc[out.duplicated('Category'), out.columns != 'Name'] = ''
out = out[df.columns]

Final output:

>>> out
       Name      Last Loc Category
0  [1]Tabby   buy buy  NJ        A
0  [3]Tabby
1  [2]Tabby  buy sell  JP        B
1  [4]Tabby

Pandas groupby: How to get a union of strings

In [4]: df = read_csv(StringIO(data),sep='\s+')

In [5]: df
Out[5]: 
   A         B       C
0  1  0.749065    This
1  2  0.301084      is
2  3  0.463468       a
3  4  0.643961  random
4  1  0.866521  string
5  2  0.120737       !

In [6]: df.dtypes
Out[6]: 
A      int64
B    float64
C     object
dtype: object

When you apply your own function, there is not automatic exclusions of non-numeric columns. This is slower, though, than the application of .sum() to the groupby

In [8]: df.groupby('A').apply(lambda x: x.sum())
Out[8]: 
   A         B           C
A                         
1  2  1.615586  Thisstring
2  4  0.421821         is!
3  3  0.463468           a
4  4  0.643961      random

sum by default concatenates

In [9]: df.groupby('A')['C'].apply(lambda x: x.sum())
Out[9]: 
A
1    Thisstring
2           is!
3             a
4        random
dtype: object

You can do pretty much what you want

In [11]: df.groupby('A')['C'].apply(lambda x: "{%s}" % ', '.join(x))
Out[11]: 
A
1    {This, string}
2           {is, !}
3               {a}
4          {random}
dtype: object

Doing this on a whole frame, one group at a time. Key is to return a Series

def f(x):
     return Series(dict(A = x['A'].sum(), 
                        B = x['B'].sum(), 
                        C = "{%s}" % ', '.join(x['C'])))

In [14]: df.groupby('A').apply(f)
Out[14]: 
   A         B               C
A                             
1  2  1.615586  {This, string}
2  4  0.421821         {is, !}
3  3  0.463468             {a}
4  4  0.643961        {random}

Pandas groupby concat ungrouped column into comma separated string

Try groupby and agg like so:

(df.groupby(['col1', 'col2', 'col3'])['doc_no']
   .agg(['count', ('doc_no',  lambda x: ','.join(map(str, x)))])
   .sort_values('count', ascending=False)     
   .reset_index())

  col1 col2 col3  count doc_no
0    a    x    f      3  0,1,5
1    d    x    t      2    5,6
2    b    x    g      1      2
3    b    y    g      1      3
4    c    x    t      1      3
5    c    y    t      1      4

agg is simple to use because you can specify a list of reducers to run on a single column.

Pandas group by one column concatenate values of other column as delimited list

If need unique strings:

You can add set or unique and if possible some Nones or NaNs add dropna:

df1 = (df.groupby('Job Title')['Qualification Name']
       .apply(lambda x: ','.join(set(x.dropna())))
       .reset_index())

print (df1)
                     Job Title  \
0               .Net Developer   
1  Snr Finance Systems Analyst   

                                  Qualification Name  
0  Diploma of Software Development,Diploma of Sof...  
1

If order is important:

df1 = (df.groupby('Job Title')['Qualification Name']
       .apply(lambda x: ','.join(x.dropna().unique()))
       .reset_index())

print (df1)
                     Job Title  \
0               .Net Developer   
1  Snr Finance Systems Analyst   

                                  Qualification Name  
0  Diploma of Software Development,Certificate IV...  
1

If want NaNs for no values:

def f(x):
    val = set(x.dropna())
    if len(val) > 0:
        val = ','.join(val)
    else:
        val = np.nan
    return val

df2 = df.groupby('Job Title')['Qualification Name'].apply(f).reset_index()
print (df2)
                     Job Title  \
0               .Net Developer   
1  Snr Finance Systems Analyst   

                                  Qualification Name  
0  Diploma of Software Development,Diploma of Sof...  
1                                                NaN

If need unique lists:

df2 = (df.groupby('Job Title')['Qualification Name']
       .apply(lambda x: list(set(x)))
       .reset_index())

print (df2)
                     Job Title  \
0               .Net Developer   
1  Snr Finance Systems Analyst   

                                  Qualification Name  
0  [Diploma of Software Development, Diploma of S...  
1                                             [None]  

df2 = (df.groupby('Job Title')['Qualification Name']
       .apply(lambda x: list(x.unique()))
       .reset_index())

print (df2)
                     Job Title  \
0               .Net Developer   
1  Snr Finance Systems Analyst   

                                  Qualification Name  
0  [Diploma of Software Development, Certificate ...  
1                                             [None]

How to use groupby to concatenate strings in python pandas?

You can apply join on your column after groupby:

df.groupby('index')['words'].apply(','.join)

Example:

In [326]:
df = pd.DataFrame({'id':['a','a','b','c','c'], 'words':['asd','rtr','s','rrtttt','dsfd']})
df

Out[326]:
  id   words
0  a     asd
1  a     rtr
2  b       s
3  c  rrtttt
4  c    dsfd

In [327]:
df.groupby('id')['words'].apply(','.join)

Out[327]:
id
a        asd,rtr
b              s
c    rrtttt,dsfd
Name: words, dtype: object

pandas groupby and join lists

object dtype is a catch-all dtype that basically means not int, float, bool, datetime, or timedelta. So it is storing them as a list. convert_objects tries to convert a column to one of those dtypes.

You want

In [63]: df
Out[63]: 
   a          b    c
0  1  [1, 2, 3]  foo
1  1     [2, 5]  bar
2  2     [5, 6]  baz


In [64]: df.groupby('a').agg({'b': 'sum', 'c': lambda x: ' '.join(x)})
Out[64]: 
         c                b
a                          
1  foo bar  [1, 2, 3, 2, 5]
2      baz           [5, 6]

This groups the data frame by the values in column a. Read more about groupby.

This is doing a regular list sum (concatenation) just like [1, 2, 3] + [2, 5] with the result [1, 2, 3, 2, 5]

Python Pandas: Groupby Sum AND Concatenate Strings

Let us make it into one line

df.groupby(['ID','Name'],as_index=False).agg(lambda x : x.sum() if x.dtype=='float64' else ' '.join(x))
Out[1510]: 
   ID Name  COMMENT1      COMMENT2  NUM
0   1  dan    hi you  hello friend  3.0
1   2  jon       dog           cat  0.5
2   3  jon  yeah yes       nope no  3.1

Pandas Groupby With Delimiter Join