Pandas Groupby With Delimiter Join

Pandas groupby with delimiter join

Alternatively you can do it this way:

In [48]: df.groupby('col')['val'].agg('-'.join)
Out[48]:
col
A Cat-Tiger
B Ball-Bat
Name: val, dtype: object

UPDATE: answering question from the comment:

In [2]: df
Out[2]:
col val
0 A Cat
1 A Tiger
2 A Panda
3 B Ball
4 B Bat
5 B Mouse
6 B Egg

In [3]: df.groupby('col')['val'].agg('-'.join)
Out[3]:
col
A Cat-Tiger-Panda
B Ball-Bat-Mouse-Egg
Name: val, dtype: object

Last for convert index or MultiIndex to columns:

df1 = df.groupby('col')['val'].agg('-'.join).reset_index(name='new')

Concatenate strings from several rows using Pandas groupby

You can groupby the 'name' and 'month' columns, then call transform which will return data aligned to the original df and apply a lambda where we join the text entries:

In [119]:

df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))
df[['name','text','month']].drop_duplicates()
Out[119]:
name text month
0 name1 hej,du 11
2 name1 aj,oj 12
4 name2 fin,katt 11
6 name2 mycket,lite 12

I sub the original df by passing a list of the columns of interest df[['name','text','month']] here and then call drop_duplicates

EDIT actually I can just call apply and then reset_index:

In [124]:

df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()

Out[124]:
name month text
0 name1 11 hej,du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite

update

the lambda is unnecessary here:

In[38]:
df.groupby(['name','month'])['text'].apply(','.join).reset_index()

Out[38]:
name month text
0 name1 11 du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite

Groupby and join in pandas dataframe for string column with newline character

You can't accomplish what you want without have a new indexed lines (so '\n' doesn't work in DataFrame).

Easier solution:

df = df.sort_values('Category')
df['Last'] = df.groupby('Category')['Last'].transform(' '.join)
df.loc[df.duplicated('Category'), df.columns != 'Name'] = ''
>>> df
Name Last Loc Category
0 [1]Tabby buy buy NJ A
2 [3]Tabby
1 [2]Tabby buy sell JP B
3 [4]Tabby
>>> print(df.to_string(index=False))

Name Last Loc Category
[1]Tabby buy buy NJ A
[3]Tabby
[2]Tabby buy sell JP B
[4]Tabby

Old answer

An alternative could be:

out = df.groupby('Category', as_index=False) \
.agg({'Name': list, 'Last': ' '.join, 'Loc': 'first'}) \
.explode('Name')

At this point, the output is:

>>> out
Category Name Last Loc
0 A [1]Tabby buy buy NJ
0 A [3]Tabby buy buy NJ
1 B [2]Tabby buy sell JP
1 B [4]Tabby buy sell JP

Now you can use .loc to remove extra content:

out.loc[out.duplicated('Category'), out.columns != 'Name'] = ''
out = out[df.columns]

Final output:

>>> out
Name Last Loc Category
0 [1]Tabby buy buy NJ A
0 [3]Tabby
1 [2]Tabby buy sell JP B
1 [4]Tabby

Pandas groupby: How to get a union of strings

In [4]: df = read_csv(StringIO(data),sep='\s+')

In [5]: df
Out[5]:
A B C
0 1 0.749065 This
1 2 0.301084 is
2 3 0.463468 a
3 4 0.643961 random
4 1 0.866521 string
5 2 0.120737 !

In [6]: df.dtypes
Out[6]:
A int64
B float64
C object
dtype: object

When you apply your own function, there is not automatic exclusions of non-numeric columns. This is slower, though, than the application of .sum() to the groupby

In [8]: df.groupby('A').apply(lambda x: x.sum())
Out[8]:
A B C
A
1 2 1.615586 Thisstring
2 4 0.421821 is!
3 3 0.463468 a
4 4 0.643961 random

sum by default concatenates

In [9]: df.groupby('A')['C'].apply(lambda x: x.sum())
Out[9]:
A
1 Thisstring
2 is!
3 a
4 random
dtype: object

You can do pretty much what you want

In [11]: df.groupby('A')['C'].apply(lambda x: "{%s}" % ', '.join(x))
Out[11]:
A
1 {This, string}
2 {is, !}
3 {a}
4 {random}
dtype: object

Doing this on a whole frame, one group at a time. Key is to return a Series

def f(x):
return Series(dict(A = x['A'].sum(),
B = x['B'].sum(),
C = "{%s}" % ', '.join(x['C'])))

In [14]: df.groupby('A').apply(f)
Out[14]:
A B C
A
1 2 1.615586 {This, string}
2 4 0.421821 {is, !}
3 3 0.463468 {a}
4 4 0.643961 {random}

Pandas groupby concat ungrouped column into comma separated string

Try groupby and agg like so:

(df.groupby(['col1', 'col2', 'col3'])['doc_no']
.agg(['count', ('doc_no', lambda x: ','.join(map(str, x)))])
.sort_values('count', ascending=False)
.reset_index())

col1 col2 col3 count doc_no
0 a x f 3 0,1,5
1 d x t 2 5,6
2 b x g 1 2
3 b y g 1 3
4 c x t 1 3
5 c y t 1 4

agg is simple to use because you can specify a list of reducers to run on a single column.

Pandas group by one column concatenate values of other column as delimited list

If need unique strings:

You can add set or unique and if possible some Nones or NaNs add dropna:

df1 = (df.groupby('Job Title')['Qualification Name']
.apply(lambda x: ','.join(set(x.dropna())))
.reset_index())

print (df1)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst

Qualification Name
0 Diploma of Software Development,Diploma of Sof...
1

If order is important:

df1 = (df.groupby('Job Title')['Qualification Name']
.apply(lambda x: ','.join(x.dropna().unique()))
.reset_index())

print (df1)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst

Qualification Name
0 Diploma of Software Development,Certificate IV...
1

If want NaNs for no values:

def f(x):
val = set(x.dropna())
if len(val) > 0:
val = ','.join(val)
else:
val = np.nan
return val

df2 = df.groupby('Job Title')['Qualification Name'].apply(f).reset_index()
print (df2)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst

Qualification Name
0 Diploma of Software Development,Diploma of Sof...
1 NaN

If need unique lists:

df2 = (df.groupby('Job Title')['Qualification Name']
.apply(lambda x: list(set(x)))
.reset_index())

print (df2)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst

Qualification Name
0 [Diploma of Software Development, Diploma of S...
1 [None]

df2 = (df.groupby('Job Title')['Qualification Name']
.apply(lambda x: list(x.unique()))
.reset_index())

print (df2)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst

Qualification Name
0 [Diploma of Software Development, Certificate ...
1 [None]

How to use groupby to concatenate strings in python pandas?

You can apply join on your column after groupby:

df.groupby('index')['words'].apply(','.join)

Example:

In [326]:
df = pd.DataFrame({'id':['a','a','b','c','c'], 'words':['asd','rtr','s','rrtttt','dsfd']})
df

Out[326]:
id words
0 a asd
1 a rtr
2 b s
3 c rrtttt
4 c dsfd

In [327]:
df.groupby('id')['words'].apply(','.join)

Out[327]:
id
a asd,rtr
b s
c rrtttt,dsfd
Name: words, dtype: object

pandas groupby and join lists

object dtype is a catch-all dtype that basically means not int, float, bool, datetime, or timedelta. So it is storing them as a list. convert_objects tries to convert a column to one of those dtypes.

You want

In [63]: df
Out[63]:
a b c
0 1 [1, 2, 3] foo
1 1 [2, 5] bar
2 2 [5, 6] baz


In [64]: df.groupby('a').agg({'b': 'sum', 'c': lambda x: ' '.join(x)})
Out[64]:
c b
a
1 foo bar [1, 2, 3, 2, 5]
2 baz [5, 6]

This groups the data frame by the values in column a. Read more about groupby.

This is doing a regular list sum (concatenation) just like [1, 2, 3] + [2, 5] with the result [1, 2, 3, 2, 5]

Python Pandas: Groupby Sum AND Concatenate Strings

Let us make it into one line

df.groupby(['ID','Name'],as_index=False).agg(lambda x : x.sum() if x.dtype=='float64' else ' '.join(x))
Out[1510]:
ID Name COMMENT1 COMMENT2 NUM
0 1 dan hi you hello friend 3.0
1 2 jon dog cat 0.5
2 3 jon yeah yes nope no 3.1


Related Topics



Leave a reply



Submit