Pandas groupby with delimiter join
Alternatively you can do it this way:
In [48]: df.groupby('col')['val'].agg('-'.join)
Out[48]:
col
A Cat-Tiger
B Ball-Bat
Name: val, dtype: object
UPDATE: answering question from the comment:
In [2]: df
Out[2]:
col val
0 A Cat
1 A Tiger
2 A Panda
3 B Ball
4 B Bat
5 B Mouse
6 B Egg
In [3]: df.groupby('col')['val'].agg('-'.join)
Out[3]:
col
A Cat-Tiger-Panda
B Ball-Bat-Mouse-Egg
Name: val, dtype: object
Last for convert index or MultiIndex to columns:
df1 = df.groupby('col')['val'].agg('-'.join).reset_index(name='new')
Concatenate strings from several rows using Pandas groupby
You can groupby the 'name'
and 'month'
columns, then call transform
which will return data aligned to the original df and apply a lambda where we join
the text entries:
In [119]:
df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))
df[['name','text','month']].drop_duplicates()
Out[119]:
name text month
0 name1 hej,du 11
2 name1 aj,oj 12
4 name2 fin,katt 11
6 name2 mycket,lite 12
I sub the original df by passing a list of the columns of interest df[['name','text','month']]
here and then call drop_duplicates
EDIT actually I can just call apply
and then reset_index
:
In [124]:
df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()
Out[124]:
name month text
0 name1 11 hej,du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite
update
the lambda
is unnecessary here:
In[38]:
df.groupby(['name','month'])['text'].apply(','.join).reset_index()
Out[38]:
name month text
0 name1 11 du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite
Groupby and join in pandas dataframe for string column with newline character
You can't accomplish what you want without have a new indexed lines (so '\n' doesn't work in DataFrame).
Easier solution:
df = df.sort_values('Category')
df['Last'] = df.groupby('Category')['Last'].transform(' '.join)
df.loc[df.duplicated('Category'), df.columns != 'Name'] = ''
>>> df
Name Last Loc Category
0 [1]Tabby buy buy NJ A
2 [3]Tabby
1 [2]Tabby buy sell JP B
3 [4]Tabby
>>> print(df.to_string(index=False))
Name Last Loc Category
[1]Tabby buy buy NJ A
[3]Tabby
[2]Tabby buy sell JP B
[4]Tabby
Old answer
An alternative could be:
out = df.groupby('Category', as_index=False) \
.agg({'Name': list, 'Last': ' '.join, 'Loc': 'first'}) \
.explode('Name')
At this point, the output is:
>>> out
Category Name Last Loc
0 A [1]Tabby buy buy NJ
0 A [3]Tabby buy buy NJ
1 B [2]Tabby buy sell JP
1 B [4]Tabby buy sell JP
Now you can use .loc
to remove extra content:
out.loc[out.duplicated('Category'), out.columns != 'Name'] = ''
out = out[df.columns]
Final output:
>>> out
Name Last Loc Category
0 [1]Tabby buy buy NJ A
0 [3]Tabby
1 [2]Tabby buy sell JP B
1 [4]Tabby
Pandas groupby: How to get a union of strings
In [4]: df = read_csv(StringIO(data),sep='\s+')
In [5]: df
Out[5]:
A B C
0 1 0.749065 This
1 2 0.301084 is
2 3 0.463468 a
3 4 0.643961 random
4 1 0.866521 string
5 2 0.120737 !
In [6]: df.dtypes
Out[6]:
A int64
B float64
C object
dtype: object
When you apply your own function, there is not automatic exclusions of non-numeric columns. This is slower, though, than the application of .sum()
to the groupby
In [8]: df.groupby('A').apply(lambda x: x.sum())
Out[8]:
A B C
A
1 2 1.615586 Thisstring
2 4 0.421821 is!
3 3 0.463468 a
4 4 0.643961 random
sum
by default concatenates
In [9]: df.groupby('A')['C'].apply(lambda x: x.sum())
Out[9]:
A
1 Thisstring
2 is!
3 a
4 random
dtype: object
You can do pretty much what you want
In [11]: df.groupby('A')['C'].apply(lambda x: "{%s}" % ', '.join(x))
Out[11]:
A
1 {This, string}
2 {is, !}
3 {a}
4 {random}
dtype: object
Doing this on a whole frame, one group at a time. Key is to return a Series
def f(x):
return Series(dict(A = x['A'].sum(),
B = x['B'].sum(),
C = "{%s}" % ', '.join(x['C'])))
In [14]: df.groupby('A').apply(f)
Out[14]:
A B C
A
1 2 1.615586 {This, string}
2 4 0.421821 {is, !}
3 3 0.463468 {a}
4 4 0.643961 {random}
Pandas groupby concat ungrouped column into comma separated string
Try groupby
and agg
like so:
(df.groupby(['col1', 'col2', 'col3'])['doc_no']
.agg(['count', ('doc_no', lambda x: ','.join(map(str, x)))])
.sort_values('count', ascending=False)
.reset_index())
col1 col2 col3 count doc_no
0 a x f 3 0,1,5
1 d x t 2 5,6
2 b x g 1 2
3 b y g 1 3
4 c x t 1 3
5 c y t 1 4
agg
is simple to use because you can specify a list of reducers to run on a single column.
Pandas group by one column concatenate values of other column as delimited list
If need unique strings:
You can add set
or unique
and if possible some None
s or NaN
s add dropna
:
df1 = (df.groupby('Job Title')['Qualification Name']
.apply(lambda x: ','.join(set(x.dropna())))
.reset_index())
print (df1)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst
Qualification Name
0 Diploma of Software Development,Diploma of Sof...
1
If order is important:
df1 = (df.groupby('Job Title')['Qualification Name']
.apply(lambda x: ','.join(x.dropna().unique()))
.reset_index())
print (df1)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst
Qualification Name
0 Diploma of Software Development,Certificate IV...
1
If want NaN
s for no values:
def f(x):
val = set(x.dropna())
if len(val) > 0:
val = ','.join(val)
else:
val = np.nan
return val
df2 = df.groupby('Job Title')['Qualification Name'].apply(f).reset_index()
print (df2)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst
Qualification Name
0 Diploma of Software Development,Diploma of Sof...
1 NaN
If need unique lists:
df2 = (df.groupby('Job Title')['Qualification Name']
.apply(lambda x: list(set(x)))
.reset_index())
print (df2)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst
Qualification Name
0 [Diploma of Software Development, Diploma of S...
1 [None]
df2 = (df.groupby('Job Title')['Qualification Name']
.apply(lambda x: list(x.unique()))
.reset_index())
print (df2)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst
Qualification Name
0 [Diploma of Software Development, Certificate ...
1 [None]
How to use groupby to concatenate strings in python pandas?
You can apply join
on your column after groupby
:
df.groupby('index')['words'].apply(','.join)
Example:
In [326]:
df = pd.DataFrame({'id':['a','a','b','c','c'], 'words':['asd','rtr','s','rrtttt','dsfd']})
df
Out[326]:
id words
0 a asd
1 a rtr
2 b s
3 c rrtttt
4 c dsfd
In [327]:
df.groupby('id')['words'].apply(','.join)
Out[327]:
id
a asd,rtr
b s
c rrtttt,dsfd
Name: words, dtype: object
pandas groupby and join lists
object
dtype is a catch-all dtype that basically means not int, float, bool, datetime, or timedelta. So it is storing them as a list. convert_objects
tries to convert a column to one of those dtypes.
You want
In [63]: df
Out[63]:
a b c
0 1 [1, 2, 3] foo
1 1 [2, 5] bar
2 2 [5, 6] baz
In [64]: df.groupby('a').agg({'b': 'sum', 'c': lambda x: ' '.join(x)})
Out[64]:
c b
a
1 foo bar [1, 2, 3, 2, 5]
2 baz [5, 6]
This groups the data frame by the values in column a
. Read more about groupby.
This is doing a regular list sum
(concatenation) just like [1, 2, 3] + [2, 5]
with the result [1, 2, 3, 2, 5]
Python Pandas: Groupby Sum AND Concatenate Strings
Let us make it into one line
df.groupby(['ID','Name'],as_index=False).agg(lambda x : x.sum() if x.dtype=='float64' else ' '.join(x))
Out[1510]:
ID Name COMMENT1 COMMENT2 NUM
0 1 dan hi you hello friend 3.0
1 2 jon dog cat 0.5
2 3 jon yeah yes nope no 3.1
Related Topics
Python Exit Commands - Why So Many and When Should Each Be Used
Maximum and Minimum Values For Ints
How to Parallelize a Simple Python Loop
How to Convert Local Time String to Utc
How to Query as Group by in Django
What Is a Good Way to Draw Images Using Pygame
Using @Property Versus Getters and Setters
Saving Utf-8 Texts With Json.Dumps as Utf8, Not as \U Escape Sequence
How to Implement Nested Dictionaries
How to Read Large Text Files Line by Line, Without Loading It into Memory
How to Use the Apply() Function For a Single Column
How to Split a List Based on a Condition
Variable Scopes in Python Classes
Groupby Value Counts on the Dataframe Pandas
Sort a List by Multiple Attributes
Using Numpy to Build an Array of All Combinations of Two Arrays