Concatenate strings from several rows using Pandas groupby
You can groupby the 'name'
and 'month'
columns, then call transform
which will return data aligned to the original df and apply a lambda where we join
the text entries:
In [119]:
df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))
df[['name','text','month']].drop_duplicates()
Out[119]:
name text month
0 name1 hej,du 11
2 name1 aj,oj 12
4 name2 fin,katt 11
6 name2 mycket,lite 12
I sub the original df by passing a list of the columns of interest df[['name','text','month']]
here and then call drop_duplicates
EDIT actually I can just call apply
and then reset_index
:
In [124]:
df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()
Out[124]:
name month text
0 name1 11 hej,du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite
update
the lambda
is unnecessary here:
In[38]:
df.groupby(['name','month'])['text'].apply(','.join).reset_index()
Out[38]:
name month text
0 name1 11 du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite
How to use groupby to concatenate strings in python pandas?
You can apply join
on your column after groupby
:
df.groupby('index')['words'].apply(','.join)
Example:
In [326]:
df = pd.DataFrame({'id':['a','a','b','c','c'], 'words':['asd','rtr','s','rrtttt','dsfd']})
df
Out[326]:
id words
0 a asd
1 a rtr
2 b s
3 c rrtttt
4 c dsfd
In [327]:
df.groupby('id')['words'].apply(','.join)
Out[327]:
id
a asd,rtr
b s
c rrtttt,dsfd
Name: words, dtype: object
Concatenate string in groupby with conditions
You want to add an if statement inside your lambda function.
print (df.groupby(['id'])[['v1', 'v2']].agg(lambda x: ', '.join(set(x)) if len(set(x))==1 else ', '.join(x)).reset_index())
If the set(x) has only one element, then you just pass set(x) else you join the values.
The output of this will be:
id v1 v2
0 1 a b, d
1 2 c, d, f e, e, g
Pandas Dataframe Groupby join string whilst preserving order of strings
Use the sort=False
parameter in groupby
and drop_duplicates
instead set
:
df = df.sort_values(
['id', 'order_column']
).groupby('id', sort=False).agg(
{
'channel': lambda x: ' > '.join(x.drop_duplicates()),
'value': np.sum
}
)
Pandas: groupby and concat strings with condition
you can filter before groupby
then reindex
with the missing groups
out = data.loc[data.status == 'Finished'].groupby(['id', 'category'])['description'].apply(' '.join).reindex(pd.MultiIndex.from_frame(data[['id','category']].drop_duplicates()),fill_value= ' ').reset_index()
Out[70]:
id category description
0 11 A Text_1
1 22 A
2 33 B Text_1 Text_2
Pandas groupby and concatenate strings
Try with groupy
and agg
+ join
s=df[['two','three']].agg('+'.join,1).groupby(df.one).agg('/n'.join).\
to_frame('two + three').reset_index()
one two + three
0 1 x+a/ny+b/nz+c
1 2 x+a/ny+b/nz+c
Concatenating strings after grouping by name and then sorting by date
Try apply
with join
df.sort_values('date').groupby('name')['message'].apply(' '.join).reset_index()
name message
0 a Hi there
1 b Hello everyone
2 c Test
how to perform a groupby, sort, and concatenate strings in a pandas dataframe
Looks like groupby()
and aggregation:
df.groupby(['PK', 'Source'], as_index=False).Text.agg(' '.join)
You can add sort_values('Line')
to make sure that the lines are in order, e.g.
(df.sort_values('Line')
.groupby(['PK', 'Source'], as_index=False)
.Text.agg(' '.join)
)
Output:
PK Source Text
0 1 A The quick brown
1 2 A fox jumped
2 3 A over the lazy
3 4 A yellow
4 5 A dogs sam
Related Topics
What Can Multiprocessing and Dill Do Together
Generate a Random Letter in Python
Inheritance of Private and Protected Methods in Python
Python: Best Way to Add to Sys.Path Relative to the Current Running Script
Valueerror: Numpy.Dtype Has the Wrong Size, Try Recompiling
How to Pass a User Defined Argument in Scrapy Spider
File Not Found Error When Launching a Subprocess Containing Piped Commands
Heapq with Custom Compare Predicate
In Selenium Web Driver How to Choose the Correct Iframe
Pil Thumbnail Is Rotating My Image
How to Extract Parameters from a List and Pass Them to a Function Call
Take Screenshot of Full Page with Selenium Python with Chromedriver
How to Get a Raw, Compiled SQL Query from a SQLalchemy Expression
Streaming Data with Python and Flask
Most Efficient Property to Hash for Numpy Array
Python Multiprocessing: Handling Child Errors in Parent
How to Filter Pandas Dataframes by Multiple Columns
Pip - Fatal Error in Launcher: Unable to Create Process Using '"'