Aggregating Unique Values in Columns to Single Dataframe "Cell"

How to get unique values from multiple columns in a pandas groupby

You can do it with apply:

import numpy as np
g = df.groupby('c')['l1','l2'].apply(lambda x: list(np.unique(x)))

Pandas, for each unique value in one column, get unique values in another column

Here are two strategies to do it. No doubt, there are other ways.

Assuming your dataframe looks something like this (obviously with more columns):

df = pd.DataFrame({'author':['a', 'a', 'b'], 'subreddit':['sr1', 'sr2', 'sr2']})

>>> df
  author subreddit
0      a       sr1
1      a       sr2
2      b       sr2
...

SOLUTION 1: groupby

More straightforward than solution 2, and similar to your first attempt:

group = df.groupby('author')

df2 = group.apply(lambda x: x['subreddit'].unique())

# Alternatively, same thing as a one liner:
# df2 = df.groupby('author').apply(lambda x: x['subreddit'].unique())

Result:

>>> df2
author
a    [sr1, sr2]
b         [sr2]

The author is the index, and the single column is the list of all subreddits they are active in (this is how I interpreted how you wanted your output, according to your description).

If you wanted the subreddits each in a separate column, which might be more useable, depending on what you want to do with it, you could just do this after:

df2 = df2.apply(pd.Series)

Result:

>>> df2
          0    1
author          
a       sr1  sr2
b       sr2  NaN

Solution 2: Iterate through dataframe

you can make a new dataframe with all unique authors:

df2 = pd.DataFrame({'author':df.author.unique()})

And then just get the list of all unique subreddits they are active in, assigning it to a new column:

df2['subreddits'] = [list(set(df['subreddit'].loc[df['author'] == x['author']])) 
    for _, x in df2.iterrows()]

This gives you this:

>>> df2
  author  subreddits
0      a  [sr2, sr1]
1      b       [sr2]

Aggregate unique values from multiple columns with pandas GroupBy

Use groupby and agg, and aggregate only unique values by calling Series.unique:

df.astype(str).groupby('prop1').agg(lambda x: ','.join(x.unique()))

            prop2       prop3      prop4
prop1                                   
K20       12,1,66  travis,leo   10.0,4.0
L30    3,54,11,10    bob,john  11.2,10.0

df.astype(str).groupby('prop1', sort=False).agg(lambda x: ','.join(x.unique()))

            prop2       prop3      prop4
prop1                                   
L30    3,54,11,10    bob,john  11.2,10.0
K20       12,1,66  travis,leo   10.0,4.0

If handling NaNs is important, call fillna in advance:

import re
df.fillna('').astype(str).groupby('prop1').agg(
    lambda x: re.sub(',+', ',', ','.join(x.unique()))
)

            prop2       prop3      prop4
prop1                                   
K20       12,1,66  travis,leo   10.0,4.0
L30    3,54,11,10    bob,john  11.2,10.0

unique combinations of values in selected columns in pandas data frame and count

You can groupby on cols 'A' and 'B' and call size and then reset_index and rename the generated column:

In [26]:

df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[26]:
     A    B  count
0   no   no      1
1   no  yes      2
2  yes   no      4
3  yes  yes      3

update

A little explanation, by grouping on the 2 columns, this groups rows where A and B values are the same, we call size which returns the number of unique groups:

In[202]:
df1.groupby(['A','B']).size()

Out[202]: 
A    B  
no   no     1
     yes    2
yes  no     4
     yes    3
dtype: int64

So now to restore the grouped columns, we call reset_index:

In[203]:
df1.groupby(['A','B']).size().reset_index()

Out[203]: 
     A    B  0
0   no   no  1
1   no  yes  2
2  yes   no  4
3  yes  yes  3

This restores the indices but the size aggregation is turned into a generated column 0, so we have to rename this:

In[204]:
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})

Out[204]: 
     A    B  count
0   no   no      1
1   no  yes      2
2  yes   no      4
3  yes  yes      3

groupby does accept the arg as_index which we could have set to False so it doesn't make the grouped columns the index, but this generates a series and you'd still have to restore the indices and so on....:

In[205]:
df1.groupby(['A','B'], as_index=False).size()

Out[205]: 
A    B  
no   no     1
     yes    2
yes  no     4
     yes    3
dtype: int64

How can I merge rows by same value in a column in Pandas with aggregation functions?

You are looking for

aggregation_functions = {'price': 'sum', 'amount': 'sum', 'name': 'first'}
df_new = df.groupby(df['id']).aggregate(aggregation_functions)

which gives

    price     name  amount
id                        
1     130     anna       3
2      42      bob      30
3       3  charlie     110

Multiple rows to single cell space delimited values in pandas with group by

Convert the 'value' column from int to string, then perform a groupby on 'id' and apply the str.join function:

# Convert 'value' column to string.
df1['value'] = df1['value'].astype(str)

# Perform a groupby and apply a string join.
df1 = df1.groupby('id')['value'].apply(' '.join).reset_index()

The resulting output:

   id  value
0   1  67 45
1   2  7 5 9

Pandas aggregate count distinct

How about either of:

>>> df
         date  duration user_id
0  2013-04-01        30    0001
1  2013-04-01        15    0001
2  2013-04-01        20    0002
3  2013-04-02        15    0002
4  2013-04-02        30    0002
>>> df.groupby("date").agg({"duration": np.sum, "user_id": pd.Series.nunique})
            duration  user_id
date                         
2013-04-01        65        2
2013-04-02        45        1
>>> df.groupby("date").agg({"duration": np.sum, "user_id": lambda x: x.nunique()})
            duration  user_id
date                         
2013-04-01        65        2
2013-04-02        45        1

Aggregate pandas dataframe but collapse duplicate cell values

If I Understand Correctly:

try via groupby()+agg() and use set for unique values instead of list:

df=df.groupby('query').agg(lambda x:' | '.join(set(x)))

If order is important then use pd.unique() for unique values:

df=df.groupby('query').agg(lambda x:' | '.join(pd.unique(x)))

If want to perform on selected columns then create a list of those columns and perform aggregration only on those columns:

cols=['knum','definition','A','B','C']
df=df.groupby('query')[cols].agg(lambda x:' | '.join(set(x)))