Pandas Groupby and Join Lists

pandas groupby and join lists

object dtype is a catch-all dtype that basically means not int, float, bool, datetime, or timedelta. So it is storing them as a list. convert_objects tries to convert a column to one of those dtypes.

You want

In [63]: df
Out[63]: 
   a          b    c
0  1  [1, 2, 3]  foo
1  1     [2, 5]  bar
2  2     [5, 6]  baz

In [64]: df.groupby('a').agg({'b': 'sum', 'c': lambda x: ' '.join(x)})
Out[64]: 
         c                b
a                          
1  foo bar  [1, 2, 3, 2, 5]
2      baz           [5, 6]

This groups the data frame by the values in column a. Read more about groupby.

This is doing a regular list sum (concatenation) just like [1, 2, 3] + [2, 5] with the result [1, 2, 3, 2, 5]

How to group dataframe rows into list in pandas groupby

You can do this using groupby to group on the column of interest and then apply list to every group:

In [1]: df = pd.DataFrame( {'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6]})
        df

Out[1]: 
   a  b
0  A  1
1  A  2
2  B  5
3  B  5
4  B  4
5  C  6

In [2]: df.groupby('a')['b'].apply(list)
Out[2]: 
a
A       [1, 2]
B    [5, 5, 4]
C          [6]
Name: b, dtype: object

In [3]: df1 = df.groupby('a')['b'].apply(list).reset_index(name='new')
        df1
Out[3]: 
   a        new
0  A     [1, 2]
1  B  [5, 5, 4]
2  C        [6]

Pandas groupby() merge different lists of strings

That's because your Description column is string. You can strip out the [] and sum:

 '[' + df['Description'].str[1:-1].groupby(df['Fruit']).agg(', '.join) + ']'

Merge two lists in pandas groupby and apply

Concatenating lists is done by addition, so you can simply apply sum to the relevant column:

In [24]: df
Out[24]:
   make model                year
0  Audi    A3  [1991, 1992, 1993]
1  Audi    A3        [1997, 1998]

In [25]: df.groupby([df.make, df.model]).year.apply(sum)
Out[25]:
make  model
Audi  A3       [1991, 1992, 1993, 1997, 1998]
Name: year, dtype: object

Pandas: groupby column, merge rows of lists into a single column for group?

You can GroupBy and aggregate on the column containing lists with sum to concatenate the lists within the group and on Feature 2 with first:

df.groupby('Groups').agg({'Feature 1':'sum', 'Feature 2':'first'}).reset_index()

   Groups                        Feature 1          Feature 2
0  GROUP A        [abc, def, ghi, jkl, mno, pqr]         1
1  GROUP B            [stu, vwx, yz, xx, yx, zx]         2
2  GROUP C  [text, more, stuff, here, last, one]         3

Pandas groupby with delimiter join

Alternatively you can do it this way:

In [48]: df.groupby('col')['val'].agg('-'.join)
Out[48]:
col
A    Cat-Tiger
B     Ball-Bat
Name: val, dtype: object

UPDATE: answering question from the comment:

In [2]: df
Out[2]:
  col    val
0   A    Cat
1   A  Tiger
2   A  Panda
3   B   Ball
4   B    Bat
5   B  Mouse
6   B    Egg

In [3]: df.groupby('col')['val'].agg('-'.join)
Out[3]:
col
A       Cat-Tiger-Panda
B    Ball-Bat-Mouse-Egg
Name: val, dtype: object

Last for convert index or MultiIndex to columns:

df1 = df.groupby('col')['val'].agg('-'.join).reset_index(name='new')

Groupby and append lists and strings

Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:

f1 = lambda x: ', '.join(x.dropna())
#alternative for join only strings
#f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
f2 = lambda x: [z for y in x for z in y]
d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
d['list'] = f2 

df = df.groupby('value_1', as_index=False).agg(d)
print (df)
     value_1                 value_2                value_3  \
0   american  california, nyc, texas         walmart, kmart   
1   canadian                 toronto  dunkinDonuts, walmart   

                               list  
0  [supermarket, connivence, state]  
1             [coffee, supermarket]

Explanation:

f1 and f2 are lambda functions.

First remove missing values (if exist) and join strings with separator:

f1 = lambda x: ', '.join(x.dropna())

First get only strings values (omit missing values, because NaNs) and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])

First get all string values with filtering empty strings and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if y != ''])

Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]

f2 = lambda x: [z for y in x for z in y]

Pandas groupby and aggregate over multiple lists

You can get the average of the lists within each group in this way:

s = df.groupby("column_a")["column_b"].apply(lambda x: np.array(x.tolist()).mean(axis=0))

pd.DataFrame({'group':s.index, 'avg_list':s.values})

Gives:

  group avg_list
0   1   [1.5, 3.5, 2.0]
1   2   [5.0, 6.0, 6.0]
2   3   [3.0, 1.0, 2.0]

Pandas Groupby and Join Lists