Pandas Groupby and Join Lists

pandas groupby and join lists

object dtype is a catch-all dtype that basically means not int, float, bool, datetime, or timedelta. So it is storing them as a list. convert_objects tries to convert a column to one of those dtypes.

You want

In [63]: df
Out[63]:
a b c
0 1 [1, 2, 3] foo
1 1 [2, 5] bar
2 2 [5, 6] baz

In [64]: df.groupby('a').agg({'b': 'sum', 'c': lambda x: ' '.join(x)})
Out[64]:
c b
a
1 foo bar [1, 2, 3, 2, 5]
2 baz [5, 6]

This groups the data frame by the values in column a. Read more about groupby.

This is doing a regular list sum (concatenation) just like [1, 2, 3] + [2, 5] with the result [1, 2, 3, 2, 5]

How to group dataframe rows into list in pandas groupby

You can do this using groupby to group on the column of interest and then apply list to every group:

In [1]: df = pd.DataFrame( {'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6]})
df

Out[1]:
a b
0 A 1
1 A 2
2 B 5
3 B 5
4 B 4
5 C 6

In [2]: df.groupby('a')['b'].apply(list)
Out[2]:
a
A [1, 2]
B [5, 5, 4]
C [6]
Name: b, dtype: object

In [3]: df1 = df.groupby('a')['b'].apply(list).reset_index(name='new')
df1
Out[3]:
a new
0 A [1, 2]
1 B [5, 5, 4]
2 C [6]

Pandas groupby() merge different lists of strings

That's because your Description column is string. You can strip out the [] and sum:

 '[' + df['Description'].str[1:-1].groupby(df['Fruit']).agg(', '.join) + ']'

Merge two lists in pandas groupby and apply

Concatenating lists is done by addition, so you can simply apply sum to the relevant column:

In [24]: df
Out[24]:
make model year
0 Audi A3 [1991, 1992, 1993]
1 Audi A3 [1997, 1998]

In [25]: df.groupby([df.make, df.model]).year.apply(sum)
Out[25]:
make model
Audi A3 [1991, 1992, 1993, 1997, 1998]
Name: year, dtype: object

Pandas: groupby column, merge rows of lists into a single column for group?

You can GroupBy and aggregate on the column containing lists with sum to concatenate the lists within the group and on Feature 2 with first:

df.groupby('Groups').agg({'Feature 1':'sum', 'Feature 2':'first'}).reset_index()

Groups Feature 1 Feature 2
0 GROUP A [abc, def, ghi, jkl, mno, pqr] 1
1 GROUP B [stu, vwx, yz, xx, yx, zx] 2
2 GROUP C [text, more, stuff, here, last, one] 3

Pandas groupby with delimiter join

Alternatively you can do it this way:

In [48]: df.groupby('col')['val'].agg('-'.join)
Out[48]:
col
A Cat-Tiger
B Ball-Bat
Name: val, dtype: object

UPDATE: answering question from the comment:

In [2]: df
Out[2]:
col val
0 A Cat
1 A Tiger
2 A Panda
3 B Ball
4 B Bat
5 B Mouse
6 B Egg

In [3]: df.groupby('col')['val'].agg('-'.join)
Out[3]:
col
A Cat-Tiger-Panda
B Ball-Bat-Mouse-Egg
Name: val, dtype: object

Last for convert index or MultiIndex to columns:

df1 = df.groupby('col')['val'].agg('-'.join).reset_index(name='new')

Groupby and append lists and strings

Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:

f1 = lambda x: ', '.join(x.dropna())
#alternative for join only strings
#f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
f2 = lambda x: [z for y in x for z in y]
d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
d['list'] = f2

df = df.groupby('value_1', as_index=False).agg(d)
print (df)
value_1 value_2 value_3 \
0 american california, nyc, texas walmart, kmart
1 canadian toronto dunkinDonuts, walmart

list
0 [supermarket, connivence, state]
1 [coffee, supermarket]

Explanation:

f1 and f2 are lambda functions.

First remove missing values (if exist) and join strings with separator:

f1 = lambda x: ', '.join(x.dropna())

First get only strings values (omit missing values, because NaNs) and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])

First get all string values with filtering empty strings and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if y != '']) 

Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]

f2 = lambda x: [z for y in x for z in y]

Pandas groupby and aggregate over multiple lists

You can get the average of the lists within each group in this way:

s = df.groupby("column_a")["column_b"].apply(lambda x: np.array(x.tolist()).mean(axis=0))

pd.DataFrame({'group':s.index, 'avg_list':s.values})

Gives:

  group avg_list
0 1 [1.5, 3.5, 2.0]
1 2 [5.0, 6.0, 6.0]
2 3 [3.0, 1.0, 2.0]


Related Topics



Leave a reply



Submit