pandas groupby and join lists
object
dtype is a catch-all dtype that basically means not int, float, bool, datetime, or timedelta. So it is storing them as a list. convert_objects
tries to convert a column to one of those dtypes.
You want
In [63]: df
Out[63]:
a b c
0 1 [1, 2, 3] foo
1 1 [2, 5] bar
2 2 [5, 6] baz
In [64]: df.groupby('a').agg({'b': 'sum', 'c': lambda x: ' '.join(x)})
Out[64]:
c b
a
1 foo bar [1, 2, 3, 2, 5]
2 baz [5, 6]
This groups the data frame by the values in column a
. Read more about groupby.
This is doing a regular list sum
(concatenation) just like [1, 2, 3] + [2, 5]
with the result [1, 2, 3, 2, 5]
How to group dataframe rows into list in pandas groupby
You can do this using groupby
to group on the column of interest and then apply
list
to every group:
In [1]: df = pd.DataFrame( {'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6]})
df
Out[1]:
a b
0 A 1
1 A 2
2 B 5
3 B 5
4 B 4
5 C 6
In [2]: df.groupby('a')['b'].apply(list)
Out[2]:
a
A [1, 2]
B [5, 5, 4]
C [6]
Name: b, dtype: object
In [3]: df1 = df.groupby('a')['b'].apply(list).reset_index(name='new')
df1
Out[3]:
a new
0 A [1, 2]
1 B [5, 5, 4]
2 C [6]
Pandas groupby() merge different lists of strings
That's because your Description
column is string. You can strip out the []
and sum:
'[' + df['Description'].str[1:-1].groupby(df['Fruit']).agg(', '.join) + ']'
Merge two lists in pandas groupby and apply
Concatenating lists is done by addition, so you can simply apply sum
to the relevant column:
In [24]: df
Out[24]:
make model year
0 Audi A3 [1991, 1992, 1993]
1 Audi A3 [1997, 1998]
In [25]: df.groupby([df.make, df.model]).year.apply(sum)
Out[25]:
make model
Audi A3 [1991, 1992, 1993, 1997, 1998]
Name: year, dtype: object
Pandas: groupby column, merge rows of lists into a single column for group?
You can GroupBy
and aggregate on the column containing lists with sum
to concatenate the lists within the group and on Feature 2
with first
:
df.groupby('Groups').agg({'Feature 1':'sum', 'Feature 2':'first'}).reset_index()
Groups Feature 1 Feature 2
0 GROUP A [abc, def, ghi, jkl, mno, pqr] 1
1 GROUP B [stu, vwx, yz, xx, yx, zx] 2
2 GROUP C [text, more, stuff, here, last, one] 3
Pandas groupby with delimiter join
Alternatively you can do it this way:
In [48]: df.groupby('col')['val'].agg('-'.join)
Out[48]:
col
A Cat-Tiger
B Ball-Bat
Name: val, dtype: object
UPDATE: answering question from the comment:
In [2]: df
Out[2]:
col val
0 A Cat
1 A Tiger
2 A Panda
3 B Ball
4 B Bat
5 B Mouse
6 B Egg
In [3]: df.groupby('col')['val'].agg('-'.join)
Out[3]:
col
A Cat-Tiger-Panda
B Ball-Bat-Mouse-Egg
Name: val, dtype: object
Last for convert index or MultiIndex to columns:
df1 = df.groupby('col')['val'].agg('-'.join).reset_index(name='new')
Groupby and append lists and strings
Create dynamically dictionary by all columns with no list
and value_1
and for list
use lambda function with list comprehension with flatenning:
f1 = lambda x: ', '.join(x.dropna())
#alternative for join only strings
#f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
f2 = lambda x: [z for y in x for z in y]
d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
d['list'] = f2
df = df.groupby('value_1', as_index=False).agg(d)
print (df)
value_1 value_2 value_3 \
0 american california, nyc, texas walmart, kmart
1 canadian toronto dunkinDonuts, walmart
list
0 [supermarket, connivence, state]
1 [coffee, supermarket]
Explanation:
f1
and f2
are lambda functions.
First remove missing values (if exist) and join
strings with separator:
f1 = lambda x: ', '.join(x.dropna())
First get only strings values (omit missing values, because NaN
s) and join
strings with separator:
f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
First get all string values with filtering empty strings and join
strings with separator:
f1 = lambda x: ', '.join([y for y in x if y != ''])
Function f2
is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]
f2 = lambda x: [z for y in x for z in y]
Pandas groupby and aggregate over multiple lists
You can get the average of the lists within each group in this way:
s = df.groupby("column_a")["column_b"].apply(lambda x: np.array(x.tolist()).mean(axis=0))
pd.DataFrame({'group':s.index, 'avg_list':s.values})
Gives:
group avg_list
0 1 [1.5, 3.5, 2.0]
1 2 [5.0, 6.0, 6.0]
2 3 [3.0, 1.0, 2.0]
Related Topics
How to Escape Curly-Brackets in F-Strings
Difference Between Type(Obj) and Obj._Class_
How to Re Import an Updated Package While in Python Interpreter
Pip Ignores Dependency_Links in Setup.Py
Access Memory Address in Python
Installing Scipy in Python 3.5 on 32-Bit Windows 7 MAChine
Builtin Function Not Working with Spyder
Python Pandas: Group Datetime Column into Hour and Minute Aggregations
Compare Two CSV Files and Search for Similar Items
Pandas Concat Generates Nan Values
High Performance Fuzzy String Comparison in Python, Use Levenshtein or Difflib
Is Close() Necessary When Using Iterator on a Python File Object
Upgrade Python in a Virtualenv
Matplotlib: Plotting Numerous Disconnected Line Segments with Different Colors
Why Is the Value of _Name_ Changing After Assignment to Sys.Modules[_Name_]