Data.Frame Group by Column

Pandas DataFrame Groupby two columns and get counts

Followed by @Andy's answer, you can do following to solve your second question:

In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]:
0
col2
A 3
B 2
C 1
D 3

Is there an easy way to group columns in a Pandas DataFrame?

You basically just need to manipulate the column names, in your case.

Starting with your original DataFrame (and a tiny index manipulation):

from StringIO import StringIO
import numpy as np
a = pd.read_csv(StringIO('T,Ax,Ay,Az,Bx,By,Bz,Cx,Cy,Cz,Dx,Dy,Dz\n\
0,1,2,1,3,2,1,4,2,1,5,2,1\n\
1,8,2,3,3,2,9,9,1,3,4,9,1\n\
2,4,5,7,7,7,1,8,3,6,9,2,3'))
a.set_index('T', inplace=True)

So that:

>> a
Ax Ay Az Bx By Bz Cx Cy Cz Dx Dy Dz
T
0 1 2 1 3 2 1 4 2 1 5 2 1
1 8 2 3 3 2 9 9 1 3 4 9 1
2 4 5 7 7 7 1 8 3 6 9 2 3

Then simply create a list of tuples for your columns, and use MultiIndex.from_tuples:

a.columns = pd.MultiIndex.from_tuples([(c[0], c[1]) for c in a.columns])

>> a
A B C D
x y z x y z x y z x y z
T
0 1 2 1 3 2 1 4 2 1 5 2 1
1 8 2 3 3 2 9 9 1 3 4 9 1
2 4 5 7 7 7 1 8 3 6 9 2 3

How to Group by column value in Pandas Data frame

Convert groupby object to dictionary of DataFrames:

d = dict(tuple(df.groupby('App_Name')))

print (d['com.alpha.studio'])
App_Name Date Response Gross Revenue
9 com.alpha.studio 2018-10-16 1731.429858 11643154 NaN
11 com.alpha.studio 2018-10-17 2769.373388 13198984 NaN
14 com.alpha.studio 2018-10-18 2784.822039 24217875 NaN

EDIT:

d1 = {}
for k, v in d.items():
d1[k] = v['Gross Revenue'].rolling(2).mean()

DataFrame: Group by one column and average other columns

You just need groupby:

data['state'] = data['state'].eq('True')
data.drop('id',axis=1).groupby('group', as_index=False).mean()

Output:

  group     state      value
0 1 0.666667 10.333333
1 2 0.500000 4.000000

pandas dataframe group columns based on name and apply a function

You can convert columns without separator to index and then grouping with lambda function per columns with aggregate function like max:

m = df.columns.str.contains('_')

df = (df.set_index(df.columns[~m].tolist())
.groupby(lambda x: x.split('_')[0], axis=1)
.max()
.reset_index())
print (df)
A B C D E K
0 a 2 r 4 6 9
1 e g 1 d 8 7

Solution with custom function:

def rms(x):
return np.sqrt(np.sum(x**2, axis=1)/len(x.columns))

m = df.columns.str.contains('_')

df1 = (df.set_index(df.columns[~m].tolist())
.groupby(lambda x: x.split('_')[0], axis=1)
.agg(rms)
.reset_index())
print (df1)
A B C D E K
0 a 2 r 4 3.915780 5.972158
1 e g 1 d 5.567764 4.690416

Pandas Dataframe, how to group columns together in Python


groupby / concat hack

m = {'A': 'AB', 'B': 'AB', 'C': 'CD', 'D': 'CD'}
pd.concat(dict((*df.groupby(m, 1),)), axis=1)

AB CD
A B C D
Index
1 0.25 0.3 0.25 0.66
2 0.25 0.3 0.25 0.66
3 0.25 0.3 0.25 0.66

Note that with this method it is possible to select an arbitrary subset of the columns in the original DataFrame, whereas the alternative answer appears to require a valid dictionary mapping for all values in the parent DataFrame

Pandas Groupby and Sum Only One Column

The only way to do this would be to include C in your groupby (the groupby function can accept a list).

Give this a try:

df.groupby(['A','C'])['B'].sum()

One other thing to note, if you need to work with df after the aggregation you can also use the as_index=False option to return a dataframe object. This one gave me problems when I was first working with Pandas. Example:

df.groupby(['A','C'], as_index=False)['B'].sum()

Pandas dataframe group by 10 min intervals with different actions on other columns

Your approach is correct. The
dataframe.resample("10min").agg() does the calculations for you.
You might get more outputs than what you expect and that is because this: resample method continuously adds 10 minutes to the time and do the calculations that you asked. But if there was no data in any of the 10 min intervals, it creates a NULL row. Maybe your data is not continuous and causes this Null rows.

You can simply delete the NULL rows by using dataframe.dropna()

Pandas - dataframe groupby - how to get sum of multiple columns

By using apply

df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]:
col3 col4
col1 col2
a c 2 4
d 1 2
b d 1 2
e 2 4

If you want to agg

df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})

Group by a specific column, list the other columns Pandas

Use:

df.groupby('Route', as_index=False).agg(list)

Output:

  Route       Station      Position
0 A1 [X1, X2, X3] [P1, P2, P3]
1 B2 [Y1, Y2] [P1, P2]


Related Topics



Leave a reply



Submit