Pandas DataFrame Groupby two columns and get counts
Followed by @Andy's answer, you can do following to solve your second question:
In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]:
0
col2
A 3
B 2
C 1
D 3
Is there an easy way to group columns in a Pandas DataFrame?
You basically just need to manipulate the column names, in your case.
Starting with your original DataFrame (and a tiny index manipulation):
from StringIO import StringIO
import numpy as np
a = pd.read_csv(StringIO('T,Ax,Ay,Az,Bx,By,Bz,Cx,Cy,Cz,Dx,Dy,Dz\n\
0,1,2,1,3,2,1,4,2,1,5,2,1\n\
1,8,2,3,3,2,9,9,1,3,4,9,1\n\
2,4,5,7,7,7,1,8,3,6,9,2,3'))
a.set_index('T', inplace=True)
So that:
>> a
Ax Ay Az Bx By Bz Cx Cy Cz Dx Dy Dz
T
0 1 2 1 3 2 1 4 2 1 5 2 1
1 8 2 3 3 2 9 9 1 3 4 9 1
2 4 5 7 7 7 1 8 3 6 9 2 3
Then simply create a list of tuples for your columns, and use MultiIndex.from_tuples
:
a.columns = pd.MultiIndex.from_tuples([(c[0], c[1]) for c in a.columns])
>> a
A B C D
x y z x y z x y z x y z
T
0 1 2 1 3 2 1 4 2 1 5 2 1
1 8 2 3 3 2 9 9 1 3 4 9 1
2 4 5 7 7 7 1 8 3 6 9 2 3
How to Group by column value in Pandas Data frame
Convert groupby
object to dictionary of DataFrames:
d = dict(tuple(df.groupby('App_Name')))
print (d['com.alpha.studio'])
App_Name Date Response Gross Revenue
9 com.alpha.studio 2018-10-16 1731.429858 11643154 NaN
11 com.alpha.studio 2018-10-17 2769.373388 13198984 NaN
14 com.alpha.studio 2018-10-18 2784.822039 24217875 NaN
EDIT:
d1 = {}
for k, v in d.items():
d1[k] = v['Gross Revenue'].rolling(2).mean()
DataFrame: Group by one column and average other columns
You just need groupby
:
data['state'] = data['state'].eq('True')
data.drop('id',axis=1).groupby('group', as_index=False).mean()
Output:
group state value
0 1 0.666667 10.333333
1 2 0.500000 4.000000
pandas dataframe group columns based on name and apply a function
You can convert columns without separator to index and then grouping with lambda function per columns with aggregate function like max
:
m = df.columns.str.contains('_')
df = (df.set_index(df.columns[~m].tolist())
.groupby(lambda x: x.split('_')[0], axis=1)
.max()
.reset_index())
print (df)
A B C D E K
0 a 2 r 4 6 9
1 e g 1 d 8 7
Solution with custom function:
def rms(x):
return np.sqrt(np.sum(x**2, axis=1)/len(x.columns))
m = df.columns.str.contains('_')
df1 = (df.set_index(df.columns[~m].tolist())
.groupby(lambda x: x.split('_')[0], axis=1)
.agg(rms)
.reset_index())
print (df1)
A B C D E K
0 a 2 r 4 3.915780 5.972158
1 e g 1 d 5.567764 4.690416
Pandas Dataframe, how to group columns together in Python
groupby
/ concat
hack
m = {'A': 'AB', 'B': 'AB', 'C': 'CD', 'D': 'CD'}
pd.concat(dict((*df.groupby(m, 1),)), axis=1)
AB CD
A B C D
Index
1 0.25 0.3 0.25 0.66
2 0.25 0.3 0.25 0.66
3 0.25 0.3 0.25 0.66
Note that with this method it is possible to select an arbitrary subset of the columns in the original DataFrame, whereas the alternative answer appears to require a valid dictionary mapping for all values in the parent DataFrame
Pandas Groupby and Sum Only One Column
The only way to do this would be to include C in your groupby (the groupby function can accept a list).
Give this a try:
df.groupby(['A','C'])['B'].sum()
One other thing to note, if you need to work with df after the aggregation you can also use the as_index=False
option to return a dataframe object. This one gave me problems when I was first working with Pandas. Example:
df.groupby(['A','C'], as_index=False)['B'].sum()
Pandas dataframe group by 10 min intervals with different actions on other columns
Your approach is correct. The dataframe.resample("10min").agg()
does the calculations for you.
You might get more outputs than what you expect and that is because this: resample
method continuously adds 10 minutes to the time and do the calculations that you asked. But if there was no data in any of the 10 min
intervals, it creates a NULL
row. Maybe your data is not continuous and causes this Null
rows.
You can simply delete the NULL
rows by using dataframe.dropna()
Pandas - dataframe groupby - how to get sum of multiple columns
By using apply
df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]:
col3 col4
col1 col2
a c 2 4
d 1 2
b d 1 2
e 2 4
If you want to agg
df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})
Group by a specific column, list the other columns Pandas
Use:
df.groupby('Route', as_index=False).agg(list)
Output:
Route Station Position
0 A1 [X1, X2, X3] [P1, P2, P3]
1 B2 [Y1, Y2] [P1, P2]
Related Topics
How to Make Graphics with Transparent Background in R Using Ggplot2
Dplyr on Data.Table, am I Really Using Data.Table
Export a Graph to .Eps File with R
How to Change the Background Color of a Plot Made with Ggplot2
Is There a Way of Manipulating Ggplot Scale Breaks and Labels
Pretty Ticks for Log Normal Scale Using Ggplot2 (Dynamic Not Manual)
How to Multiply Data Frame by Vector
Predict.Lm() with an Unknown Factor Level in Test Data
Is It a Good Practice to Call Functions in a Package via ::
How to Randomize (Or Permute) a Dataframe Rowwise and Columnwise
How to Get Coefficients and Their Confidence Intervals in Mixed Effects Models
Finding 2 & 3 Word Phrases Using R Tm Package
Using Dynamic Column Names in 'Data.Table'
How to Show Only Part of the Plot Area of Polar Ggplot with Facet