Aggregate a dataframe on a given column and display another column
First, you split the data using split
:
split(z,z$Group)
Than, for each chunk, select the row with max Score:
lapply(split(z,z$Group),function(chunk) chunk[which.max(chunk$Score),])
Finally reduce back to a data.frame do.call
ing rbind
:
do.call(rbind,lapply(split(z,z$Group),function(chunk) chunk[which.max(chunk$Score),]))
Result:
Group Score Info
1 1 3 c
2 2 4 d
One line, no magic spells, fast, result has good names =)
Aggregate column in dataframe by values of another column?
Not sure what you want to do with other columns, but this should solve your query
data.groupby('product_code')['value'].sum()
How to aggregate one column based on another column in Pandas
First aggregate min
and max
to df1
with DataFrame.add_suffix
, then pivoting by DataFrame.pivot
with DataFrame.add_prefix
and last join toghether by concat
:
df1 = df.groupby('fruit')['year'].agg(['min','max']).add_suffix('_year')
df2 = df.pivot('fruit','year','sales').add_prefix('sales_')
df = pd.concat([df1, df2], axis=1)
print (df)
min_year max_year sales_2010 sales_2011
fruit
Apple 2010 2011 10 20
Banans 2010 2011 50000 30
Pandas DataFrame. Aggregate column in depends of values in another column
First replace not matched rows to missing values in Series.where
and then pass helper columns to agg
:
grouped_orders = (
orders
.assign(cash = orders['order_price'].where(orders['payment_type'] == 'cash'),
card = orders['order_price'].where(orders['payment_type'] == 'card'))
.groupby('driver_uuid')
.agg(
cash_order_price_sum=('cash', 'sum'),
card_order_price_sum=('card', 'sum'),
bonus_payment_sum=('bonus_payment', 'sum')
)
)
Aggregate a dataframe column based on a hierarichal condition from another column
Very tricky question especially with the structure of your data(because your grouper which is really the parts "A,B", "X,Y", etc. are not in a separate column. But I think you can do:
df.sort_values(by='Samples', inplace=True, ignore_index=True)
#grouper containing groupby keys ['A,B', 'X,Y', etc.)
g = df['Category'].str.extract("(.*),+")[0]
#create a column to keep the category together
df['sample_category'] = list(zip(df['Samples'], df['Category']))
Then use functools.reduce
to reduce the list by iteratively grabbing the next tuple if sample is less than 5:
df2 = df.groupby(g, as_index=False).agg(
{'sample_category': lambda s:
functools.reduce(lambda x, y: (x[0] + y[0], y[1]) if x[0] < 5 else (x, y), s)})
Then do some munging to modify the elements to a list type:
df2['sample_category'] = df2['sample_category'].apply(
lambda x: [x] if isinstance(x[0], int) else list(x))
Then explode, extract columns and finally drop the intermediate column 'sample_category'
df2 = df2.explode('sample_category', ignore_index=True)
df2['Sample'] = df2['sample_category'].str[0]
df2['Category'] = df2['sample_category'].str[1]
df2.drop('sample_category', axis=1, inplace=True)
print(df2):
Sample Category
0 10 A,B,123
1 4 L,M,456
2 5 P,Q,789
3 8 S,T,123
4 5 S,T,456
5 9 X,Y,456
6 18 X,Y,123
Resample and aggregate data according to another column value
One way is to create the columns Volume_B (and _S) before with np.where
like you did, then aggregate, so:
res = (
df.assign(Volume_B= lambda x: np.where(x['Type']=='B', x['Volume'], 0),
Volume_S= lambda x: np.where(x['Type']=='S', x['Volume'], 0))\
.groupby(df['Time']) # you can replace by resample here
[['Volume_B','Volume_S']].sum()
.reset_index()
)
print(res)
Time Volume_B Volume_S
0 09:25:00 400 253
Edit, with your input like that (and aggregating on Time column), then you could also do a pivot_table
like:
(df.pivot_table(index='Time', columns='Type',
values='Volume', aggfunc=sum)
.add_prefix('Volume_')
.reset_index()
.rename_axis(columns=None)
)
How to aggregate a column by a value on another column?
You can use DataFrameGroupBy.idxmax
for indices of max values of C
with loc
:
#unique index
df.reset_index(drop=True, inplace=True)
df1 = df.groupby(['A','B'])['C'].agg(['sum', 'idxmax'])
df1['idxmax'] = df.loc[df1['idxmax'], 'D'].values
df1 = df1.rename(columns={'idxmax':'D','sum':'C'}).reset_index()
Similar solution with map
:
df1 = df.groupby(['A','B'])['C'].agg(['sum', 'idxmax']).reset_index()
df1['idxmax'] = df1['idxmax'].map(df['D'])
df1 = df1.rename(columns={'idxmax':'D','sum':'C'})
print (df1)
A B C D
0 x a 101 v
1 y b 1010 w
Aggregate contents of a column based on the range of values in another column in Pandas
The most intuitive approach would be to filter and then aggregate. To solve your specific problem, I would do this:
>> df = pd.DataFrame({"min": [1, 0, 6, 3],
"max": [5, 5, 8, 4],
"value": [['a','b'], ['d'], ['a','c'], ['e','a']]})
>> print(df)
min max value
0 1 5 [a, b]
1 0 5 [d]
2 6 8 [a, c]
3 3 4 [e, a]
>> sum_filtered_values = df[(df["max"]<=5) & (df["min"]>=0)].value.sum()
>> print(sum_filtered_values)
['a', 'b', 'd', 'e', 'a']
>> sum_filtered_values = df[(df["max"]<=10) & (df["min"]>=5)].value.sum()
>> print(sum_filtered_values)
['a', 'c']
Is it possible to groupby-aggregate a column based on another column?
You can sort values by product
and date
and then take the last sample in each group:
df.sort_values(['product', 'date']).groupby('product').tail(1)
Related Topics
Save Multiple Ggplots Using a For Loop
R Spreading Multiple Columns With Tidyr
How to Merge Color, Line Style and Shape Legends in Ggplot
Subscript Letters in Ggplot Axis Label
Split Date-Time Column into Date and Time Variables
Filtering a Data Frame on a Vector
How to Merge 2 Vectors Alternating Indexes
Unique Rows, Considering Two Columns, in R, Without Order
Selecting Only Numeric Columns from a Data Frame
Put Stars on Ggplot Barplots and Boxplots - to Indicate the Level of Significance (P-Value)
Conditionally Change Panel Background With Facet_Grid
Ggplot, Facet, Piechart: Placing Text in the Middle of Pie Chart Slices
Table of Interactions - Case With Pets and Houses
Tools For Making Latex Tables in R