Aggregate a Dataframe on a Given Column and Display Another Column

Aggregate a dataframe on a given column and display another column

First, you split the data using split:

split(z,z$Group)

Than, for each chunk, select the row with max Score:

lapply(split(z,z$Group),function(chunk) chunk[which.max(chunk$Score),])

Finally reduce back to a data.frame do.calling rbind:

do.call(rbind,lapply(split(z,z$Group),function(chunk) chunk[which.max(chunk$Score),]))

Result:

  Group Score Info
1 1 3 c
2 2 4 d

One line, no magic spells, fast, result has good names =)

Aggregate column in dataframe by values of another column?

Not sure what you want to do with other columns, but this should solve your query

data.groupby('product_code')['value'].sum()

How to aggregate one column based on another column in Pandas

First aggregate min and max to df1 with DataFrame.add_suffix, then pivoting by DataFrame.pivot with DataFrame.add_prefix and last join toghether by concat:

df1 = df.groupby('fruit')['year'].agg(['min','max']).add_suffix('_year')
df2 = df.pivot('fruit','year','sales').add_prefix('sales_')

df = pd.concat([df1, df2], axis=1)
print (df)
min_year max_year sales_2010 sales_2011
fruit
Apple 2010 2011 10 20
Banans 2010 2011 50000 30

Pandas DataFrame. Aggregate column in depends of values in another column

First replace not matched rows to missing values in Series.where and then pass helper columns to agg:

grouped_orders = (
orders
.assign(cash = orders['order_price'].where(orders['payment_type'] == 'cash'),
card = orders['order_price'].where(orders['payment_type'] == 'card'))
.groupby('driver_uuid')
.agg(
cash_order_price_sum=('cash', 'sum'),
card_order_price_sum=('card', 'sum'),
bonus_payment_sum=('bonus_payment', 'sum')
)
)

Aggregate a dataframe column based on a hierarichal condition from another column

Very tricky question especially with the structure of your data(because your grouper which is really the parts "A,B", "X,Y", etc. are not in a separate column. But I think you can do:

df.sort_values(by='Samples', inplace=True, ignore_index=True)
#grouper containing groupby keys ['A,B', 'X,Y', etc.)
g = df['Category'].str.extract("(.*),+")[0]
#create a column to keep the category together
df['sample_category'] = list(zip(df['Samples'], df['Category']))

Then use functools.reduce to reduce the list by iteratively grabbing the next tuple if sample is less than 5:

df2 = df.groupby(g, as_index=False).agg(
{'sample_category': lambda s:
functools.reduce(lambda x, y: (x[0] + y[0], y[1]) if x[0] < 5 else (x, y), s)})

Then do some munging to modify the elements to a list type:

df2['sample_category'] = df2['sample_category'].apply(
lambda x: [x] if isinstance(x[0], int) else list(x))

Then explode, extract columns and finally drop the intermediate column 'sample_category'

df2 = df2.explode('sample_category', ignore_index=True)
df2['Sample'] = df2['sample_category'].str[0]
df2['Category'] = df2['sample_category'].str[1]
df2.drop('sample_category', axis=1, inplace=True)

print(df2):

   Sample Category
0 10 A,B,123
1 4 L,M,456
2 5 P,Q,789
3 8 S,T,123
4 5 S,T,456
5 9 X,Y,456
6 18 X,Y,123

Resample and aggregate data according to another column value

One way is to create the columns Volume_B (and _S) before with np.where like you did, then aggregate, so:

res = (
df.assign(Volume_B= lambda x: np.where(x['Type']=='B', x['Volume'], 0),
Volume_S= lambda x: np.where(x['Type']=='S', x['Volume'], 0))\
.groupby(df['Time']) # you can replace by resample here
[['Volume_B','Volume_S']].sum()
.reset_index()
)
print(res)
Time Volume_B Volume_S
0 09:25:00 400 253

Edit, with your input like that (and aggregating on Time column), then you could also do a pivot_table like:

(df.pivot_table(index='Time', columns='Type', 
values='Volume', aggfunc=sum)
.add_prefix('Volume_')
.reset_index()
.rename_axis(columns=None)
)

How to aggregate a column by a value on another column?

You can use DataFrameGroupBy.idxmax for indices of max values of C with loc:

#unique index
df.reset_index(drop=True, inplace=True)
df1 = df.groupby(['A','B'])['C'].agg(['sum', 'idxmax'])
df1['idxmax'] = df.loc[df1['idxmax'], 'D'].values
df1 = df1.rename(columns={'idxmax':'D','sum':'C'}).reset_index()

Similar solution with map:

df1 = df.groupby(['A','B'])['C'].agg(['sum', 'idxmax']).reset_index()
df1['idxmax'] = df1['idxmax'].map(df['D'])
df1 = df1.rename(columns={'idxmax':'D','sum':'C'})

print (df1)
A B C D
0 x a 101 v
1 y b 1010 w

Aggregate contents of a column based on the range of values in another column in Pandas

The most intuitive approach would be to filter and then aggregate. To solve your specific problem, I would do this:

>> df = pd.DataFrame({"min": [1, 0, 6, 3],
"max": [5, 5, 8, 4],
"value": [['a','b'], ['d'], ['a','c'], ['e','a']]})

>> print(df)
min max value
0 1 5 [a, b]
1 0 5 [d]
2 6 8 [a, c]
3 3 4 [e, a]

>> sum_filtered_values = df[(df["max"]<=5) & (df["min"]>=0)].value.sum()
>> print(sum_filtered_values)
['a', 'b', 'd', 'e', 'a']

>> sum_filtered_values = df[(df["max"]<=10) & (df["min"]>=5)].value.sum()
>> print(sum_filtered_values)
['a', 'c']

Is it possible to groupby-aggregate a column based on another column?

You can sort values by product and date and then take the last sample in each group:

df.sort_values(['product', 'date']).groupby('product').tail(1)


Related Topics



Leave a reply



Submit