﻿ Calculate Max Value Across Multiple Columns by Multiple Groups - ITCodar

# Calculate Max Value Across Multiple Columns by Multiple Groups

## Calculate max value across multiple columns by multiple groups

Solution using `data.table`. Find max value on `3:5` columns (Score columns) by `ID` and `Group`.

``library(data.table)setDT(d)d[, .(Max = do.call(max, .SD)), .SDcols = 3:5, .(ID, Group)]   ID Group Max1: a1   abc  112: a1   def   53: a2   def  11``

Data:

``d <- structure(list(ID = structure(c(1L, 1L, 1L, 2L), .Label = c("a1", "a2"), class = "factor"), Group = structure(c(1L, 1L, 2L, 2L), .Label = c("abc", "def"), class = "factor"), Score1 = c(10L, 0L, 0L, 5L), Score2 = c(0L, 0L, 5L, 10L), Score3 = c(0L, 11L, 2L, 11L)), class = "data.frame", row.names = c(NA, -4L))``

## Select the row with the maximum value in each group based on multiple columns in R dplyr

We may get rowwise max of the 'count' columns with `pmax`, grouped by 'col1', `filter` the rows where the `max` value of 'Max' column is.

``library(dplyr)df1 %>%  mutate(Max = pmax(count_col1, count_col2) ) %>% group_by(col1) %>% filter(Max == max(Max)) %>% ungroup %>% select(-Max)``

-output

``# A tibble: 3 × 4  col1   col2   count_col1 count_col2  <chr>  <chr>       <dbl>      <dbl>1 apple  aple            1          42 banana banan           4          13 banana bananb          4          1``

We may also use `slice_max`

``library(purrr)df1 %>%  group_by(col1) %>%  slice_max(invoke(pmax, across(starts_with("count")))) %>%  ungroup# A tibble: 3 × 4  col1   col2   count_col1 count_col2  <chr>  <chr>       <dbl>      <dbl>1 apple  aple            1          42 banana banan           4          13 banana bananb          4          1``

## How to get the max value of a multiple column group-by pandas?

If you need the bookid and conceptid for the maximum weight, try this

``annotations.ix[annotations.groupby(['bookid'], sort=False)['weight'].idxmax()][['bookid', 'conceptid', 'weight']]``

Note: Since Pandas v0.20 `ix` has been deprecated. Use `.loc` instead.

## Find maximum value of one column based on group_by multiple other columns

We can use `slice_max` instead of `summarise` to return all the columns after the `select` step

``library(dplyr)df_k %>%  group_by(COUNTRY, date_start) %>%  select(-code) %>%  slice_max(order_by = 'ord', n = 1)``

If we need to create a new column, use `mutate`

``df_k %>%    group_by(COUNTRY, date_start) %>%    select(-code) %>%    mutate(ordMax = max(ord, na.rm = TRUE)) %>%    ungroup``

## python get max and min values across mutiple columns while grouping a dataframe

You can `melt` the DataFrame so that you consider either 'actual' or 'budget' when calculating the min or max. Then group the melted DataFrame and merge back.

``id_vars = ['measure', 'measure_group', 'route']df1 = (df.melt(id_vars=id_vars, value_vars=['actual', 'budget'])         .groupby(id_vars)['value']         .agg(['min', 'max']))df = df.merge(df1, how='left', on=id_vars)``

``   measure    measure_group route      year  actual  budget  min   max0       AC  electrification     A  20182019     103      99   99   1221       AC  electrification     A  20192020     110     122   99   1222       AC  electrification     B  20182019       9      10    9    553       AC  electrification     B  20192020      55      50    9    554       HV  electrification     A  20182019       2      10    2    155       HV  electrification     A  20192020       7      15    2    156       HV  electrification     B  20182019      67      10   10   1157       HV  electrification     B  20192020     100     115   10   1158     cat1            track     A  20182019      10      15   10   1119     cat1            track     A  20192020     111      25   10   11110    cat1            track     B  20182019      55      16   16   17511    cat1            track     B  20192020      75     175   16   17512    cat2            track     A  20182019      84       5    5  100513    cat2            track     A  20192020     125    1005    5  100514    cat2            track     B  20182019       7       4    4    2515    cat2            track     B  20192020      15      25    4    25``

## Multiple column groupby with pandas to find maximum value for each group

I would do it by using `merge` on the grouped data.

Based on this data:

``df = pd.DataFrame({'Feature':['age']*9+['talk']*9,                   'value':(['No']*3+['Yes']*3+['[Null]']*3)*2,                   'frequency':[2700,1707,83,222,15,8,323,8,5,20,170,500,210,1500,809,234,43,85],                   'label':['N','P','O']*6})``

Using:

``df.groupby(['Feature','value'],as_index=False)['frequency'].max().merge(df,on=['Feature','Value','frequency'])``

Outputs:

``  Feature   value  frequency label0     age      No       2700     N1     age     Yes        222     N2     age  [Null]        323     N3    talk      No        500     O4    talk     Yes       1500     P5    talk  [Null]        234     N``

Adding the extra column can be done via a simple assignment:

``df_1['sum_no_max'] = df.groupby(['Feature','value'])['frequency'].sum().values - df_1['frequency'].values``

Finally outputting:

``  Feature   value  frequency label  sum_no_max0     age      No       2700     N        17901     age     Yes        222     N          232     age  [Null]        323     N          133    talk      No        500     O         1904    talk     Yes       1500     P        10195    talk  [Null]        234     N         128``

## SQL MAX of multiple columns?

This is an old answer and broken in many way.

See https://stackoverflow.com/a/6871572/194653 which has way more upvotes and works with sql server 2008+ and handles nulls, etc.

Original but problematic answer:

Well, you can use the CASE statement:

``SELECT    CASE        WHEN Date1 >= Date2 AND Date1 >= Date3 THEN Date1        WHEN Date2 >= Date1 AND Date2 >= Date3 THEN Date2        WHEN Date3 >= Date1 AND Date3 >= Date2 THEN Date3        ELSE                                        Date1    END AS MostRecentDate``