How to Output the Columns With the Maximum Value

How to output the columns with the maximum value

Here is a dplyr approach that I tried to make a little more generalized to accommodate a different number of columns of interest. With your test data frame from above, start by defining a function that finds the max of the current group, gets indices for columns with matching values, then builds the output based on the number of matching columns:

foo <- function(df_, cols = 1:3) {
# Get max
m = max(df_[, cols], na.rm = TRUE)

# Get columns
ix <- as.data.frame(which(df_[, cols] == m, arr.ind = TRUE))[, 2]
matchlen = length(ix)
columns <- names(df_[,cols])[ix]

# Get varname based on length
out = ifelse(matchlen == length(cols), "all_equal", paste(columns, collapse = "&"))
df_$col_name = out
return(df_)
}

Because the output from that is a data frame, you need to make use of do to apply it to groups with dplyr:

test %>%
group_by(gr) %>%
do(foo(.))

# A tibble: 9 x 5
# Groups: gr [3]
A B C gr col_name
<dbl> <dbl> <dbl> <fct> <chr>
1 5 NA NA 1 A
2 NA 2 NA 1 A
3 NA NA 1 1 A
4 1 NA NA 2 all_equal
5 NA 1 NA 2 all_equal
6 NA NA 1 2 all_equal
7 3 NA NA 3 A&C
8 NA 1 NA 3 A&C
9 NA NA 3 3 A&C

The function should allow for a flexible number of columns to be input, as long as they're numeric. For example,

test %>%
group_by(gr) %>%
do(foo(., cols = 1:2))

and

test %>%
group_by(gr) %>%
do(foo(., cols = c(1,3)))

both seem to work.

Edit:

Yeah, I guess you can!

test %>%
group_by(gr) %>%
do(foo(., cols = c("A", "B", "C")))

Output name of column with max value per line

With your shown samples, please try following awk code, written and tested in GNU awk.

awk -v startField="3" -v endField="6" '
FNR==1{
for(i=startField;i<=endField;i++){
heading[i]=$i
}
next
}
{
max=maxInd=""
for(i=startField;i<=endField;i++){
maxInd=(max<$i?i:maxInd)
max=(max<$i?$i:max)
}
NF=(startField-1)
print $0,heading[maxInd]
}
' Input_file

Advantages of this code's approach:

  • user can mention starting field number and ending field number by using variables named startField and endField so we need NOT to change anything inside main awk code.
  • 2nd advantage is since nothing is hardcoded here, so lets say User tomorrow wants to check maximum values from 10th field to 20th field then we need NOT to print or mention explicitly 9th fields to get printed since that is taken care in code itself.

Detailed explanation: Adding detailed explanation for above.

awk -v startField="3" -v endField="6" '   ##Starting awk program and setting startField and endField to values on which user wants to look for maximum values.
FNR==1{ ##Checking condition if this is first line of Input_file.
for(i=startField;i<=endField;i++){ ##Traversing through only those fields which user needs to get max value.
heading[i]=$i ##Creating array heading whose index is i and value is current field value.
}
next ##next will skip all further statements from here.
}
{
max=maxInd="" ##Nullifying max and maxInd variables here.
for(i=startField;i<=endField;i++){ ##Traversing through only those fields which user needs to get max value.
maxInd=(max<$i?i:maxInd) ##Getting maxInd variable to current field number if current field value is greater than maxInd else keep it as maxInd itself.
max=(max<$i?$i:max) ##Getting max variable to current field value if current field value is greater than max else keep it as max itself.
}
NF=(startField-1) ##Setting NF(number of fields of current line) to startField-1 here.
print $0,heading[maxInd] ##printing current field followed by heading array value whose index is maxInd.
}
' Input_file ##Mentioning Input_file name here.

Find the column name which has the maximum value for each row

You can use idxmax with axis=1 to find the column with the greatest value on each row:

>>> df.idxmax(axis=1)
0 Communications
1 Business
2 Communications
3 Communications
4 Business
dtype: object

To create the new column 'Max', use df['Max'] = df.idxmax(axis=1).

To find the row index at which the maximum value occurs in each column, use df.idxmax() (or equivalently df.idxmax(axis=0)).

Getting the maximum values of every column and storing those value to an array without pandas

Use:

with open(datafile) as infile:
# convert each line to an iterable of ints
rows = (map(int, line.strip()) for line in infile)

# find the maximum per col, exclude the last one
*res, _ = (max(col) for col in zip(*rows))
print(res)

Output

[7, 3, 4, 5]

As an alternative:

with open(datafile) as infile:
# convert each line to an iterable of ints exclude the last one
rows = (map(int, line.strip()[:-1]) for line in infile)

# find the maximum per col,
res = [max(col) for col in zip(*rows)]
print(res)

Find a maximum value from the second column and print value from the first column in awk or bash

$ awk '
$2>max || max=="" { # or $2>=max, depending on if you want first or last
max=$2
val=$1
}
END {
print val
}' file

Output:

1.36957

Find maximum value of a column and return the corresponding row values using Pandas

Assuming df has a unique index, this gives the row with the maximum value:

In [34]: df.loc[df['Value'].idxmax()]
Out[34]:
Country US
Place Kansas
Value 894
Name: 7

Note that idxmax returns index labels. So if the DataFrame has duplicates in the index, the label may not uniquely identify the row, so df.loc may return more than one row.

Therefore, if df does not have a unique index, you must make the index unique before proceeding as above. Depending on the DataFrame, sometimes you can use stack or set_index to make the index unique. Or, you can simply reset the index (so the rows become renumbered, starting at 0):

df = df.reset_index()

R output BOTH maximum and minimum value by group in dataframe

You can use range to get max and min value and use it in summarise to get different rows for each Name.

library(dplyr)

df %>%
group_by(Name) %>%
summarise(Value = range(Value), .groups = "drop")

# Name Value
# <chr> <int>
#1 A 27
#2 A 57
#3 B 20
#4 B 89
#5 C 58
#6 C 97

If you have large dataset using data.table might be faster.

library(data.table)
setDT(df)[, .(Value = range(Value)), Name]


Related Topics



Leave a reply



Submit