How to output the columns with the maximum value
Here is a dplyr
approach that I tried to make a little more generalized to accommodate a different number of columns of interest. With your test
data frame from above, start by defining a function that finds the max of the current group, gets indices for columns with matching values, then builds the output based on the number of matching columns:
foo <- function(df_, cols = 1:3) {
# Get max
m = max(df_[, cols], na.rm = TRUE)
# Get columns
ix <- as.data.frame(which(df_[, cols] == m, arr.ind = TRUE))[, 2]
matchlen = length(ix)
columns <- names(df_[,cols])[ix]
# Get varname based on length
out = ifelse(matchlen == length(cols), "all_equal", paste(columns, collapse = "&"))
df_$col_name = out
return(df_)
}
Because the output from that is a data frame, you need to make use of do
to apply it to groups with dplyr
:
test %>%
group_by(gr) %>%
do(foo(.))
# A tibble: 9 x 5
# Groups: gr [3]
A B C gr col_name
<dbl> <dbl> <dbl> <fct> <chr>
1 5 NA NA 1 A
2 NA 2 NA 1 A
3 NA NA 1 1 A
4 1 NA NA 2 all_equal
5 NA 1 NA 2 all_equal
6 NA NA 1 2 all_equal
7 3 NA NA 3 A&C
8 NA 1 NA 3 A&C
9 NA NA 3 3 A&C
The function should allow for a flexible number of columns to be input, as long as they're numeric. For example,
test %>%
group_by(gr) %>%
do(foo(., cols = 1:2))
and
test %>%
group_by(gr) %>%
do(foo(., cols = c(1,3)))
both seem to work.
Edit:
Yeah, I guess you can!
test %>%
group_by(gr) %>%
do(foo(., cols = c("A", "B", "C")))
Output name of column with max value per line
With your shown samples, please try following awk
code, written and tested in GNU awk
.
awk -v startField="3" -v endField="6" '
FNR==1{
for(i=startField;i<=endField;i++){
heading[i]=$i
}
next
}
{
max=maxInd=""
for(i=startField;i<=endField;i++){
maxInd=(max<$i?i:maxInd)
max=(max<$i?$i:max)
}
NF=(startField-1)
print $0,heading[maxInd]
}
' Input_file
Advantages of this code's approach:
- user can mention starting field number and ending field number by using variables named
startField
andendField
so we need NOT to change anything inside mainawk
code. - 2nd advantage is since nothing is hardcoded here, so lets say User tomorrow wants to check maximum values from 10th field to 20th field then we need NOT to print or mention explicitly 9th fields to get printed since that is taken care in code itself.
Detailed explanation: Adding detailed explanation for above.
awk -v startField="3" -v endField="6" ' ##Starting awk program and setting startField and endField to values on which user wants to look for maximum values.
FNR==1{ ##Checking condition if this is first line of Input_file.
for(i=startField;i<=endField;i++){ ##Traversing through only those fields which user needs to get max value.
heading[i]=$i ##Creating array heading whose index is i and value is current field value.
}
next ##next will skip all further statements from here.
}
{
max=maxInd="" ##Nullifying max and maxInd variables here.
for(i=startField;i<=endField;i++){ ##Traversing through only those fields which user needs to get max value.
maxInd=(max<$i?i:maxInd) ##Getting maxInd variable to current field number if current field value is greater than maxInd else keep it as maxInd itself.
max=(max<$i?$i:max) ##Getting max variable to current field value if current field value is greater than max else keep it as max itself.
}
NF=(startField-1) ##Setting NF(number of fields of current line) to startField-1 here.
print $0,heading[maxInd] ##printing current field followed by heading array value whose index is maxInd.
}
' Input_file ##Mentioning Input_file name here.
Find the column name which has the maximum value for each row
You can use idxmax
with axis=1
to find the column with the greatest value on each row:
>>> df.idxmax(axis=1)
0 Communications
1 Business
2 Communications
3 Communications
4 Business
dtype: object
To create the new column 'Max', use df['Max'] = df.idxmax(axis=1)
.
To find the row index at which the maximum value occurs in each column, use df.idxmax()
(or equivalently df.idxmax(axis=0)
).
Getting the maximum values of every column and storing those value to an array without pandas
Use:
with open(datafile) as infile:
# convert each line to an iterable of ints
rows = (map(int, line.strip()) for line in infile)
# find the maximum per col, exclude the last one
*res, _ = (max(col) for col in zip(*rows))
print(res)
Output
[7, 3, 4, 5]
As an alternative:
with open(datafile) as infile:
# convert each line to an iterable of ints exclude the last one
rows = (map(int, line.strip()[:-1]) for line in infile)
# find the maximum per col,
res = [max(col) for col in zip(*rows)]
print(res)
Find a maximum value from the second column and print value from the first column in awk or bash
$ awk '
$2>max || max=="" { # or $2>=max, depending on if you want first or last
max=$2
val=$1
}
END {
print val
}' file
Output:
1.36957
Find maximum value of a column and return the corresponding row values using Pandas
Assuming df
has a unique index, this gives the row with the maximum value:
In [34]: df.loc[df['Value'].idxmax()]
Out[34]:
Country US
Place Kansas
Value 894
Name: 7
Note that idxmax
returns index labels. So if the DataFrame has duplicates in the index, the label may not uniquely identify the row, so df.loc
may return more than one row.
Therefore, if df
does not have a unique index, you must make the index unique before proceeding as above. Depending on the DataFrame, sometimes you can use stack
or set_index
to make the index unique. Or, you can simply reset the index (so the rows become renumbered, starting at 0):
df = df.reset_index()
R output BOTH maximum and minimum value by group in dataframe
You can use range
to get max
and min
value and use it in summarise
to get different rows for each Name
.
library(dplyr)
df %>%
group_by(Name) %>%
summarise(Value = range(Value), .groups = "drop")
# Name Value
# <chr> <int>
#1 A 27
#2 A 57
#3 B 20
#4 B 89
#5 C 58
#6 C 97
If you have large dataset using data.table
might be faster.
library(data.table)
setDT(df)[, .(Value = range(Value)), Name]
Related Topics
Remove Specific Characters from Column Names in R
Conditionally Replace Values of Subset of Rows With Column Name in R Using Only Tidy
Gsub a Every Element After a Keyword in R
How Does the 'Prop.Table()' Function Work in R
How to Sum a Variable by Group
Combine a List of Data Frames into One Data Frame by Row
How to Use a Variable to Specify Column Name in Ggplot
How to Set Limits For Axes in Ggplot2 R Plots
Remove 'A' from Legend When Using Aesthetics and Geom_Text
Selecting Multiple Odd or Even Columns/Rows for Dataframe
Select the N Most Frequent Values in a Variable
Ggplot2 Stacked Bar Chart - Each Bar Being 100% and With Percenage Labels Inside Each Bar
I Want to Split Street Address into Two Columns. One With Street Number Other With Street Name
How to Make a List of Data Frames
Add Regression Line Equation and R^2 on Graph