Add a Variable to a Data Frame Containing Max Value of Each Row

Add a variable to a data frame containing max value of each row

You can use apply. For instance:

df[, "max"] <- apply(df[, 2:26], 1, max)

Here's a basic example:

> df <- data.frame(a=1:50, b=rnorm(50), c=rpois(50, 10))
> df$max <- apply(df, 1, max)
> head(df, 2)
a b c max
1 1 1.3527115 9 9
2 2 -0.6469987 20 20
> tail(df, 2)
a b c max
49 49 -1.4796887 10 49
50 50 0.1600679 13 50

Add a variable to a data frame containing max value of each row depending on a regex syntax in the column

You can select 'result' column and use max.col :

cols <- grep('result', names(df), value = TRUE)
df$max_column <- cols[max.col(df[cols], ties.method = 'first')]
df

# result_1 date_1 result_2 date_2 result_3 date_3 result_4 date_4 max_column
#1 1 12.8.2020 4 13.8.2020 2 15.8.2020 1 20.8.2020 result_2
#2 3 15.8.2020 3 14.8.2020 5 17.8.2020 2 21.8.2020 result_3

This gives the column name of maximum value in each row for 'result' columns.

data

df <- structure(list(result_1 = c(1L, 3L), date_1 = c("12.8.2020", 
"15.8.2020"), result_2 = 4:3, date_2 = c("13.8.2020", "14.8.2020"
), result_3 = c(2L, 5L), date_3 = c("15.8.2020", "17.8.2020"),
result_4 = 1:2, date_4 = c("20.8.2020", "21.8.2020")),
class = "data.frame", row.names = c(NA, -2L))

Finding the maximum value for each row among 3 columns in R

You can use the apply function for this like so:

df$max<-apply(X=df, MARGIN=1, FUN=max)

The MARGIN=1 argument indicated that for every row in X you wish to apply the function in FUN. If you use MARGIN=2 it will be by column or MARGIN=c(1,2) it will be both rows and columns.

creating new column based on highest values

You can use the pmax function from baseR to pull the max value across a defined set of columns in your dataframe. In our case this will be inspecting the education and education_partner fields.

new_data <- data %>%
mutate(highest_degree = pmax(education, education_partner, na.rm = TRUE))

Output:

  ID marital education education_partner highest_degree
1 1 1 14 18 18
2 2 4 18 NA 18
3 3 0 10 NA 10
4 4 2 12 14 14

How to select the max value of each row (not all columns) and mutate 2 columns which are the max value and name in R?

Method 1

Simply use pmax and max.col function to identify the maximum values and columns.

library(dplyr)

df %>% mutate(max = pmax(a,b), type = colnames(df)[max.col(df[,3:4]) + 2 ])

Method 2

Or first re-shape your data to a "long" format for easier manipulation. Then use mutate to extract max values and names. Finally change it back to a "wide" format and relocate columns according to your target.

df %>% 
pivot_longer(a:b, names_to = "colname") %>%
group_by(lon, lat) %>%
mutate(max = max(value),
type = colname[which.max(value)]) %>%
pivot_wider(everything(), names_from = "colname", values_from = "value") %>%
relocate(max, type, .after = b)

Output

# A tibble: 4 × 6
# Groups: lon, lat [4]
lon lat a b max type
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 102 31 4 5 5 b
2 103 32 3 2 3 a
3 104 33 7 4 7 a
4 105 34 6 9 9 b

Conditionally format each cell containing the max value of a row in a data frame - R Markdown PDF

Using dplyr::mutate(across... and max(c_across... is one way:

---
output:
pdf_document:
toc: yes
---

```{r, include=FALSE}

require("pacman")
p_load(dplyr, forcats, knitr, kableExtra, tinytex, janitor)

segment<- c('seg1', 'seg1', 'seg2', 'seg2', 'seg3', 'seg3')
subSegment<- c('subseg1', 'subseg2', 'subseg1', 'subseg2', 'subseg1', 'subseg2')
var.1<- c(100, 20, 30, 50, 40, 40)
var.2<- c(200, 30, 30, 70, 30, 140)
var.3<- c(50, 50, 40, 20, 30, 40)
var.4<- c(60, 50, 35, 53, 42, 20)

df <-
data.frame(segment, subSegment, var.1, var.2, var.3, var.4) %>%
adorn_totals('row') %>%
rowwise() %>%
mutate(across(var.1:var.4, ~cell_spec(.x, 'latex', bold = ifelse(.x == max(c_across(var.1:var.4)), TRUE, FALSE))))

```

```{r, results='asis'}


df %>%
kable(booktabs = TRUE,
caption = "Title",
align = "c",
escape = FALSE) %>%
kable_styling(latex_options = c("HOLD_position", "repeat_header", "scale_down"),
font_size = 6) %>%
pack_rows(index = table(fct_inorder(df$segment)),
italic = FALSE,
bold = FALSE,
underline = TRUE,
latex_gap_space = "1em",
background = "#f2f2f2")%>%
column_spec(1, monospace = TRUE, color = "white") %>%
row_spec(nrow(df), bold = TRUE)

```

Which results in this pdf output:

Sample Image

Add columns to pandas dataframe containing max of each row, AND corresponding column name

You can compare the df against maxval using eq with axis=0, then use apply with a lambda to produce a boolean mask to mask the columns and join them:

In [183]:
df['maxcol'] = df.ix[:,:'c'].eq(df['maxval'], axis=0).apply(lambda x: ','.join(df.columns[:3][x==x.max()]),axis=1)
df

Out[183]:
a b c maxval maxcol
0 1 0 0 1 a
1 0 0 0 0 a,b,c
2 0 1 0 1 b
3 1 0 0 1 a
4 3 1 0 3 a

apply max function to multiple rows

   df$Main_Mode <- names(df)[12:16][max.col(df[12:16])]

R help - change the maximum value of each row in a certain condition

Try this out and see what happens :)

df <- read.table(text = "A B C D E
1 1 0.74286670 0.3222136 0.9381296 10
2 1 -0.03352498 0.5262685 0.1225731 15
3 5 -0.17689629 -0.8949740 -1.4376567 100
4 5 0.48329153 1.1574834 -1.1116581 100
5 10 0.13117277 -0.2068736 0.4841806 100", stringsAsFactor = FALSE)

# find the max in columns B,C,D
z <- apply(df[df$A == 1, 2:4], 1, max)

# substitute the maximum value of each row for columns B,C,D where A == 1
# with the value of column E. Assign 0 to the others
y <- ifelse(df[df$A == 1, 2:4] == z, df$E[df$A == 1], 0)

# Change the values in your dataframe
df[df$A == 1, 2:4] <- y


Related Topics



Leave a reply



Submit