What Is the Equivalent of Mutate_At (Dplyr) in Data.Table

What is the equivalent of mutate_at (dplyr) in data.table?

With data.table, we can specify the columns of interest in .SDcols, loop through the .SD with lapply and apply the function of interest. Here, the funcion rollapply is repeated with only change in width parameter. So, it may be better to create a function to avoid repeating the whole arguments. Also, while applying the function (f1), the output can be kept in a list, later unlist with recursive = FALSE and assign (:=) to columns of interest

library(data.table)
library(zoo)
nm1 <- c("B", "C")
nm2 <- paste0(nm1, "_Roll.Mean.Week")
nm3 <- paste0(nm1, "_Roll.Mean.Two.Week")
f1 <- function(x, width) rollapply(x, width = width, mean,
align = "right", fill = 0, na.rm = TRUE, partial = TRUE)
setDT(Data)[, c(nm2, nm3) := unlist(lapply(.SD, function(x)
list(f1(x, 7), f1(x, 14))), recursive = FALSE), by = A, .SDcols = nm1]
head(Data)
# A B C B_Roll.Mean.Week C_Roll.Mean.Week B_Roll.Mean.Two.Week C_Roll.Mean.Two.Week
#1: 1 1 101 1 1 101 101
#2: 2 2 102 2 2 102 102
#3: 1 3 103 2 2 102 102
#4: 2 4 104 3 3 103 103
#5: 1 5 105 3 3 103 103
#6: 2 6 106 4 4 104 104

Note that funs is deprecated in tidyverse and in its place, can use list(~ or just ~

Data %>% 
group_by(A) %>%
mutate_at(vars(B,C), list(Roll.Mean.Week = ~f1(., 7),
Roll.Mean.Two.Week = ~ f1(., 14)))%>%
ungroup()

What is the equivalent of dplyr Mutate in data.table , R ?

If we are using data.table, do a join by 'KEY' with the 'RefDT' assign (:= -similar to mutate) the 'TYPE' in 'RefDT' to create the 'TYPE' column in 'DT'. If there are no matches, it will by default be filled with NA. Then do the subsequent assignments by specifying the logical condition in i (grepl("-", NO) - check for - in the "NO" column, check for "P" or "R" in "GROUP" where the "TYPE" is NA)

setDT(DT)[RefDT, TYPE := TYPE, on = .(KEY)]
DT[grepl("-", NO), TYPE := "INN"
][is.na(TYPE) & grepl("P|R", GROUP), TYPE := "OTHER"][]
# NO GROUP KEY TYPE
#1: 12-19 N 1701 INN
#2: 10-20 N 1602 INN
#3: 13 P 1501John BANK
#4: 14 R 1408Mary POOL
#5: 15 G 1408Peter PARK
#6: 19 K 1408Paul BANK
#7: 25 P 1708 OTHER
#8: 36 R 1503 OTHER

data

DT <- structure(list(NO = c("12-19", "10-20", "13", "14", "15", "19", 
"25", "36"), GROUP = c("N", "N", "P", "R", "G", "K", "P", "R"
), KEY = c("1701", "1602", "1501John", "1408Mary", "1408Peter",
"1408Paul", "1708", "1503")), .Names = c("NO", "GROUP", "KEY"
), row.names = c(NA, -8L), class = "data.frame")

RefDT <- structure(list(KEY = c("1609TOM", "1501John", "1408Mary", "1408Peter",
"1408Paul", "1309Sue"), TYPE = c("PARK", "BANK", "POOL", "PARK",
"BANK", "POOL")), .Names = c("KEY", "TYPE"),
class = "data.frame", row.names = c(NA,
-6L))

opposite of mutate_at in dplyr

Turning @Axeman's comment into an answer as community wiki:

library(dplyr)
df %>%
mutate_at(., vars(-var_not_to_be_modified), as.numeric)
# A tibble: 10 x 2
# var_not_to_be_modified var_to_be_modified
# <chr> <dbl>
# 1 F 1
# 2 F 1
# 3 F 1
# 4 F 1
# 5 F 1
# 6 T 0
# 7 T 1
# 8 F 1
# 9 F 0
#10 T 1

From the help page of vars:

Arguments

... Variables to include/exclude in mutate/summarise. You can use same specifications as in select(). If missing, defaults to all non-grouping variables.

Mutate_if or mutate_at in dplyr with Dates

As akrun noted, one of the columns is already in dttm format. Once that column is ignored the following code works for me:

tib %>% 
select(-fifth_dt) %>%
mutate_at(vars(ends_with("dt")), parse_date_time, orders = "%d-%m-%y")

convert selected columns at once to integer in R in dplyr

You can use across to apply same function to multiple columns.

library(dplyr)

df %>% mutate(across(all_of(cols), as.integer))
#In old version we use `mutate_at`
#df %>% mutate_at(all_of(cols), as.integer)

# depth table price x y z
# <int> <int> <int> <int> <int> <dbl>
#1 61 55 326 3 3 2.43
#2 59 61 326 3 3 2.31
#3 56 65 327 4 4 2.31

Using all_of is not required but it is a good practice to use it when we use variables which are not present in the dataframe.

dplyr: access column name in mutate_at function

Instead of using mutate_at why not use mutate combined with across and cur_column i.e:

df %>% 
mutate( across( c(carb,disp), ~ . - pull(df, paste0(cur_column(), "_new") ), .names = "{.col}_corrected") )

dplyr::mutate_at() with external variables and conditional on their values

Here's an approach with the dplyr across functionality (version >= 1.0.0):

library(dplyr)
ex_dat %>%
group_by(ID) %>%
summarize(across(-one_of(c("start_dt","end_dt","diagnosis_dt")),
~ if_else(any(diagnosis_dt > start_dt & diagnosis_dt < end_dt & .),
1, 0)))
## A tibble: 3 x 4
# ID disease1 disease2 disease3
# <fct> <dbl> <dbl> <dbl>
#1 a 0 1 0
#2 b 1 0 1
#3 c NA NA NA

Note that using the & operator on the integer column . converts to logical. I'm using the -one_of tidyselect verb because then we don't even need to know how many diseases there are. The columns that are actively being group_by-ed are automatically excluded.

Your version isn't working because 1) you need to summarize, not mutate, and 2) inside the function call . refers to the column that is being worked on, not the data from piping. Instead, you need to access those columns without $ from the calling environment.



Related Topics



Leave a reply



Submit