What is the equivalent of mutate_at (dplyr) in data.table?
With data.table
, we can specify the columns of interest in .SDcols
, loop through the .SD
with lapply
and apply the function of interest. Here, the funcion rollapply
is repeated with only change in width
parameter. So, it may be better to create a function to avoid repeating the whole arguments. Also, while applying the function (f1
), the output can be kept in a list
, later unlist
with recursive = FALSE
and assign (:=
) to columns of interest
library(data.table)
library(zoo)
nm1 <- c("B", "C")
nm2 <- paste0(nm1, "_Roll.Mean.Week")
nm3 <- paste0(nm1, "_Roll.Mean.Two.Week")
f1 <- function(x, width) rollapply(x, width = width, mean,
align = "right", fill = 0, na.rm = TRUE, partial = TRUE)
setDT(Data)[, c(nm2, nm3) := unlist(lapply(.SD, function(x)
list(f1(x, 7), f1(x, 14))), recursive = FALSE), by = A, .SDcols = nm1]
head(Data)
# A B C B_Roll.Mean.Week C_Roll.Mean.Week B_Roll.Mean.Two.Week C_Roll.Mean.Two.Week
#1: 1 1 101 1 1 101 101
#2: 2 2 102 2 2 102 102
#3: 1 3 103 2 2 102 102
#4: 2 4 104 3 3 103 103
#5: 1 5 105 3 3 103 103
#6: 2 6 106 4 4 104 104
Note that funs
is deprecated in tidyverse
and in its place, can use list(~
or just ~
Data %>%
group_by(A) %>%
mutate_at(vars(B,C), list(Roll.Mean.Week = ~f1(., 7),
Roll.Mean.Two.Week = ~ f1(., 14)))%>%
ungroup()
What is the equivalent of dplyr Mutate in data.table , R ?
If we are using data.table
, do a join by 'KEY' with the 'RefDT' assign (:=
-similar to mutate
) the 'TYPE' in 'RefDT' to create the 'TYPE' column in 'DT'. If there are no matches, it will by default be filled with NA
. Then do the subsequent assignments by specifying the logical condition in i
(grepl("-", NO)
- check for -
in the "NO" column, check for "P" or "R" in "GROUP" where the "TYPE" is NA
)
setDT(DT)[RefDT, TYPE := TYPE, on = .(KEY)]
DT[grepl("-", NO), TYPE := "INN"
][is.na(TYPE) & grepl("P|R", GROUP), TYPE := "OTHER"][]
# NO GROUP KEY TYPE
#1: 12-19 N 1701 INN
#2: 10-20 N 1602 INN
#3: 13 P 1501John BANK
#4: 14 R 1408Mary POOL
#5: 15 G 1408Peter PARK
#6: 19 K 1408Paul BANK
#7: 25 P 1708 OTHER
#8: 36 R 1503 OTHER
data
DT <- structure(list(NO = c("12-19", "10-20", "13", "14", "15", "19",
"25", "36"), GROUP = c("N", "N", "P", "R", "G", "K", "P", "R"
), KEY = c("1701", "1602", "1501John", "1408Mary", "1408Peter",
"1408Paul", "1708", "1503")), .Names = c("NO", "GROUP", "KEY"
), row.names = c(NA, -8L), class = "data.frame")
RefDT <- structure(list(KEY = c("1609TOM", "1501John", "1408Mary", "1408Peter",
"1408Paul", "1309Sue"), TYPE = c("PARK", "BANK", "POOL", "PARK",
"BANK", "POOL")), .Names = c("KEY", "TYPE"),
class = "data.frame", row.names = c(NA,
-6L))
opposite of mutate_at in dplyr
Turning @Axeman's comment into an answer as community wiki:
library(dplyr)
df %>%
mutate_at(., vars(-var_not_to_be_modified), as.numeric)
# A tibble: 10 x 2
# var_not_to_be_modified var_to_be_modified
# <chr> <dbl>
# 1 F 1
# 2 F 1
# 3 F 1
# 4 F 1
# 5 F 1
# 6 T 0
# 7 T 1
# 8 F 1
# 9 F 0
#10 T 1
From the help page of vars
:
Arguments
... Variables to include/exclude in mutate/summarise. You can use same specifications as in select(). If missing, defaults to all non-grouping variables.
Mutate_if or mutate_at in dplyr with Dates
As akrun noted, one of the columns is already in dttm
format. Once that column is ignored the following code works for me:
tib %>%
select(-fifth_dt) %>%
mutate_at(vars(ends_with("dt")), parse_date_time, orders = "%d-%m-%y")
convert selected columns at once to integer in R in dplyr
You can use across
to apply same function to multiple columns.
library(dplyr)
df %>% mutate(across(all_of(cols), as.integer))
#In old version we use `mutate_at`
#df %>% mutate_at(all_of(cols), as.integer)
# depth table price x y z
# <int> <int> <int> <int> <int> <dbl>
#1 61 55 326 3 3 2.43
#2 59 61 326 3 3 2.31
#3 56 65 327 4 4 2.31
Using all_of
is not required but it is a good practice to use it when we use variables which are not present in the dataframe.
dplyr: access column name in mutate_at function
Instead of using mutate_at
why not use mutate
combined with across
and cur_column
i.e:
df %>%
mutate( across( c(carb,disp), ~ . - pull(df, paste0(cur_column(), "_new") ), .names = "{.col}_corrected") )
dplyr::mutate_at() with external variables and conditional on their values
Here's an approach with the dplyr
across
functionality (version >= 1.0.0):
library(dplyr)
ex_dat %>%
group_by(ID) %>%
summarize(across(-one_of(c("start_dt","end_dt","diagnosis_dt")),
~ if_else(any(diagnosis_dt > start_dt & diagnosis_dt < end_dt & .),
1, 0)))
## A tibble: 3 x 4
# ID disease1 disease2 disease3
# <fct> <dbl> <dbl> <dbl>
#1 a 0 1 0
#2 b 1 0 1
#3 c NA NA NA
Note that using the &
operator on the integer column .
converts to logical. I'm using the -one_of
tidyselect verb because then we don't even need to know how many diseases there are. The columns that are actively being group_by
-ed are automatically excluded.
Your version isn't working because 1) you need to summarize, not mutate, and 2) inside the function call .
refers to the column that is being worked on, not the data from piping. Instead, you need to access those columns without $
from the calling environment.
Related Topics
Error When Mapping in Ggmap with API Key (403 Forbidden)
Calculate Difference Between Dates by Group in R
Writing a Function to Calculate the Mean of Columns in a Dataframe in R
Tls V1.1/Tls V1.2 Support in Rcurl
R - Converting Posixct to Milliseconds
Highlight a Single "Bar" in Ggplot
Lm and Predict - Agreement of Data.Frame Names
Finding If Boolean Is Ever True by Groups in R
Removing Row with Duplicated Values in All Columns of a Data Frame (R)
R: How to Get a Sum of Two Distributions
How to Substitute Symbols in a Language Object
Data.Table: Sum by All Existing Combinations in Table
Reshape Data for Values in One Column
How to Create a Dropdown List in a Shiny Table Using Datatable When Editing the Table
R: Finding the Intersect of Two Lines