How to Change Type of Target Column When Doing := by Group in a Data.Table in R

How to change type of target column when doing := by group in a data.table in R?

We can convert the class of 'x' column to 'numeric' before assigning the 'mean(y)' to 'x' as the class of 'x' is 'integer'. This may be useful if we are replacing 'x' with the mean of any other numeric variable (including 'x').

db[, x:= as.numeric(x)][, x:= mean(y), by=id][]

Or assign to a new column, and change the column name afterwards

setnames(db[, x1:= mean(y),by=id][,x:=NULL],'x1', 'x')

Or we can assign 'x' to 'NULL' and then create 'x' as the mean of 'y' ( @David Arenburg's suggestion)

db[, x:=NULL][, x:= mean(y), by= id][]

How do you change the type of target column when using .() group by in R data.table

Apparently all you have to do is add as.numeric() inside of median().

b <- a[, .(Count = .N, "Failure Count" = sum(CLASS == "2"), 
"Median DIF" = median(as.numeric(TYPE))), by = ID]

R error: Coercing 'list' RHS to 'double' to match the type of the target column

Here is my attempt. I did not handle the cases where there is NA in date. What you can do is to calculate the gaps between 1) index date and date 1, and 2) index date and date 2. Using the gaps, you can run logical checks. Based on that, you can assign targeted dates and values.

I am not the right person who can explain the error message well. But, I think you are facing coercion issues. See page 16 in the CRAN manual (version 1.12.8), where you can find information on := (assignment by reference). If anybody can provide technical explanation, please do so.

setDT(mydt)[, `:=` (taken_date = fifelse(test = abs(index_date - date_1) < abs(index_date - date_2),
yes = date_1,
no = date_2),
taken_res = fifelse(test = abs(index_date - date_1) < abs(index_date - date_2),
yes = res_1,
no = res_2))][]

# index_date date_1 date_2 res_1 res_2 taken_date taken_res
# 1: 2015-08-25 2013-11-13 2015-08-25 1.5 1.5 2015-08-25 1.5
# 2: 2017-09-11 2016-09-29 2017-05-12 2.7 2.4 2017-05-12 2.4
# 3: 2015-08-17 2014-08-08 2015-06-08 2.0 2.6 2015-06-08 2.6
# 4: 2017-05-14 2016-05-31 2016-12-19 1.3 1.2 2016-12-19 1.2
# 5: 2015-11-14 2014-11-11 2015-08-10 1.6 2.8 2015-08-10 2.8
# 6: 2016-08-08 <NA> 2016-08-08 NA 1.4 <NA> NA
# 7: 2018-12-01 2014-05-30 2017-07-24 1.7 1.8 2017-07-24 1.8
# 8: 2013-01-11 <NA> 2012-10-23 NA 3.7 <NA> NA
# 9: 2015-06-06 <NA> 2015-02-07 NA 1.3 <NA> NA
#10: 2015-05-19 <NA> 2015-05-19 NA 1.4 <NA> NA

Although you have not clearly mentioned what you wanna do with rows with NA, it seems to me that you were trying to do something like this.

setDT(mydt)[, `:=` (taken_date = fifelse(test = abs(index_date - date_1) < abs(index_date - date_2),
yes = date_1,
no = date_2),
taken_res = fifelse(test = abs(index_date - date_1) < abs(index_date - date_2),
yes = res_1,
no = res_2))][is.na(date_1),
`:=` (taken_date = date_2, taken_res = res_2)][is.na(date_2),
`:=` (taken_date = date_1, taken_res = res_1)]

# index_date date_1 date_2 res_1 res_2 taken_date taken_res
# 1: 2015-08-25 2013-11-13 2015-08-25 1.5 1.5 2015-08-25 1.5
# 2: 2017-09-11 2016-09-29 2017-05-12 2.7 2.4 2017-05-12 2.4
# 3: 2015-08-17 2014-08-08 2015-06-08 2.0 2.6 2015-06-08 2.6
# 4: 2017-05-14 2016-05-31 2016-12-19 1.3 1.2 2016-12-19 1.2
# 5: 2015-11-14 2014-11-11 2015-08-10 1.6 2.8 2015-08-10 2.8
# 6: 2016-08-08 <NA> 2016-08-08 NA 1.4 2016-08-08 1.4
# 7: 2018-12-01 2014-05-30 2017-07-24 1.7 1.8 2017-07-24 1.8
# 8: 2013-01-11 <NA> 2012-10-23 NA 3.7 2012-10-23 3.7
# 9: 2015-06-06 <NA> 2015-02-07 NA 1.3 2015-02-07 1.3
#10: 2015-05-19 <NA> 2015-05-19 NA 1.4 2015-05-19 1.4

DATA

mydt <- structure(list(index_date = structure(c(16672, 17420, 16664, 
17300, 16753, 17021, 17866, 15716, 16592, 16574), class = "Date"),
date_1 = structure(c(16022, 17073, 16290, 16952, 16385, NA,
16220, NA, NA, NA), class = "Date"), date_2 = structure(c(16672,
17298, 16594, 17154, 16657, 17021, 17371, 15636, 16473, 16574
), class = "Date"), res_1 = c(1.5, 2.7, 2, 1.3, 1.6, NA,
1.7, NA, NA, NA), res_2 = c(1.5, 2.4, 2.6, 1.2, 2.8, 1.4,
1.8, 3.7, 1.3, 1.4)), row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10"), class = "data.frame")

Change value of column based on criteria and by group

We return 1 for the rows from the first occurrence where flag = 1 and the group has at least one flag = 1

library(data.table)
dt[,flag := +(seq_len(.N)>= which.max(flag == 1) & any(flag == 1)),by = group]

dt

# year month flag group
# 1: 1992 6 1 8
# 2: 1992 7 1 8
# 3: 1992 8 1 8
# 4: 1992 9 1 8
# 5: 1992 10 1 8
# 6: 1992 11 1 8
# 7: 1992 12 1 8
# 8: 1995 6 0 10
# 9: 1995 7 0 11
#10: 1995 8 0 11
#11: 1995 9 1 11
#12: 1995 10 1 11
#13: 1995 11 1 11
#14: 1995 12 1 11
#15: 1998 6 0 13
#16: 1998 7 0 13
#17: 1998 8 0 13
#18: 1998 9 0 13
#19: 1998 10 0 13
#20: 1998 11 0 13
#21: 1998 12 0 13
# year month flag group

Which in dplyr would be

library(dplyr)
dt %>%
group_by(group) %>%
mutate(flag = +(row_number() >= which.max(flag == 1) & any(flag == 1)))

and in base R using ave would be

dt$flag <- with(dt, +(ave(flag == 1, group, FUN = function(x) 
seq_along(x) >= which.max(x) & any(x))))

data

dt <- structure(list(year = c(1992, 1992, 1992, 1992, 1992, 1992, 1992, 
1992, 1992, 1992, 1992, 1992, 1995, 1995, 1995, 1995, 1995, 1995,
1995, 1995, 1995, 1995, 1995, 1995, 1998, 1998, 1998, 1998, 1998,
1998, 1998, 1998, 1998, 1998, 1998, 1998), month = c(1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), flag = c(0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), group = c(8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 10L, 10L, 10L, 10L, 10L,
10L, 11L, 11L, 11L, 11L, 11L, 11L, 13L, 13L, 13L, 13L, 13L, 13L,
13L, 13L, 13L, 13L, 13L, 13L)), row.names = c(NA, -36L), class =
c("data.table","data.frame"))

R mutate a column by group in ifelse

with data.table:

df[,result:=fifelse(is.na(category),last(stopId),first(stopId)),by=id][]

id stopId category result
1: 1 a 1 a
2: 1 b 1 a
3: 1 c 1 a
4: 2 a NA c
5: 2 b NA c
6: 2 c NA c
7: 3 a 2 a
8: 3 b 2 a
9: 3 c 2 a

Column type set by first element being evaluated in r/data.table

Do not let your function return NA, but NA_integer_, or NA_real_..
problem solved ;-)

myfun <- function(x) {
if(x == 0) {
return(NA_integer_) #<-- !!
} else {
return(x*2)
}
}

multiple data.table columns to one column of vectors

Try this?

> asplit(unname(tab[, V1:V3]), 1)
[[1]]
"a" "d" "g"

[[2]]
"b" "e" "h"

[[3]]
"c" "f" "i"


Related Topics



Leave a reply



Submit