How to change type of target column when doing := by group in a data.table in R?
We can convert the class of 'x' column to 'numeric' before assigning the 'mean(y)' to 'x' as the class of 'x' is 'integer'. This may be useful if we are replacing 'x' with the mean
of any other numeric variable (including 'x').
db[, x:= as.numeric(x)][, x:= mean(y), by=id][]
Or assign to a new column, and change the column name afterwards
setnames(db[, x1:= mean(y),by=id][,x:=NULL],'x1', 'x')
Or we can assign 'x' to 'NULL' and then create 'x' as the mean
of 'y' ( @David Arenburg's suggestion)
db[, x:=NULL][, x:= mean(y), by= id][]
How do you change the type of target column when using .() group by in R data.table
Apparently all you have to do is add as.numeric()
inside of median()
.
b <- a[, .(Count = .N, "Failure Count" = sum(CLASS == "2"),
"Median DIF" = median(as.numeric(TYPE))), by = ID]
R error: Coercing 'list' RHS to 'double' to match the type of the target column
Here is my attempt. I did not handle the cases where there is NA in date. What you can do is to calculate the gaps between 1) index date and date 1, and 2) index date and date 2. Using the gaps, you can run logical checks. Based on that, you can assign targeted dates and values.
I am not the right person who can explain the error message well. But, I think you are facing coercion issues. See page 16 in the CRAN manual (version 1.12.8), where you can find information on :=
(assignment by reference). If anybody can provide technical explanation, please do so.
setDT(mydt)[, `:=` (taken_date = fifelse(test = abs(index_date - date_1) < abs(index_date - date_2),
yes = date_1,
no = date_2),
taken_res = fifelse(test = abs(index_date - date_1) < abs(index_date - date_2),
yes = res_1,
no = res_2))][]
# index_date date_1 date_2 res_1 res_2 taken_date taken_res
# 1: 2015-08-25 2013-11-13 2015-08-25 1.5 1.5 2015-08-25 1.5
# 2: 2017-09-11 2016-09-29 2017-05-12 2.7 2.4 2017-05-12 2.4
# 3: 2015-08-17 2014-08-08 2015-06-08 2.0 2.6 2015-06-08 2.6
# 4: 2017-05-14 2016-05-31 2016-12-19 1.3 1.2 2016-12-19 1.2
# 5: 2015-11-14 2014-11-11 2015-08-10 1.6 2.8 2015-08-10 2.8
# 6: 2016-08-08 <NA> 2016-08-08 NA 1.4 <NA> NA
# 7: 2018-12-01 2014-05-30 2017-07-24 1.7 1.8 2017-07-24 1.8
# 8: 2013-01-11 <NA> 2012-10-23 NA 3.7 <NA> NA
# 9: 2015-06-06 <NA> 2015-02-07 NA 1.3 <NA> NA
#10: 2015-05-19 <NA> 2015-05-19 NA 1.4 <NA> NA
Although you have not clearly mentioned what you wanna do with rows with NA, it seems to me that you were trying to do something like this.
setDT(mydt)[, `:=` (taken_date = fifelse(test = abs(index_date - date_1) < abs(index_date - date_2),
yes = date_1,
no = date_2),
taken_res = fifelse(test = abs(index_date - date_1) < abs(index_date - date_2),
yes = res_1,
no = res_2))][is.na(date_1),
`:=` (taken_date = date_2, taken_res = res_2)][is.na(date_2),
`:=` (taken_date = date_1, taken_res = res_1)]
# index_date date_1 date_2 res_1 res_2 taken_date taken_res
# 1: 2015-08-25 2013-11-13 2015-08-25 1.5 1.5 2015-08-25 1.5
# 2: 2017-09-11 2016-09-29 2017-05-12 2.7 2.4 2017-05-12 2.4
# 3: 2015-08-17 2014-08-08 2015-06-08 2.0 2.6 2015-06-08 2.6
# 4: 2017-05-14 2016-05-31 2016-12-19 1.3 1.2 2016-12-19 1.2
# 5: 2015-11-14 2014-11-11 2015-08-10 1.6 2.8 2015-08-10 2.8
# 6: 2016-08-08 <NA> 2016-08-08 NA 1.4 2016-08-08 1.4
# 7: 2018-12-01 2014-05-30 2017-07-24 1.7 1.8 2017-07-24 1.8
# 8: 2013-01-11 <NA> 2012-10-23 NA 3.7 2012-10-23 3.7
# 9: 2015-06-06 <NA> 2015-02-07 NA 1.3 2015-02-07 1.3
#10: 2015-05-19 <NA> 2015-05-19 NA 1.4 2015-05-19 1.4
DATA
mydt <- structure(list(index_date = structure(c(16672, 17420, 16664,
17300, 16753, 17021, 17866, 15716, 16592, 16574), class = "Date"),
date_1 = structure(c(16022, 17073, 16290, 16952, 16385, NA,
16220, NA, NA, NA), class = "Date"), date_2 = structure(c(16672,
17298, 16594, 17154, 16657, 17021, 17371, 15636, 16473, 16574
), class = "Date"), res_1 = c(1.5, 2.7, 2, 1.3, 1.6, NA,
1.7, NA, NA, NA), res_2 = c(1.5, 2.4, 2.6, 1.2, 2.8, 1.4,
1.8, 3.7, 1.3, 1.4)), row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10"), class = "data.frame")
Change value of column based on criteria and by group
We return 1 for the rows from the first occurrence where flag = 1
and the group has at least one flag = 1
library(data.table)
dt[,flag := +(seq_len(.N)>= which.max(flag == 1) & any(flag == 1)),by = group]
dt
# year month flag group
# 1: 1992 6 1 8
# 2: 1992 7 1 8
# 3: 1992 8 1 8
# 4: 1992 9 1 8
# 5: 1992 10 1 8
# 6: 1992 11 1 8
# 7: 1992 12 1 8
# 8: 1995 6 0 10
# 9: 1995 7 0 11
#10: 1995 8 0 11
#11: 1995 9 1 11
#12: 1995 10 1 11
#13: 1995 11 1 11
#14: 1995 12 1 11
#15: 1998 6 0 13
#16: 1998 7 0 13
#17: 1998 8 0 13
#18: 1998 9 0 13
#19: 1998 10 0 13
#20: 1998 11 0 13
#21: 1998 12 0 13
# year month flag group
Which in dplyr
would be
library(dplyr)
dt %>%
group_by(group) %>%
mutate(flag = +(row_number() >= which.max(flag == 1) & any(flag == 1)))
and in base R using ave
would be
dt$flag <- with(dt, +(ave(flag == 1, group, FUN = function(x)
seq_along(x) >= which.max(x) & any(x))))
data
dt <- structure(list(year = c(1992, 1992, 1992, 1992, 1992, 1992, 1992,
1992, 1992, 1992, 1992, 1992, 1995, 1995, 1995, 1995, 1995, 1995,
1995, 1995, 1995, 1995, 1995, 1995, 1998, 1998, 1998, 1998, 1998,
1998, 1998, 1998, 1998, 1998, 1998, 1998), month = c(1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), flag = c(0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), group = c(8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 10L, 10L, 10L, 10L, 10L,
10L, 11L, 11L, 11L, 11L, 11L, 11L, 13L, 13L, 13L, 13L, 13L, 13L,
13L, 13L, 13L, 13L, 13L, 13L)), row.names = c(NA, -36L), class =
c("data.table","data.frame"))
R mutate a column by group in ifelse
with data.table
:
df[,result:=fifelse(is.na(category),last(stopId),first(stopId)),by=id][]
id stopId category result
1: 1 a 1 a
2: 1 b 1 a
3: 1 c 1 a
4: 2 a NA c
5: 2 b NA c
6: 2 c NA c
7: 3 a 2 a
8: 3 b 2 a
9: 3 c 2 a
Column type set by first element being evaluated in r/data.table
Do not let your function return NA
, but NA_integer_
, or NA_real_
..
problem solved ;-)
myfun <- function(x) {
if(x == 0) {
return(NA_integer_) #<-- !!
} else {
return(x*2)
}
}
multiple data.table columns to one column of vectors
Try this?
> asplit(unname(tab[, V1:V3]), 1)
[[1]]
"a" "d" "g"
[[2]]
"b" "e" "h"
[[3]]
"c" "f" "i"
Related Topics
Use Filter in Dplyr Conditional on an If Statement in R
Predict.Lm() in a Loop. Warning: Prediction from a Rank-Deficient Fit May Be Misleading
How to Label a Barplot Bar with Positive and Negative Bars with Ggplot2
How to Write to JSON with Children from R
If Else Condition in Ggplot to Add an Extra Layer
Populating a Data Frame in R in a Loop
Ggplot2 Bar Plot, No Space Between Bottom of Geom and X Axis Keep Space Above
Simple Approach to Assigning Clusters for New Data After K-Means Clustering
How to Extract the Fill Colours from a Ggplot Object
How to Get a Barplot with Several Variables Side by Side Grouped by a Factor
How to Use Multiple Versions of the Same R Package
R Knitr Markdown: Output Plots Within for Loop
Pattern Matching Using a Wildcard
R Scatter Plot: Symbol Color Represents Number of Overlapping Points
Dplyr Issues When Using Group_By(Multiple Variables)
Extract Hours and Seconds from Posixct for Plotting Purposes in R