R data.table struggling with conditional subsetting when column name is predefined elsewhere
I can imagine this was very frustrating for you. I applaud the number of things you tried before posting. Here's one approach:
DT[get(column_name) == 1,]
x y
1: 1 0
2: 1 1
If you need to use column_name
in J
, you can use get(..column_name)
:
DT[,get(..column_name)]
[1] 1 1 0 0
The ..
instructs evaluation to occur in the parent environment.
Another approach for using a string in either I
or J
is with eval(as.name(column_name))
:
DT[eval(as.name(column_name)) == 1]
x y
1: 1 0
2: 1 1
DT[,eval(as.name(column_name))]
[1] 1 1 0 0
Why is my effort in transposing a data table not working?
instead of t() which is good for matrices() use transpose from data.table.
Making changes only where required, its works for me now without any warning error.
with t() now commented our and next row with the only change in code :
data = if(input$transposeDT=='Rows'){
#as.data.frame(t(results()))
results()%>%transpose(make.names = 'to_state',keep.names = 'to_state')
} else {results()},
You should see the transposition working by toggling the radioButton.
From this Default :
To this transposed. [I could not understand your logic but transposing is a reali life problem.
How to create means and s.d. columns with data.table (based on multiple conditions)
As far as I can tell you want the sum of con
per id
, for data where ds
is between -365 and 0, then the mean, median and standard deviation of those sums. This can be done by chaining two expressions.
DT[ds >= -365 & ds < 0, sum(con), by = "id"][, .(mean(V1), median(V1), sd(V1))]
This first subsets your data, then sums con
per id
, then takes the mean etc. of the sums with a second expression
R data.table: efficiently access and update a variable column name in j expression with grouping
You can use .SDcols
to specify the columns that you want to operate on :
library(data.table)
columns <- c("A", "B", "C")
newcolumns <- paste0(columns, ".prime")
DT[, (newcolumns) := lapply(.SD, function(x) (x- mean(x))/sd(x)),
year(date), .SDcols = columns]
This avoids using get(x)
everytime and updates data.table
by reference.
Error subsetting data table with [] but not with $-operator
First let's understand why it doesn't work in your case. When you are doing
df[1:110,2] > 10
# Number
# [1,] FALSE
# [2,] FALSE
# [3,] FALSE
# [4,] FALSE
# [5,] FALSE
# [6,] FALSE
# [7,] FALSE
#....
it returns a 1 column matrix which is used for subsetting.
class(df[1:110,2] > 10)
#[1] "matrix"
which works fine on dataframe
df1 <- data.frame(df)
subset(df1[1:110,], df1[1:110,2] > 10)
# Day Number
#11 2014-01-11 11
#12 2014-01-12 12
#13 2014-01-13 13
#14 2014-01-14 14
#15 2014-01-15 15
#....
but not on data.table
. Unfortunately subsetting doesn't work that way in data.table
. You could convert it into a vector instead of matrix and then use it for subsetting
subset(df[1:110,], df[1:110][[2]] > 10)
# Day Number
# 1: 2014-01-11 11
# 2: 2014-01-12 12
# 3: 2014-01-13 13
# 4: 2014-01-14 14
# 5: 2014-01-15 15
#...
The difference would be more clear when you see the results of
df[matrix(TRUE), ]
vs
df1[matrix(TRUE), ]
PS - in the first case doing
subset(df[1:110,], Number > 10)
would also have worked.
Overriding data.table key order causes incorrect merge results
The problem is that with dplyr::arrange(x, b)
you do not remove the sorted
attribute from your data.table
contrary to using x <- x[order(b)]
or setorder(x, "b")
.
The data.table
way would be to use setorder
in the first place e.g.
library(data.table)
x <- data.table(a = sample(1000:1100), b = sample(c("A", NA, "B", "C", "D"), replace = TRUE), c = letters)
setorder(x, "b", "a", na.last=TRUE)
The wrong results of joins on data.table
s which have a key
although they are not sorted by it, is a known bug (see also #5361 in data.table
bug tracker).
Pass a data.table column name to a function using :=
set
is the data.table-idiomatic way. If you need to do other stuff like use by
, rlang
has a generic way to delay evaluation, which is to enexpr
the args (or enquo if you want them evaluated in the original environment) and !!
them inside an inject
with the expression you'd normally use.
library(rlang)
#> Warning: package 'rlang' was built under R version 4.1.2
library(data.table)
#>
#> Attaching package: 'data.table'
#> The following object is masked from 'package:rlang':
#>
#> :=
dat <- data.table(x = 1:4,
y = 5:8)
new_column <- function(df, col_name, expr) {
col_name <- enexpr(col_name)
expr <- enexpr(expr)
inject(df[, !!col_name := !!expr])
}
new_column(dat, z, x + y)
dat
#> x y z
#> <int> <int> <int>
#> 1: 1 5 6
#> 2: 2 6 8
#> 3: 3 7 10
#> 4: 4 8 12
Created on 2022-02-25 by the reprex package (v2.0.1)
Or, similarly without rlang
library(data.table)
dat <- data.table(x = 1:4,
y = 5:8)
new_column <- function(df, col_name, expr) {
col_name <- deparse(substitute(col_name))
expr <- substitute(expr)
df[, (col_name) := eval(expr)]
}
new_column(dat, z, x + y)
dat
#> x y z
#> <int> <int> <int>
#> 1: 1 5 6
#> 2: 2 6 8
#> 3: 3 7 10
#> 4: 4 8 12
Created on 2022-02-25 by the reprex package (v2.0.1)
Using the "programming on the language" interface in the dev version.
https://rdatatable.gitlab.io/data.table/news/index.html
library(data.table)
dat <- data.table(x = 1:4,
y = 5:8)
new_column <- function(df, col_name, expr) {
df[, col_name := expr,
env = list(col_name = substitute(col_name),
expr = substitute(expr))]
}
new_column(dat, z, x + y)
dat
#> x y z
#> <int> <int> <int>
#> 1: 1 5 6
#> 2: 2 6 8
#> 3: 3 7 10
#> 4: 4 8 12
Created on 2022-03-01 by the reprex package (v2.0.1)
Or using match.call
library(data.table)
dat <- data.table(x = 1:4,
y = 5:8)
new_column <- function(df, col_name, expr) {
fun_call <- match.call()
df[, col_name := expr,
env = as.list(fun_call)[c('col_name', 'expr')]]
}
new_column(dat, z, x + y)
dat
#> x y z
#> <int> <int> <int>
#> 1: 1 5 6
#> 2: 2 6 8
#> 3: 3 7 10
#> 4: 4 8 12
Created on 2022-03-01 by the reprex package (v2.0.1)
Related Topics
R Data.Table Join: SQL "Select *" Alike Syntax in Joined Tables
Removing a Group of Words from a Character Vector
Ggplot2: Cannot Color Area Between Intersecting Lines Using Geom_Ribbon
Shiny - How to Change the Font Size in Select Tags
How to Install 2 Different R Versions on Debian
How to Prevent Rplots.Pdf from Being Generated
Making Gsub Only Replace Entire Words
Perform Operation on Each Imputed Dataset in R's Mice
R - How to One Hot Encoding a Single Column While Keep Other Columns Still
Error in Bind_Rows_(X, .Id):Column Can't Be Converted from Factor to Numeric
How to Write Data from R to Postgresql Tables with an Autoincrementing Primary Key
How to Apply Separate Coord_Cartesian() to "Zoom In" into Individual Panels of a Facet_Grid()
Why Is Date Is Being Returned as Type 'Double'
How to Run a Function Every Second
Shiny App Does Not Reflect Changes in Update Rdata File