data.table objects assigned with := from within function not printed
As David Arenburg mentions in a comment, the answer can be found here. There was a bug fixed in the version 1.9.6 but the fix introduced this downside.
One should call DT[]
at the end of the function to prevent this behaviour.
myfunction <- function(dt) {
dt[, z := y - x][]
}
myfunction(mydt) # prints immediately
# x y z
# 1: 1 5 4
# 2: 2 6 4
# 3: 3 7 4
This is described in data.table
FAQ 2.23:
Why do I have to type
DT
sometimes twice after using:=
to print the result to console?
This is an unfortunate downside to get #869 to work. If a
:=
is used inside a function with noDT[]
before the end of the function, then the next timeDT
is typed at the prompt, nothing will be printed. A repeatedDT
will print. To avoid this: include aDT[]
after the last:=
in your function. If that is not possible (e.g., it's not a function you can change) thenprint(DT)
andDT[]
at the prompt are guaranteed to print. As before, adding an extra[]
on the end of:=
query is a recommended idiom to update and then print; e.g.>DT[,foo:=3L][]
.
data.table is not displayed on first call after being modified in a function
If you are using v1.9.6, see the corresponding Readme (sec. Bugfixes, 1st entry, https://github.com/Rdatatable/data.table):
if (TRUE) DT[,LHS:=RHS] no longer prints, #869 and #1122. Tests added. To get this to work we've had to live with one downside: if a := is used inside a function with no DT[] before the end of the function, then the next time DT or print(DT) is typed at the prompt, nothing will be printed. A repeated DT or print(DT) will print. To avoid this: include a DT[] after the last := in your function. If that is not possible (e.g., it's not a function you can change) then DT[] at the prompt is guaranteed to print. As before, adding an extra [] on the end of a := query is a recommended idiom to update and then print; e.g. > DT[,foo:=3L][]. Thanks to Jureiss and Jan Gorecki for reporting.
Thus: Does calling DT[]
after your function call help?
R function returns nothing instead of data.table object when data.table := is last operation
library(data.table)
data <- data.table(x = 1:3)
test_function_1 <- function(df){
df[, new_column := 1][]
}
test_function_2 <- function(df){
df[, new_column := 1][]
return(df)
}
test_function_3 <- function(df){
df[, new_column := 1]
data.table(df)
}
test_function_1(data) # returns the modified data.table
test_function_2(data) # returns the modified data.table
test_function_3(data) # returns the modified data.table
more info: H E R E
An option to not suppress output after := assignment in data.table
One approach in 1.9.6 is to patch the print.data.table S3 method.
Prior to calling the original method, set the .global$print value to "" (default). This undoes how this value was just changed prior to the generic print method being called (using dynamic scoping rules), in the case where data.table would like to return invisibly (e.g., an assignment := line).
The effect is that the custom print method for data.table is still called, but data.table no longer tries to modify R's default logic to decide when and when not to print.
Likely a naive solution, as I'm still learning about packages, namespaces, environments, S3 methods, etc.
library(data.table)
print.data.table.orig = get('print.data.table', envir=asNamespace('data.table'))
print.data.table.patch = function(x, ...) {
.globalRef = get('.global', envir=asNamespace('data.table'))
.globalRef$print = ""
print.data.table.orig(x, ...)
}
library(R.methodsS3)
setMethodS3('print', 'data.table', print.data.table.patch)
fTbl = data.table(x=1:500000)
fTbl[, x := 5]
x
1: 5
2: 5
3: 5
4: 5
5: 5
---
499996: 5
499997: 5
499998: 5
499999: 5
500000: 5
fTbl
x
1: 5
2: 5
3: 5
4: 5
5: 5
---
499996: 5
499997: 5
499998: 5
499999: 5
500000: 5
>
Manipulate data.table objects within user defined function
This problem is a different flavor of the one described in the post Function on data.table environment errors. It's not exactly a problem, just how dget
is designed. But for those curious, this happens because dget
assigns the object to parent environment base
, and the namespace base
isn't data.table
aware.
If x is a function the associated environment is stripped. Hence scoping information can be lost.
One workaround is to assign the function to the global enviornment:
> environment(foo) <- .GlobalEnv
But I think the best solution here is to use saveRDS
to transfer R objects, which is what ?dget
recommends.
Data.table is not returned visibly after applying ':='
We need to specify the []
after the assignment
testf <- function(dt){
dt[, t := seq(1:nrow(dt))][]
}
testf(dt)
# a b t
#1: 1 2 1
Related Topics
Why Can't R'S Ifelse Statements Return Vectors
Why Does Data.Table Update Names(Dt) by Reference, Even If I Assign to Another Variable
Expand Ranges Defined by "From" and "To" Columns
How to Convert a List Consisting of Vector of Different Lengths to a Usable Data Frame in R
Fitting a Density Curve to a Histogram in R
Why Is It Not Advisable to Use Attach() in R, and What Should I Use Instead
Geographic/Geospatial Distance Between 2 Lists of Lat/Lon Points (Coordinates)
Installing Older Version of R Package
How to Do Vlookup and Fill Down (Like in Excel) in R
What Does "The Following Object Is Masked from 'Package:Xxx'" Mean
Opposite of %In%: Exclude Rows With Values Specified in a Vector
Select the First and Last Row by Group in a Data Frame
Transpose/Reshape Dataframe Without "Timevar" from Long to Wide Format