Remove multiple columns from data.table
This looks like a solid, reproducible bug. It's been filed as Bug #2791.
It appears that repeating the column attempts to delete the subsequent columns.
If no columns remain, then R crashes.
UPDATE : Now fixed in v1.8.11. From NEWS :
Assigning to the same column twice in the same query is now an error rather than a crash in some circumstances; e.g., DT[,c("B","B"):=NULL] (delete by reference the same column twice). Thanks to Ricardo (#2751) and matt_k (#2791) for reporting. Tests added.
data.table - delete columns programmatically
We can wrap it inside the brackets, and then assign (:=
) to 'NULL' (preferred way)
DT[, (cols.to.del) := NULL]
Or another option (in case we don't want to wrap it with brackets) would be to loop over the 'cols.to.del' in a for
loop and assign to NULL
for(j in seq_along(cols.to.del)){
DT[, cols.to.del[j] := NULL]
}
Or for subsetting the columns, we can use setdiff
along with with=FALSE
.
DT[, setdiff(names(DT), cols.to.del), with=FALSE]
How to remove two columns with the same name in data table R
You can use indices instead.
cols_to_delete = c(1, 3)
# OR
# cols_to_delete <- which(duplicated(names(my_dt)) | duplicated(names(my_dt),fromLast = TRUE))
my_dt[, (cols_to_delete) := NULL]
Delete column in data.table based on condition (row wise)
How about this?
dt[,dt[1]!="-", with=FALSE]
Removing columns from a data.table in R based on conditions
dt = data.table("col1" = "a", "col2" = "b", "col3" = "c",
"col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
"col9" = 0, "col10" = 99)
not0 = function(x) is.numeric(x) && !anyNA(x) && all(x!=0)
dt[, .(
## your categorical columns
col1, col2, col3, col4, col5,
## new column pasted from non-0 numeric columns
new = as.numeric(paste0(unlist(.SD), collapse=""))
),
## this filters columns to be provided in .SD column subset
.SDcols = not0,
## we group by each row so it will handle input of multiple rows
by = .(row=seq_len(nrow(dt)))
][, row:=NULL ## this removes extra grouping column
][] ## this prints
# col1 col2 col3 col4 col5 new
#1: a b c d e 9799
Alternatively if you want to update in place existing table
is0 = function(x) is.numeric(x) && !anyNA(x) && all(x==0)
## remove columns that has 0
dt[, which(sapply(dt, is0)) := NULL]
## add new column
dt[, new := as.numeric(
paste0(unlist(.SD), collapse="")
), .SDcols=is.numeric, by=.(row=seq_len(nrow(dt)))
][]
# col1 col2 col3 col4 col5 col6 col8 col10 new
#1: a b c d e 9 7 99 9799
How do you delete a column by name in data.table?
Any of the following will remove column foo
from the data.table df3
:
# Method 1 (and preferred as it takes 0.00s even on a 20GB data.table)
df3[,foo:=NULL]
df3[, c("foo","bar"):=NULL] # remove two columns
myVar = "foo"
df3[, (myVar):=NULL] # lookup myVar contents
# Method 2a -- A safe idiom for excluding (possibly multiple)
# columns matching a regex
df3[, grep("^foo$", colnames(df3)):=NULL]
# Method 2b -- An alternative to 2a, also "safe" in the sense described below
df3[, which(grepl("^foo$", colnames(df3))):=NULL]
data.table also supports the following syntax:
## Method 3 (could then assign to df3,
df3[, !"foo"]
though if you were actually wanting to remove column "foo"
from df3
(as opposed to just printing a view of df3
minus column "foo"
) you'd really want to use Method 1 instead.
(Do note that if you use a method relying on grep()
or grepl()
, you need to set pattern="^foo$"
rather than "foo"
, if you don't want columns with names like "fool"
and "buffoon"
(i.e. those containing foo
as a substring) to also be matched and removed.)
Less safe options, fine for interactive use:
The next two idioms will also work -- if df3
contains a column matching "foo"
-- but will fail in a probably-unexpected way if it does not. If, for instance, you use any of them to search for the non-existent column "bar"
, you'll end up with a zero-row data.table.
As a consequence, they are really best suited for interactive use where one might, e.g., want to display a data.table minus any columns with names containing the substring "foo"
. For programming purposes (or if you are wanting to actually remove the column(s) from df3
rather than from a copy of it), Methods 1, 2a, and 2b are really the best options.
# Method 4:
df3[, .SD, .SDcols = !patterns("^foo$")]
Lastly there are approaches using with=FALSE
, though data.table
is gradually moving away from using this argument so it's now discouraged where you can avoid it; showing here so you know the option exists in case you really do need it:
# Method 5a (like Method 3)
df3[, !"foo", with=FALSE]
# Method 5b (like Method 4)
df3[, !grep("^foo$", names(df3)), with=FALSE]
# Method 5b (another like Method 4)
df3[, !grepl("^foo$", names(df3)), with=FALSE]
Related Topics
How to 'Source()' and Continue After an Error
Suggestions for Speeding Up Random Forests
Create a Vector of All Days Between Two Dates
Ggplot2: Connecting Points in Polar Coordinates with a Straight Line 2
Aggregate and Reshape from Long to Wide
How to Add a Index by Set of Data When Using Rbindlist
Superscript and Subscript Axis Labels in Ggplot2
Standard Error Bars Using Stat_Summary
Converting Date in Year.Decimal Form in R
How to Define More Line Types for Graphs in R (Custom Linetype)
How to Delete Groups Containing Less Than 3 Rows of Data in R
Reading Text File with Multiple Space as Delimiter in R
How to Get the Name of the Calling Function Inside the Called Routine