How to Delete a Row by Reference in Data.Table

Remove rows conditionally from a data.table in R

In this scenario it is not so different than data.frame

data <- data[ menuitem != 'coffee' | amount > 0] 

Delete/add row by reference it is to be implemented. You find more info in this question

Regarding speed:

1 You can benefit from keys by doing something like:

setkey(data, menuitem)
data <- data[!"coffee"]

which will be faster than data <- data[ menuitem != 'coffee']. However to apply the same filters you asked in the question you'll need a rolling join (I've finished my lunch break I can add something later :-)).

2 Even without key data.table is much faster for relatively big table (similar speed for handful amount of rows)

> microbenchmark(dt[ id == "a"], df[ df$id == "a",])
Unit: milliseconds
expr min lq median uq max neval
dt[id == "a"] 24.42193 25.74296 26.00996 26.35778 27.36355 100
df[df$id == "a", ] 138.17500 146.46729 147.38646 149.06766 154.10051 100

remove rows by reference to column values in data.table r

I assume you're learning data.table. Thus a data.table way is

setkey(sample, STUDY_LOCATION)
sample[!c('Malaysia', 'South Africa', 'Singapore')]

Remove rows in data.table according to another data.table

Use an anti-join:

dtA[!dtB, on=.(date, company, value)]

This matches all records in dtA that are not found in dtB using the columns in on.

How to delete a row from a Datatable

You get "Cannot read property 'row' of undefined" because table not is initialised. The Api is simply not yet passed back to the variable reference at the time createdRow is executed.

But inside DT's callbacks you always have this, which actually is the dataTable (jQuery) instance of the DataTable API. So

createdRow: function ( row, data, index ) {
if ( data[1] == data[2] && data[2] == data[3] ) {
this.api().row(row).remove() //<-----

How do you delete a column by name in data.table?

Any of the following will remove column foo from the data.table df3:

# Method 1 (and preferred as it takes 0.00s even on a 20GB data.table)

df3[, c("foo","bar"):=NULL] # remove two columns

myVar = "foo"
df3[, (myVar):=NULL] # lookup myVar contents

# Method 2a -- A safe idiom for excluding (possibly multiple)
# columns matching a regex
df3[, grep("^foo$", colnames(df3)):=NULL]

# Method 2b -- An alternative to 2a, also "safe" in the sense described below
df3[, which(grepl("^foo$", colnames(df3))):=NULL]

data.table also supports the following syntax:

## Method 3 (could then assign to df3, 
df3[, !"foo"]

though if you were actually wanting to remove column "foo" from df3 (as opposed to just printing a view of df3 minus column "foo") you'd really want to use Method 1 instead.

(Do note that if you use a method relying on grep() or grepl(), you need to set pattern="^foo$" rather than "foo", if you don't want columns with names like "fool" and "buffoon" (i.e. those containing foo as a substring) to also be matched and removed.)

Less safe options, fine for interactive use:

The next two idioms will also work -- if df3 contains a column matching "foo" -- but will fail in a probably-unexpected way if it does not. If, for instance, you use any of them to search for the non-existent column "bar", you'll end up with a zero-row data.table.

As a consequence, they are really best suited for interactive use where one might, e.g., want to display a data.table minus any columns with names containing the substring "foo". For programming purposes (or if you are wanting to actually remove the column(s) from df3 rather than from a copy of it), Methods 1, 2a, and 2b are really the best options.

# Method 4:
df3[, .SD, .SDcols = !patterns("^foo$")]

Lastly there are approaches using with=FALSE, though data.table is gradually moving away from using this argument so it's now discouraged where you can avoid it; showing here so you know the option exists in case you really do need it:

# Method 5a (like Method 3)
df3[, !"foo", with=FALSE]
# Method 5b (like Method 4)
df3[, !grep("^foo$", names(df3)), with=FALSE]
# Method 5b (another like Method 4)
df3[, !grepl("^foo$", names(df3)), with=FALSE]

Is there a way to specify the current row in R data.table

Using match:

df2[, parent_value := value[match(parent_ID, ID)]]
   ID parent_ID value parent_value
1: 1 0 S1 <NA>
2: 4 3 S4 S3
3: 5 3 S5 S3
4: 3 1 S3 S1

Related Topics

Leave a reply
