Remove rows conditionally from a data.table in R
In this scenario it is not so different than data.frame
data <- data[ menuitem != 'coffee' | amount > 0]
Delete/add row by reference it is to be implemented. You find more info in this question
Regarding speed:
1 You can benefit from keys by doing something like:
setkey(data, menuitem)
data <- data[!"coffee"]
which will be faster than data <- data[ menuitem != 'coffee']
. However to apply the same filters you asked in the question you'll need a rolling join (I've finished my lunch break I can add something later :-)).
2 Even without key data.table is much faster for relatively big table (similar speed for handful amount of rows)
dt<-data.table(id=sample(letters,1000000,T),var=rnorm(1000000))
df<-data.frame(id=sample(letters,1000000,T),var=rnorm(1000000))
library(microbenchmark)
> microbenchmark(dt[ id == "a"], df[ df$id == "a",])
Unit: milliseconds
expr min lq median uq max neval
dt[id == "a"] 24.42193 25.74296 26.00996 26.35778 27.36355 100
df[df$id == "a", ] 138.17500 146.46729 147.38646 149.06766 154.10051 100
remove rows by reference to column values in data.table r
I assume you're learning data.table. Thus a data.table way is
setkey(sample, STUDY_LOCATION)
sample[!c('Malaysia', 'South Africa', 'Singapore')]
Remove rows in data.table according to another data.table
Use an anti-join:
dtA[!dtB, on=.(date, company, value)]
This matches all records in dtA
that are not found in dtB
using the columns in on
.
How to delete a row from a Datatable
You get "Cannot read property 'row' of undefined" because table
not is initialised. The Api is simply not yet passed back to the variable reference at the time createdRow
is executed.
But inside DT's callbacks you always have this
, which actually is the dataTable (jQuery) instance of the DataTable API. So
createdRow: function ( row, data, index ) {
if ( data[1] == data[2] && data[2] == data[3] ) {
$(row).addClass('table-success')
this.api().row(row).remove() //<-----
}
}
How do you delete a column by name in data.table?
Any of the following will remove column foo
from the data.table df3
:
# Method 1 (and preferred as it takes 0.00s even on a 20GB data.table)
df3[,foo:=NULL]
df3[, c("foo","bar"):=NULL] # remove two columns
myVar = "foo"
df3[, (myVar):=NULL] # lookup myVar contents
# Method 2a -- A safe idiom for excluding (possibly multiple)
# columns matching a regex
df3[, grep("^foo$", colnames(df3)):=NULL]
# Method 2b -- An alternative to 2a, also "safe" in the sense described below
df3[, which(grepl("^foo$", colnames(df3))):=NULL]
data.table also supports the following syntax:
## Method 3 (could then assign to df3,
df3[, !"foo"]
though if you were actually wanting to remove column "foo"
from df3
(as opposed to just printing a view of df3
minus column "foo"
) you'd really want to use Method 1 instead.
(Do note that if you use a method relying on grep()
or grepl()
, you need to set pattern="^foo$"
rather than "foo"
, if you don't want columns with names like "fool"
and "buffoon"
(i.e. those containing foo
as a substring) to also be matched and removed.)
Less safe options, fine for interactive use:
The next two idioms will also work -- if df3
contains a column matching "foo"
-- but will fail in a probably-unexpected way if it does not. If, for instance, you use any of them to search for the non-existent column "bar"
, you'll end up with a zero-row data.table.
As a consequence, they are really best suited for interactive use where one might, e.g., want to display a data.table minus any columns with names containing the substring "foo"
. For programming purposes (or if you are wanting to actually remove the column(s) from df3
rather than from a copy of it), Methods 1, 2a, and 2b are really the best options.
# Method 4:
df3[, .SD, .SDcols = !patterns("^foo$")]
Lastly there are approaches using with=FALSE
, though data.table
is gradually moving away from using this argument so it's now discouraged where you can avoid it; showing here so you know the option exists in case you really do need it:
# Method 5a (like Method 3)
df3[, !"foo", with=FALSE]
# Method 5b (like Method 4)
df3[, !grep("^foo$", names(df3)), with=FALSE]
# Method 5b (another like Method 4)
df3[, !grepl("^foo$", names(df3)), with=FALSE]
Is there a way to specify the current row in R data.table
Using match
:
df2[, parent_value := value[match(parent_ID, ID)]]
ID parent_ID value parent_value
1: 1 0 S1 <NA>
2: 4 3 S4 S3
3: 5 3 S5 S3
4: 3 1 S3 S1
Related Topics
How to Remove Rows With Any Zero Value
Remove Specific Characters from Column Names in R
Add X and Y Axis to All Facet_Wrap
Replace Column Values With Na Based on a Different Column or Row Position With Tidyverse
Calculate Max Value Across Multiple Columns by Multiple Groups
How to Dplyr Rename a Column, by Column Index
Delete Rows Containing Specific Strings in R
Creating a New Column Based on Unique Id With Values in R
How to Remove Na from a Factor Variable (And from a Ggplot Chart)
R - Test If a String Vector Contains Any Element of Another List
Rstudio Suddenly Stopped Showing Plots in the Plot Pane
To Find Most Frequently Occuring Element in Matrix in R
How to Give Subtitles for Subplot in Plot_Ly Using R
R: How to Check If All Columns in a Data.Frame Are the Same
Use First Row Data as Column Names in R
How to Add Row and Column to a Dataframe of Different Length
Too Much White Space Between Caption and Figure Produced by Tikzdevice and Ggplot2 in Latex