Datatable, How to Conditionally Delete Rows

Remove rows conditionally from a data.table in R

In this scenario it is not so different than data.frame

data <- data[ menuitem != 'coffee' | amount > 0] 

Delete/add row by reference it is to be implemented. You find more info in this question

Regarding speed:

1 You can benefit from keys by doing something like:

setkey(data, menuitem)
data <- data[!"coffee"]

which will be faster than data <- data[ menuitem != 'coffee']. However to apply the same filters you asked in the question you'll need a rolling join (I've finished my lunch break I can add something later :-)).

2 Even without key data.table is much faster for relatively big table (similar speed for handful amount of rows)

dt<-data.table(id=sample(letters,1000000,T),var=rnorm(1000000))
df<-data.frame(id=sample(letters,1000000,T),var=rnorm(1000000))
library(microbenchmark)
> microbenchmark(dt[ id == "a"], df[ df$id == "a",])
Unit: milliseconds
expr min lq median uq max neval
dt[id == "a"] 24.42193 25.74296 26.00996 26.35778 27.36355 100
df[df$id == "a", ] 138.17500 146.46729 147.38646 149.06766 154.10051 100

DataTable, How to conditionally delete rows

You could query the dataset and then loop the selected rows to set them as delete.

var rows = dt.Select("col1 > 5");
foreach (var row in rows)
{ row.Delete(); }
dt.AcceptChanges();

... and you could also create some extension methods to make it easier ...

myTable.Delete("col1 > 5");

public static DataTable Delete(this DataTable table, string filter)
{
table.Select(filter).Delete();
return table;
}
public static void Delete(this IEnumerable<DataRow> rows)
{
foreach (var row in rows)
row.Delete();
}

Delete rows conditionally in data table

We may use %chin% (or %in%) with negate (!)

library(data.table)
exclude <- c("A", "C", "E")
dt[!customerID %chin% exclude]

-output

  customerID    V1     V2
<char> <int> <char>
1: B 42 GS
2: B 43 XC
3: B 46 XZ
4: D 34 XZ
5: D 19 RF
6: F 44 ZS
7: G 23 AA

== or != are elementwise operators which works best when the length of the lhs/rhs are the same or the rhs value is of length 1 (which recycles) or else the recycling will check on rows that gives undesriable results i.e. i.e. first element of 'exclude' will compare to first element of customerID, 2nd element to 2nd element,..., 1st element again to 3rd element of customerID and so on..

data

dt <- structure(list(customerID = c("A", "A", "B", "B", "B", "C", "C", 
"D", "D", "E", "E", "F", "G"), V1 = c(24L, 56L, 42L, 43L, 46L,
42L, 25L, 34L, 19L, 19L, 37L, 44L, 23L), V2 = c("RT", "ES", "GS",
"XC", "XZ", "GE", "WD", "XZ", "RF", "DW", "XS", "ZS", "AA")),
class = c("data.table",
"data.frame"), row.names = c(NA, -13L))

Remove rows from data.table that meet condition

You can do an anti join:

mDT = DT[(condition), !"condition"][, rbind(.SD, rev(.SD), use.names = FALSE)]
DT[!mDT, on=names(mDT)]

# col1 col2 condition
# 1: c c FALSE

Remove Row from DataTable Depending on Condition

Using LINQ you can create a new DataTable like:

DataTable newDataTable = dt.AsEnumerable()
.Where(r=> !ListLinkedIds.Contains(r.Field<string>("IDCOLUMN")))
.CopyToDataTable();


Related Topics



Leave a reply



Submit