How to Filter Cases in a Data.Table by Multiple Conditions Defined in Another Data.Table

How to filter cases in a data.table by multiple conditions defined in another data.table

setkey(dt1, A)

dt1[dt_filter, allow = T][B != i.B, !'i.B']
# A B C
#1: 1 1 1
#2: 1 1 2
#3: 1 3 1
#4: 1 9 2
#5: 2 1 1
#6: 2 1 2
#7: 2 4 1
#8: 2 5 2

Filtering a data.table using two variables, an elegant fast way

In this case, you can do

data[filter, on=names(filter), nomatch=0]

See
Perform a semi-join with data.table for similar filtering joins.

Datatable select with multiple conditions

Yes, the DataTable.Select method supports boolean operators in the same way that you would use them in a "real" SQL statement:

DataRow[] results = table.Select("A = 'foo' AND B = 'bar' AND C = 'baz'");

See DataColumn.Expression in MSDN for the syntax supported by DataTable's Select method.

How to make multiple conditions inside filter datatable jQuery

According your code JSFiddle, all your code are working fine. Multiple Conditions does not work because your logic are wrong.

Let me explain.

  • Active and inactive condition is working with current table's data.

  • First, table's data have all data (active and inactive). You can
    filter like "Office -> Regional Director", you can found 3
    record. And then you can click Active and inactive condition event.
    It will work correctly.

  • Second, you hide some inactive data from
    table, it have only active data in your table.And then, you filter
    like "Office -> Regional Director". You can only found 2 record,
    cannot show inactive record because current table's data doesn't have
    inactive data.

Solution:

  • you should adjust your code when call filter function, your must reload all table data first and then your can call filter process with all data.
  • when finish filter data, you must check active or inactive conditions.

Example Code, Based on your provided code. I hope you will understand.

let tableId = "example";
$('.filtering-system').hide();
$('.button-container').hide();

addSpliting('');

function showAllVaues() {
$('#' + tableId + '').dataTable().api().pagae.len(-1).draw();
}

function addSpliting(val, length) {
if (val != '') {
//start Reload- Before Filter event, need to reload all data
var table = $('#' + tableId + '').DataTable();
$.fn.dataTable.ext.search.pop();
table.draw();
//end Reload
$('#' + tableId + '').DataTable({
destroy: true,
searchPanes: {
layout: 'columns-' + length + ''
},
columnDefs: [{
searchPanes: {
show: true
},
targets: '_all'
}],

dom: 'Pfrtip'
});

$('.dtsp-searchPanes').children().each(function (i, obj) {
if (!val.includes(i)) $(this).hide();
else $(this).show();
});
//start check - When get filter data, need to check active or not
if ($(".fa").hasClass("fa-eye")) {
drawTable(table, '.fa-inactive, .fa-invisible-data', 0);
}
//end check
} else {
$('#' + tableId + '').DataTable({
destroy: true
});
}
}

function setFilters() {
$("table thead tr th").each(function (index) {
var boxes = `<label>
<input type="checkbox" class="custom-checkbox" id="${index}"/>
${$(this).text()}
</label>`;
if ($(this).text() != "" && $(this).hasClass('checkbox-mark') === false) $(".checkBoxes").append(boxes);
});
}

setFilters();

$("#createFilter").on("click", function () {
var columFilters = [];

$('.custom-checkbox:checked').each(function () {
columFilters.push(parseInt($(this).attr('id')));
});

addSpliting(columFilters, columFilters.length);
});

$("#fil-sys").on("click", function () {
$('.filtering-system').slideToggle('slow');
});

function drawTable(table, className, lenth) {
$.fn.dataTable.ext.search.push(
function (settings, data, dataIndex) {
return $(table.row(dataIndex).node()).find(className).length == lenth;
}
);
table.draw();
}

$(document).ready(function () {
var table = $('#' + tableId + '').DataTable();
if ($(".fa").hasClass("fa-eye")) {
drawTable(table, '.fa-inactive, .fa-invisible-data', 0);
}

$(".fa-eye-slash").click(function () {
$(".fa-visibles").removeClass('hide');
$(".fa-invisibles").addClass('hide');
$.fn.dataTable.ext.search.pop();
table.draw();
});

$(".fa-eye").click(function () {
$(".fa-visibles").addClass('hide');
$(".fa-invisibles").removeClass('hide');
drawTable(table, '.fa-inactive, .fa-invisible-data', 0);
});
});

Select rows in a data.table given by a filter in an other data.table

New solution

I realised that going through each line of the bigger data.table takes to much time, so I build a new function foo_new which works the other way around:

foo_new <- function(data, i.A, i.C){
data[C %in% i.C & A %between% i.A, INDEX2]
}

Instead of machting every row of DT2 with a row of DT1, I select every row in DT2 which matches the values of one row of DT1.
The ordering of DT2 is done because I need the row with the highest TARGET value. Also if a row in DT2 was already selected, it is removed for the next iteration.

The whole process is speeded up a lot:

   function   user  system elapsed 
foo 61.511 0.327 62.052
foo_new 0.045 0.003 0.047

This is probaly only the case, when DT1 is smaller than DT2 - which is my case.


Here my whole simulation code:

rm(list = ls())
library(data.table)

DT1 <- data.table(INDEX1 = 1:12,
minA = c(1,1,1,2,2,2,3,3,3,4,4,4),
maxA = c(4,5,6),
C = c("Mon,Tue", "Mon,Wed", "Tue,Thu", "Wed,Thu"),
TARGET = c(101:112))

size <- 20000
DT2 <- data.table(A = rep(c(3,4), size),
C = rep(c("Mon", "Thu"), size),
INDEX2 = 1:(2*size))

foo <- function(i.A, i.C){
DT1[INDEX1 %in% grep(i.C, C) &
minA <= i.A &
maxA >= i.A,
][TARGET == max(TARGET),]
}

foo_new <- function(data, i.A, i.C){
data[C %in% i.C & A %between% i.A, INDEX2]
}

# with foo
DT2[, foo(i.A = A, i.C = C), by = INDEX2])

# with foo_new
DT1.ordered <- copy(DT1[order(TARGET, decreasing = TRUE)])
tmp.index <- list()
DT2[, TARGET := as.numeric(NA)]
for (i in c(1:dim(DT1.ordered)[1])) {
# i <- 1
restdata <- copy(DT2[is.na(TARGET),])
tmp.index <- foo_new(data = restdata,
i.A = unlist(DT1.ordered[i, list(minA, maxA)]),
i.C = DT1.ordered[i, strsplit(C, ",")[[1]]])
DT2[INDEX2 %in% tmp.index, TARGET := DT1.ordered[i, TARGET]]
}

Performance benefits of chaining over ANDing when filtering a data table

Mostly, the answer was given in the comments aleady: the "chaining method" for data.table is faster in this case than the "anding method" as the chaining runs the conditions one after another. As each step reduces the size of the data.table there is less to evaluate for the next one. "Anding" evaluates the conditions for the full size data each time.

We can demonstrate this with an example: when the individual steps do NOT decrease the size of the data.table (i.e. the conditions to check are the same for both appraoches):

chain_filter <- function(){
dt[a %between% c(1, 1000) # runs evaluation but does not filter out cases
][b %between% c(1, 1000)
][c %between% c(750, 760)]
}

# Anding method
and_filter <- function(){
dt[a %between% c(1, 1000) & b %between% c(1, 1000) & c %between% c(750, 760)]
}

Using the same data but the bench package, which automatically checks if results are identical:

res <- bench::mark(
chain = chain_filter(),
and = and_filter()
)
summary(res)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 chain 299ms 307ms 3.26 691MB 9.78
#> 2 and 123ms 142ms 7.18 231MB 5.39
summary(res, relative = TRUE)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 chain 2.43 2.16 1 2.99 1.82
#> 2 and 1 1 2.20 1 1

As you can see here the anding approach is 2.43 times faster in this case. That means chaining actually adds some overhead, suggesting that usually anding should be quicker. EXCEPT if conditions are reducing the size of the data.table step by step. Theoretically, the chaining approach could even be slower (even leaving the overhead aside), namely if a condition would increase the size of the data. But practically I think that's not possible since recycling of logical vectors is not allowed in data.table. I think this answers your bonus question.

For comparison, original functions on my machine with bench:

res <- bench::mark(
chain = chain_filter_original(),
and = and_filter_original()
)
summary(res)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 chain 29.6ms 30.2ms 28.5 79.5MB 7.60
#> 2 and 125.5ms 136.7ms 7.32 228.9MB 7.32
summary(res, relative = TRUE)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 chain 1 1 3.89 1 1.04
#> 2 and 4.25 4.52 1 2.88 1

FIlter table with multiple conditions with shiny

You are using ==, which is intended for specific equality. That is:

1 == 1
# [1] TRUE
1 == 2
# [1] FALSE

But when you start doing multiples on one side or the other:

1 == c(1,2)
# [1] TRUE FALSE

In this case, it says that "1 is equal to 1, 1 is not equal to 2", which is not quite what you want.
where the one TRUE is recycled over all rows, returning everything.

Let's try %in%:

1 %in% 1
# [1] TRUE
1 %in% c(1,2)
# [1] TRUE
c(1,3,4) %in% c(1,2)
# [1] TRUE FALSE FALSE

This has the desired properties of set-membership and returning a logical vector of the same length as the left-hand side of the operator.

Applying to your use, just change the one reactive component to:

output$Generation <- renderDataTable({
pokemon[pokemon$Generation %in% input$Generation,-11]
})

How to subset data.table by external function with arbitrary conditions

One possible solution (Note the of == and not =, as in the post):

foo = function(dt, ...) {
eval(substitute(dt[Reduce(`&`, list(...)),]))
}

foo(dt,Var2==1,Var3==1)

Var1 Var2 Var3
<int> <int> <int>
1: 1 1 1
2: 2 1 1

The best way to search in DataTable on multiple conditions in C#?

One possibility would be to use the DataTable class build in filtering. You can define a dynamic filter and apply it to the DataTable object. The dynamic filter language is something like a subset of SQL, it has LIKE and other SQL keywords. An example of filtering code:

var dt = new DataTable("test");
dt.Columns.Add("A", typeof(string));
dt.Columns.Add("B", typeof(string));
dt.Rows.Add(new object[] { "a", "1" });
dt.Rows.Add(new object[] { "a", "2" });
var rows = dt.Select("B = '2'");

This way you can define the filter and apply it to both tables and compare only the result set and not every entry. The result is an array of Rows.

I used it in a project, that has DataTable objects containing more than 2K entries each and the performance is really good.

Another possibility would be to use LINQ to filter the data. You can query the DataTable's rows like this:

var rows = (from DataRow dr in dt.Rows
where dr["B"] == "2"
select dr).ToList();

This query returns the same result as the direct filtering. You can apply again the same approach here to check the mathching result only.


If i understood your question correctly, a possible solution to your problem could look like this:

// test DataTable objects for the example
var dt1 = new DataTable("Table 1");
dt1.Columns.Add("title", typeof(string));
dt1.Columns.Add("number", typeof(int));
dt1.Columns.Add("subnum1", typeof(int));
dt1.Columns.Add("subnum2", typeof(int));
dt1.Rows.Add(new object[] { "a", 1111, 1, 1 }); // Exact match!
dt1.Rows.Add(new object[] { "b", 2222, 1, 1 }); // Only NUMBER match
dt1.Rows.Add(new object[] { "b", 2222, 2, 2 }); // Only NUMBER match
dt1.Rows.Add(new object[] { "d", 3333, 1, 1 }); // Exact match!
dt1.Rows.Add(new object[] { "d", 3333, 1, 2 });
dt1.Rows.Add(new object[] { "d", 3333, 2, 1 });

var dt2 = new DataTable("Table 2");
dt2.Columns.Add("number", typeof(int));
dt2.Columns.Add("subnum1", typeof(int));
dt2.Columns.Add("subnum2", typeof(int));
dt2.Rows.Add(new object[] { 1111, 1, 1 }); // Exact match!
dt2.Rows.Add(new object[] { 2222, 0, 5 }); // Only NUMBER match
dt2.Rows.Add(new object[] { 3333, 1, 1 }); // Exact match!
dt2.Rows.Add(new object[] { 3333, 0, 0 }); // Only NUMBER match

foreach (DataRow row in dt1.Rows)
{
var matches = dt2.Select(string.Format("number = {0} and subnum1 = {1} and subnum2 = {2}", row["number"], row["subnum1"], row["subnum2"]));
if (matches.Count() > 0)
{
Console.WriteLine(row["title"]);
}
else
{
var fallback = dt2.Select(string.Format("number = {0}", row["number"]));
if (fallback.Count() > 0)
{
Console.WriteLine(" > " + row["title"]);
}
}
}

The output in this case is:

a
> b
> b
d
> d
> d

What values shoule be written to the output is up to you - at the point where the match is found you have all that you need.



Related Topics



Leave a reply



Submit