How to filter cases in a data.table by multiple conditions defined in another data.table
setkey(dt1, A)
dt1[dt_filter, allow = T][B != i.B, !'i.B']
# A B C
#1: 1 1 1
#2: 1 1 2
#3: 1 3 1
#4: 1 9 2
#5: 2 1 1
#6: 2 1 2
#7: 2 4 1
#8: 2 5 2
Filtering a data.table using two variables, an elegant fast way
In this case, you can do
data[filter, on=names(filter), nomatch=0]
See
Perform a semi-join with data.table for similar filtering joins.
Datatable select with multiple conditions
Yes, the DataTable.Select
method supports boolean operators in the same way that you would use them in a "real" SQL statement:
DataRow[] results = table.Select("A = 'foo' AND B = 'bar' AND C = 'baz'");
See DataColumn.Expression in MSDN for the syntax supported by DataTable's Select
method.
How to make multiple conditions inside filter datatable jQuery
According your code JSFiddle, all your code are working fine. Multiple Conditions does not work because your logic are wrong.
Let me explain.
Active and inactive condition is working with current table's data.
First, table's data have all data (active and inactive). You can
filter like "Office -> Regional Director", you can found 3
record. And then you can click Active and inactive condition event.
It will work correctly.Second, you hide some inactive data from
table, it have only active data in your table.And then, you filter
like "Office -> Regional Director". You can only found 2 record,
cannot show inactive record because current table's data doesn't have
inactive data.
Solution:
- you should adjust your code when call filter function, your must reload all table data first and then your can call filter process with all data.
- when finish filter data, you must check active or inactive conditions.
Example Code, Based on your provided code. I hope you will understand.
let tableId = "example";
$('.filtering-system').hide();
$('.button-container').hide();
addSpliting('');
function showAllVaues() {
$('#' + tableId + '').dataTable().api().pagae.len(-1).draw();
}
function addSpliting(val, length) {
if (val != '') {
//start Reload- Before Filter event, need to reload all data
var table = $('#' + tableId + '').DataTable();
$.fn.dataTable.ext.search.pop();
table.draw();
//end Reload
$('#' + tableId + '').DataTable({
destroy: true,
searchPanes: {
layout: 'columns-' + length + ''
},
columnDefs: [{
searchPanes: {
show: true
},
targets: '_all'
}],
dom: 'Pfrtip'
});
$('.dtsp-searchPanes').children().each(function (i, obj) {
if (!val.includes(i)) $(this).hide();
else $(this).show();
});
//start check - When get filter data, need to check active or not
if ($(".fa").hasClass("fa-eye")) {
drawTable(table, '.fa-inactive, .fa-invisible-data', 0);
}
//end check
} else {
$('#' + tableId + '').DataTable({
destroy: true
});
}
}
function setFilters() {
$("table thead tr th").each(function (index) {
var boxes = `<label>
<input type="checkbox" class="custom-checkbox" id="${index}"/>
${$(this).text()}
</label>`;
if ($(this).text() != "" && $(this).hasClass('checkbox-mark') === false) $(".checkBoxes").append(boxes);
});
}
setFilters();
$("#createFilter").on("click", function () {
var columFilters = [];
$('.custom-checkbox:checked').each(function () {
columFilters.push(parseInt($(this).attr('id')));
});
addSpliting(columFilters, columFilters.length);
});
$("#fil-sys").on("click", function () {
$('.filtering-system').slideToggle('slow');
});
function drawTable(table, className, lenth) {
$.fn.dataTable.ext.search.push(
function (settings, data, dataIndex) {
return $(table.row(dataIndex).node()).find(className).length == lenth;
}
);
table.draw();
}
$(document).ready(function () {
var table = $('#' + tableId + '').DataTable();
if ($(".fa").hasClass("fa-eye")) {
drawTable(table, '.fa-inactive, .fa-invisible-data', 0);
}
$(".fa-eye-slash").click(function () {
$(".fa-visibles").removeClass('hide');
$(".fa-invisibles").addClass('hide');
$.fn.dataTable.ext.search.pop();
table.draw();
});
$(".fa-eye").click(function () {
$(".fa-visibles").addClass('hide');
$(".fa-invisibles").removeClass('hide');
drawTable(table, '.fa-inactive, .fa-invisible-data', 0);
});
});
Select rows in a data.table given by a filter in an other data.table
New solution
I realised that going through each line of the bigger data.table takes to much time, so I build a new function foo_new
which works the other way around:
foo_new <- function(data, i.A, i.C){
data[C %in% i.C & A %between% i.A, INDEX2]
}
Instead of machting every row of DT2 with a row of DT1, I select every row in DT2 which matches the values of one row of DT1.
The ordering of DT2 is done because I need the row with the highest TARGET
value. Also if a row in DT2 was already selected, it is removed for the next iteration.
The whole process is speeded up a lot:
function user system elapsed
foo 61.511 0.327 62.052
foo_new 0.045 0.003 0.047
This is probaly only the case, when DT1 is smaller than DT2 - which is my case.
Here my whole simulation code:
rm(list = ls())
library(data.table)
DT1 <- data.table(INDEX1 = 1:12,
minA = c(1,1,1,2,2,2,3,3,3,4,4,4),
maxA = c(4,5,6),
C = c("Mon,Tue", "Mon,Wed", "Tue,Thu", "Wed,Thu"),
TARGET = c(101:112))
size <- 20000
DT2 <- data.table(A = rep(c(3,4), size),
C = rep(c("Mon", "Thu"), size),
INDEX2 = 1:(2*size))
foo <- function(i.A, i.C){
DT1[INDEX1 %in% grep(i.C, C) &
minA <= i.A &
maxA >= i.A,
][TARGET == max(TARGET),]
}
foo_new <- function(data, i.A, i.C){
data[C %in% i.C & A %between% i.A, INDEX2]
}
# with foo
DT2[, foo(i.A = A, i.C = C), by = INDEX2])
# with foo_new
DT1.ordered <- copy(DT1[order(TARGET, decreasing = TRUE)])
tmp.index <- list()
DT2[, TARGET := as.numeric(NA)]
for (i in c(1:dim(DT1.ordered)[1])) {
# i <- 1
restdata <- copy(DT2[is.na(TARGET),])
tmp.index <- foo_new(data = restdata,
i.A = unlist(DT1.ordered[i, list(minA, maxA)]),
i.C = DT1.ordered[i, strsplit(C, ",")[[1]]])
DT2[INDEX2 %in% tmp.index, TARGET := DT1.ordered[i, TARGET]]
}
Performance benefits of chaining over ANDing when filtering a data table
Mostly, the answer was given in the comments aleady: the "chaining method" for data.table
is faster in this case than the "anding method" as the chaining runs the conditions one after another. As each step reduces the size of the data.table
there is less to evaluate for the next one. "Anding" evaluates the conditions for the full size data each time.
We can demonstrate this with an example: when the individual steps do NOT decrease the size of the data.table
(i.e. the conditions to check are the same for both appraoches):
chain_filter <- function(){
dt[a %between% c(1, 1000) # runs evaluation but does not filter out cases
][b %between% c(1, 1000)
][c %between% c(750, 760)]
}
# Anding method
and_filter <- function(){
dt[a %between% c(1, 1000) & b %between% c(1, 1000) & c %between% c(750, 760)]
}
Using the same data but the bench
package, which automatically checks if results are identical:
res <- bench::mark(
chain = chain_filter(),
and = and_filter()
)
summary(res)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 chain 299ms 307ms 3.26 691MB 9.78
#> 2 and 123ms 142ms 7.18 231MB 5.39
summary(res, relative = TRUE)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 chain 2.43 2.16 1 2.99 1.82
#> 2 and 1 1 2.20 1 1
As you can see here the anding approach is 2.43 times faster in this case. That means chaining actually adds some overhead, suggesting that usually anding should be quicker. EXCEPT if conditions are reducing the size of the data.table
step by step. Theoretically, the chaining approach could even be slower (even leaving the overhead aside), namely if a condition would increase the size of the data. But practically I think that's not possible since recycling of logical vectors is not allowed in data.table
. I think this answers your bonus question.
For comparison, original functions on my machine with bench
:
res <- bench::mark(
chain = chain_filter_original(),
and = and_filter_original()
)
summary(res)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 chain 29.6ms 30.2ms 28.5 79.5MB 7.60
#> 2 and 125.5ms 136.7ms 7.32 228.9MB 7.32
summary(res, relative = TRUE)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 chain 1 1 3.89 1 1.04
#> 2 and 4.25 4.52 1 2.88 1
FIlter table with multiple conditions with shiny
You are using ==
, which is intended for specific equality. That is:
1 == 1
# [1] TRUE
1 == 2
# [1] FALSE
But when you start doing multiples on one side or the other:
1 == c(1,2)
# [1] TRUE FALSE
In this case, it says that "1 is equal to 1, 1 is not equal to 2", which is not quite what you want.
where the one TRUE
is recycled over all rows, returning everything.
Let's try %in%
:
1 %in% 1
# [1] TRUE
1 %in% c(1,2)
# [1] TRUE
c(1,3,4) %in% c(1,2)
# [1] TRUE FALSE FALSE
This has the desired properties of set-membership and returning a logical
vector of the same length as the left-hand side of the operator.
Applying to your use, just change the one reactive component to:
output$Generation <- renderDataTable({
pokemon[pokemon$Generation %in% input$Generation,-11]
})
How to subset data.table by external function with arbitrary conditions
One possible solution (Note the of ==
and not =
, as in the post):
foo = function(dt, ...) {
eval(substitute(dt[Reduce(`&`, list(...)),]))
}
foo(dt,Var2==1,Var3==1)
Var1 Var2 Var3
<int> <int> <int>
1: 1 1 1
2: 2 1 1
The best way to search in DataTable on multiple conditions in C#?
One possibility would be to use the DataTable class build in filtering. You can define a dynamic filter and apply it to the DataTable object. The dynamic filter language is something like a subset of SQL, it has LIKE and other SQL keywords. An example of filtering code:
var dt = new DataTable("test");
dt.Columns.Add("A", typeof(string));
dt.Columns.Add("B", typeof(string));
dt.Rows.Add(new object[] { "a", "1" });
dt.Rows.Add(new object[] { "a", "2" });
var rows = dt.Select("B = '2'");
This way you can define the filter and apply it to both tables and compare only the result set and not every entry. The result is an array of Rows.
I used it in a project, that has DataTable objects containing more than 2K entries each and the performance is really good.
Another possibility would be to use LINQ to filter the data. You can query the DataTable's rows like this:
var rows = (from DataRow dr in dt.Rows
where dr["B"] == "2"
select dr).ToList();
This query returns the same result as the direct filtering. You can apply again the same approach here to check the mathching result only.
If i understood your question correctly, a possible solution to your problem could look like this:
// test DataTable objects for the example
var dt1 = new DataTable("Table 1");
dt1.Columns.Add("title", typeof(string));
dt1.Columns.Add("number", typeof(int));
dt1.Columns.Add("subnum1", typeof(int));
dt1.Columns.Add("subnum2", typeof(int));
dt1.Rows.Add(new object[] { "a", 1111, 1, 1 }); // Exact match!
dt1.Rows.Add(new object[] { "b", 2222, 1, 1 }); // Only NUMBER match
dt1.Rows.Add(new object[] { "b", 2222, 2, 2 }); // Only NUMBER match
dt1.Rows.Add(new object[] { "d", 3333, 1, 1 }); // Exact match!
dt1.Rows.Add(new object[] { "d", 3333, 1, 2 });
dt1.Rows.Add(new object[] { "d", 3333, 2, 1 });
var dt2 = new DataTable("Table 2");
dt2.Columns.Add("number", typeof(int));
dt2.Columns.Add("subnum1", typeof(int));
dt2.Columns.Add("subnum2", typeof(int));
dt2.Rows.Add(new object[] { 1111, 1, 1 }); // Exact match!
dt2.Rows.Add(new object[] { 2222, 0, 5 }); // Only NUMBER match
dt2.Rows.Add(new object[] { 3333, 1, 1 }); // Exact match!
dt2.Rows.Add(new object[] { 3333, 0, 0 }); // Only NUMBER match
foreach (DataRow row in dt1.Rows)
{
var matches = dt2.Select(string.Format("number = {0} and subnum1 = {1} and subnum2 = {2}", row["number"], row["subnum1"], row["subnum2"]));
if (matches.Count() > 0)
{
Console.WriteLine(row["title"]);
}
else
{
var fallback = dt2.Select(string.Format("number = {0}", row["number"]));
if (fallback.Count() > 0)
{
Console.WriteLine(" > " + row["title"]);
}
}
}
The output in this case is:
a
> b
> b
d
> d
> d
What values shoule be written to the output is up to you - at the point where the match is found you have all that you need.
Related Topics
How to Find All Possible Subsets of a Set Iteratively in R
Add Points to Usmap with Ggplot in R
Ggplot Legend Showing Transparency and Fill Color
Creating an Equal Distance Spatial Grid in R
How to Set Contrasts for My Variable in Regression Analysis with R
R: How to Get Row and Column Names of The True Elements of a Matrix
How to Subscript The X Axis Tick Label
Data Table String Concatenation of Sd Columns for by Group Values
How to Install The Fftw3 Package of R in Ubuntu 12.04
How to Set R to Default Options
Arrange Within a Group with Dplyr
Ifelse Assignment in Data.Table
Split Character Vector into Sentences