How to Store Filter Expressions as Strings

How to store filter expressions as strings?

filter_

You can pass your filter expression using filter_ in dplyr:

mtcars %>%
filter_("cyl == 4")

Handling strings

Let's say that you want to take this further and handle strings, you could use '' for your string in the filter:

data.frame(col_A = LETTERS[1:10],
col_B = 1:10,
stringsAsFactors = FALSE) %>%
filter_("col_A == 'A'")

Handling "

If you really want to pass your string as ", you have to escape quotes:

data.frame(col_A = LETTERS[1:10],
col_B = 1:10,
stringsAsFactors = FALSE) %>%
filter_("col_A == \"A\"")

Better approach

I would suggest that you avoid the approach above. Have a look at the suggestion below that let's you pass your column name using sym function. In dplyr pipeline you could make use of rlang that would give you more flexibility in building your filter expressions:

require(dplyr)
require(rlang)
col_nme <- sym("cyl")
flt_val <- 4
mtcars %>%
filter(UQ(col_nme) == UQ(flt_val))

This is equivalent to:

mtcars %>%
filter(UQ(col_nme) == flt_val)

As you don't have to unquote second argument.

Side points

The syntax of your filter is:

rlb == "1" | rlb == "2" | rlb== "3" | rlb == "G" | rlb == "R" |

This would be equivalent to:

rlb %in% c("1", "2", "3" , "G" , "R")

the vector c("1", "2", "3", "G", "R") could be easily passed as a variable, without any addittional effort involving quosures or non-standard evaluation. I would start from simplifying filters then use simplified expressions via rlang features.


Code sharing

Following the comment on code sharing, it may be good to look at the sqldf package:

require(sqldf)
sqldf(x = "SELECT * FROM mtcars WHERE CYL = 4")

This is would let you share your filters in SQL, which is usually more familiar then dplyr syntax.

Store client/server string filtering directives for realtime use

I would suggest using the Factory pattern and storing a json object in the database that contains instructions for constructing a filter. The json object would be retrieved from the DB and passed to the Factory which would create the correct kind of Filter object.

Suppose you had two filters one that uses a regular expression and another that filters for a value range. You could then create each filter as follows:

class Filter{
public function filter($results){
$filteredResults = array();
foreach($results as $item){
if($this->isMatch($item)){
$filteredResults[]=$item;
}
}
return $filteredResults
}
public function isMatch($row){
return true;
}
}
class RegexFilter extends Filter{
public function __construct($regexString){
$this->regex = $regexString;
}
public function isMatch($item){
return preg_match($this->regex,$item) ==1;
}
}
class RangeFilter extends Filter{
public function __construct($min,$max){
$this->max = $max;
$this->min = $min;
}
public function isMatch($item){
return $this->min < $item && $item <$this->max;
}
}
class FilterFactory{
public function createFilter($json){
$filterData = json_decode($json);
if($filterData->filterType == 'regex'){
return new RegexFilter($filterData->regexStr);
}
elseif($filterData->filterType == 'range'){
return new RangeFilter($filterData->min,$filterData->max);
}
}
}
// Example use
// match only results that start with http
$regexJsonStr = '{
"filterType":"regex",
"regexStr":"/^http/"
}';
$filter = FilterFactory::createFilter($regexJson);
$filteredResults = $filter->filter($results);

// Match results that values between 0 and 100 exclusive
$regexJsonStr = '{
"filterType":"range",
"min":0,
"max":100
}';
$filter = FilterFactory::createFilter($rangeJson);
$filteredResults = $filter->filter($results);

This approach allows you to use best practices for maintaining your codebase while still storing filtering instructions in your database.

Child classes use Filter's filter method but their own isMatch method, making adding new filters easier. You simply A) inherit from the Filter class and implement a new isMatch method; B) add a new clause to FilterFactory->createFilter to create the filter correctly, and C) add the Json describing the filter to the database.

Everywhere else you can use the exact same logic to filter the results:

$filter=FilterFactory::createFilter($jsonFromDatabase);
$filteredResults = $filter->filter($resultsFromApi);

String based filtering in dplyr - NSE

What if I want to dynamically change the operator of the filtering too

You can do it with tidy eval by unquoting a symbol representing the operator (note that I use expr() to illustrate the result of the unquoting):

lhs <- "foo"

# Storing the symbol `<` in `op`
op <- quote(`<`)

expr(`!!`(op)(!!sym(lhs), 5))
#> foo < 5

However it is cleaner to do it outside tidy eval with regular R code. Unquoting is only necessary when the symbol you unquote represents a column from the data frame, i.e. something that's not in the context. Here you can just store the operator in a variable and then call that variable in your filtering expression:

# Storing the function `<` in `op`
op <- `<`

expr(op(!!sym(lhs), 5))
#> op(foo, 5)

what if I want to apply multiple filters?

You save the expressions in a list and then you splice them in a call with !!!:

filters <- list(
quote(x >= 5),
quote(y <= 10),
quote(z >= 2)
)

expr(df %>% filter(!!!filters))
#> df %>% filter(x >= 5, y <= 10, z >= 2)`

Note: I said above that it is not necessary to unquote variable from the context, but it is still often a good idea to do so if you're writing a function that has the data frame as input. Since the data frame is variable, you don't know in advance what columns it contains. The columns will always have precedence over the objects you have defined in the environment. In the case here, this is not an issue because we are talking about a function and R will keep looking for a function if it finds a similarly named object in the data frame.

Apply a vector of filters based on a string (or vector of strings) in dplyr

Learning from @Ronak Shah's answer, apparently, in dplyr I can use multiple conditions with a single & in filter instead of a comma. I don't understand this at all---it is not the same thing as an and logical:

> df %>% filter(A<3 & B<5)
A B
1 1 1
2 2 2
> df %>% filter(A<3 && B<5)
A B
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10

Nevertheless, the following does work:

> df %>% filter(eval(str2expression("A<3 & B<5")))
A B
1 1 1
2 2 2
> df %>% filter(eval(str2expression("A<6 & B<5")))
A B
1 1 1
2 2 2
3 3 3
4 4 4

Expression.Like alternative for filtering based on text in c#

I was able to get this working as below:

private static readonly MethodInfo StringContainsIgnoreCase =typeof(DbFunctionsExtensions).GetMethod(nameof(DbFunctionsExtensions.Like), new[] { Expression.Property(null, typeof(EF).GetProperty(nameof(EF.Functions)) ??throw new InvalidOperationException()).Type, typeof(string), typeof(string)
});

and then use below to make Expression:

var equals = nonNullValues.Select(value =>
Expression.Call(null, StringContainsIgnoreCase, Expression.Property(null, typeof(EF), nameof(EF.Functions)), left, Expression.Constant($"%{value}%", typeof(string)))).ToList();
contExpressions.AddRange(equals);

This works like charm when you need to have a Expression to perform a string case insensitive contains work.



Related Topics



Leave a reply



Submit