How to store filter expressions as strings?
filter_
You can pass your filter expression using filter_
in dplyr
:
mtcars %>%
filter_("cyl == 4")
Handling strings
Let's say that you want to take this further and handle strings, you could use ''
for your string in the filter:
data.frame(col_A = LETTERS[1:10],
col_B = 1:10,
stringsAsFactors = FALSE) %>%
filter_("col_A == 'A'")
Handling "
If you really want to pass your string as "
, you have to escape quotes:
data.frame(col_A = LETTERS[1:10],
col_B = 1:10,
stringsAsFactors = FALSE) %>%
filter_("col_A == \"A\"")
Better approach
I would suggest that you avoid the approach above. Have a look at the suggestion below that let's you pass your column name using sym
function. In dplyr
pipeline you could make use of rlang
that would give you more flexibility in building your filter expressions:
require(dplyr)
require(rlang)
col_nme <- sym("cyl")
flt_val <- 4
mtcars %>%
filter(UQ(col_nme) == UQ(flt_val))
This is equivalent to:
mtcars %>%
filter(UQ(col_nme) == flt_val)
As you don't have to unquote second argument.
Side points
The syntax of your filter is:
rlb == "1" | rlb == "2" | rlb== "3" | rlb == "G" | rlb == "R" |
This would be equivalent to:
rlb %in% c("1", "2", "3" , "G" , "R")
the vector c("1", "2", "3", "G", "R")
could be easily passed as a variable, without any addittional effort involving quosures or non-standard evaluation. I would start from simplifying filters then use simplified expressions via rlang
features.
Code sharing
Following the comment on code sharing, it may be good to look at the sqldf
package:
require(sqldf)
sqldf(x = "SELECT * FROM mtcars WHERE CYL = 4")
This is would let you share your filters in SQL, which is usually more familiar then dplyr
syntax.
Store client/server string filtering directives for realtime use
I would suggest using the Factory pattern and storing a json object in the database that contains instructions for constructing a filter. The json object would be retrieved from the DB and passed to the Factory which would create the correct kind of Filter object.
Suppose you had two filters one that uses a regular expression and another that filters for a value range. You could then create each filter as follows:
class Filter{
public function filter($results){
$filteredResults = array();
foreach($results as $item){
if($this->isMatch($item)){
$filteredResults[]=$item;
}
}
return $filteredResults
}
public function isMatch($row){
return true;
}
}
class RegexFilter extends Filter{
public function __construct($regexString){
$this->regex = $regexString;
}
public function isMatch($item){
return preg_match($this->regex,$item) ==1;
}
}
class RangeFilter extends Filter{
public function __construct($min,$max){
$this->max = $max;
$this->min = $min;
}
public function isMatch($item){
return $this->min < $item && $item <$this->max;
}
}
class FilterFactory{
public function createFilter($json){
$filterData = json_decode($json);
if($filterData->filterType == 'regex'){
return new RegexFilter($filterData->regexStr);
}
elseif($filterData->filterType == 'range'){
return new RangeFilter($filterData->min,$filterData->max);
}
}
}
// Example use
// match only results that start with http
$regexJsonStr = '{
"filterType":"regex",
"regexStr":"/^http/"
}';
$filter = FilterFactory::createFilter($regexJson);
$filteredResults = $filter->filter($results);
// Match results that values between 0 and 100 exclusive
$regexJsonStr = '{
"filterType":"range",
"min":0,
"max":100
}';
$filter = FilterFactory::createFilter($rangeJson);
$filteredResults = $filter->filter($results);
This approach allows you to use best practices for maintaining your codebase while still storing filtering instructions in your database.
Child classes use Filter's filter
method but their own isMatch
method, making adding new filters easier. You simply A) inherit from the Filter class and implement a new isMatch method; B) add a new clause to FilterFactory->createFilter
to create the filter correctly, and C) add the Json describing the filter to the database.
Everywhere else you can use the exact same logic to filter the results:
$filter=FilterFactory::createFilter($jsonFromDatabase);
$filteredResults = $filter->filter($resultsFromApi);
String based filtering in dplyr - NSE
What if I want to dynamically change the operator of the filtering too
You can do it with tidy eval by unquoting a symbol representing the operator (note that I use expr()
to illustrate the result of the unquoting):
lhs <- "foo"
# Storing the symbol `<` in `op`
op <- quote(`<`)
expr(`!!`(op)(!!sym(lhs), 5))
#> foo < 5
However it is cleaner to do it outside tidy eval with regular R code. Unquoting is only necessary when the symbol you unquote represents a column from the data frame, i.e. something that's not in the context. Here you can just store the operator in a variable and then call that variable in your filtering expression:
# Storing the function `<` in `op`
op <- `<`
expr(op(!!sym(lhs), 5))
#> op(foo, 5)
what if I want to apply multiple filters?
You save the expressions in a list and then you splice them in a call with !!!
:
filters <- list(
quote(x >= 5),
quote(y <= 10),
quote(z >= 2)
)
expr(df %>% filter(!!!filters))
#> df %>% filter(x >= 5, y <= 10, z >= 2)`
Note: I said above that it is not necessary to unquote variable from the context, but it is still often a good idea to do so if you're writing a function that has the data frame as input. Since the data frame is variable, you don't know in advance what columns it contains. The columns will always have precedence over the objects you have defined in the environment. In the case here, this is not an issue because we are talking about a function and R will keep looking for a function if it finds a similarly named object in the data frame.
Apply a vector of filters based on a string (or vector of strings) in dplyr
Learning from @Ronak Shah's answer, apparently, in dplyr I can use multiple conditions with a single &
in filter instead of a comma. I don't understand this at all---it is not the same thing as an and logical:
> df %>% filter(A<3 & B<5)
A B
1 1 1
2 2 2
> df %>% filter(A<3 && B<5)
A B
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
Nevertheless, the following does work:
> df %>% filter(eval(str2expression("A<3 & B<5")))
A B
1 1 1
2 2 2
> df %>% filter(eval(str2expression("A<6 & B<5")))
A B
1 1 1
2 2 2
3 3 3
4 4 4
Expression.Like alternative for filtering based on text in c#
I was able to get this working as below:
private static readonly MethodInfo StringContainsIgnoreCase =typeof(DbFunctionsExtensions).GetMethod(nameof(DbFunctionsExtensions.Like), new[] { Expression.Property(null, typeof(EF).GetProperty(nameof(EF.Functions)) ??throw new InvalidOperationException()).Type, typeof(string), typeof(string)
});
and then use below to make Expression:
var equals = nonNullValues.Select(value =>
Expression.Call(null, StringContainsIgnoreCase, Expression.Property(null, typeof(EF), nameof(EF.Functions)), left, Expression.Constant($"%{value}%", typeof(string)))).ToList();
contExpressions.AddRange(equals);
This works like charm when you need to have a Expression to perform a string case insensitive contains work.
Related Topics
R Table Function - How to Remove 0 Counts
R Shiny: Multiple Use in UI of Same Renderui in Server
How to Split a Vector by Delimiter
Geom_Bar + Geom_Line: with Different Y-Axis Scale
How to Read Column Names 'As Is' from CSV File
Recode Multiple Columns Using Dplyr
Help Understand the Error in a Function I Defined in R
Variable Results with Dplyr Summarise, Depending on Output Variable Naming
Regex to Remove All Non-Digit Symbols from String in R
Error in Install.Packages:Type =="Both" Cannot Be Used with 'Repos =Null'
Sum Columns by Group (Row Names) in a Matrix
Cant Create File Name with Time Stamp
Cannot Install Library(Xlsx) in R and Look for an Alternative
Calculating the Distance Between Points in Different Data Frames
Why Does 1..99,999 == "1".."99,999" in R, But 100,000 != "100,000"
Caret Error: "All the Accuracy Metric Values Are Missing"
Object Not Found Error with Ggplot2 When Adding Shape Aesthetic