Using filter_ in dplyr where both field and value are in variables
You can try with interp
from lazyeval
library(lazyeval)
library(dplyr)
df %>%
filter_(interp(~v==sval, v=as.name(fld)))
# V Unhappy
#1 1 Y
#2 5 Y
#3 3 Y
For multiple key/value pairs, I found this to be working but I think a better way should be there.
df1 %>%
filter_(interp(~v==sval1[1] & y ==sval1[2],
.values=list(v=as.name(fld1[1]), y= as.name(fld1[2]))))
# V Unhappy Col2
#1 1 Y B
#2 5 Y B
For these cases, I find the base R
option to be easier. For example, if we are trying to filter
the rows based on the 'key' variables in 'fld1' with corresponding values in 'sval1', one option is using Map
. We subset the dataset (df1[fld1]
) and apply the FUN (==
) to each column of df1[f1d1]
with corresponding value in 'sval1' and use the &
with Reduce
to get a logical vector that can be used to filter
the rows of 'df1'.
df1[Reduce(`&`, Map(`==`, df1[fld1],sval1)),]
# V Unhappy Col2
# 2 1 Y B
#3 5 Y B
data
df1 <- cbind(df, Col2= c("A", "B", "B", "C", "A"))
fld1 <- c(fld, 'Col2')
sval1 <- c(sval, 'B')
Using filter in dplyr (version 1.0.0) where both field and value are in variables
1) Firstly note that
search_value <- c("setosa", 5.0)
will coerce 5.0 to character so this is problematic. Instead create a data frame and then use inner_join
.
(If the search_value were all strings and not a mix of strings and numerics then in terms of the variables shown in the question we could use the commented out line or we could just use search_cols in place of names(searchDF) below and use search_value in place of searchDF elsewhere.)
# searchDF <- as.data.frame(setNames(as.list(search_value), search_col))
searchDF <- data.frame(Species = "setosa", Sepal.Length = 5.0)
inner_join(iris, searchDF, by = names(searchDF))
giving:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5 3.6 1.4 0.2 setosa
2 5 3.4 1.5 0.2 setosa
3 5 3.0 1.6 0.2 setosa
4 5 3.4 1.6 0.4 setosa
5 5 3.2 1.2 0.2 setosa
6 5 3.5 1.3 0.3 setosa
7 5 3.5 1.6 0.6 setosa
8 5 3.3 1.4 0.2 setosa
2) If you must use filter
then use cur_data()
to refer to the data. The scalar variable can be used directly.
filter(iris, cur_data()[, names(searchDF)[1]] == searchDF[[1]] &
cur_data()[, names(searchDF)[2]] == searchDF[[2]])
3) For an arbitrary number of conditions use reduce
(from purrr) or Reduce
(from base R).
library(dplyr)
library(purrr)
myfilter <- function(x, nm) filter(x, cur_data()[, nm] == searchDF[[nm]])
iris %>% reduce(names(searchDF), myfilter, .init = .)
Filter multiple values on a string column in dplyr
You need %in%
instead of ==
:
library(dplyr)
target <- c("Tom", "Lynn")
filter(dat, name %in% target) # equivalently, dat %>% filter(name %in% target)
Produces
days name
1 88 Lynn
2 11 Tom
3 1 Tom
4 222 Lynn
5 2 Lynn
To understand why, consider what happens here:
dat$name == target
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
Basically, we're recycling the two length target
vector four times to match the length of dat$name
. In other words, we are doing:
Lynn == Tom
Tom == Lynn
Chris == Tom
Lisa == Lynn
... continue repeating Tom and Lynn until end of data frame
In this case we don't get an error because I suspect your data frame actually has a different number of rows that don't allow recycling, but the sample you provide does (8 rows). If the sample had had an odd number of rows I would have gotten the same error as you. But even when recycling works, this is clearly not what you want. Basically, the statement dat$name == target
is equivalent to saying:
return
TRUE
for every odd value that is equal to "Tom" or every even value that is equal to "Lynn".
It so happens that the last value in your sample data frame is even and equal to "Lynn", hence the one TRUE
above.
To contrast, dat$name %in% target
says:
for each value in
dat$name
, check that it exists intarget
.
Very different. Here is the result:
[1] TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE
Note your problem has nothing to do with dplyr
, just the mis-use of ==
.
Using variables in `dplyr` filter
Perhaps, we don't need a loop, use the filter_at
.
If we need to filter
rows having either of the 'Con' values are 1, then we use the any_vars
to quote that a predicate expression should be applied to the variables mentioned in the .predicate
(here we use the index. If we need the string names, then wrap it with vars(matches("Con"))
database %>%
filter_at(2:3, any_vars(.==1))
Suppose, if we need to have 1 for both the columns, use the all_vars
database %>%
filter_at(2:3, all_vars(.==1))
For multiple datasets, initiate a list
and store the output from each iteration inside it
tmp <- setNames(vector("list", length(varibles)), varibles)
for(i in seq_along(varibles)){
tmp[[i]] <- database %>%
filter_at(vars(varibles[i]), all_vars(. == 1))
}
Or with sym
from rlang
tmp <- setNames(vector("list", length(varibles)), varibles)
for(i in seq_along(varibles)){
tmp[[i]] <- database %>%
filter(UQ(rlang::sym(varibles[i])) == 1)
}
tmp
#$Con1
# ID Con1 Con2
#1 1 1 1
#2 2 1 0
#3 4 1 0
#4 8 1 0
#5 10 1 0
#$Con2
# ID Con1 Con2
#1 1 1 1
The above approaches were doing using R 3.4.1
and dplyr_0.7.2
. As the OP mentioned some difficulties in updating the R to a new version, we tried the get
approach using R 3.1.3
and dplyr_0.4.3
tmp <- setNames(vector("list", length(varibles)), varibles)
for(i in seq_along(varibles)){
tmp[[i]] <- database %>%
filter(get(varibles[i], envir = as.environment(.))==1)
}
tmp
#$Con1
# ID Con1 Con2
#1 1 1 1
#2 2 1 0
#3 4 1 0
#4 8 1 0
#5 10 1 0
#$Con2
# ID Con1 Con2
#1 1 1 1
How to use dplyr::filter with input values in a reactive environment?
The issue lies in your filter()
. You have to pass it as a symbol
. Try this,
server <- function(input, output, session) {
output$classtype <- renderUI({
if (input$class != "NONE") {
checkboxGroupInput("classtype", label = "Class Type", choices = sapply(classlist(), unique))
}
})
classlist <- reactive({
data1 %>%
select(input$class)
})
output$table <- renderTable({
data1 %>%
filter(!!sym(input$class) %in% input$classtype)
})
}
This is equivalent of the following non-shiny
problem,
mtcars %>% filter("cyl" == 6)
Which will return an empty data.frame
. However, if you modify it accordingly, it will give you what you are looking for,
mtcars %>% filter(!!sym("cyl") == 6)
dplyr filter with condition on multiple columns
A possible dplyr
(0.5.0.9004 <= version < 1.0) solution is:
# > packageVersion('dplyr')
# [1] ‘0.5.0.9004’
dataset %>%
filter(!is.na(father), !is.na(mother)) %>%
filter_at(vars(-father, -mother), all_vars(is.na(.)))
Explanation:
vars(-father, -mother)
: select all columns exceptfather
andmother
.all_vars(is.na(.))
: keep rows whereis.na
isTRUE
for all the selected columns.
note: any_vars
should be used instead of all_vars
if rows where is.na
is TRUE
for any column are to be kept.
Update (2020-11-28)
As the _at
functions and vars
have been superseded by the use of across
since dplyr 1.0, the following way (or similar) is recommended now:
dataset %>%
filter(across(c(father, mother), ~ !is.na(.x))) %>%
filter(across(c(-father, -mother), is.na))
See more example of across
and how to rewrite previous code with the new approach here: Colomn-wise operatons or type vignette("colwise")
in R after installing the latest version of dplyr
.
How to use a variable in dplyr::filter?
You could use the get
function to fetch the value of the variable from the environment.
df %>% filter(b == get("b")) # Note the "" around b
Selecting and filtering on the same variables in dplyr
You could do this:
library(dplyr)
set.seed(2)
a_dem <- runif(100,0,100)
b_dem <- runif(100,0,100)
c_blah <- runif(100,0,100)
dat <- data.frame(a_dem, b_dem, c_blah)
newdat1 <- dat %>%
select(ends_with("_dem"))
filtered <- sapply(newdat1, function(x) ifelse(x>50, x, NA))
>head(filtered)
a_dem b_dem
[1,] NA NA
[2,] 70.23740 NA
[3,] 57.33263 98.06000
[4,] NA 82.89221
[5,] 94.38393 NA
[6,] 94.34750 59.59169
And then depending on what you want to do next you could easily just exclude the NA
values.
Update:
To do this completely in dplyr
you can use the method that was linked to here by @sgp667
newdat2 <- dat %>%
select(ends_with("_dem")) %>%
mutate_each(funs(((function(x){ifelse(x>50, x, NA)})(.))))
> head(newdat2)
a_dem b_dem
1 NA NA
2 70.23740 NA
3 57.33263 98.06000
4 NA 82.89221
5 94.38393 NA
6 94.34750 59.59169
column names as variables in dplyr: select v filter
Two potential solutions
- Using get
colName = "NAME"
filter(data, get({{colName}}) == colName)
- Using rlang::sym
colName = "NAME"
colSym <- rlang::sym(colName)
filter(data, !!colSym == colName)
Not sure which is best.
Related Topics
How to Request an Early Exit When Knitting an Rmd Document
Time-Series - Data Splitting and Model Evaluation
Is There a Technical Difference Between "=" and "<-"
Sort Matrix According to First Column in R
Filtering Observations in Dplyr in Combination with Grepl
Shared Memory in Parallel Foreach in R
Transparent Equivalent of Given Color
How to Count How Many Values Per Level in a Given Factor
Automatic Documentation of Datasets
How to Highlight Time Ranges on a Plot
How to Write from R to the Clipboard on a MAC
Regression Tables in Markdown Format (For Flexible Use in R Markdown V2)
Factor Order Within Faceted Dotplot Using Ggplot2