Using Filter_ in Dplyr Where Both Field and Value Are in Variables

Using filter_ in dplyr where both field and value are in variables

You can try with interp from lazyeval

 library(lazyeval)
library(dplyr)
df %>%
filter_(interp(~v==sval, v=as.name(fld)))
# V Unhappy
#1 1 Y
#2 5 Y
#3 3 Y

For multiple key/value pairs, I found this to be working but I think a better way should be there.

  df1 %>% 
filter_(interp(~v==sval1[1] & y ==sval1[2],
.values=list(v=as.name(fld1[1]), y= as.name(fld1[2]))))
# V Unhappy Col2
#1 1 Y B
#2 5 Y B

For these cases, I find the base R option to be easier. For example, if we are trying to filter the rows based on the 'key' variables in 'fld1' with corresponding values in 'sval1', one option is using Map. We subset the dataset (df1[fld1]) and apply the FUN (==) to each column of df1[f1d1] with corresponding value in 'sval1' and use the & with Reduce to get a logical vector that can be used to filter the rows of 'df1'.

 df1[Reduce(`&`, Map(`==`, df1[fld1],sval1)),]
# V Unhappy Col2
# 2 1 Y B
#3 5 Y B

data

df1 <- cbind(df, Col2= c("A", "B", "B", "C", "A"))
fld1 <- c(fld, 'Col2')
sval1 <- c(sval, 'B')

Using filter in dplyr (version 1.0.0) where both field and value are in variables

1) Firstly note that

search_value <- c("setosa", 5.0)

will coerce 5.0 to character so this is problematic. Instead create a data frame and then use inner_join.

(If the search_value were all strings and not a mix of strings and numerics then in terms of the variables shown in the question we could use the commented out line or we could just use search_cols in place of names(searchDF) below and use search_value in place of searchDF elsewhere.)

# searchDF <- as.data.frame(setNames(as.list(search_value), search_col))
searchDF <- data.frame(Species = "setosa", Sepal.Length = 5.0)
inner_join(iris, searchDF, by = names(searchDF))

giving:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5 3.6 1.4 0.2 setosa
2 5 3.4 1.5 0.2 setosa
3 5 3.0 1.6 0.2 setosa
4 5 3.4 1.6 0.4 setosa
5 5 3.2 1.2 0.2 setosa
6 5 3.5 1.3 0.3 setosa
7 5 3.5 1.6 0.6 setosa
8 5 3.3 1.4 0.2 setosa

2) If you must use filter then use cur_data() to refer to the data. The scalar variable can be used directly.

filter(iris, cur_data()[, names(searchDF)[1]] == searchDF[[1]] &
cur_data()[, names(searchDF)[2]] == searchDF[[2]])

3) For an arbitrary number of conditions use reduce (from purrr) or Reduce (from base R).

library(dplyr)
library(purrr)

myfilter <- function(x, nm) filter(x, cur_data()[, nm] == searchDF[[nm]])
iris %>% reduce(names(searchDF), myfilter, .init = .)

Filter multiple values on a string column in dplyr

You need %in% instead of ==:

library(dplyr)
target <- c("Tom", "Lynn")
filter(dat, name %in% target) # equivalently, dat %>% filter(name %in% target)

Produces

  days name
1 88 Lynn
2 11 Tom
3 1 Tom
4 222 Lynn
5 2 Lynn

To understand why, consider what happens here:

dat$name == target
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE

Basically, we're recycling the two length target vector four times to match the length of dat$name. In other words, we are doing:

 Lynn == Tom
Tom == Lynn
Chris == Tom
Lisa == Lynn
... continue repeating Tom and Lynn until end of data frame

In this case we don't get an error because I suspect your data frame actually has a different number of rows that don't allow recycling, but the sample you provide does (8 rows). If the sample had had an odd number of rows I would have gotten the same error as you. But even when recycling works, this is clearly not what you want. Basically, the statement dat$name == target is equivalent to saying:

return TRUE for every odd value that is equal to "Tom" or every even value that is equal to "Lynn".

It so happens that the last value in your sample data frame is even and equal to "Lynn", hence the one TRUE above.

To contrast, dat$name %in% target says:

for each value in dat$name, check that it exists in target.

Very different. Here is the result:

[1]  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE

Note your problem has nothing to do with dplyr, just the mis-use of ==.

Using variables in `dplyr` filter

Perhaps, we don't need a loop, use the filter_at.

If we need to filter rows having either of the 'Con' values are 1, then we use the any_vars to quote that a predicate expression should be applied to the variables mentioned in the .predicate (here we use the index. If we need the string names, then wrap it with vars(matches("Con"))

database %>%
filter_at(2:3, any_vars(.==1))

Suppose, if we need to have 1 for both the columns, use the all_vars

database %>%
filter_at(2:3, all_vars(.==1))

For multiple datasets, initiate a list and store the output from each iteration inside it

tmp <- setNames(vector("list", length(varibles)), varibles)
for(i in seq_along(varibles)){
tmp[[i]] <- database %>%
filter_at(vars(varibles[i]), all_vars(. == 1))
}

Or with sym from rlang

tmp <- setNames(vector("list", length(varibles)), varibles)
for(i in seq_along(varibles)){
tmp[[i]] <- database %>%
filter(UQ(rlang::sym(varibles[i])) == 1)
}

tmp
#$Con1
# ID Con1 Con2
#1 1 1 1
#2 2 1 0
#3 4 1 0
#4 8 1 0
#5 10 1 0

#$Con2
# ID Con1 Con2
#1 1 1 1

The above approaches were doing using R 3.4.1 and dplyr_0.7.2. As the OP mentioned some difficulties in updating the R to a new version, we tried the get approach using R 3.1.3 and dplyr_0.4.3

tmp <- setNames(vector("list", length(varibles)), varibles)
for(i in seq_along(varibles)){
tmp[[i]] <- database %>%
filter(get(varibles[i], envir = as.environment(.))==1)
}

tmp
#$Con1
# ID Con1 Con2
#1 1 1 1
#2 2 1 0
#3 4 1 0
#4 8 1 0
#5 10 1 0

#$Con2
# ID Con1 Con2
#1 1 1 1

How to use dplyr::filter with input values in a reactive environment?

The issue lies in your filter(). You have to pass it as a symbol. Try this,

server <- function(input, output, session) {
output$classtype <- renderUI({
if (input$class != "NONE") {
checkboxGroupInput("classtype", label = "Class Type", choices = sapply(classlist(), unique))
}
})

classlist <- reactive({
data1 %>%
select(input$class)
})

output$table <- renderTable({
data1 %>%
filter(!!sym(input$class) %in% input$classtype)
})

}

This is equivalent of the following non-shiny problem,

mtcars %>% filter("cyl" == 6)

Which will return an empty data.frame. However, if you modify it accordingly, it will give you what you are looking for,

mtcars %>% filter(!!sym("cyl") == 6)

dplyr filter with condition on multiple columns

A possible dplyr(0.5.0.9004 <= version < 1.0) solution is:

# > packageVersion('dplyr')
# [1] ‘0.5.0.9004’

dataset %>%
filter(!is.na(father), !is.na(mother)) %>%
filter_at(vars(-father, -mother), all_vars(is.na(.)))

Explanation:

  • vars(-father, -mother): select all columns except father and mother.
  • all_vars(is.na(.)): keep rows where is.na is TRUE for all the selected columns.

note: any_vars should be used instead of all_vars if rows where is.na is TRUE for any column are to be kept.


Update (2020-11-28)

As the _at functions and vars have been superseded by the use of across since dplyr 1.0, the following way (or similar) is recommended now:

dataset %>%
filter(across(c(father, mother), ~ !is.na(.x))) %>%
filter(across(c(-father, -mother), is.na))

See more example of across and how to rewrite previous code with the new approach here: Colomn-wise operatons or type vignette("colwise") in R after installing the latest version of dplyr.

How to use a variable in dplyr::filter?

You could use the get function to fetch the value of the variable from the environment.

df %>% filter(b == get("b")) # Note the "" around b

Selecting and filtering on the same variables in dplyr

You could do this:

library(dplyr)
set.seed(2)

a_dem <- runif(100,0,100)
b_dem <- runif(100,0,100)
c_blah <- runif(100,0,100)

dat <- data.frame(a_dem, b_dem, c_blah)

newdat1 <- dat %>%
select(ends_with("_dem"))

filtered <- sapply(newdat1, function(x) ifelse(x>50, x, NA))

>head(filtered)

a_dem b_dem
[1,] NA NA
[2,] 70.23740 NA
[3,] 57.33263 98.06000
[4,] NA 82.89221
[5,] 94.38393 NA
[6,] 94.34750 59.59169

And then depending on what you want to do next you could easily just exclude the NA values.


Update:

To do this completely in dplyr you can use the method that was linked to here by @sgp667

newdat2 <- dat %>%
select(ends_with("_dem")) %>%
mutate_each(funs(((function(x){ifelse(x>50, x, NA)})(.))))

> head(newdat2)
a_dem b_dem
1 NA NA
2 70.23740 NA
3 57.33263 98.06000
4 NA 82.89221
5 94.38393 NA
6 94.34750 59.59169

column names as variables in dplyr: select v filter

Two potential solutions

  1. Using get
colName = "NAME"
filter(data, get({{colName}}) == colName)

  1. Using rlang::sym
colName = "NAME"
colSym <- rlang::sym(colName)
filter(data, !!colSym == colName)

Not sure which is best.



Related Topics



Leave a reply



Submit