R Dplyr Filter Not Masking Base Filter

R dplyr filter not masking base filter?

When you load libraries in .RProfile they get attached very early in the R startup process, before the stats package is attached. The other way, you're attaching dplyr after stats has already been loaded. You can learn about R's startup process by typing ?Startup. There it says:

Note that when the site and user profile files are sourced only the base package is loaded, so objects in other packages need to be referred to by e.g. utils::dump.frames or after explicitly loading the package concerned.

I've seen Hadley recommend against loading packages in .RProfile for this reason, i.e. the discrepancies in package loading order, although personally I don't have strong feelings about it.

One possible solution is to simply add library(stats) as the very first library call in your script, before loading dplyr.

Another (long term) option to avoid these sorts of issues more globally would be to transition your workflows from "a large collection of scripts" to one or more packages.

Using a function in dplyr filter

You can wrap the expression in your function with quo() and use the !! operator to defuse it in the filter() call.

library(dplyr)

sepal_config = function(length, width, species) {
  quo(Sepal.Length > length & Sepal.Width < width & Species == species)
  }

iris %>% 
  filter(!!sepal_config(length = 4, width = 3, species = "versicolor") |
         !!sepal_config(length = 3, width = 3, species = "virginica"))

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1           5.5         2.3          4.0         1.3 versicolor
2           6.5         2.8          4.6         1.5 versicolor
3           5.7         2.8          4.5         1.3 versicolor
4           4.9         2.4          3.3         1.0 versicolor
5           6.6         2.9          4.6         1.3 versicolor
6           5.2         2.7          3.9         1.4 versicolor
7           5.0         2.0          3.5         1.0 versicolor
8           6.0         2.2          4.0         1.0 versicolor
9           6.1         2.9          4.7         1.4 versicolor
10          5.6         2.9          3.6         1.3 versicolor
...

Why does this dplyr filter not work in shiny, but works fine when run without shiny?

Is this closer to what you need?

library(shiny)
library(tidyverse)

ui <-
  fluidPage(
    h3("Data table:"),
    tableOutput("data"),
    h3("Sum the data table columns:"),
    radioButtons(
      inputId = "grouping",
      label = NULL,
      choiceNames = c("By period 1", "By period 2"),
      choiceValues = c("Period_1", "Period_2"),
      selected = "Period_1",
      inline = TRUE
    ),
    tableOutput("sums")
  )

server <- function(input, output, session) {
  data <- reactive({
    data.frame(
      ID = c(1,1,2,2,2,2),
      Period_1 = c("2020-03", "2020-04", "2020-01", "2020-02", "2020-03", "2020-04"),
      Period_2 = c(1, 2, 1, 2, 3, 4),
      ColA = c(10, 20, 30, 40, 50, 52),
      ColB = c(15, 25, 35, 45, 55, 87)
    )
  })
  
  dataExpand <- reactive({
    data() %>%
      tidyr::complete(ID, nesting(Period_2)) %>%
      tidyr::fill(ColA, ColB, .direction = "down")
  })  
  
  choice <- reactive(input$grouping)
  
  summed_data <- reactive({
    dataExpand() %>%
      group_by(across(choice())) %>%
      select("ColA","ColB") %>%
      summarise(across(everything(), sum, na.rm = TRUE)) |> 
      filter(across(1,.fns = ~ .x |> negate(is.na)() ))
    
    # Below removes Period_1 rows that are added due to Period_2 < 4 when grouping by Period_2
    
  })
  
  
  
  output$data <- renderTable(data())
  output$sums <- renderTable(summed_data())
}

shinyApp(ui, server)

How to filter out rows that do not fit specified condition in R

In addition to PaulS's tidyverse/dplyr solution, here's how you can filter out rows using base R.

df[df$Ethnicity != "Asian",]

#>    ID Ethnicity Age Set
#> 1   1     White   1   1
#> 3   3     Black   3   3
#> 4   4  Hispanic   4   4
#> 5   5     Other   5   1
#> 6   6     White   6   2
#> 8   8     Black   8   4
#> 9   9  Hispanic   9   1
#> 10 10     Other  10   2
#> 11 11     White  11   3
#> 13 13     Black  13   1
#> 14 14  Hispanic  14   2
#> 15 15     Other  15   3
#> 16 16     White  16   4
#> 18 18     Black  18   2
#> 19 19  Hispanic  19   3
#> 20 20     Other  20   4

dplyr filter not working as expected in function with same argument names

dplyr and friends¹ are smart, but they cannot differentiate between the two references to gear (and carb). For instance, in gear == as.numeric(gear), you intend the first to refer to gear within the frame and the second to refer to the function argument, but in these functions, the first match of gear (to within the frame, within the function environment, within the enclosing environments) wins and is used for all references. In this case, they both match the column of the frame, and are therefore always TRUE (in this example).

Try:

function(carb., gear.) {
  mtcars %>% filter(gear == as.numeric(gear.),
                    carb == as.numeric(carb.)) %>% 
    jsonlite::toJSON()

}

This has the unfortunate side-effect that the API arguments are less aesthetic. So if you want to preserve the way they look (or there are external motivators to keeping them as-is), then do a quick reassignment.

function(carb, gear) {
  c. <- carb
  g. <- gear
  mtcars %>%
    filter(gear == as.numeric(g.),
           carb == as.numeric(c.)) %>% 
    jsonlite::toJSON()
}

Side note: I find it useful at times to implement permissive filtering, where an omitted (or intentionally-null) argument means no filtering.

function(carb = NA, gear = NA) {
  c. <- carb
  g. <- gear
  mtcars %>%
    filter(is.na(g.) | gear == as.numeric(g.),
           is.na(c.) | carb == as.numeric(c.)) %>% 
    jsonlite::toJSON()
}

Another side note: is there a reason you are doing a double JSON here? For instance, I'm seeing:

$ curl -s localhost:8000/test2?gear=4
"[{\"mpg\":21,\"cyl\":6,\"disp\":160,\"hp\":110,\"drat\":3.9,\"wt\":2.62,\"qsec\":16.46,\"vs\":0,\"am\":1,\"gear\":4,\"carb\":4},...]"

which is returning a long string (note the quotes). Many parsers will see that as a string and preserve it. (For instance, piping curl ... | jq . does not break-open the json as it should, it just returns the literal string.)

Instead, if you remove the toJSON, you see:

$ curl -s localhost:8000/test2?gear=4
[{"mpg":21,"cyl":6,"disp":160,"hp":110,"drat":3.9,"wt":2.62,"qsec":16.46,"vs":0,"am":1,"gear":4,"carb":4},...]

which is a "proper" json return, and can be parsed correctly. Adding | jq . after the curl call correctly parses the output:

$ curl -s localhost:8000/test2?gear=4 | jq .
[
  {
    "mpg": 21,
    "cyl": 6,
    "disp": 160,
    "hp": 110,
    "drat": 3.9,
    "wt": 2.62,
    "qsec": 16.46,
    "vs": 0,
    "am": 1,
    "gear": 4,
    "carb": 4
  },
  ...
]

Notes:

I should note that this is not unique to dplyr, and there should be no blame assigned there. The same behavior can be seen with base::with and base::within. Compare the two:

func <- function(carb, gear) { browser(); 1; }
func(1, 3)
# Called from: func(1, 3)
debug at #1: [1] 1
c. <- carb
g. <- gear
with(mtcars, { gear == as.numeric(gear) & carb == as.numeric(carb); })
#  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [31] TRUE TRUE
with(mtcars, { gear == as.numeric(g.) & carb == as.numeric(c.); })
#  [1] FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
# [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
# [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

filtering with a variable does not give the same results as with a constant - R

The problem is that your data contains a column i. And in tidyverse pipes, the functions will always look within the data first, so what you essentially trying to do with patch_sparse %>% filter(period==i) is to filter on rows where period is equal to the column i of your data.

So if you want to filter based on an external scalar, make sure the name of the scalar is different from your data's column names, e.g. something like:

filter_i <- 0
patch_sparse %>% filter(period==filter_i)

filter function in dplyr errors: object 'name' not found

It does seem like you are getting the stats::filter function and not the dplyr one. To make sure you get the right one, use the notation dplyr::filter.

d = data.frame(x=1:10,
 name=c("foo","bar","baz","bar","bar","baz","fnord","qar","qux","quux"))

filter(d, !grepl("ar|ux", name))
Error in grepl("ar|ux", name) : object 'name' not found

dplyr::filter(d, !grepl("ar|ux", name))
  x  name
1 1   foo
2 3   baz
3 6   baz
4 7 fnord

You don't even need to do library(dplyr) for this to work - you do need dplyr installed though.

This works for functions from any package.

dplyr:filter() to grab the rows

"Species" == "setosa" is like matching two unequal strings, which are evidently not equal.

So you are filtering a df with vector having only FALSE in it. Thus, no rows were returned. For filtering any dataframe in dplyr we indeed require a logical vector equal to the length of rows in that dataframe. Wherever, there is a TRUE that row is returned.

you are actually doing something like this

filter(iris, 'something' == 'something else')
[1] Sepal.Length Sepal.Width  Petal.Length Petal.Width  Species     
<0 rows> (or 0-length row.names)

If instead you'll do something like this, all rows will be returned.

filter(iris, 'a' == 'a')

#check
str(filter(iris, 'a' == 'a'))
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

To match contents of object Species you have to remove its quotation marks and then R will recognise it an object and not a string.

Moreover, in dplyr/tidyverse objects are always attached and thus we do not have to use $

Multiple if_all() in dplyr::filter not working

The comparison should be included inside if_all. When using external vectors as column names it is good practice to use all_of.

library(dplyr)

df %>% filter(if_all(all_of(yes),  ~. == 1) & if_all(all_of(no), ~. == 0))

#  dim pub sco wos
#1   1   0   0   1
#2   1   0   0   1

In base R, you can use rowSums -

df[rowSums(df[yes] == 1) == length(yes) & rowSums(df[no] == 0) == length(no), ]

R Dplyr Filter Not Masking Base Filter