Referring to Variables by Name in a Dplyr Function Returns Object Not Found Error

mutate() returns error of 'Object not found'

data1 does not have Net Sales column, it is present in the transformation that you have done. You can use . to refer to current dataframe in pipe.

library(dplyr)

data1 %>% 
     select(`Product Code` = `product id`, `Net Sales` = `total sales`) %>%
     replace_na(list(`Net Sales` = 0))%>%
     arrange(desc(`Net Sales`)) %>%
     mutate(Volume = rank_volume(., `Net Sales`))

# `Product Code` `Net Sales` Volume
#  <chr>                <dbl> <chr> 
#1 X109                   300 H     
#2 X180                   200 M     
#3 X918                   200 M     
#4 X273                   150 L     
#5 X988                   120 L

Or can also use cur_data() -

data1 %>% 
     select(`Product Code` = `product id`, `Net Sales` = `total sales`) %>%
     replace_na(list(`Net Sales` = 0))%>%
     arrange(desc(`Net Sales`)) %>%
     mutate(Volume = rank_volume(cur_data(), `Net Sales`))

Mutate inside a function: object not found

Below, I have included a very ergonomic Solution, which (almost) seamlessly mimics the familiar feel of the dplyr workflow. I have also spent some time in diagnosing and addressing conceptual pitfalls.

A common source of confusion is the fact that R passes by value and not by reference. Along with the nuances of programmatic dplyr, this fact is responsible for two conceptual errors in your code.

For the sake of convenience, I have reproduced your sp_merrel_df_raw here as a data.frame:

structure(list(product_name = c("Merrell Riverbed 3", "Sapatilhas Montanha Merrell", "Merrell Moab Adventure", "Merrell Moab 2 Vent"),
               price_new = c("59,99 €", "149,99 €", "99,99 €", "79,99 €"),
               price_old = c("69,99 €", NA, NA, "99,99 €"),
               date = c(210720, 210720, 210720, 210720)),
          row.names = c(NA, -4L),
          class = "data.frame")

You Cannot Modify "in Place"

Serendipitously, I clarified this exact confusion some time ago. Suppose you have a simple numeric variable a and a simple function doubler():

x <- 2

doubler <- function(num) {
  num <- 2 * num
}

Now simply running doubler(x) will do absolutely nothing aside from (invisibly) returning 4. All that happens is that the parameter num is passed the value 2, and then num is overwritten with 4 within the scope of the function. However, the original variable x remains untouched:

doubler(x)

x
# [1] 2

In order to modify x, we must overwrite it (<-) with the results of the function:

x <- doubler(x)

x
# [1] 4

Analogously, when you run your currency_to_numeric() function

currency_to_numeric(raw = sp_merrel_df_raw, clean = sp_merrel_df_clean2, var = price_new)

it will accept the value of sp_merrel_df_clean2, and assign that value to its clean parameter. Everything that happens afterward

    clean <- raw %>% # ...

affects only clean within the scope of the function. When all is said and done, sp_merrel_df_clean2 will never be affected.

Instead, something like this is required, to overwrite sp_merrel_df_clean2 with the new value:

currency_to_numeric <- function(raw, ...) {
  # ...
}

sp_merrel_df_clean2 <- currency_to_numeric(raw = sp_merrel_df_raw, ...)

Decontextualized Variables

As discussed in the dplyr documentation

env-variables are “programming” variables that live in an environment

whereas

data-variables are “statistical” variables that live in a data frame.

Now data-variables are "masked" in the context of certain functions, especially in dplyr. Such masking lets us refer to the sp_merrel_df_raw$price_new column as simply price_new when we perform (say) a mutate() on sp_merrel_df_raw:

sp_merrel_df_raw %>%
  mutate(
    price_new = sub(" €", "", price_new)
    # ...
  )

However, when you run your currency_to_numeric() function

currency_to_numeric(raw = sp_merrel_df_raw, clean = sp_merrel_df_clean2, var = price_new)

var does not become the sp_merrel_df_raw$price_new variable itself.

Rather, R looks for some env-variable named price_new, in surrounding environment (here .GlobalEnv), and attempts to assign its value to the var parameter. Naturally, since no such price_new variable exists in .GlobalEnv, there is no such value, so R throws an error as soon as it tries to use that value in mutate():

Error: Problem with `mutate()` column `var`.
i `var = sub(" \200", "", var)`.
x object 'price_new' not found

This error is comparable to what you would get if you called a function on any other variable that didn't exist:

doubler(num = nonexistent_variable)

# Error in doubler(num = nonexistent_variable) : 
#   object 'nonexistent_variable' not found

However, even if price_new were actually floating around in .GlobalEnv as a typical env-variable, you would still get an error. This is because passing the value of price_new to var is not the same as "pasting" the "name" price_new

mutate(
        price_new = sub(" €", "", price_new),
        price_new = sub(",", ".", price_new),
        price_new = as.numeric(price_new))

wherever the "name" var used to be.

mutate(
        var = sub(" €", "", var),
        var = sub(",", ".", var),
        var = as.numeric(var))

Solution

Here's a nifty reworking of currency_to_numeric () that closely imitates the typical functionality of dplyr:

currency_to_numeric <- function(raw, ...) {
  raw %>%
    mutate(
      across(c(...), ~ sub(" €", "", .x)),
      across(c(...), ~ sub(",", ".", .x)),
      across(c(...), ~ as.numeric(.x))
      )
}

As with virtually any R function, you must still assign the results to sp_merrel_df_clean2, but this solution will help you do so very cleanly

sp_merrel_df_clean2 <- sp_merrel_df_raw %>%
  currency_to_numeric(price_new)

with the following results for sp_merrel_df_clean2:

                 product_name price_new price_old   date
1          Merrell Riverbed 3     59.99   69,99 € 210720
2 Sapatilhas Montanha Merrell    149.99      <NA> 210720
3      Merrell Moab Adventure     99.99      <NA> 210720
4         Merrell Moab 2 Vent     79.99   99,99 € 210720

In fact, you can simultaneously target as many data-variables (like price_new and price_old) as you want

sp_merrel_df_clean2 <- sp_merrel_df_raw %>%
  currency_to_numeric(price_new, price_old)

and covert all your currency columns in one fell swoop!

                 product_name price_new price_old   date
1          Merrell Riverbed 3     59.99     69.99 210720
2 Sapatilhas Montanha Merrell    149.99        NA 210720
3      Merrell Moab Adventure     99.99        NA 210720
4         Merrell Moab 2 Vent     79.99     99.99 210720

dplyr rename() function not working. I get object X not found

We need to specify it in the reverse

library(dplyr)
test %>% 
  rename(column1 = X1)
#  column1 X2
#1       2  5
#2       3  6
#3       4  7

R Object not found error when using tidy evaluation and group by in function

Ok, found it :)

pie_plot <- function(data, labels, varname, colors) {
  lbls <- labels
  count <- data %>%
    dplyr::group_by({{varname}}) %>% 
    tally() %>%
    .[2] %>%
    .[1:length(labels), ] %>%
    unlist(.)
  df <- as_tibble(cbind(count, lbls))
  plot_ly(df, labels = df$lbls, values = df$count,
         marker = list(colors = colors,
                       line = list(color = '#FFFFFF', width = 1)),
         type = "pie", width = 280, height = 280)
}

pie_plot(data, labels = c("Male", "Female"), varname = t1gender, colors = c('#440154FF', '#21908CFF'))

What do backticks do in R?

A pair of backticks is a way to refer to names or combinations of symbols that are otherwise reserved or illegal. Reserved are words like if are part of the language, while illegal includes non-syntactic combinations like c a t. These two categories, reserved and illegal, are referred to in R documentation as non-syntactic names.

Thus,

`c a t` <- 1 # is valid R

and

> `+` # is equivalent to typing in a syntactic function name
function (e1, e2)  .Primitive("+")

As a commenter mentioned, ?Quotes does contain some information on the backtick, under Names and Identifiers:

Identifiers consist of a sequence of letters, digits, the period (.) and the underscore. They must not start with a digit nor underscore, nor with a period followed by a digit. Reserved words are not valid identifiers.
The definition of a letter depends on the current locale, but only ASCII digits are considered to be digits.
Such identifiers are also known as syntactic names and may be used directly in R code. Almost always, other names can be used provided they are quoted. The preferred quote is the backtick (`), and deparse will normally use it, but under many circumstances single or double quotes can be used (as a character constant will often be converted to a name). One place where backticks may be essential is to delimit variable names in formulae: see formula

This prose is a little hard to parse. What it means is that for R to parse a token as a name, it must be 1) a sequence of letters digits, the period and underscores, that 2) is not a reserved word in the language. Otherwise, to be parsed as a name, backticks must be used.

Also check out ?Reserved:

Reserved words outside quotes are always parsed to be references to the objects linked to in the 'Description', and hence they are not allowed as syntactic names (see make.names). They are allowed as non-syntactic names, e.g.inside backtick quotes.

In addition, Advanced R has some examples of how backticks are used in expressions, environments, and functions.

Why do I get an Object not found error when using group_by and summarise() in r?

From @mouli3c3 on twitter:

I know what caused the problem. Cant explain clearly why though.
library(operators) is some how masking/changing the original behaviour
of %>%. Adding library(magrittr) below librarary(operators) solved the
problem. Let me know if it works.

It worked! :)

How to pass column name as argument to function for dplyr verbs?

Here is another way of making it work. You can use .data[[var]] construct for a column name which is stored as a string:

foo <- function(data, colName) {
  
  result <- data %>%
    group_by(.data[[colName]]) %>%
    summarise(count = n()) 
  
  return(result)
}

foo(quakes, "stations")

# A tibble: 102 x 2
   stations count
      <int> <int>
 1       10    20
 2       11    28
 3       12    25
 4       13    21
 5       14    39
 6       15    34
 7       16    35
 8       17    38
 9       18    33
10       19    29
# ... with 92 more rows

In case you decide not to pass the ColName as a string you you wrap it with a pair of curly braces inside your function to get the similar result:

foo <- function(data, colName) {
  
  result <- data %>%
    group_by({{ colName }}) %>%
    summarise(count = n()) 
  
  return(result)
}

foo(quakes, stations)

# A tibble: 102 x 2
   stations count
      <int> <int>
 1       10    20
 2       11    28
 3       12    25
 4       13    21
 5       14    39
 6       15    34
 7       16    35
 8       17    38
 9       18    33
10       19    29
# ... with 92 more rows