mutate() returns error of 'Object not found'
data1
does not have Net Sales
column, it is present in the transformation that you have done. You can use .
to refer to current dataframe in pipe.
library(dplyr)
data1 %>%
select(`Product Code` = `product id`, `Net Sales` = `total sales`) %>%
replace_na(list(`Net Sales` = 0))%>%
arrange(desc(`Net Sales`)) %>%
mutate(Volume = rank_volume(., `Net Sales`))
# `Product Code` `Net Sales` Volume
# <chr> <dbl> <chr>
#1 X109 300 H
#2 X180 200 M
#3 X918 200 M
#4 X273 150 L
#5 X988 120 L
Or can also use cur_data()
-
data1 %>%
select(`Product Code` = `product id`, `Net Sales` = `total sales`) %>%
replace_na(list(`Net Sales` = 0))%>%
arrange(desc(`Net Sales`)) %>%
mutate(Volume = rank_volume(cur_data(), `Net Sales`))
Mutate inside a function: object not found
Below, I have included a very ergonomic Solution, which (almost) seamlessly mimics the familiar feel of the dplyr
workflow. I have also spent some time in diagnosing and addressing conceptual pitfalls.
A common source of confusion is the fact that R passes by value and not by reference. Along with the nuances of programmatic dplyr
, this fact is responsible for two conceptual errors in your code.
For the sake of convenience, I have reproduced your sp_merrel_df_raw
here as a data.frame
:
structure(list(product_name = c("Merrell Riverbed 3", "Sapatilhas Montanha Merrell", "Merrell Moab Adventure", "Merrell Moab 2 Vent"),
price_new = c("59,99 €", "149,99 €", "99,99 €", "79,99 €"),
price_old = c("69,99 €", NA, NA, "99,99 €"),
date = c(210720, 210720, 210720, 210720)),
row.names = c(NA, -4L),
class = "data.frame")
You Cannot Modify "in Place"
Serendipitously, I clarified this exact confusion some time ago. Suppose you have a simple numeric
variable a
and a simple function doubler()
:
x <- 2
doubler <- function(num) {
num <- 2 * num
}
Now simply running doubler(x)
will do absolutely nothing aside from (invisibly) returning 4
. All that happens is that the parameter num
is passed the value 2
, and then num
is overwritten with 4
within the scope of the function. However, the original variable x
remains untouched:
doubler(x)
x
# [1] 2
In order to modify x
, we must overwrite it (<-
) with the results of the function:
x <- doubler(x)
x
# [1] 4
Analogously, when you run your currency_to_numeric()
function
currency_to_numeric(raw = sp_merrel_df_raw, clean = sp_merrel_df_clean2, var = price_new)
it will accept the value of sp_merrel_df_clean2
, and assign that value to its clean
parameter. Everything that happens afterward
clean <- raw %>% # ...
affects only clean
within the scope of the function. When all is said and done, sp_merrel_df_clean2
will never be affected.
Instead, something like this is required, to overwrite sp_merrel_df_clean2
with the new value:
currency_to_numeric <- function(raw, ...) {
# ...
}
sp_merrel_df_clean2 <- currency_to_numeric(raw = sp_merrel_df_raw, ...)
Decontextualized Variables
As discussed in the dplyr
documentation
env-variables are “programming” variables that live in an environment
whereas
data-variables are “statistical” variables that live in a data frame.
Now data-variables are "masked" in the context of certain functions, especially in dplyr
. Such masking lets us refer to the sp_merrel_df_raw$price_new
column as simply price_new
when we perform (say) a mutate()
on sp_merrel_df_raw
:
sp_merrel_df_raw %>%
mutate(
price_new = sub(" €", "", price_new)
# ...
)
However, when you run your currency_to_numeric()
function
currency_to_numeric(raw = sp_merrel_df_raw, clean = sp_merrel_df_clean2, var = price_new)
var
does not become the sp_merrel_df_raw$price_new
variable itself.
Rather, R looks for some env-variable named price_new
, in surrounding environment (here .GlobalEnv
), and attempts to assign its value to the var
parameter. Naturally, since no such price_new
variable exists in .GlobalEnv
, there is no such value, so R throws an error as soon as it tries to use that value in mutate()
:
Error: Problem with `mutate()` column `var`.
i `var = sub(" \200", "", var)`.
x object 'price_new' not found
This error is comparable to what you would get if you called a function on any other variable that didn't exist:
doubler(num = nonexistent_variable)
# Error in doubler(num = nonexistent_variable) :
# object 'nonexistent_variable' not found
However, even if price_new
were actually floating around in .GlobalEnv
as a typical env-variable, you would still get an error. This is because passing the value of price_new
to var
is not the same as "pasting" the "name" price_new
mutate(
price_new = sub(" €", "", price_new),
price_new = sub(",", ".", price_new),
price_new = as.numeric(price_new))
wherever the "name" var
used to be.
mutate(
var = sub(" €", "", var),
var = sub(",", ".", var),
var = as.numeric(var))
Solution
Here's a nifty reworking of currency_to_numeric ()
that closely imitates the typical functionality of dplyr
:
currency_to_numeric <- function(raw, ...) {
raw %>%
mutate(
across(c(...), ~ sub(" €", "", .x)),
across(c(...), ~ sub(",", ".", .x)),
across(c(...), ~ as.numeric(.x))
)
}
As with virtually any R function, you must still assign the results to sp_merrel_df_clean2
, but this solution will help you do so very cleanly
sp_merrel_df_clean2 <- sp_merrel_df_raw %>%
currency_to_numeric(price_new)
with the following results for sp_merrel_df_clean2
:
product_name price_new price_old date
1 Merrell Riverbed 3 59.99 69,99 € 210720
2 Sapatilhas Montanha Merrell 149.99 <NA> 210720
3 Merrell Moab Adventure 99.99 <NA> 210720
4 Merrell Moab 2 Vent 79.99 99,99 € 210720
In fact, you can simultaneously target as many data-variables (like price_new
and price_old
) as you want
sp_merrel_df_clean2 <- sp_merrel_df_raw %>%
currency_to_numeric(price_new, price_old)
and covert all your currency columns in one fell swoop!
product_name price_new price_old date
1 Merrell Riverbed 3 59.99 69.99 210720
2 Sapatilhas Montanha Merrell 149.99 NA 210720
3 Merrell Moab Adventure 99.99 NA 210720
4 Merrell Moab 2 Vent 79.99 99.99 210720
dplyr rename() function not working. I get object X not found
We need to specify it in the reverse
library(dplyr)
test %>%
rename(column1 = X1)
# column1 X2
#1 2 5
#2 3 6
#3 4 7
R Object not found error when using tidy evaluation and group by in function
Ok, found it :)
pie_plot <- function(data, labels, varname, colors) {
lbls <- labels
count <- data %>%
dplyr::group_by({{varname}}) %>%
tally() %>%
.[2] %>%
.[1:length(labels), ] %>%
unlist(.)
df <- as_tibble(cbind(count, lbls))
plot_ly(df, labels = df$lbls, values = df$count,
marker = list(colors = colors,
line = list(color = '#FFFFFF', width = 1)),
type = "pie", width = 280, height = 280)
}
pie_plot(data, labels = c("Male", "Female"), varname = t1gender, colors = c('#440154FF', '#21908CFF'))
What do backticks do in R?
A pair of backticks is a way to refer to names or combinations of symbols that are otherwise reserved or illegal. Reserved are words like if
are part of the language, while illegal includes non-syntactic combinations like c a t
. These two categories, reserved and illegal, are referred to in R documentation as non-syntactic names
.
Thus,
`c a t` <- 1 # is valid R
and
> `+` # is equivalent to typing in a syntactic function name
function (e1, e2) .Primitive("+")
As a commenter mentioned, ?Quotes
does contain some information on the backtick, under Names and Identifiers:
Identifiers consist of a sequence of letters, digits, the period (
.
) and the underscore. They must not start with a digit nor underscore, nor with a period followed by a digit. Reserved words are not valid identifiers.The definition of a letter depends on the current locale, but only ASCII digits are considered to be digits.
Such identifiers are also known as syntactic names and may be used directly in R code. Almost always, other names can be used provided they are quoted. The preferred quote is the backtick (
`
), anddeparse
will normally use it, but under many circumstances single or double quotes can be used (as a character constant will often be converted to a name). One place where backticks may be essential is to delimit variable names in formulae: seeformula
This prose is a little hard to parse. What it means is that for R to parse a token as a name, it must be 1) a sequence of letters digits, the period and underscores, that 2) is not a reserved word in the language. Otherwise, to be parsed as a name, backticks must be used.
Also check out ?Reserved
:
Reserved words outside quotes are always parsed to be references to the objects linked to in the 'Description', and hence they are not allowed as syntactic names (see
make.names
). They are allowed as non-syntactic names, e.g.inside backtick quotes.
In addition, Advanced R has some examples of how backticks are used in expressions, environments, and functions.
Why do I get an Object not found error when using group_by and summarise() in r?
From @mouli3c3 on twitter:
I know what caused the problem. Cant explain clearly why though.
library(operators) is some how masking/changing the original behaviour
of %>%. Adding library(magrittr) below librarary(operators) solved the
problem. Let me know if it works.
It worked! :)
How to pass column name as argument to function for dplyr verbs?
Here is another way of making it work. You can use .data[[var]]
construct for a column name which is stored as a string:
foo <- function(data, colName) {
result <- data %>%
group_by(.data[[colName]]) %>%
summarise(count = n())
return(result)
}
foo(quakes, "stations")
# A tibble: 102 x 2
stations count
<int> <int>
1 10 20
2 11 28
3 12 25
4 13 21
5 14 39
6 15 34
7 16 35
8 17 38
9 18 33
10 19 29
# ... with 92 more rows
In case you decide not to pass the ColName
as a string you you wrap it with a pair of curly braces inside your function to get the similar result:
foo <- function(data, colName) {
result <- data %>%
group_by({{ colName }}) %>%
summarise(count = n())
return(result)
}
foo(quakes, stations)
# A tibble: 102 x 2
stations count
<int> <int>
1 10 20
2 11 28
3 12 25
4 13 21
5 14 39
6 15 34
7 16 35
8 17 38
9 18 33
10 19 29
# ... with 92 more rows
Related Topics
How to Use the Box-Cox Power Transformation in R
Change the Index Number of a Dataframe
Difference Between Paste() and Paste0()
How to Add Rmse, Slope, Intercept, R^2 to R Plot
Plot Every Column in a Data Frame as a Histogram on One Page Using Ggplot
Meaning of Band Width in Ggplot Geom_Smooth Lm
How to Control Number of Minor Grid Lines in Ggplot2
Roll Your Own Linked List/Tree in R
Rstudio Empty on Startup - No Windows, No Menus, No Rendering
Multiply Permutations of Two Vectors in R
"Un-Register" a Doparallel Cluster
How to Make Object Created Within Function Usable Outside
Manipulating Multiple Files in R
Duplicate a Column in Data Frame and Rename It to Another Column Name