Meaning of error using . shorthand inside dplyr function
As @aosmith noted in the comments it's due to the way magrittr
parses the dot in this case :
from ?'%>%'
:
Using the dot-place holder as lhs
When the dot is used as lhs, the
result will be a functional sequence, i.e. a function which applies
the entire chain of right-hand sides in turn to its input.
To avoid triggering this, any modification of the expression on the lhs will do:
df %>%
mutate(name = str_to_lower(name)) %>%
bind_rows((.) %>% mutate(name = "New England"))
df %>%
mutate(name = str_to_lower(name)) %>%
bind_rows({.} %>% mutate(name = "New England"))
df %>%
mutate(name = str_to_lower(name)) %>%
bind_rows(identity(.) %>% mutate(name = "New England"))
Here's a suggestion that avoid the problem altogether:
df %>%
# arbitrary piped operation
mutate(name = str_to_lower(name)) %>%
replicate(2,.,simplify = FALSE) %>%
map_at(2,mutate_at,"name",~"New England") %>%
bind_rows
# # A tibble: 12 x 2
# name estimate
# <chr> <dbl>
# 1 ct 501074
# 2 ma 1057316
# 3 me 47369
# 4 nh 76630
# 5 ri 141206
# 6 vt 27464
# 7 New England 501074
# 8 New England 1057316
# 9 New England 47369
# 10 New England 76630
# 11 New England 141206
# 12 New England 27464
Understand the warning message in across in R
There is not much difference between using where
and not using it. It just shows a warning to suggest a better syntax. Basically where
takes a predicate function and apply it on every variable (column) of your data set. It then returns every variable for which the function returns TRUE
. The following examples are taken from the documentations of where
:
iris %>% select(where(is.numeric))
# or an anonymous function
iris %>% select(where(function(x) is.numeric(x)))
# or a purrr style formula as a shortcut for creating a function on the spot
iris %>% select(where(~ is.numeric(.x)))
Or you can also have two conditions using shorthand &&
:
# The following code selects are numeric variables whose means are greater thatn 3.5
iris %>% select(where(~ is.numeric(.x) && mean(.x) > 3.5))
You can use select(where(is.character))
for .cols
argument of the across
function and then apply a function in .fns
argument on the selected columns.
For more information you can always refer to documentations which are the best source to learn more about these materials.
What does the dplyr period character . reference?
The dot is used within dplyr mainly (not exclusively) in mutate_each
, summarise_each
and do
. In the first two (and their SE counterparts) it refers to all the columns to which the functions in funs
are applied. In do
it refers to the (potentially grouped) data.frame so you can reference single columns by using .$xyz
to reference a column named "xyz".
The reasons you cannot run
filter(df, . == 5)
is because a) filter
is not designed to work with multiple columns like mutate_each
for example and b) you would need to use the pipe operator %>%
(originally from magrittr
).
However, you could use it with a function like rowSums
inside filter
when combined with the pipe operator %>%
:
> filter(mtcars, rowSums(. > 5) > 4)
Error: Objekt '.' not found
> mtcars %>% filter(rowSums(. > 5) > 4) %>% head()
lm cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
4 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
5 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
6 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
You should also take a look at the magrittr help files:
library(magrittr)
help("%>%")
From the help page:
Placing lhs elsewhere in rhs call
Often you will want lhs to the rhs call at another position than the first. For this purpose you can use the dot (.) as placeholder. For example,y %>% f(x, .)
is equivalent tof(x, y)
andz %>% f(x, y, arg = .)
is equivalent tof(x, y, arg = z)
.Using the dot for secondary purposes
Often, some attribute or property of lhs is desired in the rhs call in addition to the value of lhs itself, e.g. the number of rows or columns. It is perfectly valid to use the dot placeholder several times in the rhs call, but by design
the behavior is slightly different when using it inside nested
function calls. In particular, if the placeholder is only used in a
nested function call, lhs will also be placed as the first argument!
The reason for this is that in most use-cases this produces the most
readable code. For example,iris %>% subset(1:nrow(.) %% 2 == 0)
is
equivalent toiris %>% subset(., 1:nrow(.) %% 2 == 0)
but slightly
more compact. It is possible to overrule this behavior by enclosing
the rhs in braces. For example,1:10 %>% {c(min(.), max(.))}
is
equivalent toc(min(1:10), max(1:10))
.
What is the meaning of the `~` operator in the tidyverse context?
Most commonly, it's a shorthand way of writing an anonymous function.
map_dbl(HEIGHT, ~ sum(.x, 5))
is the same as
map_dbl(HEIGHT, function(.x){sum(.x, 5))
It has other meanings in other contexts. E.g., at the R>
prompt, type
? case_when
to see how it uses ~
.
Problem programming with dplyr--column which is definitely a vector being picked up as a formula
{{a}}
is shorthand for !!enquo(a)
, which captures the expression provided to a
as well as the context where this expression should be evaluated. In your case, the context is the data frame, which is already being provided to the function. So, a better rlang
verb to use here is ensym(a)
, which captures the symbol name provided to a
instead:
plot_high_chart <- function(.data,
chart_type = "column",
x_value = "Year", # <-- Note: strings
y_value = "total",
group_value = "service") {
.data %>%
hchart(chart_type, hcaes(x = !!rlang::ensym(x_value), # <- ensym instead of {{
y = !!rlang::ensym(y_value),
group = !!rlang::ensym(group_value)))
}
As a bonus, the function will now work with symbols AND with strings:
data %>%
plot_high_chart(x_value= "Year", y_value= "total", group_value= "service") # Works
data %>%
plot_high_chart(x_value= Year, y_value= total, group_value= service) # Also Works
pivot_longer gives error when using dtplyr
Dtplyr version 1.2.0 is now available on CRAN, which means this issue is now resolved!
For anyone experiencing this error, check/update your version of dtplyr to ensure you are running >=1.2.0:
install.packages("dtplyr")
(NB. this isn't updated as part of the tidyverse packages so make sure to do it separately)
https://www.tidyverse.org/blog/2021/12/dtplyr-1-2-0/
https://cran.r-project.org/web/packages/dtplyr/index.html
Use dplyr's _if() functions like mutate_if() with a negative predicate function
We can use shorthand notation ~
for anonymous function in tidyverse
library(dplyr)
iris %>%
mutate_if(~ !is.numeric(.), as.character)
Or without anonymous function, use negate
from purrr
library(purrr)
iris %>%
mutate_if(negate(is.numeric), as.character)
In addition to negate
, Negate
from base R
also works
iris %>%
mutate_if(Negate(is.numeric), as.character)
Same notation, works with select_if/arrange_if
iris %>%
select_if(negate(is.numeric))%>%
head(2)
# Species
#1 setosa
#2 setosa
Related Topics
R Leaflet Offline Tiles Within Shiny
Error with Scale_X_Labels in Ggplot2
Linear Regression with Constraints on The Coefficients
How to Install Doredis Package Version 1.0.5 into R 3.0.1 on Windows
Adding Row to a Data Frame with Missing Values
Manually Set Order of Fill Bars in Arbitrary Order Using Ggplot2
How to Calculate Euclidean Distance Between Two Matrices in R
How to Specify Certificate, Key and Root Certificate with Httr for Certificate Based Authentication
Ggplot2 Ggsave Function Causes Graphics Device to Not Display Plots
How to Keep Track of Total Transaction Amount Sent from an Account Each Last 6 Month
Removing "Nul" Characters (Within R)
Splitting (1:N)[Boolean] into Contiguous Sequences
Visualizing Distance Between Nodes According to Weights - with R
What's The Difference Between [1], [1,], [,1], [[1]] for a Dataframe in R
R: Apply Function to Matrix with Elements of Vector as Argument