What Are the Differences Between R's New Native Pipe '|>' and the Magrittr Pipe '%>%'

R: transition from magrittr to native pipe and translation of a function

complete_data_native_wrong():

complete_data_native_wrong <- function(data){

res <- data |> (\(x) filter(x, complete.cases(x)))()

return(res)

}

Data masking is the reason that this lovely function doesn't work as expected.

"So, what actually happens?", you ask.

dplyr::filter() checks for a column named x, it indeed finds it, then passes the contents of that column to complete.cases(). The same happens when you use y instead of x.

complete.cases() ends up acting on a "vector" instead of a data.frame, hence the results.

"But... How do I ensure dplyr::filter() doesn't act that way?", you enquire.

That's where the bang-bang operator !! comes in. And we can now have complete_data_native_right():

complete_data_native_right <- function(data){

res <- data |> (\(x) filter(x, complete.cases(!!x)))()
# res <- data |> (\(y) filter(y, complete.cases(!!y)))()

return(res)

}


move_row_native_attempt():

For this one you can use the shorthand function notation without any hiccups:

move_row_native_attempt <-  function(df, ini_pos, fin_pos){
row_pick <- slice(df, ini_pos)

if (fin_pos=="last"){
res <- df |>
slice(-ini_pos) |>
(\(x) add_row(x, row_pick, .before = nrow(x)))()

} else{
res <- df |>
slice(-ini_pos) |>
add_row(row_pick, .before = fin_pos)
}

return(res)
}

What is the difference between % % and %,% in magrittr?

The normal piping operator is %>%. You can use %,% to create a reusable pipe, a pipe without data. Then later you can use the same pipe with various data sets. Here is an example.

library(magrittr)
library(dplyr)
library(Lahman)

Suppose you want to calculate the top 5 baseball players, according to total hits. Then you can do something like this (taken from the magrittr README):

Batting %>%
group_by(playerID) %>%
summarise(total = sum(G)) %>%
arrange(desc(total)) %>%
head(5)
# Source: local data frame [5 x 2]
#
# playerID total
# 1 rosepe01 3562
# 2 yastrca01 3308
# 3 aaronha01 3298
# 4 henderi01 3081
# 5 cobbty01 3035

So far so good. Now let's assume that you have several data sets in the same format as Batting, so you could just reuse the same pipe again. %,% helps you create, save and reuse the pipe:

top_total <- group_by(playerID) %,%
summarise(total = sum(G)) %,%
arrange(desc(total)) %,%
head(5)

top_total(Batting)
# Source: local data frame [5 x 2]
#
# playerID total
# 1 rosepe01 3562
# 2 yastrca01 3308
# 3 aaronha01 3298
# 4 henderi01 3081
# 5 cobbty01 3035

Of course you could also create a function the regular R way, i.e. top_total <- function(...) ..., but %,% is a more concise way.

Why isn't the magrittr % % assignment pipe working with R's native pipe (| )

The reason that the assignment pipe, %<>% is no longer is due operator precedence. The %<>% occurs before the |>, see below eacmple:

library(magrittr)
library(tidyverse)

a <- tibble(a = 1:3)
a %<>%
mutate(b = a * 2) |>
mutate(c = a * 3) |>
filter(a <= 2)
a

Returns

# A tibble: 3 × 2
a b
<int> <dbl>
1 1 2
2 2 4
3 3 6

Thus the

a %<>% 
mutate(b = a * 2)

Is the only section that was saved. You can also get a feeling that this may be the case as you get the intended table printed instead which should never be the case with a tibble assignment.

How does the new native pipe placeholder works, exactly?

You will need to restructure this a bit to take advantage of _ . _ does not directly address the problem of using the LHS multiple times on the RHS and does not address the problem of nesting functions on the RHS, both of which are problems that the code faces. Also note that the code in the question reuses m again within the code which really defeats the left to right idea of pipes. Also names(m) is NULL since m has no names.

We create a list with a single element named x and then use that in the next line to solve the problem of having to refer to it 3 times and also to address the nested calls. In the rbind we eliminated reference to m since rbinding NULL is pointless. We did manage to use _ twice and eliminate all the anonymous functions while keeping mostly to the idea of the code in the question.

m |>
list(x = _) |>
with(.colSums(is.na(x), NROW(x), NCOL(x))) |>
rbind(sum.NA = _) |>
t()

What is the difference between the + operator in ggplot2 and the % % operator in magrittr?

Piping is very different from ggplot2's addition. What the pipe operator, %>%, does is take the result of the left-hand side and put it as the first argument of the function on the right-hand side. For example:

1:10 %>% mean()
# [1] 5.5

Is exactly equivalent to mean(1:10). The pipe is more useful to replace multiply nested functions, e.g.,

x = factor(2008:2012)
x_num = as.numeric(as.character(x))
# could be rewritten to read from left-to-right as
x_num = x %>% as.character() %>% as.numeric()

but this is all explained nicely over at What does %>% mean in R?, you should read through that for a couple more examples.

Using this knowledge, we can re-write your pipe examples as nested functions and see that they still do the same things; but now it (hopefully) is obvious why #4 doesn't work:

# 3. This is acceptable ggplot2 syntax
ggplot(data = mtcars) + geom_point(aes(x=wt, y = mpg))

# 4. This is not
geom_point(aes(ggplot(data = mtcars), x=wt, y = mpg))

ggplot2 includes a special "+" method for ggplot objects, which it uses to add layers to plots. I didn't know until you asked your question that it also works with the aes() function, but apparently that's defined as well. These are all specially defined within ggplot2. The use of + in ggplot2 predates the pipe, and while the usage is similar, the functionality is quite different.

As an interesting side-note, Hadley Wickham (the creator of ggplot2) said that:

...if I'd discovered the pipe earlier, there never would've been a ggplot2, because you could write ggplot graphics as

ggplot(mtcars, aes(wt, mpg)) %>%
geom_point() %>%
geom_smooth()

How to pipe purely in base R ('base pipe')?

In R |> is used as a pipe operator. (Since 4.1.0)

The left-hand side expression lhs is inserted as the first free argument in the call of to the right-hand side expression rhs.

mtcars |> head()                      # same as head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

mtcars |> head(2) # same as head(mtcars, 2)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4

It is also possible to use a named argument with the placeholder _ in the rhs call to specify where the lhs is to be inserted. The placeholder can only appear once on the rhs. (Since 4.2.0)

mtcars |> lm(mpg ~ disp, data = _)
#mtcars |> lm(mpg ~ disp, _) #Error: pipe placeholder can only be used as a named argument
#Call:
#lm(formula = mpg ~ disp, data = mtcars)
#
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122

Alternatively explicitly name the argument(s) before the "one":

mtcars |> lm(formula = mpg ~ disp)

In case the placeholder is used more than once or used as a named or also unnamed argument on any position or for disabled functions: Use an (anonymous) function.

mtcars |> (\(.) .[.$cyl == 6,])()
#mtcars ->.; .[.$cyl == 6,] # Alternative using bizarro pipe
#local(mtcars ->.; .[.$cyl == 6,]) # Without overwriting and keeping .
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6

mtcars |> (\(.) lm(mpg ~ disp, .))()
#Call:
#lm(formula = mpg ~ disp, data = .)
#
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122

1:3 |> setNames(object = _, nm = _)
#Error in setNames(object = "_", nm = "_") :
# pipe placeholder may only appear once
1:3 |> (\(.) setNames(., .))()
#1 2 3
#1 2 3

1:3 |> list() |> setNames(".") |> with(setNames(., .))
#1 2 3
#1 2 3

#The same but over a function
._ <- \(data, expr, ...) {
eval(substitute(expr), list(. = data), enclos = parent.frame())
}
1:3 |> ._(setNames(., .))
#1 2 3
#1 2 3

Some function are disabled.

mtcars |> `$`(cyl)
#Error: function '$' not supported in RHS call of a pipe

But some still can be called by placing them in brakes, call them via the function ::, call it in a function or define a link to the function.

mtcars |> (`$`)(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

mtcars |> base::`$`(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

mtcars |> (\(.) .$cyl)()
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

fun <- `$`
mtcars |> fun(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

An expression written as x |> f(y) is parsed as f(x, y). While the code in a pipeline is written sequentially, regular R semantics for evaluation apply. So piped expressions will be evaluated only when first used in the rhs expression.

-1 |> sqrt() |> (\(x) 0)()
#[1] 0

. <- -1
. <- sqrt(.)
#Warning message:
#In sqrt(.) : NaNs produced
(\(x) 0)(.)
#[1] 0


x <- data.frame(a=0)
f1 <- \(x) {message("IN 1"); x$b <- 1; message("OUT 1"); x}
f2 <- \(x) {message("IN 2"); x$c <- 2; message("OUT 2"); x}

x|> f1() |> f2()
#IN 2
#IN 1
#OUT 1
#OUT 2
# a b c
#1 0 1 2

f2(f1(x))
#IN 2
#IN 1
#OUT 1
#OUT 2
# a b c
#1 0 1 2

. <- x
. <- f1(.)
#IN 1
#OUT 1
f2(.)
#IN 2
#OUT 2
# a b c
#1 0 1 2

Behavior of R pipe operator after transformation

In your example, it is only the 100 that is passed to round() -- so it doesn't affect anything since 100 is already a whole number. The same thing happens with %>%.

With magrittr's pipe, we can fix this by calling * as a function explicitly with backticks:

table(iris$Species) %>% prop.table() %>% `*`(100) %>% round(1)

Within base R, AFAIK we have to include an anonymous function to multiply:

table(iris$Species) |> prop.table() |> (\(x) x * 100)() |> round(1)

(But see Anoushiravan R's answer for a workaround.)



Related Topics



Leave a reply



Submit