%>% .$Column_Name Equivalent for R Base Pipe |>

%% .$column_name equivalent for R base pipe |

We can use getElement().

iris |> getElement('Sepal.Length') |> cut(5)

How to pipe purely in base R ('base pipe')?

In R |> is used as a pipe operator. (Since 4.1.0)

The left-hand side expression lhs is inserted as the first free argument in the call of to the right-hand side expression rhs.

mtcars |> head()                      # same as head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

mtcars |> head(2) # same as head(mtcars, 2)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4

It is also possible to use a named argument with the placeholder _ in the rhs call to specify where the lhs is to be inserted. The placeholder can only appear once on the rhs. (Since 4.2.0)

mtcars |> lm(mpg ~ disp, data = _)
#mtcars |> lm(mpg ~ disp, _) #Error: pipe placeholder can only be used as a named argument
#Call:
#lm(formula = mpg ~ disp, data = mtcars)
#
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122

Alternatively explicitly name the argument(s) before the "one":

mtcars |> lm(formula = mpg ~ disp)

In case the placeholder is used more than once or used as a named or also unnamed argument on any position or for disabled functions: Use an (anonymous) function.

mtcars |> (\(.) .[.$cyl == 6,])()
#mtcars ->.; .[.$cyl == 6,] # Alternative using bizarro pipe
#local(mtcars ->.; .[.$cyl == 6,]) # Without overwriting and keeping .
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6

mtcars |> (\(.) lm(mpg ~ disp, .))()
#Call:
#lm(formula = mpg ~ disp, data = .)
#
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122

1:3 |> setNames(object = _, nm = _)
#Error in setNames(object = "_", nm = "_") :
# pipe placeholder may only appear once
1:3 |> (\(.) setNames(., .))()
#1 2 3
#1 2 3

1:3 |> list() |> setNames(".") |> with(setNames(., .))
#1 2 3
#1 2 3

#The same but over a function
._ <- \(data, expr, ...) {
eval(substitute(expr), list(. = data), enclos = parent.frame())
}
1:3 |> ._(setNames(., .))
#1 2 3
#1 2 3

Some function are disabled.

mtcars |> `$`(cyl)
#Error: function '$' not supported in RHS call of a pipe

But some still can be called by placing them in brakes, call them via the function ::, call it in a function or define a link to the function.

mtcars |> (`$`)(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

mtcars |> base::`$`(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

mtcars |> (\(.) .$cyl)()
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

fun <- `$`
mtcars |> fun(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

An expression written as x |> f(y) is parsed as f(x, y). While the code in a pipeline is written sequentially, regular R semantics for evaluation apply. So piped expressions will be evaluated only when first used in the rhs expression.

-1 |> sqrt() |> (\(x) 0)()
#[1] 0

. <- -1
. <- sqrt(.)
#Warning message:
#In sqrt(.) : NaNs produced
(\(x) 0)(.)
#[1] 0

x <- data.frame(a=0)
f1 <- \(x) {message("IN 1"); x$b <- 1; message("OUT 1"); x}
f2 <- \(x) {message("IN 2"); x$c <- 2; message("OUT 2"); x}

x|> f1() |> f2()
#IN 2
#IN 1
#OUT 1
#OUT 2
# a b c
#1 0 1 2

f2(f1(x))
#IN 2
#IN 1
#OUT 1
#OUT 2
# a b c
#1 0 1 2

. <- x
. <- f1(.)
#IN 1
#OUT 1
f2(.)
#IN 2
#OUT 2
# a b c
#1 0 1 2

Does the Bizarro pipe -.; have disadvantages making it not recommended for use?

The main issue with the bizarro pipe is that it creates hidden side-effects and makes it easier to create subtle bugs. It decreases code maintainability.

The persistent existence of the . variable makes it all too easy to accidentally refer to this value later down the line: its presence masks mistakes if you at some point forget to assign to it and think you did. It’s easy to dismiss this possibility but such errors are fairly common and, worse, very non-obvious: you won’t get an error message, you’ll just get a wrong result. By contrast, if you forget the pipe symbol somewhere, you’ll get an immediate error message.

Worse, the bizarro pipe hides this error-prone side-effect in two different ways. First, because it makes the assignment non-obvious. I’ve argued previously that -> assignment shouldn’t be used since left-to-right assignment hides a side-effect, and side-effects should be made syntactically obvious. The side-effect in this case is the assignment, and it should happen where it’s most prominent: in the first column of the expression, not hidden away at its end. This is a fundamental objection to the use of -> (or any other attempt to mask side-effects), not limited to the bizarro pipe.

And because . is by default hidden (from ls and from the inspector pane in IDEs), this makes it even easier to accidentally depend on it.

Therefore, if you want to assign to a temporary name instead of using a pipe, just do that. But:

  1. Perform right-to-left assignment, i.e. use name = value or name <- value, not value -> name.
  2. Use a descriptive name.

I can’t stress enough that this is an actual source of subtle bugs — don’t underestimate it!

Another issue is that its use breaks editor support for auto-formatting code. This is a “solvable issue” in some IDEs via plugins but the solution, as it were, solves an issue that should not even exist. To clarify what I mean, if you’re using the bizarro pipe you’d presumably want a hanging indent, i.e. something along these lines:

mtcars ->.
subset(cyl == 4) ->.
lm(mpg ~ disp, data = .)

… but auto-indentation won’t indent the code like this, and auto-formatters will flatten the hanging indent.

Neither of these issues are prohibitive (though the first is quite serious); but in the absence of a positive argument for using the bizarro pipe they tip the balance decisively. After all, what problem does the bizarro pipe solve that isn’t better solved by a proper pipeline operator1 or by regular assignment? If you can’t use R 4.1, use ‘magrittr’. If you don’t like the semantics of ‘magrittr’, write your own pipe operator, use one of the many other existing implementations, or just use regular assignment.

Lastly, one might argue that this code is sufficiently unusual to trip up readers, but honestly I don’t think that’s a very compelling argument if the usage is consistent and clearly documented somewhere. But it presents another argument against recommending its use to beginners.


1 Of course that’s easy to answer: |> does not allow explicit dot substitution. And while I understand the arguments against supporting it, the fact that its absence encourages hacks such as the bizarro pipe is a very strong argument that this was in fact a huge mistake.

Create a matrix from a vector and flipping it using base pipe

Another way calling the function [.

v <- c(1,2,3,4,1,2,3,4,2)

matrix(data=v,ncol=3) |> `[`(x=_,,3:1, drop = FALSE)
# [,1] [,2] [,3]
#[1,] 3 4 1
#[2,] 4 1 2
#[3,] 2 2 3

or without placeholder:

#Does not work as '[' is currently not supported in RHS call of a pipe
#matrix(data=v,ncol=3) |> `[`(,3:1, drop = FALSE)

#But the following will currently work
matrix(data=v,ncol=3) |> base::`[`(,3:1, drop = FALSE)
matrix(data=v,ncol=3) |> (`[`)(,3:1, drop = FALSE)

Or without pipe:

matrix(data=v,ncol=3)[, 3:1, drop = FALSE]
#matrix(v, ncol=3)[, 3:1] #Short alternative

Equivalent of max and over(partition by) in R for flattening

Using data.table:

library(data.table)
setDT(df)[
, recorded_dt:=NULL][
, lapply(.SD, \(x) sort(x, na.last = TRUE, decreasing = TRUE)[1])
, by=.(ID, time0)]
## ID time0 day0 day1 day4 day30
## 1: 1 2009-01-01 A <NA> B D
## 2: 2 2005-02-02 <NA> B <NA> <NA>

The internal variable .SD represents a subset of the data.table including all columns except those included in the by=... clause. This is why we have to remove the column recorded_dt first.



Related Topics



Leave a reply



Submit