Should I Avoid Programming Packages with Pipe Operators

Should I avoid programming packages with pipe operators?

Like all advanced functions written in R, %>% carries a lot of overhead, so don't use it in loops (this includes implicit loops, such as the *apply family, or the per group loops in packages like dplyr or data.table). Here's an example:

library(magrittr)
x = 1:10

system.time({for(i in 1:1e5) identity(x)})
#   user  system elapsed 
#   0.07    0.00    0.08 
system.time({for(i in 1:1e5) x %>% identity})
#   user  system elapsed 
#  15.39    0.00   16.68

Should I use %$% instead of % %?

No, you shouldn't use %$% routinely. It is like using the with() function, i.e. it exposes the component parts of the LHS when evaluating the RHS. But it only works when the value on the left has names like a list or dataframe, so you can't always use it. For example,

library(magrittr)
x <- 1:10
x %>% mean()
#> [1] 5.5
x %$% mean()
#> Error in eval(substitute(expr), data, enclos = parent.frame()): numeric 'envir' arg not of length one

^{Created on 2022-02-06 by the reprex package (v2.0.1.9000)}

You'd get a similar error with x %$% mean(.).

Even when the LHS has names, it doesn't automatically put the . argument in the first position. For example,

mtcars %>% nrow()
#> [1] 32
mtcars %$% nrow()
#> Error in nrow(): argument "x" is missing, with no default

^{Created on 2022-02-06 by the reprex package (v2.0.1.9000)}

In this case mtcars %$% nrow(.) would work, because mtcars has names.

Your example involving .$hp and .$mpg is illustrating one of the oddities of magrittr pipes. Because the . is only used in expressions, not alone as an argument, it is passed as the first argument as well as being passed in those expressions. You can avoid this using braces, e.g.

mtcars %>% {plot(.$hp, .$mpg)}

Why don't I see pipe operators in most high-level languages?

You can do pipelining type parallelism quite easily in Erlang. Below is a shameless copy/paste from my blogpost of Jan 2008.

Also, Glasgow Parallel Haskell allows for parallel function composition, which amounts to the same thing, giving you implicit parallelisation.

You already think in terms of
pipelines - how about "gzcat
foo.tar.gz | tar xf -"? You may not
have known it, but the shell is
running the unzip and untar in
parallel - the stdin read in tar just
blocks until data is sent to stdout by
gzcat.

Well a lot of tasks can be expressed
in terms of pipelines, and if you can
do that then getting some level of
parallelisation is simple with David
King's helper code (even across erlang
nodes, ie. machines):
pipeline:run([pipeline:generator(BigList),
          {filter,fun some_filter/1},
          {map,fun_some_map/1},
          {generic,fun some_complex_function/2},
          fun some_more_complicated_function/1,
          fun pipeline:collect/1]).
So basically what he's doing here is
making a list of the steps - each step
being implemented in a fun that
accepts as input whatever the previous
step outputs (the funs can even be
defined inline of course). Go check
out David's blog entry for the
code and more detailed explanation.

store intermediary results in pipe

Pipe chains starting with . %>% will build functional sequences, the content of . is not evaluated.

If you use (.) %>% you'll get the behavior you expected.

library(magrittr)
a <- 1:5 

b2 <- a %>% exp %T>%
{ a.mean2 <<- (.) %>% sqrt %>% mean } %T>%
{ a.sd2 <<- (.) %>% sqrt %>% sd } %>%
  round(2)

b2
#> [1]   2.72   7.39  20.09  54.60 148.41

a.mean2 
#> [1] 5.684048
a.sd2
#> [1] 4.232675

^{Created on 2019-03-02 by the reprex package (v0.2.1)}

R: use magrittr pipe operator in self written package

It should have worked correctly if you had magrittr listed in Depends. However, this is not advised. Instead, you leave magrittr in Imports and add the following line to NAMESPACE:

importFrom(magrittr,"%>%")

I suggest reading Writing R extensions. Your question is covered in paragraphs 1.1.3 and 1.5.1.

Functional pipes in python like % % from R's magrittr

One possible way of doing this is by using a module called macropy. Macropy allows you to apply transformations to the code that you have written. Thus a | b can be transformed to b(a). This has a number of advantages and disadvantages.

In comparison to the solution mentioned by Sylvain Leroux, The main advantage is that you do not need to create infix objects for the functions you are interested in using -- just mark the areas of code that you intend to use the transformation. Secondly, since the transformation is applied at compile time, rather than runtime, the transformed code suffers no overhead during runtime -- all the work is done when the byte code is first produced from the source code.

The main disadvantages are that macropy requires a certain way to be activated for it to work (mentioned later). In contrast to a faster runtime, the parsing of the source code is more computationally complex and so the program will take longer to start. Finally, it adds a syntactic style that means programmers who are not familiar with macropy may find your code harder to understand.

Example Code:

run.py

import macropy.activate 
# Activates macropy, modules using macropy cannot be imported before this statement
# in the program.
import target
# import the module using macropy

target.py

from fpipe import macros, fpipe
from macropy.quick_lambda import macros, f
# The `from module import macros, ...` must be used for macropy to know which 
# macros it should apply to your code.
# Here two macros have been imported `fpipe`, which does what you want
# and `f` which provides a quicker way to write lambdas.

from math import sqrt

# Using the fpipe macro in a single expression.
# The code between the square braces is interpreted as - str(sqrt(12))
print fpipe[12 | sqrt | str] # prints 3.46410161514

# using a decorator
# All code within the function is examined for `x | y` constructs.
x = 1 # global variable
@fpipe
def sum_range_then_square():
    "expected value (1 + 2 + 3)**2 -> 36"
    y = 4 # local variable
    return range(x, y) | sum | f[_**2]
    # `f[_**2]` is macropy syntax for -- `lambda x: x**2`, which would also work here

print sum_range_then_square() # prints 36

# using a with block.
# same as a decorator, but for limited blocks.
with fpipe:
    print range(4) | sum # prints 6
    print 'a b c' | f[_.split()] # prints ['a', 'b', 'c']

And finally the module that does the hard work. I've called it fpipe for functional pipe as its emulating shell syntax for passing output from one process to another.

fpipe.py

from macropy.core.macros import *
from macropy.core.quotes import macros, q, ast

macros = Macros()

@macros.decorator
@macros.block
@macros.expr
def fpipe(tree, **kw):

    @Walker
    def pipe_search(tree, stop, **kw):
        """Search code for bitwise or operators and transform `a | b` to `b(a)`."""
        if isinstance(tree, BinOp) and isinstance(tree.op, BitOr):
            operand = tree.left
            function = tree.right
            newtree = q[ast[function](ast[operand])]
            return newtree

    return pipe_search.recurse(tree)

What does % % function mean in R?

%...% operators

%>% has no builtin meaning but the user (or a package) is free to define operators of the form %whatever% in any way they like. For example, this function will return a string consisting of its left argument followed by a comma and space and then it's right argument.

"%,%" <- function(x, y) paste0(x, ", ", y)

# test run

"Hello" %,% "World"
## [1] "Hello, World"

The base of R provides %*% (matrix mulitiplication), %/% (integer division), %in% (is lhs a component of the rhs?), %o% (outer product) and %x% (kronecker product). It is not clear whether %% falls in this category or not but it represents modulo.

expm The R package, expm, defines a matrix power operator %^%. For an example see Matrix power in R .

operators The operators R package has defined a large number of such operators such as %!in% (for not %in%). See http://cran.r-project.org/web/packages/operators/operators.pdf

igraph This package defines %--% , %->% and %<-% to select edges.

lubridate This package defines %m+% and %m-% to add and subtract months and %--% to define an interval. igraph also defines %--% .

Pipes

magrittr In the case of %>% the magrittr R package has defined it as discussed in the magrittr vignette. See http://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html

magittr has also defined a number of other such operators too. See the Additional Pipe Operators section of the prior link which discusses %T>%, %<>% and %$% and http://cran.r-project.org/web/packages/magrittr/magrittr.pdf for even more details.

dplyr The dplyr R package used to define a %.% operator which is similar; however, it has been deprecated and dplyr now recommends that users use %>% which dplyr imports from magrittr and makes available to the dplyr user. As David Arenburg has mentioned in the comments this SO question discusses the differences between it and magrittr's %>% : Differences between %.% (dplyr) and %>% (magrittr)

pipeR The R package, pipeR, defines a %>>% operator that is similar to magrittr's %>% and can be used as an alternative to it. See http://renkun.me/pipeR-tutorial/

The pipeR package also has defined a number of other such operators too. See: http://cran.r-project.org/web/packages/pipeR/pipeR.pdf

postlogic The postlogic package defined %if% and %unless% operators.

wrapr The R package, wrapr, defines a dot pipe %.>% that is an explicit version of %>% in that it does not do implicit insertion of arguments but only substitutes explicit uses of dot on the right hand side. This can be considered as another alternative to %>%. See https://winvector.github.io/wrapr/articles/dot_pipe.html

Bizarro pipe. This is not really a pipe but rather some clever base syntax to work in a way similar to pipes without actually using pipes. It is discussed in http://www.win-vector.com/blog/2017/01/using-the-bizarro-pipe-to-debug-magrittr-pipelines-in-r/ The idea is that instead of writing:

1:8 %>% sum %>% sqrt
## [1] 6

one writes the following. In this case we explicitly use dot rather than eliding the dot argument and end each component of the pipeline with an assignment to the variable whose name is dot (.) . We follow that with a semicolon.

1:8 ->.; sum(.) ->.; sqrt(.)
## [1] 6

Update Added info on expm package and simplified example at top. Added postlogic package.

Update 2 The development version of R has defined a |> pipe. Unlike magrittr's %>% it can only substitute into the first argument of the right hand side. Although limited, it works via syntax transformation so it has no performance impact.

R combinations with dot ( . ), ~ , and pipe (% %) operator

That line uses the . in three different ways.

         [1]             [2]      [3]
aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2))

Generally speaking you pass in the value from the pipe into your function at a specific location with . but there are some exceptions. One exception is when the . is in a formula. The ~ is used to create formulas in R. The pipe wont change the meaning of the formula, so it behaves like it would without any escaping. For example

aggregate(. ~ cyl, data=mydata)

And that's just because aggregate requires a formula with both a left and right hand side. So the . at [1] just means "all the other columns in the dataset." This use is not at all related to magrittr.

The . at [2] is the value that's being passed in as the pipe. If you have a plain . as a parameter to the function, that's there the value will be placed. So the result of the subset() will go to the data= parameter.

The magrittr library also allows you to define anonymous functions with the . variable. If you have a chain that starts with a ., it's treated like a function. so

. %>% mean %>% round(2)

is the same as

function(x) round(mean(x), 2)

so you're just creating a custom function with the . at [3]

Should I Avoid Programming Packages with Pipe Operators