R: Further Subset a Selection Using the Pipe %>% and Placeholder

R: Further subset a selection using the pipe %% and placeholder

Since you're going from a bunch of data into one (row of) value(s), you're summarizing. in a dplyr pipeline you can then use the summarize function, within the summarize function you don't need to subset and can just call pre and post

Like so:

dat %>% select(pre, post) %>% summarize(CD = cohensD(pre, post)) 

(The select statement isn't actually necessary in this case, but I left it in to show how this works in a pipeline)

How to subset a dataframe in r using placeholder

Use starts_with().

library(dplyr)
dat %>% select(-starts_with("x"))

There are other functions like this (ends_with, matches, contains, one_of). And if everything else fails, you can always use regular expressions and base R:

dat <- dat[ , !grepl("^x", colnames(dat)) ]

Explanation: grepl returns a logical vector. The regular expression "^x" matches anything that starts with an x. This is matched against the column names of dat. We negate the logical vector with the bang (!) and thus select everything that does not match our regex.

R Conditional evaluation when using the pipe operator %%

Here is a quick example that takes advantage of the . and ifelse:

X<-1
Y<-T

X %>% add(1) %>% { ifelse(Y ,add(.,1), . ) }

In the ifelse, if Y is TRUE if will add 1, otherwise it will just return the last value of X. The . is a stand-in which tells the function where the output from the previous step of the chain goes, so I can use it on both branches.

Edit
As @BenBolker pointed out, you might not want ifelse, so here is an if version.

X %>% 
add(1) %>%
{if(Y) add(.,1) else .}

Thanks to @Frank for pointing out that I should use { braces around my if and ifelse statements to continue the chain.

How to pipe purely in base R ('base pipe')?

In R |> is used as a pipe operator. (Since 4.1.0)

The left-hand side expression lhs is inserted as the first free argument in the call of to the right-hand side expression rhs.

mtcars |> head()                      # same as head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

mtcars |> head(2) # same as head(mtcars, 2)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4

It is also possible to use a named argument with the placeholder _ in the rhs call to specify where the lhs is to be inserted. The placeholder can only appear once on the rhs. (Since 4.2.0)

mtcars |> lm(mpg ~ disp, data = _)
#mtcars |> lm(mpg ~ disp, _) #Error: pipe placeholder can only be used as a named argument
#Call:
#lm(formula = mpg ~ disp, data = mtcars)
#
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122

Alternatively explicitly name the argument(s) before the "one":

mtcars |> lm(formula = mpg ~ disp)

In case the placeholder is used more than once or used as a named or also unnamed argument on any position or for disabled functions: Use an (anonymous) function.

mtcars |> (\(.) .[.$cyl == 6,])()
#mtcars ->.; .[.$cyl == 6,] # Alternative using bizarro pipe
#local(mtcars ->.; .[.$cyl == 6,]) # Without overwriting and keeping .
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6

mtcars |> (\(.) lm(mpg ~ disp, .))()
#Call:
#lm(formula = mpg ~ disp, data = .)
#
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122

1:3 |> setNames(object = _, nm = _)
#Error in setNames(object = "_", nm = "_") :
# pipe placeholder may only appear once
1:3 |> (\(.) setNames(., .))()
#1 2 3
#1 2 3

1:3 |> list() |> setNames(".") |> with(setNames(., .))
#1 2 3
#1 2 3

#The same but over a function
._ <- \(data, expr, ...) {
eval(substitute(expr), list(. = data), enclos = parent.frame())
}
1:3 |> ._(setNames(., .))
#1 2 3
#1 2 3

Some function are disabled.

mtcars |> `$`(cyl)
#Error: function '$' not supported in RHS call of a pipe

But some still can be called by placing them in brakes, call them via the function ::, call it in a function or define a link to the function.

mtcars |> (`$`)(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

mtcars |> base::`$`(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

mtcars |> (\(.) .$cyl)()
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

fun <- `$`
mtcars |> fun(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

An expression written as x |> f(y) is parsed as f(x, y). While the code in a pipeline is written sequentially, regular R semantics for evaluation apply. So piped expressions will be evaluated only when first used in the rhs expression.

-1 |> sqrt() |> (\(x) 0)()
#[1] 0

. <- -1
. <- sqrt(.)
#Warning message:
#In sqrt(.) : NaNs produced
(\(x) 0)(.)
#[1] 0

x <- data.frame(a=0)
f1 <- \(x) {message("IN 1"); x$b <- 1; message("OUT 1"); x}
f2 <- \(x) {message("IN 2"); x$c <- 2; message("OUT 2"); x}

x|> f1() |> f2()
#IN 2
#IN 1
#OUT 1
#OUT 2
# a b c
#1 0 1 2

f2(f1(x))
#IN 2
#IN 1
#OUT 1
#OUT 2
# a b c
#1 0 1 2

. <- x
. <- f1(.)
#IN 1
#OUT 1
f2(.)
#IN 2
#OUT 2
# a b c
#1 0 1 2

%% .$column_name equivalent for R base pipe |

We can use getElement().

iris |> getElement('Sepal.Length') |> cut(5)

dplyr: group_by & mutate variable can't call mean/sd functions

You want the baseR function with().

mtcars %>% 
group_by(cyl) %>%
mutate(group_pct = hp / sum(hp)) %>%
with(paste0('Words: ', mean(group_pct)))

[1] "Words: 0.09375"

The issue with your original attempt is that group_pct is not defined in the global environment, so you get the error message, when it can't locate it in the lookup.

with is the syntactic sugar that tells R to evaluate the paste0() function call within the environment of the data frame being passed by pipe. So it finds group_pct and returns your expected result.

Using table() in dplyr chain

This behavior is by design: https://github.com/tidyverse/magrittr/blob/00a1fe3305a4914d7c9714fba78fd5f03f70f51e/README.md#re-using-the-placeholder-for-attributes

Since you don't have a . on it's own, the tibble is still being passed as the first parameter so it's really more like

... %>% table(., .$type, .$colour)

The official magrittr work-around is to use curly braces

... %>% {table(.$type, .$colour)}

R combinations with dot (.), ~, and pipe (%%) operator

That line uses the . in three different ways.

         [1]             [2]      [3]
aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2))

Generally speaking you pass in the value from the pipe into your function at a specific location with . but there are some exceptions. One exception is when the . is in a formula. The ~ is used to create formulas in R. The pipe wont change the meaning of the formula, so it behaves like it would without any escaping. For example

aggregate(. ~ cyl, data=mydata)

And that's just because aggregate requires a formula with both a left and right hand side. So the . at [1] just means "all the other columns in the dataset." This use is not at all related to magrittr.

The . at [2] is the value that's being passed in as the pipe. If you have a plain . as a parameter to the function, that's there the value will be placed. So the result of the subset() will go to the data= parameter.

The magrittr library also allows you to define anonymous functions with the . variable. If you have a chain that starts with a ., it's treated like a function. so

. %>% mean %>% round(2)

is the same as

function(x) round(mean(x), 2)

so you're just creating a custom function with the . at [3]

How to pipe an output tibble into further calculations without saving the tibble as a separate object in R?

In your caase, you can further manipulate the tibble you have generated using dplyr functions.

Note the existence of mutate_at and summarize_at, that lets you transform a set of columns with the option to select them by column position.

This, using . as a placeholder for the tibble you are currently manipulating, and calling an anonymous function inside mutate_at, will give you the result you expect.

sr_df %>%
group_by(ResolutionViolated) %>%
tally() %>%
arrange(desc(n)) %>%
mutate(total = sum(n)) %>%
mutate_at(.cols = c(1, 2),
.funs = function(column) round(column / .$total * 100, digits = 2))


Related Topics



Leave a reply



Submit