R Pipelining Functions

R Pipelining functions

We can use Compose from the functional package to create our own binary operator that does something similar to what you want

# Define our helper functions
square <- function(x){x^2}
add5 <- function(x){x + 5}

# functional contains Compose
library(functional)

# Define our binary operator
"%|>%" <- Compose

# Create our complexFunction by 'piping' our functions
complexFunction <- square %|>% add5 %|>% as.character
complexFunction(1:5)
#[1] "6"  "9"  "14" "21" "30"

# previously had this until flodel pointed out
# that the above was sufficient
#"%|>%" <- function(fun1, fun2){ Compose(fun1, fun2) }

I guess we could technically do this without requiring the functional package - but it feels so right using Compose for this task.

"%|>%" <- function(fun1, fun2){
    function(x){fun2(fun1(x))}
}
complexFunction <- square %|>% add5 %|>% as.character
complexFunction(1:5)
#[1] "6"  "9"  "14" "21" "30"

applying custom function in R pipeline

Store the output in a list for each ppt and use unnest_wider to get separate columns.

library(dplyr)
library(tidyr)

data %>%
  group_by(ppt) %>%
  summarise(Pc = mean(correct),
            VRT = var(key_rt[correct==1]),
            MRT = mean(key_rt[correct==1]), 
            out = list(EZ(Pc, VRT, MRT, n()))) %>%
  unnest_wider(out)

#    ppt    Pc    VRT   MRT      v     a     Ter
#  <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>   <dbl>
#1     1   0.6 0.0399 0.500  0.407 0.997  0.255 
#2     2   1   0.130  0.391  1.36  1.61  -0.0840
#3     3   0.4 0.282  0.574 -0.249 1.63  -0.0775

I am using n = n() indicating number of rows in the group. I am guessing this is what it is supposed to mean here.

Piping into `if` returns the pipeline without evaluating it

The syntax . %>% round(.) means function(.) round(.). Any time dot starts a pipeline it defines a function rather than being an ordinary pipeline. Put parentheses around the dot to prevent the dot from starting the inner pipeline.

 maybe_round = function(x, round = TRUE){
   x %>%
     `if`(
       round,
       (.) %>% round(),
       .
     )
 }

maybe_round(5.3, TRUE)
## [1] 5

Another possibility is to just leave it as a function and then evaluate that function at the outer dot like this:

 maybe_round = function(x, round = TRUE){
   x %>%
     `if`(
       round,
       (. %>% round())(.),
       .
     )
 }

R Function to import data set and pipeline create variables based on field name/existence

Solution

Simply use regex to change the column names:

temp_set <- read_table(input_path)

names(temp_set) <- gsub(x = names(temp_set), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX")

Or equivalently with `names<-`() in the dplyr workflow:

temp_set <- read_table(input_path) %>%
  `names<-`(gsub(x = names(.), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX"))

Regex

The pattern = "^(.+)(\\d{4,4})$" breaks each name into two capturing groups:

Any prefix of positive length: .+
Some year comprised of 4 digits: \\d{4,4}

The replacement = "\\1XXXX" then prepends the first group (\\1) to the code (XXXX); so the code essentially "replaces" the year.

Example

Here are two possible cases, where the MSAXXXX column starts as MSA2003 and as MSA2013 respectively:

case_1 <- data.frame(
  MSA2003 = c(41929, 33820, 27642, 88111),
  var2019 = c(41929, 33820, 27642, 88111),
  other_var = 1:4
)
case_1
#>   MSA2003 var2019 other_var
#> 1   41929   41929         1
#> 2   33820   33820         2
#> 3   27642   27642         3
#> 4   88111   88111         4

case_2 <- data.frame(
  MSA2013 = c(41929, 33820, 27642, 88111),
  var2009 = c(41929, 33820, 27642, 88111),
  other_var = 1:4
)
case_2
#>   MSA2013 var2009 other_var
#> 1   41929   41929         1
#> 2   33820   33820         2
#> 3   27642   27642         3
#> 4   88111   88111         4

Notice how the solution standardizes all variables with years in their names, yet leaves the other variables untouched:

library(dplyr)

case_1 %>%
  `names<-`(gsub(x = names(.), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX"))
#>   MSAXXXX varXXXX other_var
#> 1   41929   41929         1
#> 2   33820   33820         2
#> 3   27642   27642         3
#> 4   88111   88111         4

case_2 %>%
  `names<-`(gsub(x = names(.), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX"))
#>   MSAXXXX varXXXX other_var
#> 1   41929   41929         1
#> 2   33820   33820         2
#> 3   27642   27642         3
#> 4   88111   88111         4

R Pipelining with Anonymous Functions

Using Compose, and calling the resulting function gives this:

"%|>%" <- function(...) Compose(...)()

Now get rid of the 'x' as the final "function" (replaced with an actual function, that is not needed but here for example):

anonymousPipelineTest <-
     function(x){x^2} %|>%
     function(x){x+5} %|>% function(x){x}
anonymousPipelineTest(1:10)

[1]   6   9  14  21  30  41  54  69  86 105

How do you call a function that takes no inputs within a pipeline?

With the ‘magrittr’ pipe operator you can put an operand inside {…} to prevent automatic argument substitution:

c(1,3,5) %>% {ls()} %>% mean()
# NA
# Warning message:
# In mean.default(.) : argument is not numeric or logical: returning NA

… but of course this serves no useful purpose.

Incidentally, ls() inside a pipeline is executed in its own environment rather than the calling environment so its use here is even less useful. But a different function that returned a sensible value could be used, e.g.:

c(1,3,5) %>% {rnorm(10)} %>% mean()
# [1] -0.01068046

Or, if you intended for the left-hand side to be passed on, skipping the intermediate ls(), you could do the following:

c(1,3,5) %>% {ls(); .} %>% mean()
# [1] 3

… again, using ls() here won’t be meaningful but some other function that has a side-effect would work.

dplyr pipeline in a function

With rlang_0.40, we can do this much easier by using the {{...}} or curly-curly operator

library(rlang)
library(dplyr)
n_occurences <- function(df, n1, column){

 df %>%
   filter(!is.na({{column}}))  %>%
    count(ID) %>% 
    filter(n > n1) %>%
    pull(ID)

 }     

n_occurences(dat, n1 = 3, column = B)
#[1] Y
#Levels: X Y

If we intend to pass a quoted string, convert it to symbol (sym) and then do the evaluation (!!)

n_occurences <- function(df, n1, column){

  column <- rlang::sym(column)
 df %>%
       filter(!is.na(!!column))  %>%
       count(ID) %>% 
       filter(n > n1) %>%
       pull(ID)

}

col <- 'B'
n_occurences(dat, n1=3, column = col)
#[1] Y
#Levels: X Y

R using which function after group_by

You could use

library(dplyr)

data %>%
  group_by(a,b,c) %>%
  filter(
    d > quantile(d, na.rm = TRUE)[4] + 1.5 * IQR(d, na.rm = TRUE) | 
      d < quantile(d, na.rm = TRUE)[4] - 1.5 * IQR(d, na.rm = TRUE))

This returns you

# A tibble: 2,464 x 5
    ...1 a         d b         c
   <dbl> <chr> <dbl> <chr> <dbl>
 1 10533 gas   321.  CAISO  2011
 2 10534 gas    51.8 CAISO  2012
 3 15067 gas    52.6 CAISO  2013
 4 25890 oil    51.0 ISONE  2010
 5 26485 gas   416.  PJM    2008
 6 26489 gas   468.  PJM    2012
 7 38153 gas   Inf   SPP    2014
 8 38154 gas   Inf   SPP    2015
 9 38155 gas    67.4 SPP    2016
10 38156 gas    58.8 SPP    2017
# ... with 2,454 more rows

R: combine several apply() function in a pipe

If you break this down step by step it will be easier to understand. The output after getting minimum value in each column is :

library(dplyr)
library(palmerpenguins)

penguins_raw %>%
  select_if(is.numeric) %>%
  apply(2, min, na.rm = TRUE)

# Sample Number  Culmen Length (mm)   Culmen Depth (mm) Flipper Length (mm) 
#          1.00               32.10               13.10              172.00 

#      Body Mass (g)   Delta 15 N (o/oo)   Delta 13 C (o/oo) 
#            2700.00                7.63              -27.02

Now to this output you are applying the function apply(2, max, na.rm = TRUE) which is not what you want because you want to get max for each column from penguins_raw dataset and not from the above output.

If you are using pipes and dplyr function there are dedicated functions to perform such calculation. In this case you can use across.

penguins_raw %>%
  summarise(across(where(is.numeric), list(min = ~min(., na.rm = TRUE), 
                                           max = ~max(., na.rm = TRUE))))

Or if you are on older version of dplyr use summarise_if as :

penguins_raw %>%
  summarise_if(is.numeric, list(min = ~min(., na.rm = TRUE), 
                                           max = ~max(., na.rm = TRUE)))

To get data into 3-column format we can use pivot_longer.

penguins_raw %>%
  summarise(across(where(is.numeric), list(min = ~min(., na.rm = TRUE), 
                                           max = ~max(., na.rm = TRUE)))) %>%
  pivot_longer(cols = everything(), 
               names_to = c('name', '.value'), 
               names_sep = '_')

#  name                    min    max
#  <chr>                 <dbl>  <dbl>
#1 Sample Number          1     152  
#2 Culmen Length (mm)    32.1    59.6
#3 Culmen Depth (mm)     13.1    21.5
#4 Flipper Length (mm)  172     231  
#5 Body Mass (g)       2700    6300  
#6 Delta 15 N (o/oo)      7.63   10.0
#7 Delta 13 C (o/oo)    -27.0   -23.8

Turn string into the contents of a function in R for pipeline

I don't think there's a way to write the unstring function you want, but you can do this:

makeContrastsFromString <- function(s)
  eval(parse(text = paste("makeContrasts(", s, ")")))

then

makeContrastsFromString(aarts_2)

should give you want you want. I haven't tested it, since I can't install limma to get makeContrasts. My function is pretty fragile; if a user breaks up the lines into separate elements of a string vector, it won't work. I'll leave it to you to make it robust against that kind of thing.

R Pipelining Functions