R Pipelining functions
We can use Compose
from the functional package to create our own binary operator that does something similar to what you want
# Define our helper functions
square <- function(x){x^2}
add5 <- function(x){x + 5}
# functional contains Compose
library(functional)
# Define our binary operator
"%|>%" <- Compose
# Create our complexFunction by 'piping' our functions
complexFunction <- square %|>% add5 %|>% as.character
complexFunction(1:5)
#[1] "6" "9" "14" "21" "30"
# previously had this until flodel pointed out
# that the above was sufficient
#"%|>%" <- function(fun1, fun2){ Compose(fun1, fun2) }
I guess we could technically do this without requiring the functional package - but it feels so right using Compose
for this task.
"%|>%" <- function(fun1, fun2){
function(x){fun2(fun1(x))}
}
complexFunction <- square %|>% add5 %|>% as.character
complexFunction(1:5)
#[1] "6" "9" "14" "21" "30"
applying custom function in R pipeline
Store the output in a list for each ppt
and use unnest_wider
to get separate columns.
library(dplyr)
library(tidyr)
data %>%
group_by(ppt) %>%
summarise(Pc = mean(correct),
VRT = var(key_rt[correct==1]),
MRT = mean(key_rt[correct==1]),
out = list(EZ(Pc, VRT, MRT, n()))) %>%
unnest_wider(out)
# ppt Pc VRT MRT v a Ter
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 0.6 0.0399 0.500 0.407 0.997 0.255
#2 2 1 0.130 0.391 1.36 1.61 -0.0840
#3 3 0.4 0.282 0.574 -0.249 1.63 -0.0775
I am using n = n()
indicating number of rows in the group. I am guessing this is what it is supposed to mean here.
Piping into `if` returns the pipeline without evaluating it
The syntax . %>% round(.)
means function(.) round(.)
. Any time dot starts a pipeline it defines a function rather than being an ordinary pipeline. Put parentheses around the dot to prevent the dot from starting the inner pipeline.
maybe_round = function(x, round = TRUE){
x %>%
`if`(
round,
(.) %>% round(),
.
)
}
maybe_round(5.3, TRUE)
## [1] 5
Another possibility is to just leave it as a function and then evaluate that function at the outer dot like this:
maybe_round = function(x, round = TRUE){
x %>%
`if`(
round,
(. %>% round())(.),
.
)
}
R Function to import data set and pipeline create variables based on field name/existence
Solution
Simply use regex to change the column names:
temp_set <- read_table(input_path)
names(temp_set) <- gsub(x = names(temp_set), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX")
Or equivalently with `names<-`()
in the dplyr
workflow:
temp_set <- read_table(input_path) %>%
`names<-`(gsub(x = names(.), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX"))
Regex
The pattern = "^(.+)(\\d{4,4})$"
breaks each name into two capturing groups:
- Any prefix of positive length:
.+
- Some year comprised of
4
digits:\\d{4,4}
The replacement = "\\1XXXX"
then prepends the first group (\\1
) to the code (XXXX
); so the code essentially "replaces" the year.
Example
Here are two possible cases, where the MSAXXXX
column starts as MSA2003
and as MSA2013
respectively:
case_1 <- data.frame(
MSA2003 = c(41929, 33820, 27642, 88111),
var2019 = c(41929, 33820, 27642, 88111),
other_var = 1:4
)
case_1
#> MSA2003 var2019 other_var
#> 1 41929 41929 1
#> 2 33820 33820 2
#> 3 27642 27642 3
#> 4 88111 88111 4
case_2 <- data.frame(
MSA2013 = c(41929, 33820, 27642, 88111),
var2009 = c(41929, 33820, 27642, 88111),
other_var = 1:4
)
case_2
#> MSA2013 var2009 other_var
#> 1 41929 41929 1
#> 2 33820 33820 2
#> 3 27642 27642 3
#> 4 88111 88111 4
Notice how the solution standardizes all variables with years in their names, yet leaves the other variables untouched:
library(dplyr)
case_1 %>%
`names<-`(gsub(x = names(.), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX"))
#> MSAXXXX varXXXX other_var
#> 1 41929 41929 1
#> 2 33820 33820 2
#> 3 27642 27642 3
#> 4 88111 88111 4
case_2 %>%
`names<-`(gsub(x = names(.), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX"))
#> MSAXXXX varXXXX other_var
#> 1 41929 41929 1
#> 2 33820 33820 2
#> 3 27642 27642 3
#> 4 88111 88111 4
R Pipelining with Anonymous Functions
Using Compose, and calling the resulting function gives this:
"%|>%" <- function(...) Compose(...)()
Now get rid of the 'x' as the final "function" (replaced with an actual function, that is not needed but here for example):
anonymousPipelineTest <-
function(x){x^2} %|>%
function(x){x+5} %|>% function(x){x}
anonymousPipelineTest(1:10)
[1] 6 9 14 21 30 41 54 69 86 105
How do you call a function that takes no inputs within a pipeline?
With the ‘magrittr’ pipe operator you can put an operand inside {…}
to prevent automatic argument substitution:
c(1,3,5) %>% {ls()} %>% mean()
# NA
# Warning message:
# In mean.default(.) : argument is not numeric or logical: returning NA
… but of course this serves no useful purpose.
Incidentally, ls()
inside a pipeline is executed in its own environment rather than the calling environment so its use here is even less useful. But a different function that returned a sensible value could be used, e.g.:
c(1,3,5) %>% {rnorm(10)} %>% mean()
# [1] -0.01068046
Or, if you intended for the left-hand side to be passed on, skipping the intermediate ls()
, you could do the following:
c(1,3,5) %>% {ls(); .} %>% mean()
# [1] 3
… again, using ls()
here won’t be meaningful but some other function that has a side-effect would work.
dplyr pipeline in a function
With rlang_0.40
, we can do this much easier by using the {{...}}
or curly-curly operator
library(rlang)
library(dplyr)
n_occurences <- function(df, n1, column){
df %>%
filter(!is.na({{column}})) %>%
count(ID) %>%
filter(n > n1) %>%
pull(ID)
}
n_occurences(dat, n1 = 3, column = B)
#[1] Y
#Levels: X Y
If we intend to pass a quoted string, convert it to symbol (sym
) and then do the evaluation (!!
)
n_occurences <- function(df, n1, column){
column <- rlang::sym(column)
df %>%
filter(!is.na(!!column)) %>%
count(ID) %>%
filter(n > n1) %>%
pull(ID)
}
col <- 'B'
n_occurences(dat, n1=3, column = col)
#[1] Y
#Levels: X Y
R using which function after group_by
You could use
library(dplyr)
data %>%
group_by(a,b,c) %>%
filter(
d > quantile(d, na.rm = TRUE)[4] + 1.5 * IQR(d, na.rm = TRUE) |
d < quantile(d, na.rm = TRUE)[4] - 1.5 * IQR(d, na.rm = TRUE))
This returns you
# A tibble: 2,464 x 5
...1 a d b c
<dbl> <chr> <dbl> <chr> <dbl>
1 10533 gas 321. CAISO 2011
2 10534 gas 51.8 CAISO 2012
3 15067 gas 52.6 CAISO 2013
4 25890 oil 51.0 ISONE 2010
5 26485 gas 416. PJM 2008
6 26489 gas 468. PJM 2012
7 38153 gas Inf SPP 2014
8 38154 gas Inf SPP 2015
9 38155 gas 67.4 SPP 2016
10 38156 gas 58.8 SPP 2017
# ... with 2,454 more rows
R: combine several apply() function in a pipe
If you break this down step by step it will be easier to understand. The output after getting minimum value in each column is :
library(dplyr)
library(palmerpenguins)
penguins_raw %>%
select_if(is.numeric) %>%
apply(2, min, na.rm = TRUE)
# Sample Number Culmen Length (mm) Culmen Depth (mm) Flipper Length (mm)
# 1.00 32.10 13.10 172.00
# Body Mass (g) Delta 15 N (o/oo) Delta 13 C (o/oo)
# 2700.00 7.63 -27.02
Now to this output you are applying the function apply(2, max, na.rm = TRUE)
which is not what you want because you want to get max
for each column from penguins_raw
dataset and not from the above output.
If you are using pipes and dplyr
function there are dedicated functions to perform such calculation. In this case you can use across
.
penguins_raw %>%
summarise(across(where(is.numeric), list(min = ~min(., na.rm = TRUE),
max = ~max(., na.rm = TRUE))))
Or if you are on older version of dplyr
use summarise_if
as :
penguins_raw %>%
summarise_if(is.numeric, list(min = ~min(., na.rm = TRUE),
max = ~max(., na.rm = TRUE)))
To get data into 3-column format we can use pivot_longer
.
penguins_raw %>%
summarise(across(where(is.numeric), list(min = ~min(., na.rm = TRUE),
max = ~max(., na.rm = TRUE)))) %>%
pivot_longer(cols = everything(),
names_to = c('name', '.value'),
names_sep = '_')
# name min max
# <chr> <dbl> <dbl>
#1 Sample Number 1 152
#2 Culmen Length (mm) 32.1 59.6
#3 Culmen Depth (mm) 13.1 21.5
#4 Flipper Length (mm) 172 231
#5 Body Mass (g) 2700 6300
#6 Delta 15 N (o/oo) 7.63 10.0
#7 Delta 13 C (o/oo) -27.0 -23.8
Turn string into the contents of a function in R for pipeline
I don't think there's a way to write the unstring
function you want, but you can do this:
makeContrastsFromString <- function(s)
eval(parse(text = paste("makeContrasts(", s, ")")))
then
makeContrastsFromString(aarts_2)
should give you want you want. I haven't tested it, since I can't install limma
to get makeContrasts
. My function is pretty fragile; if a user breaks up the lines into separate elements of a string vector, it won't work. I'll leave it to you to make it robust against that kind of thing.
Related Topics
How Do We Plot Images at Given Coordinates in R
Using Functions and Environments
Importing Multiple Excel Files with Filenames in R
Locator Equivalent in Ggplot2 (For Maps)
Ggplot2 Add a Legend for Several Stat_Functions
R: Replacing Foreign Characters in a String
Ggplot2': Label Values of Barplot That Uses 'Fun.Y="Mean"' of 'Stat_Summary'
R Output Without [1], How to Nicely Format
Displaying Image on Point Hover in Plotly
Extracting Orthogonal Polynomial Coefficients from R's Poly() Function
Boxplot, How to Match Outliers' Color to Fill Aesthetics
Remove Certain Legend Variables and Legend Values from Ggplot2
Replace Nan Values in a List with Zero (0)
Extracting Data Used to Make a Smooth Plot in Mgcv
Change a Column from Birth Date to Age in R
Error When Plotting Sf Object --- Error: Could Not Find Function "Geom_Sf"