Use pipe operator % % with replacement functions like colnames() -
You could use colnames<-
or setNames
(thanks to @David Arenburg)
group_by(mtcars, cyl) %>%
summarise(mean(disp), mean(hp)) %>%
`colnames<-`(c("cyl", "disp_mean", "hp_mean"))
# or
# `names<-`(c("cyl", "disp_mean", "hp_mean"))
# setNames(., c("cyl", "disp_mean", "hp_mean"))
# cyl disp_mean hp_mean
# 1 4 105.1364 82.63636
# 2 6 183.3143 122.28571
# 3 8 353.1000 209.21429
Or pick an Alias
(set_colnames
) from magrittr
:
library(magrittr)
group_by(mtcars, cyl) %>%
summarise(mean(disp), mean(hp)) %>%
set_colnames(c("cyl", "disp_mean", "hp_mean"))
dplyr::rename
may be more convenient if you are only (re)naming a few out of many columns (it requires writing both the old and the new name; see @Richard Scriven's answer)
Transform data to data.frame with the pipe operator
After the t
ranspose, convert to tibble
with as_tibble
and change the column names with set_names
library(dplyr)
library(tibble)
x %>%
t %>%
as_tibble(.name_repair = "unique") %>%
setNames(c("a", "b"))
# A tibble: 1 x 2
# a b
# <int> <int>
#1 1 2
Or another option if we want to use the OP's syntax would be to wrap the code with {}
x %>%
{data.frame(a = .[1], b = .[2])}
Update row names of a data frame in a pipe (% %)
Using the `rownames<-`()
assignment function.
library(magrittr)
d %>% `rownames<-`(NULL)
# X1 X2 X3 X4
# 1 1 4 7 10
# 2 2 5 8 11
# 3 3 6 9 12
Data:
d <- structure(list(X1 = 1:3, X2 = 4:6, X3 = 7:9, X4 = 10:12), class = "data.frame", row.names = c("a",
"b", "c"))
How to use the R pipe operator (% %) in the following cases
If myvar is just a variable floating around in the environmnet, you can use an if else statement within mutate (similar question here)
library(dplyr)
# Generate dataset
df <- tibble(oldColumn = rnorm(100))
# Mutate with if-else conditions
df <- df %>% mutate(newColumn = if(myvar == "A") oldColumn else if(myvar=="B") oldColumn * 3)
If myvar is included as a column in the dataframe then you could can use case_when.
# Generate dataset
df <- tibble(myvar = sample(c("A", "B"), 100, replace = TRUE),
oldColumn = rnorm(100))
# Create a new column which depends on the value of myvar
df <- df %>%
mutate(newColumn = case_when(myvar == "A" ~ oldColumn*3,
myvar == "B" ~ oldColumn))
As for question 2, you can use mutate with "." operater which calls the left hand side (i.e. "df") in the right hand side of the function. Then you can filter down to the row with the minimum value of seconds (top_n statement using -1 as argument), and pull out the value for the numbers variable
# Generate data
df <- tibble(numbers = sample(1:60),
seconds = sample(1:60))
# Do computation
df <- df %>% mutate(newCol = numbers / top_n(.,-1,seconds) %>% pull(numbers))
R combinations with dot ( . ), ~ , and pipe (% %) operator
That line uses the .
in three different ways.
[1] [2] [3]
aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2))
Generally speaking you pass in the value from the pipe into your function at a specific location with .
but there are some exceptions. One exception is when the .
is in a formula. The ~
is used to create formulas in R. The pipe wont change the meaning of the formula, so it behaves like it would without any escaping. For example
aggregate(. ~ cyl, data=mydata)
And that's just because aggregate
requires a formula with both a left and right hand side. So the .
at [1]
just means "all the other columns in the dataset." This use is not at all related to magrittr.
The .
at [2]
is the value that's being passed in as the pipe. If you have a plain .
as a parameter to the function, that's there the value will be placed. So the result of the subset()
will go to the data=
parameter.
The magrittr
library also allows you to define anonymous functions with the .
variable. If you have a chain that starts with a .
, it's treated like a function. so
. %>% mean %>% round(2)
is the same as
function(x) round(mean(x), 2)
so you're just creating a custom function with the .
at [3]
Pipe operator using in R
You want to do:
`colnames<-`(c("Improved", "Hospitalized", "Death"))
Note the backticks and no spaces as colnames<-
is actually a function.
Access result later in pipe
This is due to R's lazy evaluation. It occurs even if pipes are not used. See code below. In that code the argument to n_excluded
is filter(n_before(iris), Species != 'setosa')
and at the point that rows
is used in the print
statement the argument has not been referenced from within n_excluded
so the entire argument will not have been evaluated and so rows
does not yet exist.
if (exists("rows")) rm(rows) # ensure rows does not exist
n_excluded(filter(n_before(iris), Species != 'setosa'))
## Error in h(simpleError(msg, call)) :
## error in evaluating the argument 'x' in selecting a method for function
## 'print': object 'rows' not found
To fix this
1) we can force x before the print
statement.
n_excluded = function(x) {
force(x)
print(rows - nrow(x))
return(x)
}
2) Alternately, we can use the magrittr sequential pipe which guarantees that legs are run in order. magrittr makes it available but does not provide an operator for it but we can assign it to an operator like this.
`%s>%` <- magrittr::pipe_eager_lexical
iris %>%
n_before() %>%
filter(Species != 'setosa') %s>% # note use of %s>% on this line
n_excluded()
The magrittr developer has stated that he will add it as an operator if there is sufficient demand for it so you might want to add such request to magrittr issue #247 on github.
How can I write this R expression in the pipe operator format?
It would be good practice to make a reproducible example, with dummy data like this:
height <- seq(1:30)
weight <- seq(1:30)
df <- data.frame(height, weight)
These pipe operators work with the majority of the tidyverse (not just magrittr). What you are trying to do is actually coming out of dplyr. The na.rm=T is required for many summary variables like mean, sd, as well as certain functions used to gather specific data points like min, max, etc. These functions don't play well with NA values.
df %>% pull(height) %>% mean(na.rm=T) %>% print()
Unless your data is nested you may not even need to use pull
df %>% summarise(mean = mean(height,na.rm=T))
Also, using summarise you can pipe these into another dataframe rather than just printing, and call them out of the dataframe whenever you want.
df %>% summarise(meanHt = mean(height,na.rm=T), sdHt = sd(height,na.rm=T)) -> summary
summary[1]
summary[2]
Related Topics
Create Binary Column (0/1) Based on Condition in Another Column
Ggplot2 Make Missing Value in Geom_Tile Not Blank
Grouped Operations That Result in Length Not Equal to 1 or Length of Group in Dplyr
How to Convert a Date from a Character String
Messy Plot When Plotting Predictions of a Polynomial Regression Using Lm() in R
Dplyr Replacing Na Values in a Column Based on Multiple Conditions
Row-Wise Sort Then Concatenate Across Specific Columns of Data Frame
Reading Psv (Pipe-Separated) File or String
Maps, Ggplot2, Fill by State Is Missing Certain Areas on the Map
R - How to Make Barplot Plot Zeros for Missing Values Over the Data Range
Paste All Combinations of a Vector in R
R::Ggplot2::Geom_Points: How to Swap Points with Pie Charts