Use Pipe Operator %>% with Replacement Functions Like Colnames()<-

Use pipe operator % % with replacement functions like colnames() -

You could use colnames<- or setNames (thanks to @David Arenburg)

group_by(mtcars, cyl) %>%
summarise(mean(disp), mean(hp)) %>%
`colnames<-`(c("cyl", "disp_mean", "hp_mean"))
# or
# `names<-`(c("cyl", "disp_mean", "hp_mean"))
# setNames(., c("cyl", "disp_mean", "hp_mean"))

# cyl disp_mean hp_mean
# 1 4 105.1364 82.63636
# 2 6 183.3143 122.28571
# 3 8 353.1000 209.21429

Or pick an Alias (set_colnames) from magrittr:

library(magrittr)
group_by(mtcars, cyl) %>%
summarise(mean(disp), mean(hp)) %>%
set_colnames(c("cyl", "disp_mean", "hp_mean"))

dplyr::rename may be more convenient if you are only (re)naming a few out of many columns (it requires writing both the old and the new name; see @Richard Scriven's answer)

Transform data to data.frame with the pipe operator

After the transpose, convert to tibble with as_tibble and change the column names with set_names

library(dplyr)
library(tibble)
x %>%
t %>%
as_tibble(.name_repair = "unique") %>%
setNames(c("a", "b"))
# A tibble: 1 x 2
# a b
# <int> <int>
#1 1 2

Or another option if we want to use the OP's syntax would be to wrap the code with {}

x %>%
{data.frame(a = .[1], b = .[2])}

Update row names of a data frame in a pipe (% %)

Using the `rownames<-`() assignment function.

library(magrittr)
d %>% `rownames<-`(NULL)
# X1 X2 X3 X4
# 1 1 4 7 10
# 2 2 5 8 11
# 3 3 6 9 12

Data:

d <- structure(list(X1 = 1:3, X2 = 4:6, X3 = 7:9, X4 = 10:12), class = "data.frame", row.names = c("a", 
"b", "c"))

How to use the R pipe operator (% %) in the following cases

If myvar is just a variable floating around in the environmnet, you can use an if else statement within mutate (similar question here)

library(dplyr)
# Generate dataset
df <- tibble(oldColumn = rnorm(100))
# Mutate with if-else conditions
df <- df %>% mutate(newColumn = if(myvar == "A") oldColumn else if(myvar=="B") oldColumn * 3)

If myvar is included as a column in the dataframe then you could can use case_when.

# Generate dataset
df <- tibble(myvar = sample(c("A", "B"), 100, replace = TRUE),
oldColumn = rnorm(100))

# Create a new column which depends on the value of myvar
df <- df %>%
mutate(newColumn = case_when(myvar == "A" ~ oldColumn*3,
myvar == "B" ~ oldColumn))

As for question 2, you can use mutate with "." operater which calls the left hand side (i.e. "df") in the right hand side of the function. Then you can filter down to the row with the minimum value of seconds (top_n statement using -1 as argument), and pull out the value for the numbers variable

# Generate data
df <- tibble(numbers = sample(1:60),
seconds = sample(1:60))
# Do computation
df <- df %>% mutate(newCol = numbers / top_n(.,-1,seconds) %>% pull(numbers))

R combinations with dot ( . ), ~ , and pipe (% %) operator

That line uses the . in three different ways.

         [1]             [2]      [3]
aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2))

Generally speaking you pass in the value from the pipe into your function at a specific location with . but there are some exceptions. One exception is when the . is in a formula. The ~ is used to create formulas in R. The pipe wont change the meaning of the formula, so it behaves like it would without any escaping. For example

aggregate(. ~ cyl, data=mydata)

And that's just because aggregate requires a formula with both a left and right hand side. So the . at [1] just means "all the other columns in the dataset." This use is not at all related to magrittr.

The . at [2] is the value that's being passed in as the pipe. If you have a plain . as a parameter to the function, that's there the value will be placed. So the result of the subset() will go to the data= parameter.

The magrittr library also allows you to define anonymous functions with the . variable. If you have a chain that starts with a ., it's treated like a function. so

. %>% mean %>% round(2)

is the same as

function(x) round(mean(x), 2)

so you're just creating a custom function with the . at [3]

Pipe operator using in R

You want to do:

`colnames<-`(c("Improved", "Hospitalized", "Death"))

Note the backticks and no spaces as colnames<- is actually a function.

Access result later in pipe

This is due to R's lazy evaluation. It occurs even if pipes are not used. See code below. In that code the argument to n_excluded is filter(n_before(iris), Species != 'setosa') and at the point that rows is used in the print statement the argument has not been referenced from within n_excluded so the entire argument will not have been evaluated and so rows does not yet exist.

if (exists("rows")) rm(rows)  # ensure rows does not exist
n_excluded(filter(n_before(iris), Species != 'setosa'))
## Error in h(simpleError(msg, call)) :
## error in evaluating the argument 'x' in selecting a method for function
## 'print': object 'rows' not found

To fix this

1) we can force x before the print statement.

n_excluded = function(x) { 
force(x)
print(rows - nrow(x))
return(x)
}

2) Alternately, we can use the magrittr sequential pipe which guarantees that legs are run in order. magrittr makes it available but does not provide an operator for it but we can assign it to an operator like this.

`%s>%` <- magrittr::pipe_eager_lexical
iris %>%
n_before() %>%
filter(Species != 'setosa') %s>% # note use of %s>% on this line
n_excluded()

The magrittr developer has stated that he will add it as an operator if there is sufficient demand for it so you might want to add such request to magrittr issue #247 on github.

How can I write this R expression in the pipe operator format?

It would be good practice to make a reproducible example, with dummy data like this:

height <- seq(1:30)
weight <- seq(1:30)
df <- data.frame(height, weight)

These pipe operators work with the majority of the tidyverse (not just magrittr). What you are trying to do is actually coming out of dplyr. The na.rm=T is required for many summary variables like mean, sd, as well as certain functions used to gather specific data points like min, max, etc. These functions don't play well with NA values.

df %>% pull(height) %>% mean(na.rm=T) %>% print()

Unless your data is nested you may not even need to use pull

df %>% summarise(mean = mean(height,na.rm=T))

Also, using summarise you can pipe these into another dataframe rather than just printing, and call them out of the dataframe whenever you want.

df %>% summarise(meanHt = mean(height,na.rm=T), sdHt = sd(height,na.rm=T)) -> summary
summary[1]
summary[2]


Related Topics



Leave a reply



Submit