Save Output Between Pipes in Dplyr

Save output between pipes in dplyr

Thanks for the help. I found a better solution using braces{} and ->>. See below

   c = cars %>% mutate(var1 = dist*speed)%>%
   {. ->> b } %>%   #here is save
   summary()
   c
   head(b)

Assign intermediate output to temp variable as part of dplyr pipeline

This does not create an object in the global environment:

df %>% 
   filter(b < 3) %>% 
   { 
     { . -> tmp } %>% 
     mutate(b = b*2) %>% 
     bind_rows(tmp) 
   }

This can also be used for debugging if you use . ->> tmp instead of . -> tmp or insert this into the pipeline:

{ browser(); . } %>%

Pipe output of one data.frame to another using dplyr

Below is a tidyverse (dplyr, tidyr, and purrr) solution that I hope will help.

Note that the use of map_df in the last line returns all results as a data frame. If you'd prefer it to be a list object for each group, then simply use map.

library(dplyr)
library(tidyr)
library(purrr)

# Save unique groups for later use
P_Groups <- unique(P_Lookup$Group)

# Convert lookup table to product IDs and Groups
P_Lookup <- P_Lookup %>% 
              gather(ProductIDn, ProductID, ProductID1, ProductID2) %>% 
              select(ProductID, Group) %>% 
              distinct() %>% 
              nest(-ProductID, .key = Group)

# Bind Group information to transactions
# and group for next analysis
P_Trans <- P_Trans %>%
             left_join(P_Lookup) %>%
             filter(!map_lgl(Group, is.null)) %>%  
             unnest(Group) %>% 
             group_by(TransactionID)

# Iterate through Groups to produce results
map(P_Groups, ~ filter(P_Trans, Group == .)) %>% 
  map(~ mutate(., No_of_PIDs = n_distinct(ProductType))) %>% 
  map_df(~ filter(., No_of_PIDs > 1))
#> Source: local data frame [12 x 5]
#> Groups: TransactionID [4]
#> 
#>    TransactionID ProductID ProductType  Group No_of_PIDs
#>            <chr>     <chr>       <dbl>  <chr>      <int>
#> 1             a1         A           1 Group1          2
#> 2             a1         B           1 Group1          2
#> 3             a1         1           2 Group1          2
#> 4             a2         C           1 Group2          2
#> 5             a2         4           2 Group2          2
#> 6             a2         5           2 Group2          2
#> 7             a3         D           1 Group3          2
#> 8             a3         C           1 Group3          2
#> 9             a3         7           2 Group3          2
#> 10            a3         8           2 Group3          2
#> 11            a6         H           1 Group5          2
#> 12            a6        15           2 Group5          2

output two dataframes inside dplyr pipes

Try the following code:

df <- read.csv(file) %>%
  mutate(....) %>%
  mutate(....)

# save df state to new df2 object here.......*****
df2 <- df

df %>% group_by(....) %>%
  arrange(var) %>%
  summary()

Inside the pipe chain as requested:

df <- read.csv(file) %>%
  mutate(....) %>%
  mutate(....) %>%
  {df2 <<- .} %>% # save df state to new df2 object here.......*****
  group_by(....) %>%
  arrange(var) %>%
  summary()

Hope it helps!

How to pipe an output tibble into further calculations without saving the tibble as a separate object in R?

In your caase, you can further manipulate the tibble you have generated using dplyr functions.

Note the existence of mutate_at and summarize_at, that lets you transform a set of columns with the option to select them by column position.

This, using . as a placeholder for the tibble you are currently manipulating, and calling an anonymous function inside mutate_at, will give you the result you expect.

sr_df %>%
  group_by(ResolutionViolated) %>%
  tally() %>% 
  arrange(desc(n)) %>% 
  mutate(total = sum(n)) %>% 
  mutate_at(.cols = c(1, 2), 
            .funs = function(column) round(column / .$total * 100, digits = 2))

Temporarily store variable in series of pipes dplyr

You could use a code block for a local variable. This would look like

df %>% 
{ n = nrow(.)
  gather(., var, value, -Grp) %>% 
  mutate(newval = value * n)
}

Notice how we have to pass the . to gather as well here and the pipe continues inside the block. But you could put other parts afterwards

df %>% 
{ n = nrow(.)
  gather(., var, value, -Grp) %>% 
  mutate(newval = value * n)
} %>% 
select(newval)

Is there a way to split piping in R to two output functions?

You can use the %T>% operator from magrittr that is used when you want to return the lhs of the operator rather than the output of the rhs. However, it's messy and not really the use case for the operator, so would go with @akrun's solution. But there are cases where you don't have to use { } and the tee operator is preferable to the normal pipe.

library(magrittr)

species_tab %T>% 
  { gtsave(data = gt(.), filename = "species.tex")} %>% 
  flextable() %>% 
  save_as_docx(path = "species.docx")

Creating multiple data frames using same pipes but different columns in R

Here is one approach which nests a call to purrr::map_dfc inside dplyr::across. It should work for all variables, but we need to convert them to factors first.

library(tidyverse)

master %>% 
  mutate(across(!Date, as.factor)) %>% 
  group_by(Date) %>% 
  summarise(across(everything(), # uses all variables, expect Date bc its the grouping var
                   ~ map_df(set_names(unique(.x)),
                    function(y) {
                      sum(y == .x, na.rm = TRUE)
                    }))
            ) %>%
  do.call("data.frame", args = .) %>% 
  replace(is.na(.), 0)

#>   Date Age.70 Age.80 Age.60 Age.40 Gender.M Gender.F
#> 1 5/21      1      1      0      0        2        0
#> 2 5/22      2      0      0      0        1        1
#> 3 5/24      0      0      1      2        2        1

^{Created on 2021-06-11 by the reprex package (v0.3.0)}

How do I write a dplyr pipe-friendly function where a new column name is provided from a function argument?

In this case you can just stick to using the embrace {{}} option for your variables. If you want to dynamically create column names, you're going to still need to use :=. The difference here is that you can use the glue-style syntax with the embrace operator to get the name of the symbol. This works with the data provided.

elective_open <- function(.data, name_for_elective, course, tiebreaker){ 
  .data%>%
    mutate("{{name_for_elective}}" := ifelse({{tiebreaker}}==max({{tiebreaker}}),1,0)) %>%
    mutate("{{name_for_elective}}" := ifelse({{name_for_elective}}==0,{{course}}[{{name_for_elective}}==1],"")) %>%
    filter(!({{course}} %in% {{name_for_elective}}))
}

Save Output Between Pipes in Dplyr