Return a List in Dplyr Mutate()

Return a list in dplyr mutate()

The idiomatic way to do this using data.table would be to use the := (assignment by reference) operator. Here's an illustration:

it[, c(paste0("V", 4:5)) := myfun(V2, V3)]

If you really want a list, why not:

as.list(it[, myfun(V2, V3)])

Alternatively, maybe this is what you want, but why don't you just use the data.table functionality:

it[, c(.SD, myfun(V2, V3))]
#    V1 V2 V3 V4 V5
# 1:  a  1  2  3 -1
# 2:  a  2  3  5 -1
# 3:  b  3  4  7 -1
# 4:  b  4  2  6  2
# 5:  c  5  2  7  3

Note that if myfun were to name it's output, then the names would show up in the final result columns:

#    V1 V2 V3 new.1 new.2
# 1:  a  1  2     3    -1
# 2:  a  2  3     5    -1
# 3:  b  3  4     7    -1
# 4:  b  4  2     6     2
# 5:  c  5  2     7     3

Return list using mutate and rowwise

Your TestFn returns a 4 elements list per row, which can't really be fit in a row; You can wrap the returned elements in a vector first so the returned list is a single element list:

TestFn <- function(X, Y) list(c(X*5, Y/2, X+Y, X*2+5*Y))
#                             ^ 
df %>% rowwise() %>% mutate(R=TestFn(X,Y)) %>% pull(R)
#[[1]]
#[1]  5  1  3 12

#[[2]]
#[1] 10.0  1.5  5.0 19.0

#[[3]]
#[1] 15  2  7 26

#[[4]]
#[1] 20  1  6 18

#[[5]]
#[1] 25  1  7 20

rowwise is usually not as efficient, if you want to vectorize the solution, you can calculate the four expressions firstly and then transpose the result:

df$R = with(df, data.table::transpose(list(X*5, Y/2, X+Y, X*2+5*Y)))
df
#  Name X Y                    R
#1    a 1 2          5, 1, 3, 12
#2    a 2 3 10.0, 1.5, 5.0, 19.0
#3    b 3 4         15, 2, 7, 26
#4    b 4 2         20, 1, 6, 18
#5    c 5 2         25, 1, 7, 20

R dplyr::mutate - add all elements in a returned list

We can use pmap

library(purrr)
library(dplyr)
pmap_dfr(have %>% 
              select(x, z), testFun) %>%
   bind_cols(have, .)
#    x y z  x2   x3
#1   1 1 0   1    1
#2   2 1 0   4    8
#3   3 1 0   9   27
#4   4 1 0  16   64
#5   5 1 0  25  125
#6   6 1 0  36  216
#7   7 1 0  49  343
#8   8 1 0  64  512
#9   9 1 0  81  729
#10 10 1 0 100 1000

Or if we can change the function by quoting (quote or quo) it, this becomes more easier

testFun <- function(x,z){
  list(x2= quo(x*x + z), x3= quo(x*x*x + z))
 }

have %>% 
   mutate(!!! testFun(x, z))
#    x y z  x2   x3
#1   1 1 0   1    1
#2   2 1 0   4    8
#3   3 1 0   9   27
#4   4 1 0  16   64
#5   5 1 0  25  125
#6   6 1 0  36  216
#7   7 1 0  49  343
#8   8 1 0  64  512
#9   9 1 0  81  729
#10 10 1 0 100 1000

Return multiple columns in dplyr mutate

Well, you don't have to modify your function. Just do this

CO2 %>%
  as_tibble() %>%
  mutate(
    conc2 = conc^2,
    across(c(Treatment), one_hot)$Treatment # see here
  )

Output

# A tibble: 84 x 7
   Plant Type   Treatment   conc uptake   conc2 Isnonchilled
   <ord> <fct>  <fct>      <dbl>  <dbl>   <dbl>        <int>
 1 Qn1   Quebec nonchilled    95   16      9025            1
 2 Qn1   Quebec nonchilled   175   30.4   30625            1
 3 Qn1   Quebec nonchilled   250   34.8   62500            1
 4 Qn1   Quebec nonchilled   350   37.2  122500            1
 5 Qn1   Quebec nonchilled   500   35.3  250000            1
 6 Qn1   Quebec nonchilled   675   39.2  455625            1
 7 Qn1   Quebec nonchilled  1000   39.7 1000000            1
 8 Qn2   Quebec nonchilled    95   13.6    9025            1
 9 Qn2   Quebec nonchilled   175   27.3   30625            1
10 Qn2   Quebec nonchilled   250   37.1   62500            1
# ... with 74 more rows

For mutation across many columns,

CO2 %>%
  as_tibble() %>%
  mutate(
    conc2 = conc^2,
    bind_cols(as.list(across(starts_with("T"), one_hot)))
  )

Output

# A tibble: 84 x 8
   Plant Type   Treatment   conc uptake   conc2 IsQuebec Isnonchilled
   <ord> <fct>  <fct>      <dbl>  <dbl>   <dbl>    <int>        <int>
 1 Qn1   Quebec nonchilled    95   16      9025        1            1
 2 Qn1   Quebec nonchilled   175   30.4   30625        1            1
 3 Qn1   Quebec nonchilled   250   34.8   62500        1            1
 4 Qn1   Quebec nonchilled   350   37.2  122500        1            1
 5 Qn1   Quebec nonchilled   500   35.3  250000        1            1
 6 Qn1   Quebec nonchilled   675   39.2  455625        1            1
 7 Qn1   Quebec nonchilled  1000   39.7 1000000        1            1
 8 Qn2   Quebec nonchilled    95   13.6    9025        1            1
 9 Qn2   Quebec nonchilled   175   27.3   30625        1            1
10 Qn2   Quebec nonchilled   250   37.1   62500        1            1
# ... with 74 more rows

Mutate with a list column function in dplyr

You could simply add rowwise

df_comp_jaccard <- df_comp %>%
  rowwise() %>%
  dplyr::mutate(jaccard_sim = length(intersect(names_vec, source_vec))/
                              length(union(names_vec, source_vec)))

# A tibble: 3 x 3
  names_ names_vec jaccard_sim
   <chr>    <list>       <dbl>
1  b d f <chr [3]>         0.2
2  u k g <chr [3]>         0.0
3  m o c <chr [3]>         0.2

Using rowwise you get the intuitive behavior some would expect when using mutate : "do this operation for every row".

Not using rowwise means you take advantage of vectorized functions, which is much faster, that's why it's the default, but may yield unexpected results if you're not careful.

The impression that mutate (or other dplyr functions) works row-wise is an illusion due to the fact you're working with vectorized functions, in fact you're always juggling with full columns.

I'll illustrate with a couple of examples:

Sometimes the result is the same, with a vectorized function such as paste:

tibble(a=1:10,b=10:1) %>% mutate(X = paste(a,b,sep="_"))
tibble(a=1:10,b=10:1) %>% rowwise %>% mutate(X = paste(a,b,sep="_"))
# # A tibble: 5 x 3
#       a     b     X
#   <int> <int> <chr>
# 1     1     5   1_5
# 2     2     4   2_4
# 3     3     3   3_3
# 4     4     2   4_2
# 5     5     1   5_1

And sometimes it's different, with a function that is not vectorized, such as max:

tibble(a=1:5,b=5:1) %>% mutate(max(a,b))
# # A tibble: 5 x 3
#       a     b `max(a, b)`
#   <int> <int>       <int>
# 1     1     5           5
# 2     2     4           5
# 3     3     3           5
# 4     4     2           5
# 5     5     1           5

tibble(a=1:5,b=5:1) %>% rowwise %>% mutate(max(a,b))
# # A tibble: 5 x 3
#       a     b `max(a, b)`
#   <int> <int>       <int>
# 1     1     5           5
# 2     2     4           4
# 3     3     3           3
# 4     4     2           4
# 5     5     1           5

Note that in this case you shouldn't use rowwise in a real life situation, but pmax which is vectorized for this purpose:

tibble(a=1:5,b=5:1) %>% mutate(pmax(a,b))
# # A tibble: 5 x 3
#       a     b `pmax(a, b)`
#   <int> <int>        <int>
# 1     1     5            5
# 2     2     4            4
# 3     3     3            3
# 4     4     2            4
# 5     5     1            5

Intersect is such function, you fed this function one list column containing vectors and one other vector, these 2 objects have no intersection.

How do I mutate a list-column to a common one leaving only the last value when there is a vector in the list?

Here is an option. We can use if/else instead of ifelse here

library(dplyr)
library(tidyr)
x %>% 
   mutate(two = map_chr(two, ~ if(is.null(.x)) NA_character_ else last(.x)))
#   one two
#1   a   d
#2   b   g
#3   c  NA

Or replace the NULL elements with NA and extract the last

x %>% 
   mutate(two = map_chr(two, ~ last(replace(.x, is.null(.), NA))))

mutate several columns into a function creates a lists for each component inside the resulted column in dplyr

Based on the input dataset, assuming that we only need one value per each row by doing the moe_prop on each row, convert the column names to symbols and then do an evaluation (!!!)

tt %>% 
  mutate(moe = moe_prop(!!! rlang::syms(names(.)[c(1, 3, 4, 2)])))
# A tibble: 6 x 7
#      A     B     C     D     E     F   moe
#  <int> <int> <int> <int> <int> <int> <dbl>
#1     1     7    13    19    25    31  1.46
#2     2     8    14    20    26    32  1.43
#3     3     9    15    21    27    33  1.39
#4     4    10    16    22    28    34  1.37
#5     5    11    17    23    29    35  1.34
#6     6    12    18    24    30    36  1.31

It is similar to calling

tt %>%
   mutate(moe = moe_prop(!!! rlang::syms(c("A", "C", "D", "B"))))

Or do a rowwise() operation

tt %>%
    rowwise %>% 
    mutate(moe = moe_prop(A, C, D, B))

By checking the row values individually

moe_prop(1, 13, 19, 7)
#[1] 1.460951

moe_prop(2, 14, 20, 8)
#[1] 1.426237

Use recode to mutate across multiple columns using named list of named vectors

Below are three approaches:

First, we can make it work with dplyr::across in a custom function using dplyr::cur_column().

library(tidyverse)

myfun <- function(x) {
  mycol <- cur_column()
  dplyr::recode(x, !!! dicts[[mycol]])
}

test %>% 
  mutate(across(c("A", "B", "C"), myfun))

#> # A tibble: 3 x 4
#>   Names A       B     C      
#>   <chr> <chr>   <chr> <chr>  
#> 1 Alice charlie yes   delta  
#> 2 Bob   delta   no    epsilon
#> 3 Cindy bravo   bad   beta

A second option is to transform the dicts into a list of expression and then just splice it into mutate using the !!! operator:

expr_ls <-  imap(dicts, ~ quo(recode(!!sym(.y), !!!.x)))

test %>% 
  mutate(!!! expr_ls)

#> # A tibble: 3 x 4
#>   Names A       B     C      
#>   <chr> <chr>   <chr> <chr>  
#> 1 Alice charlie yes   delta  
#> 2 Bob   delta   no    epsilon
#> 3 Cindy bravo   bad   beta

Finally, in the larger tidyverse we could use purrr::lmap_at, but it makes the underlying function more complex than it needs to be:

myfun2 <- function(x) {
  x_nm <- names(x)
  mutate(x, !! x_nm := recode(!! sym(x_nm), !!! dicts[[x_nm]]))
}

lmap_at(test, 
        names(dicts),
        myfun2)
#> # A tibble: 3 x 4
#>   Names A       B     C      
#>   <chr> <chr>   <chr> <chr>  
#> 1 Alice charlie yes   delta  
#> 2 Bob   delta   no    epsilon
#> 3 Cindy bravo   bad   beta

Original data

# Starting tibble
test <- tibble(Names = c("Alice","Bob","Cindy"),
               A = c(3,"q",7),
               B = c(1,2,"b"),
               C = c("a","g",9))

# Named vector
A <- c("5" = "alpha", "7" = "bravo", "3" = "charlie", "q" = "delta")
B <- c("1" = "yes", "2" = "no", "b" = "bad", "c" = "missing")
C <- c("9" = "beta", "8" = "gamma", "a" = "delta", "g" = "epsilon")

# Named list of named vectors
dicts <- list("A" = A, "B" = B, "C" = C) # Same names as columns

^{Created on 2021-12-15 by the reprex package (v2.0.1)}

Dplyr : use mutate with columns that contain lists

You can't subtract directly from a list column in that way using dplyr. The best way I have found to accomplish the task you are referencing is to use purrr::map. Here is how it works:

data <- data %>% mutate(y = map2(mnt_ope, ref_amount, function(x, y){ x - y }))

Or, more tersely:

data <- data %>% mutate(y = map2(mnt_ope, ref_amount, ~.x - .y))

map2 here applies a two-input function to two vectors (in your case, two columns of a data frame) and returns the result as a vector (which we are using mutate to append back to your data frame).

Hope that helps!

How to use mutate on list?

Your attempt with group_by fails because you override mutate's search path. Mutate uses non-standard evaluation, so it will look for variables first among the columns of it's data argument.

When you use pipes (%>%), a dot . refers to the whole data frame, and .$SelfEsteem refers to the whole SelfEsteem column from the whole data frame.

You just need to simplify a little bit (not override the defaults) to get the expected result.

csv %>% 
  group_by(participant_number) %>%
  mutate(SE_variance = var(SelfEsteem))
# Source: local data frame [5 x 3]
# Groups: participant_number [2]
# 
#   participant_number SelfEsteem SE_variance
#                (dbl)      (dbl)       (dbl)
# 1                  1          3           1
# 2                  1          4           1
# 3                  1          2           1
# 4                  2          1           2
# 5                  2          3           2

Return a List in Dplyr Mutate()