Return a list in dplyr mutate()
The idiomatic way to do this using data.table
would be to use the :=
(assignment by reference) operator. Here's an illustration:
it[, c(paste0("V", 4:5)) := myfun(V2, V3)]
If you really want a list, why not:
as.list(it[, myfun(V2, V3)])
Alternatively, maybe this is what you want, but why don't you just use the data.table
functionality:
it[, c(.SD, myfun(V2, V3))]
# V1 V2 V3 V4 V5
# 1: a 1 2 3 -1
# 2: a 2 3 5 -1
# 3: b 3 4 7 -1
# 4: b 4 2 6 2
# 5: c 5 2 7 3
Note that if myfun
were to name it's output, then the names would show up in the final result columns:
# V1 V2 V3 new.1 new.2
# 1: a 1 2 3 -1
# 2: a 2 3 5 -1
# 3: b 3 4 7 -1
# 4: b 4 2 6 2
# 5: c 5 2 7 3
Return list using mutate and rowwise
Your TestFn
returns a 4 elements list per row, which can't really be fit in a row; You can wrap the returned elements in a vector first so the returned list is a single element list:
TestFn <- function(X, Y) list(c(X*5, Y/2, X+Y, X*2+5*Y))
# ^
df %>% rowwise() %>% mutate(R=TestFn(X,Y)) %>% pull(R)
#[[1]]
#[1] 5 1 3 12
#[[2]]
#[1] 10.0 1.5 5.0 19.0
#[[3]]
#[1] 15 2 7 26
#[[4]]
#[1] 20 1 6 18
#[[5]]
#[1] 25 1 7 20
rowwise
is usually not as efficient, if you want to vectorize the solution, you can calculate the four expressions firstly and then transpose the result:
df$R = with(df, data.table::transpose(list(X*5, Y/2, X+Y, X*2+5*Y)))
df
# Name X Y R
#1 a 1 2 5, 1, 3, 12
#2 a 2 3 10.0, 1.5, 5.0, 19.0
#3 b 3 4 15, 2, 7, 26
#4 b 4 2 20, 1, 6, 18
#5 c 5 2 25, 1, 7, 20
R dplyr::mutate - add all elements in a returned list
We can use pmap
library(purrr)
library(dplyr)
pmap_dfr(have %>%
select(x, z), testFun) %>%
bind_cols(have, .)
# x y z x2 x3
#1 1 1 0 1 1
#2 2 1 0 4 8
#3 3 1 0 9 27
#4 4 1 0 16 64
#5 5 1 0 25 125
#6 6 1 0 36 216
#7 7 1 0 49 343
#8 8 1 0 64 512
#9 9 1 0 81 729
#10 10 1 0 100 1000
Or if we can change the function by quoting (quote
or quo
) it, this becomes more easier
testFun <- function(x,z){
list(x2= quo(x*x + z), x3= quo(x*x*x + z))
}
have %>%
mutate(!!! testFun(x, z))
# x y z x2 x3
#1 1 1 0 1 1
#2 2 1 0 4 8
#3 3 1 0 9 27
#4 4 1 0 16 64
#5 5 1 0 25 125
#6 6 1 0 36 216
#7 7 1 0 49 343
#8 8 1 0 64 512
#9 9 1 0 81 729
#10 10 1 0 100 1000
Return multiple columns in dplyr mutate
Well, you don't have to modify your function. Just do this
CO2 %>%
as_tibble() %>%
mutate(
conc2 = conc^2,
across(c(Treatment), one_hot)$Treatment # see here
)
Output
# A tibble: 84 x 7
Plant Type Treatment conc uptake conc2 Isnonchilled
<ord> <fct> <fct> <dbl> <dbl> <dbl> <int>
1 Qn1 Quebec nonchilled 95 16 9025 1
2 Qn1 Quebec nonchilled 175 30.4 30625 1
3 Qn1 Quebec nonchilled 250 34.8 62500 1
4 Qn1 Quebec nonchilled 350 37.2 122500 1
5 Qn1 Quebec nonchilled 500 35.3 250000 1
6 Qn1 Quebec nonchilled 675 39.2 455625 1
7 Qn1 Quebec nonchilled 1000 39.7 1000000 1
8 Qn2 Quebec nonchilled 95 13.6 9025 1
9 Qn2 Quebec nonchilled 175 27.3 30625 1
10 Qn2 Quebec nonchilled 250 37.1 62500 1
# ... with 74 more rows
For mutation across many columns,
CO2 %>%
as_tibble() %>%
mutate(
conc2 = conc^2,
bind_cols(as.list(across(starts_with("T"), one_hot)))
)
Output
# A tibble: 84 x 8
Plant Type Treatment conc uptake conc2 IsQuebec Isnonchilled
<ord> <fct> <fct> <dbl> <dbl> <dbl> <int> <int>
1 Qn1 Quebec nonchilled 95 16 9025 1 1
2 Qn1 Quebec nonchilled 175 30.4 30625 1 1
3 Qn1 Quebec nonchilled 250 34.8 62500 1 1
4 Qn1 Quebec nonchilled 350 37.2 122500 1 1
5 Qn1 Quebec nonchilled 500 35.3 250000 1 1
6 Qn1 Quebec nonchilled 675 39.2 455625 1 1
7 Qn1 Quebec nonchilled 1000 39.7 1000000 1 1
8 Qn2 Quebec nonchilled 95 13.6 9025 1 1
9 Qn2 Quebec nonchilled 175 27.3 30625 1 1
10 Qn2 Quebec nonchilled 250 37.1 62500 1 1
# ... with 74 more rows
Mutate with a list column function in dplyr
You could simply add rowwise
df_comp_jaccard <- df_comp %>%
rowwise() %>%
dplyr::mutate(jaccard_sim = length(intersect(names_vec, source_vec))/
length(union(names_vec, source_vec)))
# A tibble: 3 x 3
names_ names_vec jaccard_sim
<chr> <list> <dbl>
1 b d f <chr [3]> 0.2
2 u k g <chr [3]> 0.0
3 m o c <chr [3]> 0.2
Using rowwise
you get the intuitive behavior some would expect when using mutate
: "do this operation for every row".
Not using rowwise
means you take advantage of vectorized functions, which is much faster, that's why it's the default, but may yield unexpected results if you're not careful.
The impression that mutate
(or other dplyr
functions) works row-wise is an illusion due to the fact you're working with vectorized functions, in fact you're always juggling with full columns.
I'll illustrate with a couple of examples:
Sometimes the result is the same, with a vectorized function such as paste
:
tibble(a=1:10,b=10:1) %>% mutate(X = paste(a,b,sep="_"))
tibble(a=1:10,b=10:1) %>% rowwise %>% mutate(X = paste(a,b,sep="_"))
# # A tibble: 5 x 3
# a b X
# <int> <int> <chr>
# 1 1 5 1_5
# 2 2 4 2_4
# 3 3 3 3_3
# 4 4 2 4_2
# 5 5 1 5_1
And sometimes it's different, with a function that is not vectorized, such as max
:
tibble(a=1:5,b=5:1) %>% mutate(max(a,b))
# # A tibble: 5 x 3
# a b `max(a, b)`
# <int> <int> <int>
# 1 1 5 5
# 2 2 4 5
# 3 3 3 5
# 4 4 2 5
# 5 5 1 5
tibble(a=1:5,b=5:1) %>% rowwise %>% mutate(max(a,b))
# # A tibble: 5 x 3
# a b `max(a, b)`
# <int> <int> <int>
# 1 1 5 5
# 2 2 4 4
# 3 3 3 3
# 4 4 2 4
# 5 5 1 5
Note that in this case you shouldn't use rowwise
in a real life situation, but pmax
which is vectorized for this purpose:
tibble(a=1:5,b=5:1) %>% mutate(pmax(a,b))
# # A tibble: 5 x 3
# a b `pmax(a, b)`
# <int> <int> <int>
# 1 1 5 5
# 2 2 4 4
# 3 3 3 3
# 4 4 2 4
# 5 5 1 5
Intersect is such function, you fed this function one list column containing vectors and one other vector, these 2 objects have no intersection.
How do I mutate a list-column to a common one leaving only the last value when there is a vector in the list?
Here is an option. We can use if/else
instead of ifelse
here
library(dplyr)
library(tidyr)
x %>%
mutate(two = map_chr(two, ~ if(is.null(.x)) NA_character_ else last(.x)))
# one two
#1 a d
#2 b g
#3 c NA
Or replace
the NULL
elements with NA
and extract the last
x %>%
mutate(two = map_chr(two, ~ last(replace(.x, is.null(.), NA))))
mutate several columns into a function creates a lists for each component inside the resulted column in dplyr
Based on the input dataset, assuming that we only need one value per each row by doing the moe_prop
on each row, convert the column names to symbols and then do an evaluation (!!!
)
tt %>%
mutate(moe = moe_prop(!!! rlang::syms(names(.)[c(1, 3, 4, 2)])))
# A tibble: 6 x 7
# A B C D E F moe
# <int> <int> <int> <int> <int> <int> <dbl>
#1 1 7 13 19 25 31 1.46
#2 2 8 14 20 26 32 1.43
#3 3 9 15 21 27 33 1.39
#4 4 10 16 22 28 34 1.37
#5 5 11 17 23 29 35 1.34
#6 6 12 18 24 30 36 1.31
It is similar to calling
tt %>%
mutate(moe = moe_prop(!!! rlang::syms(c("A", "C", "D", "B"))))
Or do a rowwise() operation
tt %>%
rowwise %>%
mutate(moe = moe_prop(A, C, D, B))
By checking the row values individually
moe_prop(1, 13, 19, 7)
#[1] 1.460951
moe_prop(2, 14, 20, 8)
#[1] 1.426237
Use recode to mutate across multiple columns using named list of named vectors
Below are three approaches:
First, we can make it work with dplyr::across
in a custom function using dplyr::cur_column()
.
library(tidyverse)
myfun <- function(x) {
mycol <- cur_column()
dplyr::recode(x, !!! dicts[[mycol]])
}
test %>%
mutate(across(c("A", "B", "C"), myfun))
#> # A tibble: 3 x 4
#> Names A B C
#> <chr> <chr> <chr> <chr>
#> 1 Alice charlie yes delta
#> 2 Bob delta no epsilon
#> 3 Cindy bravo bad beta
A second option is to transform the dicts
into a list of expression and then just splice it into mutate
using the !!!
operator:
expr_ls <- imap(dicts, ~ quo(recode(!!sym(.y), !!!.x)))
test %>%
mutate(!!! expr_ls)
#> # A tibble: 3 x 4
#> Names A B C
#> <chr> <chr> <chr> <chr>
#> 1 Alice charlie yes delta
#> 2 Bob delta no epsilon
#> 3 Cindy bravo bad beta
Finally, in the larger tidyverse we could use purrr::lmap_at
, but it makes the underlying function more complex than it needs to be:
myfun2 <- function(x) {
x_nm <- names(x)
mutate(x, !! x_nm := recode(!! sym(x_nm), !!! dicts[[x_nm]]))
}
lmap_at(test,
names(dicts),
myfun2)
#> # A tibble: 3 x 4
#> Names A B C
#> <chr> <chr> <chr> <chr>
#> 1 Alice charlie yes delta
#> 2 Bob delta no epsilon
#> 3 Cindy bravo bad beta
Original data
# Starting tibble
test <- tibble(Names = c("Alice","Bob","Cindy"),
A = c(3,"q",7),
B = c(1,2,"b"),
C = c("a","g",9))
# Named vector
A <- c("5" = "alpha", "7" = "bravo", "3" = "charlie", "q" = "delta")
B <- c("1" = "yes", "2" = "no", "b" = "bad", "c" = "missing")
C <- c("9" = "beta", "8" = "gamma", "a" = "delta", "g" = "epsilon")
# Named list of named vectors
dicts <- list("A" = A, "B" = B, "C" = C) # Same names as columns
Created on 2021-12-15 by the reprex package (v2.0.1)
Dplyr : use mutate with columns that contain lists
You can't subtract directly from a list column in that way using dplyr
. The best way I have found to accomplish the task you are referencing is to use purrr::map
. Here is how it works:
data <- data %>%
mutate(y = map2(mnt_ope, ref_amount, function(x, y){
x - y
}))
Or, more tersely:
data <- data %>%
mutate(y = map2(mnt_ope, ref_amount, ~.x - .y))
map2
here applies a two-input function to two vectors (in your case, two columns of a data frame) and returns the result as a vector (which we are using mutate to append back to your data frame).
Hope that helps!
How to use mutate on list?
Your attempt with group_by
fails because you override mutate
's search path. Mutate uses non-standard evaluation, so it will look for variables first among the columns of it's data
argument.
When you use pipes (%>%
), a dot .
refers to the whole data frame, and .$SelfEsteem
refers to the whole SelfEsteem column from the whole data frame.
You just need to simplify a little bit (not override the defaults) to get the expected result.
csv %>%
group_by(participant_number) %>%
mutate(SE_variance = var(SelfEsteem))
# Source: local data frame [5 x 3]
# Groups: participant_number [2]
#
# participant_number SelfEsteem SE_variance
# (dbl) (dbl) (dbl)
# 1 1 3 1
# 2 1 4 1
# 3 1 2 1
# 4 2 1 2
# 5 2 3 2
Related Topics
Specify Position of Geom_Text by Keywords Like "Top", "Bottom", "Left", "Right", "Center"
Loop for Reverse Geocoding in R
Filter Groups in Dplyr That Exclusively Contain Specific Combinations of Values
Margin Adjustments When Using Ggplot's Geom_Tile()
Text Color Based on Contrast Against Background
Twitter Emoji Encoding Problems with Twitter and R
Binning Data, Finding Results by Group, and Plotting Using R
R Cumulative Sum with a Condition and a Reset
Drawing a Stratified Sample in R
How to Create a Hyperlink Interactively in Shiny App
Changing Multiple Column Values Given a Condition in Dplyr
R/Gis: How to Subset a Shapefile by a Lat-Long Bounding Box
Trouble Installing and Loading Rjava on MAC El Capitan
Transpose Only Certain Columns in Data.Frame
Rsqlite Query with User Specified Variable in the Where Field
How to Draw Half-Filled Points in R (Preferably Using Ggplot)