dplyr mutate - How do I pass one row as a function argument?
Take a look at ?dplyr::do
and ?purrr::map
, which allow you to apply arbitrary functions to arbitrary columns and to chain the results through multiple unary operators. For example,
df1 <- df %>% rowwise %>% do( X = as_data_frame(.) ) %>% ungroup
# # A tibble: 6 x 1
# X
# * <list>
# 1 <tibble [1 x 2]>
# 2 <tibble [1 x 2]>
# ...
Notice that column X
now contains 1x2 data.frame
s (or tibble
s) comprised of rows from your original data.frame
. You can now pass each one to your custom myFunc
using map
.
myFunc <- function(Y) {paste0( Y$columnA, Y$columnB )}
df1 %>% mutate( Result = map(X, myFunc) )
# # A tibble: 6 x 2
# X Result
# <list> <list>
# 1 <tibble [1 x 2]> <chr [1]>
# 2 <tibble [1 x 2]> <chr [1]>
# ...
Result
column now contains the output of myFunc
applied to each row in your original data.frame
, as desired. You can retrieve the values by concatenating a tidyr::unnest
operation.
df1 %>% mutate( Result = map(X, myFunc) ) %>% unnest
# # A tibble: 6 x 3
# Result columnA columnB
# <chr> <fctr> <fctr>
# 1 AZ A Z
# 2 BY B Y
# 3 CX C X
# ...
If desired, unnest
can be limited to specific columns, e.g., unnest(Result)
.
EDIT: Because your original data.frame
contains only two columns, you can actually skip the do
step and use purrr::map2
instead. The syntax is very similar to map
:
myFunc <- function( a, b ) {paste0(a,b)}
df %>% mutate( Result = map2( columnA, columnB, myFunc ) )
Note that myFunc
is now defined as a binary function.
Apply function to a row in a data.frame using dplyr
We just need the data to be specified as .
as data.frame
is a list
with columns as list elements. If we wrap list(.)
, it becomes a nested list
library(dplyr)
d %>%
mutate(u = pmap_int(., ~ which.max(c(...))))
# a b c u
#1 1 4 2 2
#2 2 3 3 2
#3 3 2 4 3
#4 4 1 5 3
Or can use cur_data()
d %>%
mutate(u = pmap_int(cur_data(), ~ which.max(c(...))))
Or if we want to use everything()
, place that inside select
as list(everything())
doesn't address the data from which everything should be selected
d %>%
mutate(u = pmap_int(select(., everything()), ~ which.max(c(...))))
Or using rowwise
d %>%
rowwise %>%
mutate(u = which.max(cur_data())) %>%
ungroup
# A tibble: 4 x 4
# a b c u
# <int> <int> <int> <int>
#1 1 4 2 2
#2 2 3 3 2
#3 3 2 4 3
#4 4 1 5 3
Or this is more efficient with max.col
max.col(d, 'first')
#[1] 2 2 3 3
Or with collapse
library(collapse)
dapply(d, which.max, MARGIN = 1)
#[1] 2 2 3 3
which can be included in dplyr
as
d %>%
mutate(u = max.col(cur_data(), 'first'))
where function in sum for row wise calculation
We can use where
within c_across
library(dplyr)
data.frame(a_team=c(1:3), b_team=c(2:4),
team_league=c('dd','ee','ff'),c_team=c(5,9,1)) %>%
rowwise() %>%
mutate(league_points = sum(c_across(where(is.numeric)), na.rm = TRUE)) %>%
ungroup
-output
# A tibble: 3 × 5
a_team b_team team_league c_team league_points
<int> <int> <chr> <dbl> <dbl>
1 1 2 dd 5 8
2 2 3 ee 9 14
3 3 4 ff 1 8
rowwise
would be slow. Here, a vectorized function is already available i.e. rowSums
data.frame(a_team=c(1:3), b_team=c(2:4),
team_league=c('dd','ee','ff'),c_team=c(5,9,1)) %>%
mutate(league_points = rowSums(across(where(is.numeric)), na.rm = TRUE))
-output
a_team b_team team_league c_team league_points
1 1 2 dd 5 8
2 2 3 ee 9 14
3 3 4 ff 1 8
Mutate a data frame in the tidyverse passed as a parameter in a function
Edit
Ritchie Sacramento's answer in the comments is better; use that.
--
Here is one potential solution:
library(tidyverse)
test_scale <- function(outcome, data){
outcome <- ensym(outcome)
outcome_scaled = paste0(outcome, "_s")
data2 = data %>% mutate(outcome_scaled := scale(as.numeric(!!outcome)))
print(head(data2[, "outcome_scaled"]))
}
test_scale("Sepal.Length", iris)
#> [,1]
#> [1,] -0.8976739
#> [2,] -1.1392005
#> [3,] -1.3807271
#> [4,] -1.5014904
#> [5,] -1.0184372
#> [6,] -0.5353840
Using ensym()
means that you don't necessarily need to quote "outcome":
test_scale(Sepal.Length, iris)
#> [,1]
#> [1,] -0.8976739
#> [2,] -1.1392005
#> [3,] -1.3807271
#> [4,] -1.5014904
#> [5,] -1.0184372
#> [6,] -0.5353840
Created on 2021-12-02 by the reprex package (v2.0.1)
dplyr - apply a custom function using rowwise()
I don't think your problem is with rowwise. The way your function is written, it's expecting a single object. Try adding a c():
dt2 %>% rowwise() %>% mutate(nr_of_0s = zerocount(c(A, B, C)))
Note that, if you aren't committed to using your own function, you can skip rowwise entirely, as Nettle also notes. rowSums
already treats data frames in a rowwise fashion, which is why this works:
dt2 %>% mutate(nr_of_0s = rowSums(. == 0))
mutate(across()) with external function that references other variables in current data frame without passing second argument
You can extract x
value from cur_data()
which would also work when you group the data.
library(dplyr)
dtmp = tibble(x = 1:4, y = 10, z = 20)
# Function to pass to mutate(across())
addx = function(col) {col + cur_data()$x}
dtmp %>% mutate(across(c(y,z), addx))
# x y z
# <int> <dbl> <dbl>
#1 1 11 21
#2 2 12 22
#3 3 13 23
#4 4 14 24
If you need the function to reference a grouping variable, use cur_data_all()
, instead.
R: Use dplyr::mutate/dplyr::transmute with a function which acts on an entire row
I think you're incurring in a dimension error.
If I do
library(dplyr)
transmute(head(women, n=10),
some_index=calc_some_index(head(women,10)))
Then it works (the error in your code complained about sizes differing)
Alternatively, you could use the pipe and it works:
head(women, 10) %>%
transmute(calc_some_index(.))
R: applying custom function row by row with mutate()
Edit 7 July:
From your comments I understand you were looking for something different, the assumption I made about why your function was giving multiple values was wrong. Hence this new answer from scratch:
The custom function you've written doesn't lend itself to row-by-row application, because it already processes all rows at once:
Given the following input:
congress <- c(104, 111, 104, 111, 104, 111)
latitude <- c(37.32935, 37.32935, 41.1134016, 41.1134016, 42.1554948, 42.1554948)
longitude <- c(-122.00954, -122.00954, 73.720356, 73.720356, -87.868850502543, -87.868850502543)
point_geo_test
contains these values:
> point_geo_test
[...]
congress geometry
1 104 POINT (-122.0095 37.32935)
2 111 POINT (-122.0095 37.32935)
3 104 POINT (73.72036 41.1134)
4 111 POINT (73.72036 41.1134)
5 104 POINT (-87.86885 42.15549)
6 111 POINT (-87.86885 42.15549)
and extract_district()
returns this:
> extract_district(point_geo_test, 104)
[...]
[1] "California-14" "California-14" "NA-NA" "NA-NA" "Illinois-10" "Illinois-10"
This is already a result for each row. The only problem is, while they are the correct results for the coordinates of each row, they the name for those coordinates only during congress 104. Hence, these values are only valid for the rows in point_geo_test
where congress == 104.
Extracting correct values for all rows
We will create a function that returns the correct data for all rows, eg the correct name for the coordinates during the associated congress.
I've simplified your code slightly: the df_test
is not an intermediate data frame any more, but defined directly in the creation of point_geo_test
. Any values I extract, I'll save into this data frame as well.
library(tidyverse)
library(sf)
sf_use_s2(FALSE)
districts_104 <- st_read("districts104.shp")
districts_111 <- st_read("districts111.shp")
congress <- c(104, 111, 104, 111, 104, 111)
latitude <- c(37.32935, 37.32935, 41.1134016, 41.1134016, 42.1554948, 42.1554948)
longitude <- c(-122.00954, -122.00954, 73.720356, 73.720356, -87.868850502543, -87.868850502543)
point_geo_test <- st_as_sf(data.frame(congress, latitude, longitude),
coords = c(x = "longitude", y = "latitude"),
crs = st_crs(districts_104))
To keep the code more flexible and organized, I'll create a generic function that can fetch any parameter for the given coordinates:
extract_values <- function(points, parameter) {
# initialize return values, one for each row in `points`
values <- rep(NA, nrow(points))
# for each congress present in `points`, lookup parameter and store in the rows with matching congress
for(cong in unique(points$congress)) {
shapefile <- get(paste0("districts_", cong))
st_join_results <- st_join(points, shapefile, join = st_within)
values[points$congress == cong] <- st_join_results[[parameter]][points$congress == cong]
}
return(values)
}
Examples:
> extract_values(point_geo_test, 'STATENAME')
[1] "California" "California" NA NA "Illinois" "Illinois"
> extract_values(point_geo_test, 'DISTRICT')
[1] "14" "15" NA NA "10" "10"
Storing values
point_geo_test$state <- extract_values(point_geo_test, 'STATENAME')
point_geo_test$district <- extract_values(point_geo_test, 'DISTRICT')
point_geo_test$name <- paste(point_geo_test$state, point_geo_test$district, sep = "-")
Result:
> point_geo_test
Simple feature collection with 6 features and 4 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -122.0095 ymin: 37.32935 xmax: 73.72036 ymax: 42.15549
Geodetic CRS: GRS 1980(IUGG, 1980)
congress state district name geometry
1 104 California 14 California-14 POINT (-122.0095 37.32935)
2 111 California 15 California-15 POINT (-122.0095 37.32935)
3 104 <NA> <NA> NA-NA POINT (73.72036 41.1134)
4 111 <NA> <NA> NA-NA POINT (73.72036 41.1134)
5 104 Illinois 10 Illinois-10 POINT (-87.86885 42.15549)
6 111 Illinois 10 Illinois-10 POINT (-87.86885 42.15549)
Related Topics
Inserting Stargazer or Xable Table into Knitr Document
How to Get a List of All Possible Partitions of a Vector in R
How to Calculate the Median on Grouped Dataset
Adding All Elements of Two Lists
Concatenate Values Across Columns in Data.Table, Row by Row
Ggplot2: Dashed Line in Legend
Check Whether All Elements of a List Are in Equal in R
R: How to Aggregate Some Columns While Keeping Other Columns
Find the Nearest X,Y Coordinate Using R
Split Data.Frame into Groups by Column Name
Understanding Lm and Environment
R Looping Through in Survey Package
Plot Curved Lines Between Two Locations in Ggplot2
R - Data Frame - Convert to Sparse Matrix
R Multiple Conditions in If Statement
How to Merge Multiple Data.Frames and Sum and Average Columns at the Same Time in R