R applying a function to a subset of a data frame
> aggregate( . ~ z, data=temp, FUN=mean)
z x y
1 1 1.505304 2.474642
2 3 1.533418 2.477191
When you will be applying the same function to multiple columns within categories of another column think about 'aggregate'. This is the version taht takes a formula argument where the "dot" before the tilde says to get the mean of all of the columns besides "z".
R Apply() function on specific dataframe columns
Using an example data.frame and example function (just +1 to all values)
A <- function(x) x + 1
wifi <- data.frame(replicate(9,1:4))
wifi
# X1 X2 X3 X4 X5 X6 X7 X8 X9
#1 1 1 1 1 1 1 1 1 1
#2 2 2 2 2 2 2 2 2 2
#3 3 3 3 3 3 3 3 3 3
#4 4 4 4 4 4 4 4 4 4
data.frame(wifi[1:3], apply(wifi[4:9],2, A) )
#or
cbind(wifi[1:3], apply(wifi[4:9],2, A) )
# X1 X2 X3 X4 X5 X6 X7 X8 X9
#1 1 1 1 2 2 2 2 2 2
#2 2 2 2 3 3 3 3 3 3
#3 3 3 3 4 4 4 4 4 4
#4 4 4 4 5 5 5 5 5 5
Or even:
data.frame(wifi[1:3], lapply(wifi[4:9], A) )
#or
cbind(wifi[1:3], lapply(wifi[4:9], A) )
# X1 X2 X3 X4 X5 X6 X7 X8 X9
#1 1 1 1 2 2 2 2 2 2
#2 2 2 2 3 3 3 3 3 3
#3 3 3 3 4 4 4 4 4 4
#4 4 4 4 5 5 5 5 5 5
R: apply function to subsets based on column value
Using Base R:
library(DescTools)
lapply(split(df,df$region),
function(x) (Gini(x$income, n = rep(1, length(x$income)), unbiased = TRUE,
conf.level = NA, R = 1000, type = "bca", na.rm = TRUE)))
Using tidyverse:
library(tidyverse)
library(DescTools)
df %>% group_by(region) %>% nest() %>%
mutate(gini_coef = map(data, ~Gini(.x$income, n = rep(1, length(.x$income)),
unbiased = TRUE, conf.level = NA, R = 1000, type = "bca", na.rm = TRUE))) %>%
select(-data) %>% unnest() %>% left_join(df)
Joining, by = "region"
# A tibble: 10 x 4
region gini_coef ID income
<fct> <dbl> <int> <int>
1 rot 0.177 1 3700
2 rot 0.177 9 4000
3 rot 0.177 10 4400
4 rot 0.177 12 2000
5 ams 0.0698 2 2500
6 ams 0.0698 6 3100
7 ams 0.0698 8 3000
8 utr 0.154 3 3300
9 utr 0.154 4 5300
10 utr 0.154 5 4400
Data
df <- read.table(text="
ID region income
1 rot 3700
2 ams 2500
3 utr 3300
4 utr 5300
5 utr 4400
6 ams 3100
8 ams 3000
9 rot 4000
10 rot 4400
12 rot 2000
",header=T)
Use an apply function to a subset of rows in a data frame - vectorised solution
All other solutions assuming the function called is vectorized, here's another if it's not the case:
sapply( 1:nrow(df.data), function(x) {
fnATimesB( df.data[x,'days'], df.data[x,'sal'] )
} )
Alternatively, you can use apply
here and avoid the anonymous function call, while slightly modifying your original function instead. The only thing to remember is that apply
converts the data set to a matrix and thus, you shouldn't have non-numeric columns in the input data, here is an example
fnATimesB <- function(df, a, b) {
df[a] * df[b]
}
apply(df.data[-1L], 1L, fnATimesB, a = 'days', b = 'sal')
## [1] 1000 12000 25000
apply function to subsets of dataframe r
You could use the dplyr package, as follows perhaps?
library(dplyr)
data1 %>%
group_by(Meteostation, Year) %>%
do(data.frame(biovars(.$pr, .$tasmin, .$tasmax)))
Apply custom function to each subset of a data frame and result a dataframe
dplyr
You could use do
in dplyr
:
library(dplyr)
df %>%
group_by(sample_id) %>%
do(f.get_reg(.))
Which gives:
sample_id N slope intercept S
(int) (int) (dbl) (dbl) (dbl)
1 6724 3 -0.08518211 26.12125 7.716050e-15
2 6728 3 -0.22387160 41.41037 5.551115e-17
data.table
Use .SD
in data.table
:
library(data.table)
df <- data.table(df)
df[,f.get_reg(.SD),sample_id]
Which gives the same result:
sample_id N slope intercept S
1: 6724 3 -0.08518211 26.12125 7.716050e-15
2: 6728 3 -0.22387160 41.41037 5.551115e-17
base R
Using by
:
resultList <- by(df,df$sample_id,f.get_reg)
sample_id <- names(resultList)
result <- do.call(rbind,resultList)
result$sample_id <- sample_id
rownames(result) <- NULL
Which gives:
N slope intercept S sample_id
1 3 -0.08518211 26.12125 7.716050e-15 6724
2 3 -0.22387160 41.41037 5.551115e-17 6728
creating a function to subset data frame in R
We can use [[
inside a function
f1 <- function(id){
df[df[["ID"]] == id,]
}
f1(11)
# ID Item
#1 11 a
Apply function to subsets of dataset
Here is one option -
- Use
ceiling
to round up the time values. - for each
ID
andyear
calculate the average value. - Use
complete
to create the missing year value. fill
to carry forward the average value.
library(dplyr)
library(tidyr)
df %>%
group_by(ID, year = ceiling(time)) %>%
summarise(mean_value = mean(value)) %>%
complete(year = min(year):max(year)) %>%
fill(mean_value) %>%
ungroup
# ID year mean_value
# <int> <dbl> <dbl>
#1 1 1 4
#2 1 2 4
#3 1 3 6
#4 1 4 12
#5 2 1 3
#6 2 2 6
#7 2 3 8
#8 3 1 1.7
#9 3 2 5
data
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L), time = c(0.1, 0.5, 2.1, 3.3, 0.3, 0.4,
0.6, 1.2, 1.5, 2.6, 2.7, 0.1, 0.4, 1.3, 1.5, 1.6), value = c(3,
5, 6, 12, 1, 3, 5, 4, 8, 2, 14, 1.1, 2.3, 6, 3, 6)),
class = "data.frame", row.names = c(NA, -16L))
Related Topics
Select List Element Programmatically Using Name Stored as String
Small Ggplot Object (1 Mb) Turns into 7 Gigabyte .Rdata Object When Saved
Add Points to Usmap with Ggplot in R
Group Data Frame by Pattern in R
Embed Instagram/Youtube into Shiny R App
R Mlogit Model, Computationally Singular
How to Keep The Only Intersection of The Spatial Features & Remove Everything Outside of a Boundary
Barplot with Multiple Columns in R
Extract Coefficients from Ggplot2-Created Nls Fit
When/How/Where Is Parent.Frame in a Default Argument Interpreted
How to Install Doredis Package Version 1.0.5 into R 3.0.1 on Windows
Standard Error of Variance Component from The Output of Lmer
Getting Stargazer Column Labels to Print on Two or Three Lines
Generating Split-Color Rectangles from Ggplot2 Geom_Raster()