loess regression on each group with dplyr::group_by()
You may have already figured this out -- but if not, here's some help.
Basically, you need to feed the predict function a data.frame (a vector may work too but I didn't try it) of the values you want to predict at.
So for your case:
fems <- fems %>%
group_by(CpG) %>%
arrange(CpG, AVGMOrder) %>%
mutate(Loess = predict(loess(Meth ~ AVGMOrder, span = .5, data=.),
data.frame(AVGMOrder = seq(min(AVGMOrder), max(AVGMOrder), 1))))
Note, loess requires a minimum number of observations to run (~4? I can't remember precisely). Also, this will take a while to run so test with a slice of your data to make sure it's working properly.
Predict specific x-value after loess regression on each group with dplyr::group_by()
Feels like I'm making this too simple but don't you just want...
map(model$m, ~ predict(.x, newdata = 70))
[[1]]
[1] 88.66499
[[2]]
[1] 94.66321
Reversing the direction of the prediction since it's bivariate
library(dplyr)
library(purrr)
library(tidyr)
model <- df %>%
nest(-sample) %>%
drop_na() %>%
group_by(sample) %>%
mutate(m = purrr::map(data, loess, # Perform loess calculation on each sample_long group
formula = time ~ measurement, span = 0.25), # Make span as small as possible in order to draw the nearest straighest line
fitted = purrr::map(m, `[[`, "fitted")) # Retrieve the fitted values from each model
#> Warning: Problem with `mutate()` input `m`.
#> x pseudoinverse used at 90.2
#> ℹ Input `m` is `purrr::map(data, loess, formula = time ~ measurement, span = 0.25)`.
names(model$m) <- model$sample
map(model$m, ~ predict(.x, newdata = 70))
#> $cat
#> [1] 17.08772
#>
#> $dog
#> [1] 15.03579
How can apply a loess function and get predictions by groups using dplyr in r?
If you want to get loess predictions for each country, you might want to use a nest()
ed data frame. This will let you set up a column that contains data frames for country-specific data, and then run loess()
and predict()
on those individual data frames, then unnest()
to bring the results back into a standard format.
Here's some code that nests your data, runs the analysis on each country, then pulls it back out to a regular data frame:
library(tidyverse)
data.1.with.pred <- data.1 %>%
group_by(country) %>%
arrange(country, year) %>%
nest() %>%
mutate(pred.response = purrr::map(data, function(x)stats::loess(response~year, span= 0.75, data = x) %>%
stats::predict(data.frame(year = seq(min(x$year), max(x$year), 1))))) %>%
unnest(cols = c(data, pred.response))
data.1.with.pred %>%
ggplot() +
geom_point(aes(x = year, y = response, colour = country)) +
geom_line(aes(x = year,y=pred.response, colour = country))
The resulting data frame has annual loess predictions for each country, as opposed to for all countries together, and the plot looks like this:
Is this what you were trying to do?
Loess smooth extracted values by group errors
First group by id
, then fit a model to each group's data and use broom::augment
to extract the fitted values and, as a bonus, the residuals. This preserves the inputs x
and y
; it will be easier to plot x
vs .fitted
for example.
library("tidyverse")
test %>%
group_by(id) %>%
group_modify(
# .x refers to the subset of rows that belong to a group.
# It's a smaller data frame with the same columns as the input
# but fewer rows.
~ loess(y ~ x, span = .5, data = .x) %>% broom::augment()
)
#> # A tibble: 145 × 5
#> # Groups: id [3]
#> id y x .fitted .resid
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.658 -8.10 0.834 -0.176
#> 2 1 0.738 -8.06 0.897 -0.159
#> 3 1 0.846 -8.01 0.989 -0.142
#> 4 1 0.906 -7.96 0.975 -0.0688
#> 5 1 0.964 -7.91 0.982 -0.0180
#> 6 1 0.971 -7.90 0.983 -0.0121
#> 7 1 0.962 -7.90 0.980 -0.0179
#> 8 1 0.953 -7.91 0.981 -0.0289
#> 9 1 0.943 -7.91 0.983 -0.0404
#> 10 1 0.928 -7.92 0.977 -0.0487
#> # … with 135 more rows
The original data frame might have more columns than x
and y
that are not used in the loess fit. To keep those columns, pass the group data .x
to the augment
function as well.
test %>%
mutate(
# Extra columns that we don't need for the loess fit but we want to keep.
z = rnorm(n()),
w = rnorm(n())
) %>%
group_by(id) %>%
group_modify(
# Now `broom::augment` appends .fitted and .resid to the original columns.
~ loess(y ~ x, span = .5, data = .x) %>% broom::augment(.x)
)
#> # A tibble: 145 × 7
#> # Groups: id [3]
#> id x y z w .fitted .resid
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 -8.10 0.658 1.31 -0.199 0.834 -0.176
#> 2 1 -8.06 0.738 0.395 -1.74 0.897 -0.159
#> 3 1 -8.01 0.846 -1.34 0.382 0.989 -0.142
#> 4 1 -7.96 0.906 0.376 0.231 0.975 -0.0688
#> 5 1 -7.91 0.964 -0.224 0.327 0.982 -0.0180
#> 6 1 -7.90 0.971 0.678 -0.504 0.983 -0.0121
#> 7 1 -7.90 0.962 0.736 -0.0186 0.980 -0.0179
#> 8 1 -7.91 0.953 0.368 0.0313 0.981 -0.0289
#> 9 1 -7.91 0.943 -1.35 1.02 0.983 -0.0404
#> 10 1 -7.92 0.928 0.280 -0.471 0.977 -0.0487
#> # … with 135 more rows
Created on 2022-03-08 by the reprex package (v2.0.1)
Fit loess smoothers for multiple groups across multiple numeric variables
We can get the data in long format using. pivot_longer
, group_by
Animal
and column name and apply loess
to each combinaton.
library(dplyr)
library(tidyr)
TwoVarDF %>%
pivot_longer(cols = starts_with('Var')) %>%
group_by(Animal, name) %>%
mutate(model = loess(value~Day, span = 0.3)$fitted)
regression by group and retain all the columns in R
Use group_split
then loop through each group using map_dfr
to bind ID, ID2
and augment
output using bind_cols
library(dplyr)
library(purrr)
dat %>% group_split(ID3) %>%
map_dfr(~bind_cols(select(.x,ID,ID2), augment(lm(value~yearRef, data=.x))), .id = "ID3")
# A tibble: 18,576 x 12
ID3 ID ID2 value yearRef .fitted .se.fit .resid .hat .sigma .cooksd
<chr> <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 196 16 -0.385 2009 -0.0406 0.0308 -0.344 1.00e-3 0.973 6.27e-5
2 1 372 47 -0.793 2012 -0.0676 0.0414 -0.726 1.81e-3 0.973 5.05e-4
3 1 470 15 -0.496 2011 -0.0586 0.0374 -0.438 1.48e-3 0.973 1.50e-4
4 1 242 40 -1.13 2010 -0.0496 0.0338 -1.08 1.21e-3 0.973 7.54e-4
5 1 471 34 1.28 2006 -0.0135 0.0262 1.29 7.26e-4 0.972 6.39e-4
6 1 434 35 -1.09 1998 0.0586 0.0496 -1.15 2.61e-3 0.973 1.82e-3
7 1 467 45 -0.0663 2011 -0.0586 0.0374 -0.00769 1.48e-3 0.973 4.64e-8
8 1 334 27 -1.37 2003 0.0135 0.0305 -1.38 9.86e-4 0.972 9.92e-4
9 1 186 25 -0.0195 2003 0.0135 0.0305 -0.0331 9.86e-4 0.973 5.71e-7
10 1 114 34 1.09 2014 -0.0857 0.0500 1.18 2.64e-3 0.973 1.94e-3
# ... with 18,566 more rows, and 1 more variable: .std.resid <dbl>
Fitting several regression models with dplyr
As of mid 2020 (and updated to fit dplyr
1.0+ as of 2022-04), tchakravarty's answer will fail. In order to circumvent the new approach of broom
and dpylr
seem to interact, the following combination of broom::tidy
, broom::augment
and broom::glance
can be used. We just have to use them in conjunvtion with nest_by()
and summarize()
(previously inside do()
and later unnest()
the tibble).
library(dplyr)
library(broom)
library(tidyr)
set.seed(42)
df.h = data.frame(
hour = factor(rep(1:24, each = 21)),
price = runif(504, min = -10, max = 125),
wind = runif(504, min = 0, max = 2500),
temp = runif(504, min = - 10, max = 25)
)
df.h %>%
nest_by(hour) %>%
mutate(mod = list(lm(price ~ wind + temp, data = data))) %>%
summarize(tidy(mod))
# # A tibble: 72 × 6
# # Groups: hour [24]
# hour term estimate std.error statistic p.value
# <fct> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 (Intercept) 87.4 15.8 5.55 0.0000289
# 2 1 wind -0.0129 0.0120 -1.08 0.296
# 3 1 temp 0.588 0.849 0.693 0.497
# 4 2 (Intercept) 92.3 21.6 4.27 0.000466
# 5 2 wind -0.0227 0.0134 -1.69 0.107
# 6 2 temp -0.216 0.841 -0.257 0.800
# 7 3 (Intercept) 61.1 18.6 3.29 0.00409
# 8 3 wind 0.00471 0.0128 0.367 0.718
# 9 3 temp 0.425 0.964 0.442 0.664
# 10 4 (Intercept) 31.6 15.3 2.07 0.0529
df.h %>%
nest_by(hour) %>%
mutate(mod = list(lm(price ~ wind + temp, data = data))) %>%
summarize(augment(mod))
# # A tibble: 504 × 10
# # Groups: hour [24]
# hour price wind temp .fitted .resid .hat .sigma .cooksd .std.resid
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 113. 288. -1.75 82.7 30.8 0.123 37.8 0.0359 0.877
# 2 1 117. 2234. 18.4 69.5 47.0 0.201 36.4 0.165 1.40
# 3 1 28.6 1438. 4.75 71.7 -43.1 0.0539 37.1 0.0265 -1.18
# 4 1 102. 366. 9.77 88.5 13.7 0.151 38.4 0.00926 0.395
# 5 1 76.6 2257. -4.69 55.6 21.0 0.245 38.2 0.0448 0.644
# 6 1 60.1 633. -3.18 77.4 -17.3 0.0876 38.4 0.00749 -0.484
# 7 1 89.4 376. -4.16 80.1 9.31 0.119 38.5 0.00314 0.264
# 8 1 8.18 1921. 19.2 74.0 -65.9 0.173 34.4 0.261 -1.93
# 9 1 78.7 575. -6.11 76.4 2.26 0.111 38.6 0.000170 0.0640
# 10 1 85.2 763. -0.618 77.2 7.94 0.0679 38.6 0.00117 0.219
# # … with 494 more rows
df.h %>%
nest_by(hour) %>%
mutate(mod = list(lm(price ~ wind + temp, data = data))) %>%
summarize(glance(mod))
# # A tibble: 24 × 13
# # Groups: hour [24]
# hour r.squared adj.r.squared sigma statistic p.value df logLik AIC
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 0.0679 -0.0357 37.5 0.655 0.531 2 -104. 217.
# 2 2 0.139 0.0431 42.7 1.45 0.261 2 -107. 222.
# 3 3 0.0142 -0.0953 43.1 0.130 0.879 2 -107. 222.
# 4 4 0.0737 -0.0293 36.7 0.716 0.502 2 -104. 216.
# 5 5 0.213 0.126 37.8 2.44 0.115 2 -104. 217.
# 6 6 0.0813 -0.0208 33.5 0.796 0.466 2 -102. 212.
# 7 7 0.0607 -0.0437 40.7 0.582 0.569 2 -106. 220.
# 8 8 0.153 0.0592 36.3 1.63 0.224 2 -104. 215.
# 9 9 0.166 0.0736 36.5 1.79 0.195 2 -104. 216.
# 10 10 0.110 0.0108 40.0 1.11 0.351 2 -106. 219.
# # … with 14 more rows, and 4 more variables: BIC <dbl>, deviance <dbl>,
# # df.residual <int>, nobs <int>
Credits to Bob Muenchen's Blog for inspiration on that.
How to get R^2 list when doing regression for each group separately in R
You want broom::glance
for this:
df %>%
group_by(art, shop) %>%
group_modify(~ {
fit <- my_lm(.x)
# tidy(fit)
# glance(fit)
full_join(tidy(fit), glance(fit), by = character())
}) %>%
ungroup()
Here I use a different approach, as what you're asking can be accomplished in a much more concise way. Not that the nest
-strategy is bad.
Note that the by = character()
in full_join
indicates that it should be a Cartesian product.
For loop over a grouped tibble: to run gam LOESS with specific span value for each group
Use index to store the models. In your current code, you are overwriting the models in list_mymodels
.
library(dplyr)
library(purrr)
span <- c(0.5, 0.3)
list_mymodels = vector('list', length(span))
for(i in seq_along(list_mymodels)) {
span1 <- span[[i]]
list_mymodels[[i]] <- turning_rate_4954 %>%
group_by(TEMP, SHARK, VIDEO) %>%
do(models= mygamLoess(x=.$Time_s,
y=.$turn_rate_dgs_s,
data=.,
span= span1,
degree=1))
}
Instead of for
loop, you can use map
or lapply
:
list_mymodels <- map(span, ~turning_rate_4954 %>%
group_by(TEMP, SHARK, VIDEO) %>%
do(models= mygamLoess(x=.$Time_s,
y=.$turn_rate_dgs_s,
data=.,
span= .x,
degree=1)))
If we want to apply only one model in each group we can use map2
.
list_mymodels <- turning_rate_4954 %>%
group_split(TEMP, SHARK, VIDEO) %>%
map2(span, ~mygamLoess(x=.x$Time_s,
y=.x$turn_rate_dgs_s,
data=.x,
span= .y,
degree=1))
How to perform a group_by with elements that are contiguous in R and dplyr
If you want to use only base R + tidyverse, this code exactly replicates your desired results
mydf <- tibble(group = c("x", "x", "x", "y", "z", "x", "x", "z"),
item = c(1, 2, 2, 3, 2, 2, 2, 1))
mydf
# A tibble: 8 × 2
group item
<chr> <dbl>
1 x 1
2 x 2
3 x 2
4 y 3
5 z 2
6 x 2
7 x 2
8 z 1
runs <- rle(mydf$group)
mydf %>%
mutate(run_id = rep(seq_along(runs$lengths), runs$lengths)) %>%
group_by(group, run_id) %>%
summarise(item = sum(item)) %>%
arrange(run_id) %>%
select(-run_id)
Source: local data frame [5 x 2]
Groups: group [3]
group item
<chr> <dbl>
1 x 5
2 y 3
3 z 2
4 x 4
5 z 1
Related Topics
Ggplot2: Using Gtable to Move Strip Labels to Top of Panel for Facet_Grid
R Shiny, How to Make Datatable React to Checkboxes in Datatable
Automated Httr Authentication with Twitter , Provide Response to Interactive Prompt in "Batch" Mode
Keeping Only Certain Rows of a Data Frame Based on a Set of Values
Remove Text After Final Period in String
Dictionary() Is Not Supported Anymore in Tm Package. How to Emend Code
Extend an Irregular Sequence and Add Zeros to Missing Values
Convert a Printed Message into a Character Vector
Adding Percentage Labels on Pie Chart in R
R Data.Table Breaks in Exported Functions
Programmatically Insert Header and Plot in Same Code Chunk with R Markdown Using Results='Asis'
Two Y-Axes with Different Scales for Two Datasets in Ggplot2
Convert Roman Numerals to Numbers in R
Partially Color Histogram in R
Lapply to Add Columns to Each Dataframe in a List
How to Expand Axis Asymmetrically with Ggplot2 Without Setting Limits Manually