Group_By() into Fill() Not Working as Expected

group_by() into fill() not working as expected

Looks like this has been fixed in the development version of tidyr. You now get the expected result per id using fill from tidyr_0.3.1.9000.

df %>% group_by(id) %>% fill(email)

Source: local data frame [6 x 2]
Groups: id [3]

     id         email
  (dbl)        (fctr)
1     1 bob@email.com
2     1 bob@email.com
3     2 joe@email.com
4     2 joe@email.com
5     3            NA
6     3            NA

Unable to use tidyselect `everything()` in combination with `group_by()` and `fill()`

You can do:

df %>%
 group_by(x1) %>%
 fill(-x1, .direction = "updown")

  x1       x2    x3
  <chr> <dbl> <dbl>
1 A         8     3
2 A         8     6
3 A         8     5
4 B         5     9
5 B         5     1
6 B         5     9

This behavior is documented in the documentation of tidyr (also look at the comment from @Gregor):

You can supply bare variable names, select all variables between x and
z with x:z, exclude y with -y.

How to groupby and back-fill only certain groups

One option would be to create a second column, duplicating only the groups that you wanted to fill. Then, I use coalesce to combine the two columns together.

library(tidyverse)

df %>% 
  mutate(return2 = ifelse(firms %in% c("B", "C"), return, NA)) %>% 
  group_by(firms) %>% 
  fill(return2, .direction="up") %>% 
  mutate(return = coalesce(return, return2)) %>% 
  select(-return2)

Another option is to create a new dataframe with the groups that you want to fill, then join the data back to the original dataframe. Then, I apply coalesce to the two columns that start with "return".

df %>% 
  filter(firms != "A") %>% 
  group_by(firms) %>% 
  fill(return, .direction="up") %>% 
  left_join(df, ., by = c("date", "firms")) %>% 
  mutate(return = coalesce(!!!select(., starts_with("return")))) %>% 
  select(-c(return.x, return.y))

Another option is to split the dataframe by groups into a list of tibbles. Then, I select the groups to fill, then bind back together.

df %>%
    group_split(firms, .keep = TRUE) %>% 
    map_at(c(2:3), fill, return, .direction="up") %>% 
    map_dfr(., bind_rows)

Output

   date firms return
  <int> <chr>  <int>
1  1999 A          5
2  2000 A         NA
3  2001 A          6
4  1999 B          9
5  2000 B         10
6  2001 B         10
7  1999 C          8
8  2000 C          3
9  2001 C          3

group_by function is not working with another group_by

Since both the groups are same no need to calculate them differently, you can combine them and calculate hr_rain and RAINFALL together.

library(dplyr)

df %>% 
  group_by(STATION, CODE, gr = cumsum(HOUR == '09')) %>% 
  mutate(hr_rain = zoo::na.approx(hr_rain, rule = 2, maxgap = 2, na.rm = FALSE), 
         RAINFALL = hr_rain - lag(hr_rain, default = 0))

data

df <- structure(list(STATION = c("SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", 
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", 
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", 
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", 
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", 
"SHIVAMOGGA"), CODE = c(163, 163, 163, 163, 163, 163, 163, 163, 
163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 
163, 163, 163), DATE = c("06/09/18", "06/09/18", "06/09/18", 
"06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", 
"06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", 
"06/09/18", "06/09/18", "06/10/19", "06/10/19", "06/10/19", "06/10/19", 
"06/10/19", "06/10/19", "06/10/19"), HOUR = c("00", "04", "05", 
"06", "07", "08", "09", "10", "11", "12", "13", "14", "15", "16", 
"17", "18", "19", "03", "05", "06", "07", "08", "09", "10"), 
hr_rain = c(1, 1, NA, 1.5, 2.5, NA, 0, 0.5, 0.5, NA, NA, 
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, NA, NA, NA, 0.5, 0, 0)), row.names = c(NA, 
-24L), class = "data.frame")

fill missing categorial values using dplyr group_by

You can coalesce the value column with the unique value if all the values are the same (n_distinct == 1) otherwise NA, which will leave the column as is:

incomplete_table %>% 
    group_by(id) %>% 
    mutate(value = coalesce(value, if (n_distinct(na.omit(value)) == 1) na.omit(value)[1] else NA_character_))

# A tibble: 7 x 2
# Groups:   id [3]
#     id value
#  <dbl> <chr>
#1     1     a
#2     1     a
#3     2     b
#4     2     b
#5     3     c
#6     3     d
#7     3  <NA>

Filling missing value in group

Alternative solution, though perhaps a bit flawed in how many assumptions it makes:

library(dplyr)
y %>%
  group_by(V1) %>%
  arrange(V2) %>%
  mutate(V2 = V2[1])
# Source: local data frame [9 x 2]
# Groups: V1 [3]
#      V1    V2
#   (chr) (int)
# 1     A     1
# 2     A     1
# 3     A     1
# 4     B     2
# 5     B     2
# 6     B     2
# 7     C    NA
# 8     C    NA
# 9     C    NA

group_by and fill specific rows based on capitalised row observations

You can replace every capitalized rows with the sum of non-capitalized rows for each group:

#Data
data %>% 
  group_by(grp = cumsum(RegionName == toupper(RegionName))) %>%
  mutate(REGIONNAME = first(RegionName)) %>% 
  relocate(REGIONNAME, .before = RegionName) %>% 
  
  # Here
  mutate(across(starts_with("Año"), 
                ~ ifelse(REGIONNAME == RegionName, sum(.x[REGIONNAME != RegionName], na.rm = T), .x)))

# A tibble: 10 x 6
# Groups:   grp [3]
   REGIONNAME RegionName `Año 2004_1` `Año 2004_2` `Año 2004_3`   grp
   <chr>      <chr>             <dbl>        <dbl>        <dbl> <int>
 1 ANDALUCÍA  ANDALUCÍA           210          274          156     1
 2 ANDALUCÍA  Almería              NA           NA           NA     1
 3 ANDALUCÍA  Abla                 58           61           54     1
 4 ANDALUCÍA  Abrucena              6            2            1     1
 5 ANDALUCÍA  Adra                146          211          101     1
 6 ALBÁNCHEZ  ALBÁNCHEZ            35           68           37     2
 7 ALBÁNCHEZ  Alboloduy             2            2            2     2
 8 ALBÁNCHEZ  Albox                33           66           35     2
 9 ALCOLEA    ALCOLEA               1            1            2     3
10 ALCOLEA    Alcóntar              1            1            2     3

Using group_map to create multiple plots: fill color by each group?

Interesting question; I think this is a potential solution:

library(tidyverse)

mtcars %>% 
  group_by(cyl) %>%
  group_map(
    .f = ~ ggplot(.x, aes(x = mpg, y = disp, color = factor(.y$cyl))) +
      geom_point() +
      scale_color_manual(values = c("4" = "purple", "6" = "firebrick3", "8" = "deepskyblue"))
  )
#> [[1]]

Sample Image

#> 
#> [[2]]

Sample Image

#> 
#> [[3]]

Sample Image

^{Created on 2022-03-22 by the reprex package (v2.0.1)}

Does that solve your problem?

tidyr; %% group_by() mutate(foo = fill() )

Seems you need the first LET for each group; You can extract the first element from vector LET for each group, mutate will broadcast/cycle the value within the group:

df %>% group_by(id, grp) %>% mutate(grp_LET = first(LET))

# A tibble: 17 x 4
# Groups:   id, grp [5]
#      id   grp   LET grp_LET
#   <int> <dbl> <chr>   <chr>
# 1     0     0     A       A
# 2     0     0     B       A
# 3     0     0     B       A
# 4     0     1     B       B
# 5     0     1     B       B
# 6     0     1     A       B
# 7     0     1     A       B
# 8     1     0     A       A
# 9     1     0     B       A
#10     1     1     B       B
#11     1     1     B       B
#12     1     1     A       B
#13     1     1     A       B
#14     1     1     A       B
#15     1     2     A       A
#16     1     2     B       A
#17     1     2     B       A

Group_By() into Fill() Not Working as Expected