Adding Default Values to Item X Group Pairs That Don't Have a Value (Df %>% Spread %>% Gather Seems Strange)

adding default values to item x group pairs that don't have a value (df % % spread % % gather seems strange)

There is a new function complete in the development version of tidyr that does this.

df1 %>% complete(itemid, groupid, fill = list(value = 0))
##    itemid groupid value
## 1       1     one     3
## 2       1     two     0
## 3       2     one     2
## 4       2     two     0
## 5       3     one     1
## 6       3     two     0
## 7       4     one     0
## 8       4     two     2
## 9       5     one     0
## 10      5     two     3
## 11      6     one    22
## 12      6     two     1

Spread/ Gather Error: Must supply a symbol or a string as argument

The following approach works by keeping the data in long form until you want to view it in wide form at the end. The basic approach is:

library(dplyr)
library(tidyr)
library(lubridate)

df <- tribble(
~Timestamp, ~area, ~count, ~type,
"2019-08-28 00:30:00", "area1", 4, "A",
"2019-08-28 00:30:01", "area1", 1, "B",
"2019-08-28 00:30:02", "area1", 8, "C",
"2019-08-28 00:30:03", "area2", 8, "A",
"2019-08-28 00:30:04", "area2", 1, "B",
"2019-08-28 00:30:04", "area2", 8, "C",
"2019-08-28 00:30:06", "area3", 18, "A")

df$Timestamp <- ymd_hms(df$Timestamp)
df$date <- ymd_hms(df$Timestamp) %>% date()
df$area <- factor(df$area)
df$type <- factor(df$type)

df %>%
  group_by(date, area, type) %>%
  summarize(count = sum(count)) %>%
  spread(key = type, value = count)

# # A tibble: 3 x 5
# # Groups:   date, area [3]
# date       area      A     B     C
# <date>     <fct> <dbl> <dbl> <dbl>
# 2019-08-28 area1     4     1     8
# 2019-08-28 area2     8     1     8
# 2019-08-28 area3    18    NA    NA

How to complete missing data in R

You can expand using factor levels in complete :

tidyr::complete(x, Name = factor(Name, levels = c('John', 'Dora')), 
                   fill = list(Age = 0))

How to complete the missing values of the long form data frame based on reference vectors

We can use complete

library(tidyr)
library(dplyr)
complete(df, source = complete_source, day = complete_day, fill = list(score = 0))
# A tibble: 12 x 3
#   source day   score
#   <chr>  <chr> <dbl>
# 1 a      D1       10
# 2 a      D2        0
# 3 a      D3        0
# 4 a      D4        0
# 5 b      D1        0
# 6 b      D2        5
# 7 b      D3        3
# 8 b      D4        0
# 9 c      D1        0
#10 c      D2        0
#11 c      D3        0
#12 c      D4        0

Or do a crossing with the vectors and join

crossing(source = complete_source, day = complete_day) %>% 
        left_join(df) %>%
        mutate(score = replace_na(score, 0))

In base R, this can be done with expand.grid/merge

transform(merge(expand.grid(source = complete_source, 
      day = complete_day), df, all.x = TRUE), 
      score = replace(score, is.na(score), 0))

How to expand a large dataframe in R

expand.grid is a useful function here,

mergedData <- merge(
    expand.grid(id = unique(df$id), spp = unique(df$spp)),
    df, by = c("id", "spp"), all =T)

mergedData[is.na(mergedData$y), ]$y <- 0

mergedData$date <- rep(levels(df$date),
                       each = length(levels(df$spp)))

Since you're not actually doing anything to subsets of the data I don't think plyr will help, maybe more efficient ways with data.table.

Add rows to grouped data with dplyr?

Without dplyr it can be done like this:

as.data.frame(xtabs(Demand ~ Week + Article, data))

giving:

       Week Article Freq
1  2013-W01   10004 1215
2  2013-W02   10004  900
3  2013-W03   10004  774
4  2013-W04   10004 1170
5  2013-W01   10006    0
6  2013-W02   10006    0
7  2013-W03   10006    0
8  2013-W04   10006    5
9  2013-W01   10007    2
10 2013-W02   10007    0
11 2013-W03   10007    0
12 2013-W04   10007    0

and this can be rewritten as a magrittr or dplyr pipeline like this:

data %>% xtabs(formula = Demand ~ Week + Article) %>% as.data.frame()

The as.data.frame() at the end could be omitted if a wide form solution were desired.

Sum across multiple columns with dplyr

dplyr >= 1.0.0 using across

sum up each row using rowSums (rowwise works for any aggreation, but is slower)

df %>%
   replace(is.na(.), 0) %>%
   mutate(sum = rowSums(across(where(is.numeric))))

sum down each column

df %>%
   summarise(across(everything(), ~ sum(., is.na(.), 0)))

dplyr < 1.0.0

sum up each row

df %>%
   replace(is.na(.), 0) %>%
   mutate(sum = rowSums(.[1:5]))

sum down each column using superseeded summarise_all:

df %>%
   replace(is.na(.), 0) %>%
   summarise_all(funs(sum))

Adding Default Values to Item X Group Pairs That Don't Have a Value (Df %>% Spread %>% Gather Seems Strange)