Adding Default Values to Item X Group Pairs That Don't Have a Value (Df %>% Spread %>% Gather Seems Strange)

adding default values to item x group pairs that don't have a value (df % % spread % % gather seems strange)

There is a new function complete in the development version of tidyr that does this.

df1 %>% complete(itemid, groupid, fill = list(value = 0))
## itemid groupid value
## 1 1 one 3
## 2 1 two 0
## 3 2 one 2
## 4 2 two 0
## 5 3 one 1
## 6 3 two 0
## 7 4 one 0
## 8 4 two 2
## 9 5 one 0
## 10 5 two 3
## 11 6 one 22
## 12 6 two 1

Spread/ Gather Error: Must supply a symbol or a string as argument

The following approach works by keeping the data in long form until you want to view it in wide form at the end. The basic approach is:

library(dplyr)
library(tidyr)
library(lubridate)

df <- tribble(
~Timestamp, ~area, ~count, ~type,
"2019-08-28 00:30:00", "area1", 4, "A",
"2019-08-28 00:30:01", "area1", 1, "B",
"2019-08-28 00:30:02", "area1", 8, "C",
"2019-08-28 00:30:03", "area2", 8, "A",
"2019-08-28 00:30:04", "area2", 1, "B",
"2019-08-28 00:30:04", "area2", 8, "C",
"2019-08-28 00:30:06", "area3", 18, "A")

df$Timestamp <- ymd_hms(df$Timestamp)
df$date <- ymd_hms(df$Timestamp) %>% date()
df$area <- factor(df$area)
df$type <- factor(df$type)

df %>%
group_by(date, area, type) %>%
summarize(count = sum(count)) %>%
spread(key = type, value = count)

# # A tibble: 3 x 5
# # Groups: date, area [3]
# date area A B C
# <date> <fct> <dbl> <dbl> <dbl>
# 2019-08-28 area1 4 1 8
# 2019-08-28 area2 8 1 8
# 2019-08-28 area3 18 NA NA

How to complete missing data in R

You can expand using factor levels in complete :

tidyr::complete(x, Name = factor(Name, levels = c('John', 'Dora')), 
fill = list(Age = 0))

How to complete the missing values of the long form data frame based on reference vectors

We can use complete

library(tidyr)
library(dplyr)
complete(df, source = complete_source, day = complete_day, fill = list(score = 0))
# A tibble: 12 x 3
# source day score
# <chr> <chr> <dbl>
# 1 a D1 10
# 2 a D2 0
# 3 a D3 0
# 4 a D4 0
# 5 b D1 0
# 6 b D2 5
# 7 b D3 3
# 8 b D4 0
# 9 c D1 0
#10 c D2 0
#11 c D3 0
#12 c D4 0

Or do a crossing with the vectors and join

crossing(source = complete_source, day = complete_day) %>% 
left_join(df) %>%
mutate(score = replace_na(score, 0))

In base R, this can be done with expand.grid/merge

transform(merge(expand.grid(source = complete_source, 
day = complete_day), df, all.x = TRUE),
score = replace(score, is.na(score), 0))

How to expand a large dataframe in R

expand.grid is a useful function here,

mergedData <- merge(
expand.grid(id = unique(df$id), spp = unique(df$spp)),
df, by = c("id", "spp"), all =T)

mergedData[is.na(mergedData$y), ]$y <- 0

mergedData$date <- rep(levels(df$date),
each = length(levels(df$spp)))

Since you're not actually doing anything to subsets of the data I don't think plyr will help, maybe more efficient ways with data.table.

Add rows to grouped data with dplyr?

Without dplyr it can be done like this:

as.data.frame(xtabs(Demand ~ Week + Article, data))

giving:

       Week Article Freq
1 2013-W01 10004 1215
2 2013-W02 10004 900
3 2013-W03 10004 774
4 2013-W04 10004 1170
5 2013-W01 10006 0
6 2013-W02 10006 0
7 2013-W03 10006 0
8 2013-W04 10006 5
9 2013-W01 10007 2
10 2013-W02 10007 0
11 2013-W03 10007 0
12 2013-W04 10007 0

and this can be rewritten as a magrittr or dplyr pipeline like this:

data %>% xtabs(formula = Demand ~ Week + Article) %>% as.data.frame()

The as.data.frame() at the end could be omitted if a wide form solution were desired.

Sum across multiple columns with dplyr


dplyr >= 1.0.0 using across

sum up each row using rowSums (rowwise works for any aggreation, but is slower)

df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(across(where(is.numeric))))

sum down each column

df %>%
summarise(across(everything(), ~ sum(., is.na(.), 0)))

dplyr < 1.0.0

sum up each row

df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(.[1:5]))

sum down each column using superseeded summarise_all:

df %>%
replace(is.na(.), 0) %>%
summarise_all(funs(sum))


Related Topics



Leave a reply



Submit