adding default values to item x group pairs that don't have a value (df % % spread % % gather seems strange)
There is a new function complete
in the development version of tidyr
that does this.
df1 %>% complete(itemid, groupid, fill = list(value = 0))
## itemid groupid value
## 1 1 one 3
## 2 1 two 0
## 3 2 one 2
## 4 2 two 0
## 5 3 one 1
## 6 3 two 0
## 7 4 one 0
## 8 4 two 2
## 9 5 one 0
## 10 5 two 3
## 11 6 one 22
## 12 6 two 1
Spread/ Gather Error: Must supply a symbol or a string as argument
The following approach works by keeping the data in long form until you want to view it in wide form at the end. The basic approach is:
library(dplyr)
library(tidyr)
library(lubridate)
df <- tribble(
~Timestamp, ~area, ~count, ~type,
"2019-08-28 00:30:00", "area1", 4, "A",
"2019-08-28 00:30:01", "area1", 1, "B",
"2019-08-28 00:30:02", "area1", 8, "C",
"2019-08-28 00:30:03", "area2", 8, "A",
"2019-08-28 00:30:04", "area2", 1, "B",
"2019-08-28 00:30:04", "area2", 8, "C",
"2019-08-28 00:30:06", "area3", 18, "A")
df$Timestamp <- ymd_hms(df$Timestamp)
df$date <- ymd_hms(df$Timestamp) %>% date()
df$area <- factor(df$area)
df$type <- factor(df$type)
df %>%
group_by(date, area, type) %>%
summarize(count = sum(count)) %>%
spread(key = type, value = count)
# # A tibble: 3 x 5
# # Groups: date, area [3]
# date area A B C
# <date> <fct> <dbl> <dbl> <dbl>
# 2019-08-28 area1 4 1 8
# 2019-08-28 area2 8 1 8
# 2019-08-28 area3 18 NA NA
How to complete missing data in R
You can expand using factor levels in complete
:
tidyr::complete(x, Name = factor(Name, levels = c('John', 'Dora')),
fill = list(Age = 0))
How to complete the missing values of the long form data frame based on reference vectors
We can use complete
library(tidyr)
library(dplyr)
complete(df, source = complete_source, day = complete_day, fill = list(score = 0))
# A tibble: 12 x 3
# source day score
# <chr> <chr> <dbl>
# 1 a D1 10
# 2 a D2 0
# 3 a D3 0
# 4 a D4 0
# 5 b D1 0
# 6 b D2 5
# 7 b D3 3
# 8 b D4 0
# 9 c D1 0
#10 c D2 0
#11 c D3 0
#12 c D4 0
Or do a crossing
with the vector
s and join
crossing(source = complete_source, day = complete_day) %>%
left_join(df) %>%
mutate(score = replace_na(score, 0))
In base R
, this can be done with expand.grid/merge
transform(merge(expand.grid(source = complete_source,
day = complete_day), df, all.x = TRUE),
score = replace(score, is.na(score), 0))
How to expand a large dataframe in R
expand.grid
is a useful function here,
mergedData <- merge(
expand.grid(id = unique(df$id), spp = unique(df$spp)),
df, by = c("id", "spp"), all =T)
mergedData[is.na(mergedData$y), ]$y <- 0
mergedData$date <- rep(levels(df$date),
each = length(levels(df$spp)))
Since you're not actually doing anything to subsets of the data I don't think plyr
will help, maybe more efficient ways with data.table
.
Add rows to grouped data with dplyr?
Without dplyr it can be done like this:
as.data.frame(xtabs(Demand ~ Week + Article, data))
giving:
Week Article Freq
1 2013-W01 10004 1215
2 2013-W02 10004 900
3 2013-W03 10004 774
4 2013-W04 10004 1170
5 2013-W01 10006 0
6 2013-W02 10006 0
7 2013-W03 10006 0
8 2013-W04 10006 5
9 2013-W01 10007 2
10 2013-W02 10007 0
11 2013-W03 10007 0
12 2013-W04 10007 0
and this can be rewritten as a magrittr or dplyr pipeline like this:
data %>% xtabs(formula = Demand ~ Week + Article) %>% as.data.frame()
The as.data.frame()
at the end could be omitted if a wide form solution were desired.
Sum across multiple columns with dplyr
dplyr >= 1.0.0 using across
sum up each row using rowSums
(rowwise
works for any aggreation, but is slower)
df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(across(where(is.numeric))))
sum down each column
df %>%
summarise(across(everything(), ~ sum(., is.na(.), 0)))
dplyr < 1.0.0
sum up each row
df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(.[1:5]))
sum down each column using superseeded summarise_all
:
df %>%
replace(is.na(.), 0) %>%
summarise_all(funs(sum))
Related Topics
Display Exact Value of a Variable in R
How to Document Data Sets with Roxygen
Object Not Found Error with Ddply Inside a Function
Join Two Data Frames in R Based on Closest Timestamp
How to Color Sliderbar (Sliderinput)
Set Ggplot Plots to Have Same X-Axis Width and Same Space Between Dot Plot Rows
Removing One Tablegrob When Applied to a Box Plot with a Facet_Wrap
Set R Plots X Axis to Show at Y=0
How to Save() with a Particular Variable Name
Ggplot2 Does Not Appear to Work When Inside a Function R
Align Multiple Plots in Ggplot2 When Some Have Legends and Others Don'T
Rstudio Shiny List from Checking Rows in Datatables
Shift Values in Single Column of Dataframe Up
Subset Xts Object by Time of Day