R group by date, and summarize the values
Use as.Date()
then aggregate()
.
energy$Date <- as.Date(energy$Datetime)
aggregate(energy$value, by=list(energy$Date), sum)
EDIT
Emma made a good point about column names. You can preserve column names in aggregate
by using the following instead:
aggregate(energy["value"], by=energy["Date"], sum)
R how Sum values by group by date
With aggregate
from base R
aggregate(confirmed ~ date, Table, sum, na.rm = TRUE)
Or with dplyr
library(dplyr)
Table %>%
group_by(date) %>%
summarise(confirmed = sum(confirmed, na.rm = TRUE))
Group by summarize in between dates with dplyr
I believe your map2 statement is incorrect.
Here is another possible option using lubridate's within
function.
library(dplyr)
library(lubridate)
df <- structure(list(IDsub = c("1001", "1002", "1003", "1004"),
ID = c("id1", "id1", "id2", "id2"),
start_date = structure(c(18628, 18629, 18632, 18637), class = "Date"),
end_date = structure(c(18637, 18636, 18640, 18639), class = "Date"),
value = c(1, 2, 2, 0)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))
#find start end date and create sequence
firstdate <- min(df$start_date)
lastdate <- max(df$end_date)
timeseq <-seq(firstdate, lastdate, by="1 day")
#split by id
dflist<-split(df, df$ID)
lapply(names(dflist), function(dfname){
iddf<-dflist[[dfname]]
#create time intervals for each row
intervals <-interval(iddf$start_date, iddf$end_date)
meanvalues <- sapply(timeseq, function(nrow){
withinresult <- nrow %within% intervals
mean(iddf$value[withinresult], na.rm=TRUE)
})
tibble(dfname, timeseq, meanvalues)
})
The final result from the lapply
statement is a list of data frames by ID. One could bind these together and reshape depending on the final intent.
How do I group my date variable into month/year in R?
Here is an example using dplyr
. You simply use the corresponding date format string for month %m
or year %Y
in the format
statement.
set.seed(123)
df <- data.frame(date = seq.Date(from =as.Date("01/01/1998", "%d/%m/%Y"),
to=as.Date("01/01/2000", "%d/%m/%Y"), by="day"),
value = sample(seq(5), 731, replace = TRUE))
head(df)
date value
1 1998-01-01 2
2 1998-01-02 4
3 1998-01-03 3
4 1998-01-04 5
5 1998-01-05 5
6 1998-01-06 1
library(dplyr)
df %>%
mutate(month = format(date, "%m"), year = format(date, "%Y")) %>%
group_by(month, year) %>%
summarise(total = sum(value))
Source: local data frame [25 x 3]
Groups: month [?]
month year total
(chr) (chr) (int)
1 01 1998 105
2 01 1999 91
3 01 2000 3
4 02 1998 74
5 02 1999 77
6 03 1998 96
7 03 1999 86
8 04 1998 91
9 04 1999 95
10 05 1998 93
.. ... ... ...
R - Summarize values between specific date range
A couple alternatives to consider. I assume your dates are actual dates and not character values.
You can try using fuzzyjoin
to merge the two data.frames, including rows where the dates
fall between start_dates
and end_dates
.
library(tidyverse)
library(fuzzyjoin)
fuzzy_left_join(
date_df,
df,
by = c("start_dates" = "dates", "end_dates" = "dates"),
match_fun = list(`<=`, `>=`)
) %>%
group_by(start_dates, end_dates) %>%
summarise(new_goal_column = sum(x))
Output
start_dates end_dates new_goal_column
<date> <date> <dbl>
1 2021-01-01 2021-01-06 19
2 2021-01-07 2021-01-10 6
You can also try using data.table
and joining.
library(data.table)
setDT(date_df)
setDT(df)
df[date_df, .(start_dates, end_dates, x), on = .(dates >= start_dates, dates <= end_dates)][
, .(new_goal_column = sum(x)), by = .(start_dates, end_dates)
]
Output
start_dates end_dates new_goal_column
1: 2021-01-01 2021-01-06 19
2: 2021-01-07 2021-01-10 6
Conditional sum grouped by date in R
This should work, but without data to reproduce it's difficult to say:
library(dplyr)
DF %>%
group_by(Date) %>%
summarise(peq1 = sum(People == 1),
pgeq1 = sum(People[People > 1]))
R - Mean calculation using group_by based on Date column?
You're almost in the right way. First ensure that your Date
column is actually date
. Then, when you do the grouping, do it by year
only not by ymd
which is in your dataframe. The script can be modified as follows.
years_nc$Date <- ymd(years_nc$Date)
years_nc %>%
group_by(year(Date)) %>%
summarize(avg_preci = mean(Average, na.rm = TRUE))
# #A tibble: 5 x 2
# `year(Date)` avg_preci
# <dbl> <dbl>
# 1 2010 0.00196
# 2 2011 0.00196
# 3 2012 0.00196
# 4 2013 0.00196
# 5 2014 0.00196
Group by day and hour
You can use lubridate::ymd_hms
to convert the date
variable to date-time, group by day and hour from it and take mean
value of price
for each hour.
library(dplyr)
prices_2019 %>%
mutate(date = lubridate::ymd_hms(date),
date_hour = format(date, "%Y-%m-%d %H")) %>%
group_by(date_hour) %>%
summarize(mean_price = mean(price))
Summarizing a dataframe by date and group
It sounds like what you are looking for is a pivot table. I like to use reshape::cast for these types of tables. If there is more than one value returned for a given expenditure type for a given household/year/month combination, this will sum those values. If there is only one value, it returns the value. The "sum" argument is not required but only placed there to handle exceptions. I think if your data is clean you shouldn't need this argument.
hh <- c("hh1", "hh1", "hh1", "hh2", "hh2", "hh2", "hh3", "hh3", "hh3")
date <- c(sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 9))
value <- c(1:9)
type <- c("income", "water", "energy", "income", "water", "energy", "income", "water", "energy")
df <- data.frame(hh, date, value, type)
# Load lubridate library, add date and year
library(lubridate)
df$month <- month(df$date)
df$year <- year(df$date)
# Load reshape library, run cast from reshape, creates pivot table
library(reshape)
dfNew <- cast(df, hh+year+month~type, value = "value", sum)
> dfNew
hh year month energy income water
1 hh1 1999 4 3 0 0
2 hh1 1999 10 0 1 0
3 hh1 1999 11 0 0 2
4 hh2 1999 2 0 4 0
5 hh2 1999 3 6 0 0
6 hh2 1999 6 0 0 5
7 hh3 1999 1 9 0 0
8 hh3 1999 4 0 7 0
9 hh3 1999 8 0 0 8
Summarizing values and plotting using POSIXct
If you want to group by each day separately, extract only the date from day_time
. If you want to group by each hour for each day separately extract date along with hour
using format
so that 8 AM today is in a separate group than 8 AM on any other day.
library(dplyr)
df %>%
mutate(date = as.Date(day_time),
hour = format(day_time, '%Y %m %d %H'))
# id day_time value date hour
# <int> <dttm> <dbl> <date> <chr>
# 1 1 2021-06-10 01:56:48 6 2021-06-10 2021 06 10 01
# 2 2 2021-06-10 01:47:53 0 2021-06-10 2021 06 10 01
# 3 4 2021-06-02 04:11:35 -2 2021-06-02 2021 06 02 04
# 4 7 2021-06-04 03:45:22 6 2021-06-04 2021 06 04 03
# 5 11 2021-06-09 19:46:59 -2 2021-06-09 2021 06 09 19
# 6 13 2021-06-04 21:44:34 0 2021-06-04 2021 06 04 21
# 7 14 2021-06-04 21:43:19 -6 2021-06-04 2021 06 04 21
# 8 15 2021-06-10 01:43:03 -2 2021-06-10 2021 06 10 01
# 9 20 2021-06-05 00:07:10 8 2021-06-05 2021 06 05 00
#10 23 2021-06-07 07:30:43 -1 2021-06-07 2021 06 07 07
You can use this column in group_by
and use summarise
as usual.
df %>%
group_by(hour = format(day_time, '%Y-%m-%d %H')) %>%
summarise(value = sum(value))
# hour value
# <chr> <dbl>
#1 2021-06-02 04 -2
#2 2021-06-04 03 6
#3 2021-06-04 21 -6
#4 2021-06-05 00 8
#5 2021-06-07 07 -1
#6 2021-06-09 19 -2
#7 2021-06-10 01 4
Related Topics
Change Values in Multiple Columns of a Dataframe Using a Lookup Table
Overlay Two Ggplot2 Stat_Density2D Plots with Alpha Channels
Unexpected 'Else' in "Else" Error
Automatically Adjust Latex Table Width to Fit PDF Using Knitr and Rstudio
Replace a Value Na with the Value from Another Column in R
Data.Table Join Then Add Columns to Existing Data.Frame Without Re-Copy
Apply a Function to a Subset of Data.Table Columns, by Column-Indices Instead of Name
Suggestions for Speeding Up Random Forests
Selecting Columns in R Data Frame Based on Those *Not* in a Vector
Identifying Dependencies of R Functions and Scripts
Reading Text File with Multiple Space as Delimiter in R
Importing CSV File into R - Numeric Values Read as Characters
R: How to Split a Data Frame into Training, Validation, and Test Sets
How to Suppress the Vertical Gridlines in a Ggplot2 Plot