R Group by Date, and Summarize the Values

R group by date, and summarize the values

Use as.Date() then aggregate().

energy$Date <- as.Date(energy$Datetime)
aggregate(energy$value, by=list(energy$Date), sum)

EDIT

Emma made a good point about column names. You can preserve column names in aggregate by using the following instead:

aggregate(energy["value"], by=energy["Date"], sum)

R how Sum values by group by date

With aggregate from base R

aggregate(confirmed ~ date, Table, sum, na.rm = TRUE)

Or with dplyr

library(dplyr)
Table %>%
  group_by(date) %>%
  summarise(confirmed = sum(confirmed, na.rm = TRUE))

Group by summarize in between dates with dplyr

I believe your map2 statement is incorrect.

Here is another possible option using lubridate's within function.

library(dplyr)
library(lubridate)

df <- structure(list(IDsub = c("1001", "1002", "1003", "1004"), 
                     ID = c("id1", "id1", "id2", "id2"), 
                     start_date = structure(c(18628, 18629, 18632, 18637), class = "Date"), 
                     end_date = structure(c(18637, 18636, 18640, 18639), class = "Date"), 
                     value = c(1, 2, 2, 0)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))

#find start end date and create sequence
firstdate <- min(df$start_date)
lastdate <- max(df$end_date)
timeseq <-seq(firstdate, lastdate, by="1 day")

#split by id
dflist<-split(df, df$ID)
lapply(names(dflist), function(dfname){
   
   iddf<-dflist[[dfname]]
   #create time intervals for each row 
   intervals <-interval(iddf$start_date, iddf$end_date)
   meanvalues <- sapply(timeseq, function(nrow){
      withinresult <- nrow %within% intervals
      mean(iddf$value[withinresult], na.rm=TRUE)
   })
   tibble(dfname, timeseq, meanvalues)
})

The final result from the lapply statement is a list of data frames by ID. One could bind these together and reshape depending on the final intent.

How do I group my date variable into month/year in R?

Here is an example using dplyr. You simply use the corresponding date format string for month %m or year %Y in the format statement.

set.seed(123)
df <- data.frame(date = seq.Date(from =as.Date("01/01/1998", "%d/%m/%Y"), 
                                 to=as.Date("01/01/2000", "%d/%m/%Y"), by="day"),
                 value = sample(seq(5), 731, replace = TRUE))

head(df)
        date value
1 1998-01-01     2
2 1998-01-02     4
3 1998-01-03     3
4 1998-01-04     5
5 1998-01-05     5
6 1998-01-06     1

library(dplyr)

df %>%
mutate(month = format(date, "%m"), year = format(date, "%Y")) %>%
group_by(month, year) %>%
summarise(total = sum(value))

Source: local data frame [25 x 3]
Groups: month [?]

   month  year total
   (chr) (chr) (int)
1     01  1998   105
2     01  1999    91
3     01  2000     3
4     02  1998    74
5     02  1999    77
6     03  1998    96
7     03  1999    86
8     04  1998    91
9     04  1999    95
10    05  1998    93
..   ...   ...   ...

R - Summarize values between specific date range

A couple alternatives to consider. I assume your dates are actual dates and not character values.

You can try using fuzzyjoin to merge the two data.frames, including rows where the dates fall between start_dates and end_dates.

library(tidyverse)
library(fuzzyjoin)

fuzzy_left_join(
  date_df,
  df,
  by = c("start_dates" = "dates", "end_dates" = "dates"),
  match_fun = list(`<=`, `>=`)
) %>%
  group_by(start_dates, end_dates) %>%
  summarise(new_goal_column = sum(x))

Output

  start_dates end_dates  new_goal_column
  <date>      <date>               <dbl>
1 2021-01-01  2021-01-06              19
2 2021-01-07  2021-01-10               6

You can also try using data.table and joining.

library(data.table)

setDT(date_df)
setDT(df)

df[date_df, .(start_dates, end_dates, x), on = .(dates >= start_dates, dates <= end_dates)][
  , .(new_goal_column = sum(x)), by = .(start_dates, end_dates)
]

Output

   start_dates  end_dates new_goal_column
1:  2021-01-01 2021-01-06              19
2:  2021-01-07 2021-01-10               6

Conditional sum grouped by date in R

This should work, but without data to reproduce it's difficult to say:

library(dplyr)
DF %>%
   group_by(Date) %>%
   summarise(peq1 = sum(People == 1),
             pgeq1 = sum(People[People > 1]))

R - Mean calculation using group_by based on Date column?

You're almost in the right way. First ensure that your Date column is actually date. Then, when you do the grouping, do it by year only not by ymd which is in your dataframe. The script can be modified as follows.

years_nc$Date <- ymd(years_nc$Date)

years_nc %>%
  group_by(year(Date)) %>%
  summarize(avg_preci = mean(Average, na.rm = TRUE))
# #A tibble: 5 x 2
#     `year(Date)` avg_preci
#           <dbl>     <dbl>
# 1         2010   0.00196
# 2         2011   0.00196
# 3         2012   0.00196
# 4         2013   0.00196
# 5         2014   0.00196

Group by day and hour

You can use lubridate::ymd_hms to convert the date variable to date-time, group by day and hour from it and take mean value of price for each hour.

library(dplyr)

prices_2019 %>%
  mutate(date = lubridate::ymd_hms(date),
         date_hour = format(date, "%Y-%m-%d %H")) %>%
  group_by(date_hour) %>%
  summarize(mean_price = mean(price))

Summarizing a dataframe by date and group

It sounds like what you are looking for is a pivot table. I like to use reshape::cast for these types of tables. If there is more than one value returned for a given expenditure type for a given household/year/month combination, this will sum those values. If there is only one value, it returns the value. The "sum" argument is not required but only placed there to handle exceptions. I think if your data is clean you shouldn't need this argument.

hh <- c("hh1", "hh1", "hh1", "hh2", "hh2", "hh2", "hh3", "hh3", "hh3")
date <- c(sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"),  9))
value <- c(1:9)
type <- c("income", "water", "energy", "income", "water", "energy", "income", "water", "energy")
df <- data.frame(hh,  date, value,  type)

# Load lubridate library, add date and year
library(lubridate)
df$month <- month(df$date)
df$year <- year(df$date)

# Load reshape library, run cast from reshape, creates pivot table
library(reshape)
dfNew <- cast(df, hh+year+month~type, value = "value", sum)

> dfNew
   hh year month energy income water
1 hh1 1999     4      3      0     0
2 hh1 1999    10      0      1     0
3 hh1 1999    11      0      0     2
4 hh2 1999     2      0      4     0
5 hh2 1999     3      6      0     0
6 hh2 1999     6      0      0     5
7 hh3 1999     1      9      0     0
8 hh3 1999     4      0      7     0
9 hh3 1999     8      0      0     8

Summarizing values and plotting using POSIXct

If you want to group by each day separately, extract only the date from day_time. If you want to group by each hour for each day separately extract date along with hour using format so that 8 AM today is in a separate group than 8 AM on any other day.

library(dplyr)

df %>%
  mutate(date = as.Date(day_time), 
         hour = format(day_time, '%Y %m %d %H'))

#      id day_time            value date       hour         
#   <int> <dttm>              <dbl> <date>     <chr>        
# 1     1 2021-06-10 01:56:48     6 2021-06-10 2021 06 10 01
# 2     2 2021-06-10 01:47:53     0 2021-06-10 2021 06 10 01
# 3     4 2021-06-02 04:11:35    -2 2021-06-02 2021 06 02 04
# 4     7 2021-06-04 03:45:22     6 2021-06-04 2021 06 04 03
# 5    11 2021-06-09 19:46:59    -2 2021-06-09 2021 06 09 19
# 6    13 2021-06-04 21:44:34     0 2021-06-04 2021 06 04 21
# 7    14 2021-06-04 21:43:19    -6 2021-06-04 2021 06 04 21
# 8    15 2021-06-10 01:43:03    -2 2021-06-10 2021 06 10 01
# 9    20 2021-06-05 00:07:10     8 2021-06-05 2021 06 05 00
#10    23 2021-06-07 07:30:43    -1 2021-06-07 2021 06 07 07

You can use this column in group_by and use summarise as usual.

df %>%
  group_by(hour = format(day_time, '%Y-%m-%d %H')) %>%
  summarise(value = sum(value))

#   hour          value
#  <chr>         <dbl>
#1 2021-06-02 04    -2
#2 2021-06-04 03     6
#3 2021-06-04 21    -6
#4 2021-06-05 00     8
#5 2021-06-07 07    -1
#6 2021-06-09 19    -2
#7 2021-06-10 01     4

R Group by Date, and Summarize the Values