R Group by Date, and Summarize the Values

R group by date, and summarize the values

Use as.Date() then aggregate().

energy$Date <- as.Date(energy$Datetime)
aggregate(energy$value, by=list(energy$Date), sum)

EDIT

Emma made a good point about column names. You can preserve column names in aggregate by using the following instead:

aggregate(energy["value"], by=energy["Date"], sum)

R how Sum values by group by date

With aggregate from base R

aggregate(confirmed ~ date, Table, sum, na.rm = TRUE)

Or with dplyr

library(dplyr)
Table %>%
group_by(date) %>%
summarise(confirmed = sum(confirmed, na.rm = TRUE))

Group by summarize in between dates with dplyr

I believe your map2 statement is incorrect.

Here is another possible option using lubridate's within function.

library(dplyr)
library(lubridate)

df <- structure(list(IDsub = c("1001", "1002", "1003", "1004"),
ID = c("id1", "id1", "id2", "id2"),
start_date = structure(c(18628, 18629, 18632, 18637), class = "Date"),
end_date = structure(c(18637, 18636, 18640, 18639), class = "Date"),
value = c(1, 2, 2, 0)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))

#find start end date and create sequence
firstdate <- min(df$start_date)
lastdate <- max(df$end_date)
timeseq <-seq(firstdate, lastdate, by="1 day")

#split by id
dflist<-split(df, df$ID)
lapply(names(dflist), function(dfname){

iddf<-dflist[[dfname]]
#create time intervals for each row
intervals <-interval(iddf$start_date, iddf$end_date)
meanvalues <- sapply(timeseq, function(nrow){
withinresult <- nrow %within% intervals
mean(iddf$value[withinresult], na.rm=TRUE)
})
tibble(dfname, timeseq, meanvalues)
})

The final result from the lapply statement is a list of data frames by ID. One could bind these together and reshape depending on the final intent.

How do I group my date variable into month/year in R?

Here is an example using dplyr. You simply use the corresponding date format string for month %m or year %Y in the format statement.

set.seed(123)
df <- data.frame(date = seq.Date(from =as.Date("01/01/1998", "%d/%m/%Y"),
to=as.Date("01/01/2000", "%d/%m/%Y"), by="day"),
value = sample(seq(5), 731, replace = TRUE))

head(df)
date value
1 1998-01-01 2
2 1998-01-02 4
3 1998-01-03 3
4 1998-01-04 5
5 1998-01-05 5
6 1998-01-06 1

library(dplyr)

df %>%
mutate(month = format(date, "%m"), year = format(date, "%Y")) %>%
group_by(month, year) %>%
summarise(total = sum(value))

Source: local data frame [25 x 3]
Groups: month [?]

month year total
(chr) (chr) (int)
1 01 1998 105
2 01 1999 91
3 01 2000 3
4 02 1998 74
5 02 1999 77
6 03 1998 96
7 03 1999 86
8 04 1998 91
9 04 1999 95
10 05 1998 93
.. ... ... ...

R - Summarize values between specific date range

A couple alternatives to consider. I assume your dates are actual dates and not character values.

You can try using fuzzyjoin to merge the two data.frames, including rows where the dates fall between start_dates and end_dates.

library(tidyverse)
library(fuzzyjoin)

fuzzy_left_join(
date_df,
df,
by = c("start_dates" = "dates", "end_dates" = "dates"),
match_fun = list(`<=`, `>=`)
) %>%
group_by(start_dates, end_dates) %>%
summarise(new_goal_column = sum(x))

Output

  start_dates end_dates  new_goal_column
<date> <date> <dbl>
1 2021-01-01 2021-01-06 19
2 2021-01-07 2021-01-10 6

You can also try using data.table and joining.

library(data.table)

setDT(date_df)
setDT(df)

df[date_df, .(start_dates, end_dates, x), on = .(dates >= start_dates, dates <= end_dates)][
, .(new_goal_column = sum(x)), by = .(start_dates, end_dates)
]

Output

   start_dates  end_dates new_goal_column
1: 2021-01-01 2021-01-06 19
2: 2021-01-07 2021-01-10 6

Conditional sum grouped by date in R

This should work, but without data to reproduce it's difficult to say:

library(dplyr)
DF %>%
group_by(Date) %>%
summarise(peq1 = sum(People == 1),
pgeq1 = sum(People[People > 1]))

R - Mean calculation using group_by based on Date column?

You're almost in the right way. First ensure that your Date column is actually date. Then, when you do the grouping, do it by year only not by ymd which is in your dataframe. The script can be modified as follows.

years_nc$Date <- ymd(years_nc$Date)

years_nc %>%
group_by(year(Date)) %>%
summarize(avg_preci = mean(Average, na.rm = TRUE))
# #A tibble: 5 x 2
# `year(Date)` avg_preci
# <dbl> <dbl>
# 1 2010 0.00196
# 2 2011 0.00196
# 3 2012 0.00196
# 4 2013 0.00196
# 5 2014 0.00196

Group by day and hour

You can use lubridate::ymd_hms to convert the date variable to date-time, group by day and hour from it and take mean value of price for each hour.

library(dplyr)

prices_2019 %>%
mutate(date = lubridate::ymd_hms(date),
date_hour = format(date, "%Y-%m-%d %H")) %>%
group_by(date_hour) %>%
summarize(mean_price = mean(price))

Summarizing a dataframe by date and group

It sounds like what you are looking for is a pivot table. I like to use reshape::cast for these types of tables. If there is more than one value returned for a given expenditure type for a given household/year/month combination, this will sum those values. If there is only one value, it returns the value. The "sum" argument is not required but only placed there to handle exceptions. I think if your data is clean you shouldn't need this argument.

hh <- c("hh1", "hh1", "hh1", "hh2", "hh2", "hh2", "hh3", "hh3", "hh3")
date <- c(sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 9))
value <- c(1:9)
type <- c("income", "water", "energy", "income", "water", "energy", "income", "water", "energy")
df <- data.frame(hh, date, value, type)

# Load lubridate library, add date and year
library(lubridate)
df$month <- month(df$date)
df$year <- year(df$date)

# Load reshape library, run cast from reshape, creates pivot table
library(reshape)
dfNew <- cast(df, hh+year+month~type, value = "value", sum)

> dfNew
hh year month energy income water
1 hh1 1999 4 3 0 0
2 hh1 1999 10 0 1 0
3 hh1 1999 11 0 0 2
4 hh2 1999 2 0 4 0
5 hh2 1999 3 6 0 0
6 hh2 1999 6 0 0 5
7 hh3 1999 1 9 0 0
8 hh3 1999 4 0 7 0
9 hh3 1999 8 0 0 8

Summarizing values and plotting using POSIXct

If you want to group by each day separately, extract only the date from day_time. If you want to group by each hour for each day separately extract date along with hour using format so that 8 AM today is in a separate group than 8 AM on any other day.

library(dplyr)

df %>%
mutate(date = as.Date(day_time),
hour = format(day_time, '%Y %m %d %H'))

# id day_time value date hour
# <int> <dttm> <dbl> <date> <chr>
# 1 1 2021-06-10 01:56:48 6 2021-06-10 2021 06 10 01
# 2 2 2021-06-10 01:47:53 0 2021-06-10 2021 06 10 01
# 3 4 2021-06-02 04:11:35 -2 2021-06-02 2021 06 02 04
# 4 7 2021-06-04 03:45:22 6 2021-06-04 2021 06 04 03
# 5 11 2021-06-09 19:46:59 -2 2021-06-09 2021 06 09 19
# 6 13 2021-06-04 21:44:34 0 2021-06-04 2021 06 04 21
# 7 14 2021-06-04 21:43:19 -6 2021-06-04 2021 06 04 21
# 8 15 2021-06-10 01:43:03 -2 2021-06-10 2021 06 10 01
# 9 20 2021-06-05 00:07:10 8 2021-06-05 2021 06 05 00
#10 23 2021-06-07 07:30:43 -1 2021-06-07 2021 06 07 07

You can use this column in group_by and use summarise as usual.

df %>%
group_by(hour = format(day_time, '%Y-%m-%d %H')) %>%
summarise(value = sum(value))

# hour value
# <chr> <dbl>
#1 2021-06-02 04 -2
#2 2021-06-04 03 6
#3 2021-06-04 21 -6
#4 2021-06-05 00 8
#5 2021-06-07 07 -1
#6 2021-06-09 19 -2
#7 2021-06-10 01 4


Related Topics



Leave a reply



Submit