Summing Rows by Month in R

Summing rows by month in R

I create the data set by

data <- read.table( text="   Date    Hour    Melbourne   Southern    Flagstaff
1 2009-05-01 0 0 5 17
2 2009-05-01 2 0 2 1
3 2009-05-01 1 0 11 0
4 2009-05-01 3 0 3 8
5 2009-05-01 4 0 1 0
6 2009-05-01 5 0 49 79
7 2009-05-01 6 0 425 610",
header=TRUE,stringsAsFactors=FALSE)

You can do the summation with the function aggregate:

byday <- aggregate(cbind(Melbourne,Southern,Flagstaff)~Date,
data=data,FUN=sum)
library(lubridate)
bymonth <- aggregate(cbind(Melbourne,Southern,Flagstaff)~month(Date),
data=data,FUN=sum)

Look at ?aggregate to understand the function better. Starting with the last argument (because that makes explaining easier) the arguments do the following:

  • FUN is the function that should be used for the aggregation. I use sum to sum up the values, but i could also be mean, max or some function you wrote yourself.
  • data is used to indicate that data frame that I want to aggregate.
  • The first argument tells the function what exactly I want to aggregate. On the left side of the ~, I indicate the variables I want to aggregate. If there is more than one, they are combined with cbind. On the right hand side is the variable by which the data should be split. Putting Date means that aggregate will sum up the variables for each distinct value of Date.

For the aggregation by month, I used the function month from the package lubridate. It does what one expects: it returns a numeric value indicating the month for a given date. Maybe you first need to install the package by install.packages("lubridate").

If you prefer not to use lubridate, you could do the following instead:

data <- transform(data,month=as.numeric(format(as.Date(Date),"%m")))
bymonth <- aggregate(cbind(Melbourne,Southern,Flagstaff)~month,
data=data,FUN=sum)

Here I added a new column to data that contains the month and then aggregated by that column.

Summing rows by month over years in R

You can try this:

library(tidyverse)
library(lubridate)
#Data
total.orders.by.date <- structure(list(Day = structure(c(17931, 17931, 17932, 18303,
18304), class = "Date"), Sales = c(219.12, 60.84, 200.27, 157.17,
12.14)), row.names = c("1", "2", "3", "4", "5"), class = "data.frame")

#Code
total.orders.by.date %>% mutate(Month=month(Day,label = T),Year=year(Day)) %>%
group_by(Year,Month) %>% summarise(Sales=sum(Sales,na.rm=T))

Output:

# A tibble: 2 x 3
# Groups: Year [2]
Year Month Sales
<dbl> <ord> <dbl>
1 2019 Feb 480.
2 2020 Feb 169.

R: Summarize rows per month

You can extract dates into the %Y-%m format in group_by() and use summarise_if() or summarise_at() to select which variables get be summed.

(Confirm that capture.date is Date class)

df %>%
group_by(Date = strftime(capture.date, "%Y-%m")) %>%
summarise_if(is.numeric, sum)

# # A tibble: 2 x 4
# Date Test1 Test2 Test3
# <chr> <int> <int> <int>
# 1 2016-03 3 3 3
# 2 2016-04 1 0 1

how to sum data by month and store them in the first day of the month

First make sure your "date" column is properly formatted as a date object in R:

qfile$date <- as.Date(qfile$date, format = "%Y-%m-%d")

Then we can use format to extract the month and year, and group_by that for a sum and take the first date:

library(dplyr)
qfile %>% mutate(monthyear = as.character(format(date, "%m-%Y"))) %>%
arrange(date) %>%
group_by(monthyear) %>%
summarise(date=date[1], flow = sum(flow_cms))

This will give you the first taken record for each month in the data.

Summing Consecutive Rows based on DateTime

You can check if the gap between observations is greater than an hour and increment a group index if it is using cumsum. This lets you create groups of consecutive observations and then summarise to get the totals you want. Note that I added T into the DateTime column so that read_table2 would understand it as one column; it is best practice to provide a dput so that we can easily have the data in the format you have.

library(tidyverse)
library(lubridate)

tbl <- read_table2(
"Year Month Day Hour Total Date DateTime
1950 1 2 21 0.01 1/2/1950 1/2/1950T21:00
1950 1 2 23 0.01 1/2/1950 1/2/1950T23:00
1950 1 3 0 0.06 1/3/1950 1/3/1950T0:00
1950 1 3 1 0.01 1/3/1950 1/3/1950T1:00
1950 1 3 2 0.02 1/3/1950 1/3/1950T2:00
1950 1 4 11 0.24 1/4/1950 1/4/1950T11:00
1950 1 4 12 0.07 1/4/1950 1/4/1950T12:00
1950 1 4 15 0.10 1/4/1950 1/4/1950T15:00
1950 1 4 16 0.04 1/4/1950 1/4/1950T16:00
1950 1 4 17 0.01 1/4/1950 1/4/1950T17:00"
) %>%
mutate(Date = mdy(Date), DateTime = mdy_hm(DateTime))

tbl %>%
mutate(group = cumsum(c(0, int_diff(DateTime)) > 3600)) %>%
group_by(Year, Month, group) %>%
summarise(Sum = sum(Total))
#> # A tibble: 4 x 4
#> # Groups: Year, Month [?]
#> Year Month group Sum
#> <dbl> <dbl> <int> <dbl>
#> 1 1950 1 0 0.01
#> 2 1950 1 1 0.1
#> 3 1950 1 2 0.31
#> 4 1950 1 3 0.15

Created on 2019-02-11 by the reprex package (v0.2.1)

Sum amount last 6 month prior to the date of transaction

This is simply a non-equi join in data.table. You can create a variable of date - 180 and limit the join between the current date and that variable. This should be fairly quick

library(data.table)
setDT(dt)[, date_minus_180 := date - 180]
dt[, amnt_6_m := .SD[dt, sum(amount, na.rm = TRUE),
on = .(to = from, date <= date, date >= date_minus_180), by = .EACHI]$V1]
head(dt, 10)
# id from to date amount date_minus_180 amnt_6_m
# 1: 18529 5370 9356 2005-05-31 24.4 2004-12-02 0.0
# 2: 13742 5370 5605 2005-08-05 7618.0 2005-02-06 0.0
# 3: 9913 5370 8567 2005-09-12 21971.0 2005-03-16 0.0
# 4: 956 8605 5370 2005-10-05 5245.0 2005-04-08 0.0
# 5: 2557 5370 5636 2005-11-12 2921.0 2005-05-16 5245.0
# 6: 1602 6390 5370 2005-11-26 8000.0 2005-05-30 0.0
# 7: 18669 5370 8933 2005-11-30 169.2 2005-06-03 13245.0
# 8: 35900 5370 8483 2006-01-31 71.5 2005-08-04 13245.0
# 9: 48667 8934 5370 2006-03-31 14.6 2005-10-02 0.0
# 10: 51341 5370 7626 2006-04-11 4214.0 2005-10-13 8014.6

Periodic sum over time conditional on start and end date

You can expand the start and end date for each row and create a new row with it. For each date you can sum the x values. We use complete to fill the missing dates if they exist.

library(tidyverse)

df %>%
mutate(dates = map2(start, end, seq, by = 'days')) %>%
unnest(dates) %>%
group_by(dates) %>%
summarise(x = sum(x)) %>%
complete(dates = seq(min(dates), max(dates), by = 'days'), fill = list(x = 0))

# dates x
# <date> <dbl>
# 1 2017-05-19 43
# 2 2017-05-20 43
# 3 2017-05-21 43
# 4 2017-05-22 43
# 5 2017-05-23 43
# 6 2017-05-24 43
# 7 2017-05-25 43
# 8 2017-05-26 43
# 9 2017-05-27 43
#10 2017-05-28 43
# … with 1,465 more rows

R: aggregate rows to transform monthly into quarterly data

You may divide every 3 rows into one quarter and calculate the summarising statistics for each variable. Since col1 and col3 have the same value in each quarter we can select any value from those columns, I have selected the 1st one.

library(dplyr)

survey1 %>%
group_by(Quarter = paste0('Q', ceiling(row_number()/3))) %>%
summarise(across(c(col1, col3), first),
col2 = sum(col2)) %>%
select(Quarter, col1, col2, col3)

# Quarter col1 col2 col3
# <chr> <dbl> <dbl> <dbl>
#1 Q1 10 37 12
#2 Q2 20 98 22

A hack can also be to use col1 and col3 as grouping variable since we know they are the same every quarter. This would work provided every quarter has a different value for col1 and col3 combination.

survey1 %>% 
group_by(col1, col3) %>%
summarise(col2 = sum(col2), .groups = 'drop')


Related Topics



Leave a reply



Submit