Summing rows by month in R
I create the data set by
data <- read.table( text=" Date Hour Melbourne Southern Flagstaff
1 2009-05-01 0 0 5 17
2 2009-05-01 2 0 2 1
3 2009-05-01 1 0 11 0
4 2009-05-01 3 0 3 8
5 2009-05-01 4 0 1 0
6 2009-05-01 5 0 49 79
7 2009-05-01 6 0 425 610",
header=TRUE,stringsAsFactors=FALSE)
You can do the summation with the function aggregate
:
byday <- aggregate(cbind(Melbourne,Southern,Flagstaff)~Date,
data=data,FUN=sum)
library(lubridate)
bymonth <- aggregate(cbind(Melbourne,Southern,Flagstaff)~month(Date),
data=data,FUN=sum)
Look at ?aggregate
to understand the function better. Starting with the last argument (because that makes explaining easier) the arguments do the following:
FUN
is the function that should be used for the aggregation. I usesum
to sum up the values, but i could also bemean
,max
or some function you wrote yourself.data
is used to indicate that data frame that I want to aggregate.- The first argument tells the function what exactly I want to aggregate. On the left side of the
~
, I indicate the variables I want to aggregate. If there is more than one, they are combined withcbind
. On the right hand side is the variable by which the data should be split. PuttingDate
means that aggregate will sum up the variables for each distinct value ofDate
.
For the aggregation by month, I used the function month
from the package lubridate
. It does what one expects: it returns a numeric value indicating the month for a given date. Maybe you first need to install the package by install.packages("lubridate")
.
If you prefer not to use lubridate, you could do the following instead:
data <- transform(data,month=as.numeric(format(as.Date(Date),"%m")))
bymonth <- aggregate(cbind(Melbourne,Southern,Flagstaff)~month,
data=data,FUN=sum)
Here I added a new column to data that contains the month and then aggregated by that column.
Summing rows by month over years in R
You can try this:
library(tidyverse)
library(lubridate)
#Data
total.orders.by.date <- structure(list(Day = structure(c(17931, 17931, 17932, 18303,
18304), class = "Date"), Sales = c(219.12, 60.84, 200.27, 157.17,
12.14)), row.names = c("1", "2", "3", "4", "5"), class = "data.frame")
#Code
total.orders.by.date %>% mutate(Month=month(Day,label = T),Year=year(Day)) %>%
group_by(Year,Month) %>% summarise(Sales=sum(Sales,na.rm=T))
Output:
# A tibble: 2 x 3
# Groups: Year [2]
Year Month Sales
<dbl> <ord> <dbl>
1 2019 Feb 480.
2 2020 Feb 169.
R: Summarize rows per month
You can extract dates into the %Y-%m
format in group_by()
and use summarise_if()
or summarise_at()
to select which variables get be summed.
(Confirm that capture.date
is Date
class)
df %>%
group_by(Date = strftime(capture.date, "%Y-%m")) %>%
summarise_if(is.numeric, sum)
# # A tibble: 2 x 4
# Date Test1 Test2 Test3
# <chr> <int> <int> <int>
# 1 2016-03 3 3 3
# 2 2016-04 1 0 1
how to sum data by month and store them in the first day of the month
First make sure your "date" column is properly formatted as a date object in R:
qfile$date <- as.Date(qfile$date, format = "%Y-%m-%d")
Then we can use format
to extract the month and year, and group_by
that for a sum and take the first date:
library(dplyr)
qfile %>% mutate(monthyear = as.character(format(date, "%m-%Y"))) %>%
arrange(date) %>%
group_by(monthyear) %>%
summarise(date=date[1], flow = sum(flow_cms))
This will give you the first taken record for each month in the data.
Summing Consecutive Rows based on DateTime
You can check if the gap between observations is greater than an hour and increment a group index if it is using cumsum
. This lets you create groups of consecutive observations and then summarise
to get the totals you want. Note that I added T
into the DateTime
column so that read_table2
would understand it as one column; it is best practice to provide a dput
so that we can easily have the data in the format you have.
library(tidyverse)
library(lubridate)
tbl <- read_table2(
"Year Month Day Hour Total Date DateTime
1950 1 2 21 0.01 1/2/1950 1/2/1950T21:00
1950 1 2 23 0.01 1/2/1950 1/2/1950T23:00
1950 1 3 0 0.06 1/3/1950 1/3/1950T0:00
1950 1 3 1 0.01 1/3/1950 1/3/1950T1:00
1950 1 3 2 0.02 1/3/1950 1/3/1950T2:00
1950 1 4 11 0.24 1/4/1950 1/4/1950T11:00
1950 1 4 12 0.07 1/4/1950 1/4/1950T12:00
1950 1 4 15 0.10 1/4/1950 1/4/1950T15:00
1950 1 4 16 0.04 1/4/1950 1/4/1950T16:00
1950 1 4 17 0.01 1/4/1950 1/4/1950T17:00"
) %>%
mutate(Date = mdy(Date), DateTime = mdy_hm(DateTime))
tbl %>%
mutate(group = cumsum(c(0, int_diff(DateTime)) > 3600)) %>%
group_by(Year, Month, group) %>%
summarise(Sum = sum(Total))
#> # A tibble: 4 x 4
#> # Groups: Year, Month [?]
#> Year Month group Sum
#> <dbl> <dbl> <int> <dbl>
#> 1 1950 1 0 0.01
#> 2 1950 1 1 0.1
#> 3 1950 1 2 0.31
#> 4 1950 1 3 0.15
Created on 2019-02-11 by the reprex package (v0.2.1)
Sum amount last 6 month prior to the date of transaction
This is simply a non-equi join in data.table. You can create a variable of date - 180
and limit the join between the current date and that variable. This should be fairly quick
library(data.table)
setDT(dt)[, date_minus_180 := date - 180]
dt[, amnt_6_m := .SD[dt, sum(amount, na.rm = TRUE),
on = .(to = from, date <= date, date >= date_minus_180), by = .EACHI]$V1]
head(dt, 10)
# id from to date amount date_minus_180 amnt_6_m
# 1: 18529 5370 9356 2005-05-31 24.4 2004-12-02 0.0
# 2: 13742 5370 5605 2005-08-05 7618.0 2005-02-06 0.0
# 3: 9913 5370 8567 2005-09-12 21971.0 2005-03-16 0.0
# 4: 956 8605 5370 2005-10-05 5245.0 2005-04-08 0.0
# 5: 2557 5370 5636 2005-11-12 2921.0 2005-05-16 5245.0
# 6: 1602 6390 5370 2005-11-26 8000.0 2005-05-30 0.0
# 7: 18669 5370 8933 2005-11-30 169.2 2005-06-03 13245.0
# 8: 35900 5370 8483 2006-01-31 71.5 2005-08-04 13245.0
# 9: 48667 8934 5370 2006-03-31 14.6 2005-10-02 0.0
# 10: 51341 5370 7626 2006-04-11 4214.0 2005-10-13 8014.6
Periodic sum over time conditional on start and end date
You can expand the start
and end
date for each row and create a new row with it. For each date you can sum
the x
values. We use complete
to fill the missing dates if they exist.
library(tidyverse)
df %>%
mutate(dates = map2(start, end, seq, by = 'days')) %>%
unnest(dates) %>%
group_by(dates) %>%
summarise(x = sum(x)) %>%
complete(dates = seq(min(dates), max(dates), by = 'days'), fill = list(x = 0))
# dates x
# <date> <dbl>
# 1 2017-05-19 43
# 2 2017-05-20 43
# 3 2017-05-21 43
# 4 2017-05-22 43
# 5 2017-05-23 43
# 6 2017-05-24 43
# 7 2017-05-25 43
# 8 2017-05-26 43
# 9 2017-05-27 43
#10 2017-05-28 43
# … with 1,465 more rows
R: aggregate rows to transform monthly into quarterly data
You may divide every 3 rows into one quarter and calculate the summarising statistics for each variable. Since col1
and col3
have the same value in each quarter we can select any value from those columns, I have selected the 1st one.
library(dplyr)
survey1 %>%
group_by(Quarter = paste0('Q', ceiling(row_number()/3))) %>%
summarise(across(c(col1, col3), first),
col2 = sum(col2)) %>%
select(Quarter, col1, col2, col3)
# Quarter col1 col2 col3
# <chr> <dbl> <dbl> <dbl>
#1 Q1 10 37 12
#2 Q2 20 98 22
A hack can also be to use col1
and col3
as grouping variable since we know they are the same every quarter. This would work provided every quarter has a different value for col1
and col3
combination.
survey1 %>%
group_by(col1, col3) %>%
summarise(col2 = sum(col2), .groups = 'drop')
Related Topics
Changing Class and Mode from Character to Numeric
How to Pass Vector to Integrate Function
How to Dynamically Change Plotly Axis Based on Crosstalk Conditions
How to Start Ggplot2 Geom_Bar from Different Origin
Ggplot2: Change Factor Order in Legend
R 'Inf' When It Has Class 'Date' Is Printing 'Na'
Understanding Ddply Error Message - Argument "By" Is Missing, with No Default
Replace Nas with Mean of the Same Column of a Data.Table
Replicate a List to Create a List-Of-Lists
How to Overlay an Image on to a Ggplot
R Shiny - Uioutput Not Rendering Inside Menuitem
How to Calculate the Distance Between Latitude and Longitude Along Rows of Columns in R
Does Installing Blas/Atlas/Mkl/Openblas Will Speed Up R Package That Is Written in C/C++
Specify Position of Geom_Text by Keywords Like "Top", "Bottom", "Left", "Right", "Center"
How to Multiply a Single Column in a Data.Frame by a Number
Fill in Na Based on the Last Non-Na Value for Each Group in R