R Aggregate Data.Frame with Date Column

R aggregate data.frame with date column

Indicate the variables you are trying to get the aggregate of in your aggregate statement, and this problem should be resolved:

dta.sum <- aggregate(x = dta[c("Expenditure","Indicator")],
FUN = sum,
by = list(Group.date = dta$Date))

EDITED TO ADD EXPLANATION: When you give the aggregate argument as just dta, aggregate attempts to apply the argument to every column. sum is not defined for date values in R, and therefore you are getting errors. You want to exclude the grouping column by using the code described above.

Aggregate Data by date

Here is a solution using the reshape2 (tidyr or reshape could has also been used) package to reform your data frame and the dplyr library to summarize your results:

df <- data.frame(VAL1, D01012016, D02012016, D03022016,D05022016,D03032016,D01042016,D02042016,D03042016,D05042016,D23062016,D05072016,D03082016,D01092016,D12092016)

library(reshape2)
ndf<-melt(df)
ndf$date<-as.Date(ndf$variable, format="D%d%m%Y")

library(dplyr)
summarize(group_by(ndf, VAL1, cut(ndf$date, breaks ="1 month")), sum(value))

It is difficult to work with the your by column format, thus it is easier to convert from the wide format to a long format. VAL1 is carried from the melt command. If you are interested in quarterly results just change from 1 month breaks to three month breaks.

Aggregate week and date in R by some specific rules

First we can convert the dates in df2 into year-month-date format, then join the two tables:

library(dplyr);library(lubridate)
df2$dt = ymd(df2$date)
df2$wk = day(df2$dt) %/% 7 + 1
df2$year_month_week = as.numeric(paste0(format(df2$dt, "%Y%m"), df2$wk))

df1 %>%
left_join(df2 %>% group_by(year_month_week) %>% slice(1) %>%
select(year_month_week, temperature))

Result

Joining, by = "year_month_week"
id year_month_week points temperature
1 1 2022051 65 36.1
2 1 2022052 58 36.6
3 1 2022053 47 NA
4 2 2022041 21 34.3
5 2 2022042 25 34.9
6 2 2022043 27 NA
7 2 2022044 43 NA

R aggregate data.frame having dates and hours in one column misformatted

You can filter the data according to you need before you plot it:

library(tidyverse) 

dt_sum <- dt %>%
# First filter according to your input
filter(Equipment %in% c("AC", "furnace") & ("2015-01-12" <= date) & ("2015-02-22" > date)) %>%
group_by(Equipment) %>% # Group the data by Equipment
top_n(1, kWh) %>% # Take the maximum kWh value per Equipment
top_n(1, date) # Take the maximum date if there are several with the same max kWh value

dt_sum
# A tibble: 2 x 3
# Groups: Equipment [2]
# kWh Equipment date
# <dbl> <fct> <dttm>
# 1 0.92 furnace 2015-01-21 20:00:00
# 2 0.95 AC 2015-01-14 17:00:00

p <- ggplot(dt_sum, aes(x = Equipment, y = kWh)) +
geom_bar(position = 'dodge', stat = 'identity') +
geom_text(aes(label = date), position = position_stack(vjust = 0.5),
angle = 90, size = 2) +
xlab("Date") +
ylab("Consumption (kWh)") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))

p

Sample Image

The angle-problem is due to ggplotly (as you can see the angle = 90 is not ignored in the ggplot-call).

ggplotly(p)

Sample Image

R , how to Aggregate data with same date field in an R dataframe

You could try

library(dplyr)
res <- df1 %>%
group_by(SURVEY.DATE) %>%
summarise_each(funs(mean))

Or

res1 <- aggregate(.~SURVEY.DATE, df1, mean)

and then convert it to xts

library(xts)
xts(res1[-1], order.by= as.Date(res1[,1]))
# A B C
#2010-05-13 38.0 33.5 21
#2010-05-14 37.0 34.0 21
#2010-05-21 38.5 30.5 21
#2010-05-23 39.0 32.0 21

Aggregate dataframe with many columns per week

Without a minimal reproducible example it's difficult to know exactly which function to use to summarize the data (sum, mean, median, etc.).

For now we'll assume that each row represents a day or some more granular unit (since the date column is called Timestamp and we can't see whether there are actual time values in the field).

We use a combination of tidyr, dplyr and lubridate to create a summarized data frame that sums the data in the columns.

First, we generated some raw data that is in a format similar to the data in the screen shot, and read it into R.

rawData <- "Timestamp,Var.2,Amazonas,Antioquia,Arauca
2022-01-01,0,0,0,1
2022-01-02,0,0,1,3
2022-01-03,0,1,1,2
2022-01-04,0,0,1,0
2022-01-05,0,2,0,0
2022-01-06,3,0,2,2
2022-01-07,2,3,0,2
2022-01-08,1,0,0,0
2022-01-09,0,1,3,0
2022-01-10,0,0,0,0
2022-01-11,0,2,0,5
2022-01-12,0,0,3,0
2022-01-13,0,3,0,4
2022-01-14,0,0,4,0
2022-01-15,0,0,0,3
2022-01-16,0,0,0,0
2022-01-17,0,3,0,0
2022-01-18,0,0,2,3
2022-01-19,0,0,0,0
2022-01-20,0,2,0,0
2022-01-21,0,0,5,2
2022-01-22,0,0,0,0
2022-01-23,0,1,0,0
2022-01-24,0,0,3,1
2022-01-25,0,1,0,1
2022-01-26,0,0,0,1
2022-01-27,0,2,0,0
2022-01-28,0,2,0,1
2022-01-29,0,0,1,0
2022-01-30,0,0,1,0"

df <- read.csv(text = rawData,
colClasses = c("Date","numeric","numeric","numeric","numeric"))

Next, we load the required libraries. From lubridate package we'll use the year() and week() functions to group the data by week of the year.

library(lubridate)
library(tidyr)
library(dplyr)

Finally, we use tidyr::pivot_longer() to create long format tidy data where each row represents one day's observations for one column in the wide format data frame, create the Year and Week columns, and summarise() the remaining columns in the data frame.

df %>% pivot_longer(-Timestamp,names_to="Area") %>%
mutate(Year = year(Timestamp),
Week = week(Timestamp)) %>%
group_by(Year,Week,Area) %>%
summarise(summedValue = sum(value)) -> summarisedData

head(summarisedData)

...and the first few rows of output:

> head(summarisedData)
# A tibble: 6 × 4
# Groups: Year, Week [2]
Year Week Area summedValue
<dbl> <dbl> <chr> <dbl>
1 2022 1 Amazonas 6
2 2022 1 Antioquia 5
3 2022 1 Arauca 10
4 2022 1 Var.2 5
5 2022 2 Amazonas 6
6 2022 2 Antioquia 10
>

If we need the data in the original format (wide format tidy data), we can use pivot_wider() to restore the data to its original shape.

# if necessary, pivot_wider() to restore data to original format
summarisedData %>%
pivot_wider(id_cols=c("Year","Week"),
names_from=Area,
values_from=summedValue)

...and the output:

# A tibble: 5 × 6
# Groups: Year, Week [5]
Year Week Amazonas Antioquia Arauca Var.2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2022 1 6 5 10 5
2 2022 2 6 10 9 1
3 2022 3 5 7 8 0
4 2022 4 6 3 4 0
5 2022 5 0 2 0 0
>

Aggregate data.frame for each day

This can be done pretty cleanly in dplyr, grouping by date using group_by and then summarizing with summarize:

library(dplyr)
(out <- dat %>%
group_by(Date) %>%
summarize(Buys=sum(Buy == 1), Sells=sum(Buy == 0),
Price_Buys=sum(Price[Buy == 1]), Price_Sells=sum(Price[Buy == 0])))
# Date Buys Sells Price_Buys Price_Sells
# (fctr) (int) (int) (int) (int)
# 1 29-06-2015 2 1 15000 8000
# 2 30-06-2015 0 2 0 15500

You can now manipulate this object as you would a normal data frame, e.g. with something like:

out$newvar <- with(out, Sells*Price_Sells - Buys*Price_Buys)
out
# Source: local data frame [2 x 6]
# Date Buys Sells Price_Buys Price_Sells newvar
# (fctr) (int) (int) (int) (int) (int)
# 1 29-06-2015 2 1 15000 8000 -22000
# 2 30-06-2015 0 2 0 15500 31000

R group by date, and summarize the values

Use as.Date() then aggregate().

energy$Date <- as.Date(energy$Datetime)
aggregate(energy$value, by=list(energy$Date), sum)

EDIT

Emma made a good point about column names. You can preserve column names in aggregate by using the following instead:

aggregate(energy["value"], by=energy["Date"], sum)

Creating a row that aggregates the data for that day

We can use adorn_totals from janitor after grouping by 'Date' which add a new row with the sum from the numeric column

library(dplyr)
library(janitor)
df1 %>%
group_by(Date) %>%
group_modify(~ adorn_totals(.x, name = "Overall")) %>%
ungroup

-output

# A tibble: 8 × 3
Date Item Purchased
<chr> <chr> <int>
1 01/01/08 Fruit 48
2 01/01/08 Confectionary 42
3 01/01/08 Appliance 11
4 01/01/08 Overall 101
5 01/06/08 Confectionary 16
6 01/06/08 Fruit 19
7 01/06/08 Appliance 50
8 01/06/08 Overall 85

data

df1 <- structure(list(Date = c("01/01/08", "01/01/08", "01/01/08", "01/06/08", 
"01/06/08", "01/06/08"), Item = c("Fruit", "Confectionary", "Appliance",
"Confectionary", "Fruit", "Appliance"), Purchased = c(48L, 42L,
11L, 16L, 19L, 50L)), class = "data.frame", row.names = c(NA,
-6L))


Related Topics



Leave a reply



Submit