Aggregating Values on a Data Tree with R

Aggregating values on a data tree with R

You could simply use a recursive function:

myApply <- function(node) {
node$totalHours <-
sum(c(node$hours, purrr::map_dbl(node$children, myApply)), na.rm = TRUE)
}
myApply(tree)
print(tree, "hours", "totalHours")

Result:

           levelName hours totalHours
1 Ned NA 5
2 °--John 1 5
3 °--Kate 1 4
4 ¦--Dan 1 1
5 ¦--Ron 1 1
6 °--Sienna 1 1

Edit: Filling two elements:

# Create data frame
to <- c("Ned", "John", "Kate", "Kate", "Kate")
from <- c("John", "Kate", "Dan", "Ron", "Sienna")
hours <- c(1,1,1,1,1)
hours2 <- 5:1
df <- data.frame(from,to,hours, hours2)

# Create data tree
tree <- FromDataFrameNetwork(df)
print(tree, "hours", "hours2")

myApply <- function(node) {
res.ch <- purrr::map(node$children, myApply)
a <- node$totalHours <-
sum(c(node$hours, purrr::map_dbl(res.ch, 1)), na.rm = TRUE)
b <- node$totalHours2 <-
sum(c(node$hours2, purrr::map_dbl(res.ch, 2)), na.rm = TRUE)
list(a, b)
}
myApply(tree)
print(tree, "hours", "totalHours", "hours2", "totalHours2")

Result:

           levelName hours totalHours hours2 totalHours2
1 Ned NA 5 NA 15
2 °--John 1 5 5 15
3 °--Kate 1 4 4 10
4 ¦--Dan 1 1 3 3
5 ¦--Ron 1 1 2 2
6 °--Sienna 1 1 1 1

Recursive aggregation of data tree

Assume that the data.tree looks like this

               levelName hours
1 Team1 NA
2 °--Team1-1 0
3 ¦--Team1-1-1 150
4 ¦ ¦--Team1-1-1a 65
5 ¦ ¦--Team1-1-1b 20
6 ¦ °--Team1-1-1c 30
7 °--Team1-1-2 200
8 °--Team1-1-2a 30

Here are two versions for you to try out as I'm not sure about which one you want.


Non-cumulative

tree$Do(function(node) {
node$actual_hours <- node$hours - if (node$isLeaf) 0 else Aggregate(node, attribute = "hours", aggFun = sum)
}, traversal = "post-order")
> print(tree, "hours", "actual_hours")
levelName hours actual_hours
1 Team1 NA NA
2 °--Team1-1 0 -350 # see here, -350=0-(150+200)
3 ¦--Team1-1-1 150 35
4 ¦ ¦--Team1-1-1a 65 65
5 ¦ ¦--Team1-1-1b 20 20
6 ¦ °--Team1-1-1c 30 30
7 °--Team1-1-2 200 170
8 °--Team1-1-2a 30 30

Cumulative

tree$Do(function(node) {
node$actual_hours <- node$hours - if (node$isLeaf) 0 else Aggregate(node, attribute = "actual_hours", aggFun = sum)
}, traversal = "post-order")
> print(tree, "hours", "actual_hours")
levelName hours actual_hours
1 Team1 NA NA
2 °--Team1-1 0 -205 # -205=0-(35+170)
3 ¦--Team1-1-1 150 35
4 ¦ ¦--Team1-1-1a 65 65
5 ¦ ¦--Team1-1-1b 20 20
6 ¦ °--Team1-1-1c 30 30
7 °--Team1-1-2 200 170
8 °--Team1-1-2a 30 30

R - data.tree aggregate along ancestors of a leaf?

For this, the Do comes in handy:

thetree$result <- thetree$p
traversal <- Traverse(thetree, filterFun = isNotRoot)
Do(traversal, function(node) node$result <- node$parent$result + node$p)

This then gets the desired result:

print(thetree, "p", "result")
levelName p result
1 1 0.1 0.1
2 ¦--1.1 0.2 0.3
3 ¦ ¦--1.1.1 0.3 0.6
4 ¦ °--1.1.2 0.4 0.7
5 °--1.2 0.5 0.6
6 ¦--1.2.1 0.6 1.2
7 °--1.2.2 0.7 1.3

How to aggregate data one after another in r?

We can use rollmean from zoo

library(zoo)
rollmean(h_1, 2)
#[1] 3.0 5.0 6.5 8.0 10.0
rollmean(h_1, 3)
#[1] 4.000000 5.666667 7.333333 9.000000

Hierarchical data in R- How do I sum over subsets, while maintaing the tree?

Another job for aggregate. Calling your data frame dat:

aggregate(cbind(cost, amount) ~ state+district+branch+order, data=dat, FUN=sum)

## state district branch order cost amount
## 1 CA central columbus gloves 19.38 633.82
## 2 CA central newtown gloves 20.90 222.79
## 3 CA central columbus shoes 71.73 361.99
## 4 CA central newtown shoes 57.62 202.80

On the left side of the ~, cbind is used to indicate that we want each column separately. If cost + amount were specified, it would mean the sum here because these are numeric. On the right side of the ~, we have factors, so the + means that we are aggregating by each level of each factor.

R: aggregate based of multiple columns

The last two column names have two dots at the end and Species is incorrectly spelled:

> names(Trees)
[1] "Tree.Speices" "DBH" "Basal.Area" "Compartment" "Stand"
[6] "Transect.." "Plot.."

Try:

aggregate(.~ Compartment + Stand + Transect.. + Plot.. + Tree.Speices, 
data = Trees, FUN = sum)

or remove the dots at the end of all names and correct the spelling:

names(Trees) <- sub("\\.+$", "", names(Trees))
names(Trees) <- sub("Speices", "Species", names(Trees))
aggregate(.~ Compartment + Stand + Transect + Plot + Tree.Species,
data = Trees, FUN = sum)

How to aggregate/ sum values by time in r

Using dplyr and lubridate packages we can extract the hour from each Time and sum.

library(dplyr)
library(lubridate)

df %>%
mutate(hour = hour(dmy_hms(Time))) %>%
group_by(hour) %>%
summarise(Precipitation = sum(Precipitation, na.rm = TRUE))

For aggregation by date, we can do

df %>%
mutate(day = as.Date(dmy_hms(Time))) %>%
group_by(day) %>%
summarise(Precipitation = sum(Precipitation, na.rm = TRUE))

Using base R, we could do

df$Hour <- format(as.POSIXct(df$Time, format = "%d/%m/%Y %I:%M:%S %p"), "%H")
df$Day <- as.Date(as.POSIXct(df$Time, format = "%d/%m/%Y %I:%M:%S %p"))

#Aggregation by hour
aggregate(Precipitation~Hour, df, sum, na.rm = TRUE)

#Aggregation by date
aggregate(Precipitation~Day, df, sum, na.rm = TRUE)

EDIT

Based on updated data and information, we can do

df <- readxl::read_xlsx("/path/to/file/df (1).xlsx")

hour_df <- df %>%
group_by(hour = hour(Time)) %>%
summarise(Precipitation = sum(Precipitation, na.rm = TRUE))

day_df <- df %>%
mutate(day = as.Date(Time)) %>%
group_by(day) %>%
summarise(Precipitation = sum(Precipitation, na.rm = TRUE))

So hour_df has got hourly sum of values without taking into consideration the date and day_df has got sum of Precipitation for each day.

data

Id <- c(1,2,3,4)
Time <- c("10/7/2014 12:30:00 am", "10/7/2014 01:00:05 am",
"10/7/2014 01:30:10 am", "10/7/2014 02:00:15 am")
Precipitation <- c(0.06, 0.02,0,0.25)
df <- data.frame(Id, Time, Precipitation)

Aggregate multiple columns at once

We can use the formula method of aggregate. The variables on the 'rhs' of ~ are the grouping variables while the . represents all other variables in the 'df1' (from the example, we assume that we need the mean for all the columns except the grouping), specify the dataset and the function (mean).

aggregate(.~id1+id2, df1, mean)

Or we can use summarise_each from dplyr after grouping (group_by)

library(dplyr)
df1 %>%
group_by(id1, id2) %>%
summarise_each(funs(mean))

Or using summarise with across (dplyr devel version - ‘0.8.99.9000’)

df1 %>% 
group_by(id1, id2) %>%
summarise(across(starts_with('val'), mean))

Or another option is data.table. We convert the 'data.frame' to 'data.table' (setDT(df1), grouped by 'id1' and 'id2', we loop through the subset of data.table (.SD) and get the mean.

library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)]

data

df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b", 
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"),
val1 = c(1L,
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L,
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))


Related Topics



Leave a reply



Submit