﻿ Aggregating Values on a Data Tree with R - ITCodar

# Aggregating Values on a Data Tree with R

## Aggregating values on a data tree with R

You could simply use a recursive function:

``myApply <- function(node) {  node\$totalHours <-     sum(c(node\$hours, purrr::map_dbl(node\$children, myApply)), na.rm = TRUE)}myApply(tree)print(tree, "hours", "totalHours")``

Result:

``           levelName hours totalHours1 Ned                   NA          52  °--John               1          53      °--Kate           1          44          ¦--Dan        1          15          ¦--Ron        1          16          °--Sienna     1          1``

Edit: Filling two elements:

``# Create data frameto <- c("Ned", "John", "Kate", "Kate", "Kate")from <- c("John", "Kate", "Dan", "Ron", "Sienna")hours <- c(1,1,1,1,1)hours2 <- 5:1df <- data.frame(from,to,hours, hours2)# Create data treetree <- FromDataFrameNetwork(df)print(tree, "hours", "hours2")myApply <- function(node) {  res.ch <- purrr::map(node\$children, myApply)  a <- node\$totalHours <-     sum(c(node\$hours,  purrr::map_dbl(res.ch, 1)), na.rm = TRUE)  b <- node\$totalHours2 <-     sum(c(node\$hours2, purrr::map_dbl(res.ch, 2)), na.rm = TRUE)  list(a, b)}myApply(tree)print(tree, "hours", "totalHours", "hours2", "totalHours2")``

Result:

``           levelName hours totalHours hours2 totalHours21 Ned                   NA          5     NA          152  °--John               1          5      5          153      °--Kate           1          4      4          104          ¦--Dan        1          1      3           35          ¦--Ron        1          1      2           26          °--Sienna     1          1      1           1``

## Recursive aggregation of data tree

Assume that the data.tree looks like this

``               levelName hours1 Team1                     NA2  °--Team1-1                03      ¦--Team1-1-1        1504      ¦   ¦--Team1-1-1a    655      ¦   ¦--Team1-1-1b    206      ¦   °--Team1-1-1c    307      °--Team1-1-2        2008          °--Team1-1-2a    30``

Here are two versions for you to try out as I'm not sure about which one you want.

Non-cumulative

``tree\$Do(function(node) {  node\$actual_hours <- node\$hours - if (node\$isLeaf) 0 else Aggregate(node, attribute = "hours", aggFun = sum)}, traversal = "post-order")``
``> print(tree, "hours", "actual_hours")               levelName hours actual_hours1 Team1                     NA           NA2  °--Team1-1                0         -350 # see here, -350=0-(150+200)3      ¦--Team1-1-1        150           354      ¦   ¦--Team1-1-1a    65           655      ¦   ¦--Team1-1-1b    20           206      ¦   °--Team1-1-1c    30           307      °--Team1-1-2        200          1708          °--Team1-1-2a    30           30``

Cumulative

``tree\$Do(function(node) {  node\$actual_hours <- node\$hours - if (node\$isLeaf) 0 else Aggregate(node, attribute = "actual_hours", aggFun = sum)}, traversal = "post-order")``
``> print(tree, "hours", "actual_hours")               levelName hours actual_hours1 Team1                     NA           NA2  °--Team1-1                0         -205 # -205=0-(35+170)3      ¦--Team1-1-1        150           354      ¦   ¦--Team1-1-1a    65           655      ¦   ¦--Team1-1-1b    20           206      ¦   °--Team1-1-1c    30           307      °--Team1-1-2        200          1708          °--Team1-1-2a    30           30``

## R - data.tree aggregate along ancestors of a leaf?

For this, the Do comes in handy:

``thetree\$result <- thetree\$ptraversal <- Traverse(thetree, filterFun = isNotRoot)Do(traversal, function(node) node\$result <- node\$parent\$result + node\$p)``

This then gets the desired result:

``print(thetree, "p", "result")      levelName   p result1 1             0.1    0.12  ¦--1.1       0.2    0.33  ¦   ¦--1.1.1 0.3    0.64  ¦   °--1.1.2 0.4    0.75  °--1.2       0.5    0.66      ¦--1.2.1 0.6    1.27      °--1.2.2 0.7    1.3``

## How to aggregate data one after another in r?

We can use `rollmean` from `zoo`

``library(zoo)rollmean(h_1, 2)#[1]  3.0  5.0  6.5  8.0 10.0rollmean(h_1, 3)#[1] 4.000000 5.666667 7.333333 9.000000``

## Hierarchical data in R- How do I sum over subsets, while maintaing the tree?

Another job for `aggregate`. Calling your data frame `dat`:

``aggregate(cbind(cost, amount) ~ state+district+branch+order, data=dat, FUN=sum)##   state district   branch  order  cost amount## 1    CA  central columbus gloves 19.38 633.82## 2    CA  central  newtown gloves 20.90 222.79## 3    CA  central columbus  shoes 71.73 361.99## 4    CA  central  newtown  shoes 57.62 202.80``

On the left side of the ~, `cbind` is used to indicate that we want each column separately. If `cost + amount` were specified, it would mean the sum here because these are numeric. On the right side of the ~, we have factors, so the + means that we are aggregating by each level of each factor.

## R: aggregate based of multiple columns

The last two column names have two dots at the end and Species is incorrectly spelled:

``> names(Trees)[1] "Tree.Speices" "DBH"          "Basal.Area"   "Compartment"  "Stand"       [6] "Transect.."   "Plot.."   ``

Try:

``aggregate(.~ Compartment + Stand + Transect.. + Plot.. + Tree.Speices,    data = Trees, FUN = sum)``

or remove the dots at the end of all names and correct the spelling:

``names(Trees) <- sub("\\.+\$", "", names(Trees))names(Trees) <- sub("Speices", "Species", names(Trees))aggregate(.~ Compartment + Stand + Transect + Plot + Tree.Species,    data = Trees, FUN = sum)``

## How to aggregate/ sum values by time in r

Using `dplyr` and `lubridate` packages we can extract the `hour` from each `Time` and `sum`.

``library(dplyr)library(lubridate)df %>%  mutate(hour = hour(dmy_hms(Time))) %>%  group_by(hour) %>%  summarise(Precipitation = sum(Precipitation, na.rm = TRUE))``

For aggregation by date, we can do

``df %>%  mutate(day = as.Date(dmy_hms(Time))) %>%  group_by(day) %>%  summarise(Precipitation = sum(Precipitation, na.rm = TRUE))``

Using base R, we could do

``df\$Hour <- format(as.POSIXct(df\$Time, format = "%d/%m/%Y %I:%M:%S %p"), "%H")df\$Day <- as.Date(as.POSIXct(df\$Time, format = "%d/%m/%Y %I:%M:%S %p"))#Aggregation by houraggregate(Precipitation~Hour, df, sum, na.rm = TRUE)#Aggregation by dateaggregate(Precipitation~Day, df, sum, na.rm = TRUE)``

EDIT

Based on updated data and information, we can do

``df <- readxl::read_xlsx("/path/to/file/df (1).xlsx")hour_df <- df %>%             group_by(hour = hour(Time)) %>%             summarise(Precipitation = sum(Precipitation, na.rm = TRUE))day_df <-  df %>%              mutate(day = as.Date(Time)) %>%              group_by(day) %>%              summarise(Precipitation = sum(Precipitation, na.rm = TRUE))``

So `hour_df` has got hourly `sum` of values without taking into consideration the date and `day_df` has got `sum` of `Precipitation` for each day.

data

``Id <- c(1,2,3,4)Time <- c("10/7/2014  12:30:00 am", "10/7/2014  01:00:05 am",          "10/7/2014  01:30:10 am", "10/7/2014  02:00:15 am")Precipitation <- c(0.06, 0.02,0,0.25)df <- data.frame(Id, Time, Precipitation)``

## Aggregate multiple columns at once

We can use the formula method of `aggregate`. The variables on the 'rhs' of `~` are the grouping variables while the `.` represents all other variables in the 'df1' (from the example, we assume that we need the `mean` for all the columns except the grouping), specify the dataset and the function (`mean`).

``aggregate(.~id1+id2, df1, mean)``

Or we can use `summarise_each` from `dplyr` after grouping (`group_by`)

``library(dplyr)df1 %>%    group_by(id1, id2) %>%     summarise_each(funs(mean))``

Or using `summarise` with `across` (`dplyr` devel version - `‘0.8.99.9000’`)

``df1 %>%     group_by(id1, id2) %>%    summarise(across(starts_with('val'), mean))``

Or another option is `data.table`. We convert the 'data.frame' to 'data.table' (`setDT(df1)`, grouped by 'id1' and 'id2', we loop through the subset of data.table (`.SD`) and get the `mean`.

``library(data.table)setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)] ``

### data

``df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b", "b", "b"), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"), val1 = c(1L, 2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L, 9L, 8L)), .Names = c("id1", "id2", "val1", "val2"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8"))``