Aggregating values on a data tree with R
You could simply use a recursive function:
myApply <- function(node) {
node$totalHours <-
sum(c(node$hours, purrr::map_dbl(node$children, myApply)), na.rm = TRUE)
}
myApply(tree)
print(tree, "hours", "totalHours")
Result:
levelName hours totalHours
1 Ned NA 5
2 °--John 1 5
3 °--Kate 1 4
4 ¦--Dan 1 1
5 ¦--Ron 1 1
6 °--Sienna 1 1
Edit: Filling two elements:
# Create data frame
to <- c("Ned", "John", "Kate", "Kate", "Kate")
from <- c("John", "Kate", "Dan", "Ron", "Sienna")
hours <- c(1,1,1,1,1)
hours2 <- 5:1
df <- data.frame(from,to,hours, hours2)
# Create data tree
tree <- FromDataFrameNetwork(df)
print(tree, "hours", "hours2")
myApply <- function(node) {
res.ch <- purrr::map(node$children, myApply)
a <- node$totalHours <-
sum(c(node$hours, purrr::map_dbl(res.ch, 1)), na.rm = TRUE)
b <- node$totalHours2 <-
sum(c(node$hours2, purrr::map_dbl(res.ch, 2)), na.rm = TRUE)
list(a, b)
}
myApply(tree)
print(tree, "hours", "totalHours", "hours2", "totalHours2")
Result:
levelName hours totalHours hours2 totalHours2
1 Ned NA 5 NA 15
2 °--John 1 5 5 15
3 °--Kate 1 4 4 10
4 ¦--Dan 1 1 3 3
5 ¦--Ron 1 1 2 2
6 °--Sienna 1 1 1 1
Recursive aggregation of data tree
Assume that the data.tree looks like this
levelName hours
1 Team1 NA
2 °--Team1-1 0
3 ¦--Team1-1-1 150
4 ¦ ¦--Team1-1-1a 65
5 ¦ ¦--Team1-1-1b 20
6 ¦ °--Team1-1-1c 30
7 °--Team1-1-2 200
8 °--Team1-1-2a 30
Here are two versions for you to try out as I'm not sure about which one you want.
Non-cumulative
tree$Do(function(node) {
node$actual_hours <- node$hours - if (node$isLeaf) 0 else Aggregate(node, attribute = "hours", aggFun = sum)
}, traversal = "post-order")
> print(tree, "hours", "actual_hours")
levelName hours actual_hours
1 Team1 NA NA
2 °--Team1-1 0 -350 # see here, -350=0-(150+200)
3 ¦--Team1-1-1 150 35
4 ¦ ¦--Team1-1-1a 65 65
5 ¦ ¦--Team1-1-1b 20 20
6 ¦ °--Team1-1-1c 30 30
7 °--Team1-1-2 200 170
8 °--Team1-1-2a 30 30
Cumulative
tree$Do(function(node) {
node$actual_hours <- node$hours - if (node$isLeaf) 0 else Aggregate(node, attribute = "actual_hours", aggFun = sum)
}, traversal = "post-order")
> print(tree, "hours", "actual_hours")
levelName hours actual_hours
1 Team1 NA NA
2 °--Team1-1 0 -205 # -205=0-(35+170)
3 ¦--Team1-1-1 150 35
4 ¦ ¦--Team1-1-1a 65 65
5 ¦ ¦--Team1-1-1b 20 20
6 ¦ °--Team1-1-1c 30 30
7 °--Team1-1-2 200 170
8 °--Team1-1-2a 30 30
R - data.tree aggregate along ancestors of a leaf?
For this, the Do comes in handy:
thetree$result <- thetree$p
traversal <- Traverse(thetree, filterFun = isNotRoot)
Do(traversal, function(node) node$result <- node$parent$result + node$p)
This then gets the desired result:
print(thetree, "p", "result")
levelName p result
1 1 0.1 0.1
2 ¦--1.1 0.2 0.3
3 ¦ ¦--1.1.1 0.3 0.6
4 ¦ °--1.1.2 0.4 0.7
5 °--1.2 0.5 0.6
6 ¦--1.2.1 0.6 1.2
7 °--1.2.2 0.7 1.3
How to aggregate data one after another in r?
We can use rollmean
from zoo
library(zoo)
rollmean(h_1, 2)
#[1] 3.0 5.0 6.5 8.0 10.0
rollmean(h_1, 3)
#[1] 4.000000 5.666667 7.333333 9.000000
Hierarchical data in R- How do I sum over subsets, while maintaing the tree?
Another job for aggregate
. Calling your data frame dat
:
aggregate(cbind(cost, amount) ~ state+district+branch+order, data=dat, FUN=sum)
## state district branch order cost amount
## 1 CA central columbus gloves 19.38 633.82
## 2 CA central newtown gloves 20.90 222.79
## 3 CA central columbus shoes 71.73 361.99
## 4 CA central newtown shoes 57.62 202.80
On the left side of the ~, cbind
is used to indicate that we want each column separately. If cost + amount
were specified, it would mean the sum here because these are numeric. On the right side of the ~, we have factors, so the + means that we are aggregating by each level of each factor.
R: aggregate based of multiple columns
The last two column names have two dots at the end and Species is incorrectly spelled:
> names(Trees)
[1] "Tree.Speices" "DBH" "Basal.Area" "Compartment" "Stand"
[6] "Transect.." "Plot.."
Try:
aggregate(.~ Compartment + Stand + Transect.. + Plot.. + Tree.Speices,
data = Trees, FUN = sum)
or remove the dots at the end of all names and correct the spelling:
names(Trees) <- sub("\\.+$", "", names(Trees))
names(Trees) <- sub("Speices", "Species", names(Trees))
aggregate(.~ Compartment + Stand + Transect + Plot + Tree.Species,
data = Trees, FUN = sum)
How to aggregate/ sum values by time in r
Using dplyr
and lubridate
packages we can extract the hour
from each Time
and sum
.
library(dplyr)
library(lubridate)
df %>%
mutate(hour = hour(dmy_hms(Time))) %>%
group_by(hour) %>%
summarise(Precipitation = sum(Precipitation, na.rm = TRUE))
For aggregation by date, we can do
df %>%
mutate(day = as.Date(dmy_hms(Time))) %>%
group_by(day) %>%
summarise(Precipitation = sum(Precipitation, na.rm = TRUE))
Using base R, we could do
df$Hour <- format(as.POSIXct(df$Time, format = "%d/%m/%Y %I:%M:%S %p"), "%H")
df$Day <- as.Date(as.POSIXct(df$Time, format = "%d/%m/%Y %I:%M:%S %p"))
#Aggregation by hour
aggregate(Precipitation~Hour, df, sum, na.rm = TRUE)
#Aggregation by date
aggregate(Precipitation~Day, df, sum, na.rm = TRUE)
EDIT
Based on updated data and information, we can do
df <- readxl::read_xlsx("/path/to/file/df (1).xlsx")
hour_df <- df %>%
group_by(hour = hour(Time)) %>%
summarise(Precipitation = sum(Precipitation, na.rm = TRUE))
day_df <- df %>%
mutate(day = as.Date(Time)) %>%
group_by(day) %>%
summarise(Precipitation = sum(Precipitation, na.rm = TRUE))
So hour_df
has got hourly sum
of values without taking into consideration the date and day_df
has got sum
of Precipitation
for each day.
data
Id <- c(1,2,3,4)
Time <- c("10/7/2014 12:30:00 am", "10/7/2014 01:00:05 am",
"10/7/2014 01:30:10 am", "10/7/2014 02:00:15 am")
Precipitation <- c(0.06, 0.02,0,0.25)
df <- data.frame(Id, Time, Precipitation)
Aggregate multiple columns at once
We can use the formula method of aggregate
. The variables on the 'rhs' of ~
are the grouping variables while the .
represents all other variables in the 'df1' (from the example, we assume that we need the mean
for all the columns except the grouping), specify the dataset and the function (mean
).
aggregate(.~id1+id2, df1, mean)
Or we can use summarise_each
from dplyr
after grouping (group_by
)
library(dplyr)
df1 %>%
group_by(id1, id2) %>%
summarise_each(funs(mean))
Or using summarise
with across
(dplyr
devel version - ‘0.8.99.9000’
)
df1 %>%
group_by(id1, id2) %>%
summarise(across(starts_with('val'), mean))
Or another option is data.table
. We convert the 'data.frame' to 'data.table' (setDT(df1)
, grouped by 'id1' and 'id2', we loop through the subset of data.table (.SD
) and get the mean
.
library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)]
data
df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b",
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"),
val1 = c(1L,
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L,
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
Related Topics
How to Install 2 Different R Versions on Debian
Mapping the Shortest Flight Path Across the Date Line in R Leaflet/Shiny, Using Gcintermediate
How to Get the First 10 Words in a String in R
R: Row-Wise Dplyr::Mutate Using Function That Takes a Data Frame Row and Returns an Integer
Plotting Axis Labels with Greek Symbols from a Vector
How to Create a Pie Chart with Percentage Labels Using Ggplot2
Removing Attributes of Columns in Data.Frames on Multilevel Lists in R
Ggsave Png Error with Larger Size
How to Have a New Line in a 'Bquote' Expression Used with 'Text'
Gadm-Maps Cross-Country Comparison Graphics
Plot Curved Lines Between Two Locations in Ggplot2
How Does R Handle Object in Function Call
Space Between Gpplot2 Horizontal Legend Elements
Unzip Password Protected Zip Files in R
Condition Filter in Dplyr Based on Shiny Input
Scatterplot: Error in Fun(X[[I]], ...):Object 'Group' Not Found
Keep All Plot Components Same Size in Ggplot2 Between Two Plots