Cumulative count of unique values over time
Give the countries an ID number based on first appearance, and then the cumulative count is the same as the cumulative max of that ID:
mydf = mydf[order(mydf$Year, mydf$Country), ]
mydf$country_id = as.integer(factor(mydf$Country, levels = unique(mydf$Country)))
mydf$cum_n_country = cummax(mydf$country_id)
If years are repeated, you'll need to aggregate/summarize the max cum_n_country
by year.
library(dplyr)
library(ggplot2)
mydf %>%
group_by(Year) %>%
summarize(cum_n_country = max(cum_n_country)) %>%
ggplot(aes(x = Year, y = cum_n_country)) +
geom_line()
R: Calculating cumulative number of unique entries
Here's another solution with dplyr
:
library(dplyr)
test %>%
mutate(cum_unique_entries = cumsum(!duplicated(entries))) %>%
group_by(exp) %>%
slice(n()) %>%
select(-entries)
or
test %>%
mutate(cum_unique_entries = cumsum(!duplicated(entries))) %>%
group_by(exp) %>%
summarise(cum_unique_entries = last(cum_unique_entries))
Result:
# A tibble: 4 x 2
exp cum_unique_entries
<fctr> <int>
1 exp1 4
2 exp2 6
3 exp3 7
4 exp4 9
Note:
First find the cumulative sum of all non-duplicates (cumsum(!duplicated(entries))
), group_by
exp
, and take the last cumsum
of each group, this number would be the cumulative unique entries for each group.
Cumulative count of unique values per group
Another possibility using ave
:
df$obs <- with(df, ave(elig_end_date, names,
FUN = function(x) cumsum(!duplicated(x))))
# names date_of_claim elig_end_date obs
# 1 tom 2010-01-01 2010-07-01 1
# 2 tom 2010-05-04 2010-07-01 1
# 3 tom 2010-06-01 2014-01-01 2
# 4 tom 2010-10-10 2014-01-01 2
# 5 mary 2010-03-01 2014-06-14 1
# 6 mary 2010-05-01 2014-06-14 1
# 7 mary 2010-08-01 2014-06-14 1
# 8 mary 2010-11-01 2014-06-14 1
# 9 mary 2011-01-01 2014-06-14 1
# 10 john 2010-03-27 2011-03-01 1
# 11 john 2010-07-01 2011-03-01 1
# 12 john 2010-11-01 2011-03-01 1
# 13 john 2011-02-01 2011-03-01 1
Counting the cumulative sum of unique values in a vector
We can use count
library(tidyverse)
count(tibble(v1 = vector), v1) %>%
pull(n)
Cumulative sum of unique values based on multiple criteria
This cound help, without the need for a join.
df %>% arrange(Country, Site, species, Year) %>%
filter(Year>1980) %>%
group_by(Site, species) %>%
mutate(nYear = length(unique(Year))) %>%
mutate(spsum = rowid(species))
# A tibble: 30 x 6
# Groups: Site, species [5]
Country Site species Year nYear spsum
<chr> <chr> <int> <int> <int> <int>
1 A F 1 1981 6 1
2 A F 1 1986 6 2
3 A F 1 1991 6 3
4 A F 1 1996 6 4
5 A F 1 2001 6 5
6 A F 1 2006 6 6
7 B G 2 1982 6 1
8 B G 2 1987 6 2
9 B G 2 1992 6 3
10 B G 2 1997 6 4
# ... with 20 more rows
Cumulative sum of unique events for each year
One dplyr
option could be:
df %>%
group_by(id) %>%
mutate(cum_sum = cumsum(!duplicated(event))) %>%
group_by(id, year) %>%
summarise(cum_sum = max(cum_sum))
id year cum_sum
<chr> <dbl> <int>
1 1 1900 3
2 1 1901 3
3 1 1902 5
4 2 1900 1
5 2 1901 3
6 3 1900 1
Cumulative count of each value
The dplyr
way:
library(dplyr)
foo <- data.frame(id=c(1, 2, 3, 2, 2, 1, 2, 3))
foo <- foo %>% group_by(id) %>% mutate(count=row_number())
foo
# A tibble: 8 x 2
# Groups: id [3]
id count
<dbl> <int>
1 1 1
2 2 1
3 3 1
4 2 2
5 2 3
6 1 2
7 2 4
8 3 2
That ends up grouped by id
. If you want it not grouped, add %>% ungroup()
.
dplyr Running count of unique entries
This seems to give the result you are after
df %>%
group_by(subjectID) %>%
mutate(
n_tot = row_number(),
n_case=cumsum(!duplicated(caseID))
)
We use duplicated
to see if the case ID is new or not, and then use cumsum()
to get a running count of new cases.
Related Topics
Removing a List of Columns from a Data.Frame Using Subset
Export All User Inputs in a Shiny App to File and Load Them Later
Change Color Actionbutton Shiny R
Collapse Consecutive Runs of Numbers to a String of Ranges
Sine Curve Fit Using Lm and Nls in R
Percentage Histogram with Facet_Wrap
Rmarkdown Error "Attempt to Use Zero-Length Variable Name"
Find the Index of the Column in Data Frame That Contains the String as Value
R: How to Create a Vector of Functions
R: Data.Table Count !Na Per Row
Data.Table VS Plyr Regression Output
Sum Multiple Columns by Group with Tapply
Associate a Color Palette with Ggplot2 Theme
How to Increase the Resolution of My Plot in R
Colons Equals Operator in R? New Syntax