sumifs and countifs equivalent in R
Group by id
and sum
the value
for the year
in 2020 and count the number of rows for it as well.
library(dplyr)
df %>%
group_by(id) %>%
mutate(value_2020 = sum(value[year %in% 2020]),
frequency_2020 = sum(year %in% 2020)) %>%
ungroup
# id value year total_value total_frequency value_2020 frequency_2020
# <int> <int> <int> <int> <int> <int> <int>
#1 1 10 2019 44 4 19 2
#2 1 15 2019 44 4 19 2
#3 1 12 2020 44 4 19 2
#4 1 7 2020 44 4 19 2
#5 2 10 2019 28 2 18 1
#6 2 18 2020 28 2 18 1
#7 3 27 2019 27 1 0 0
SUMIF rows equivalent in R
A base R solution:
df <- data.frame(
stringsAsFactors = FALSE,
Region = c("North", "North", "South", "South"),
Area = c("AreaA", "AreaB", "AreaC", "AreaD"),
Numerator = c(1L, 1L, 1L, 1L),
Denominator = c(10L, 10L, 10L, 10L)
)
aggregate(cbind(Numerator, Denominator) ~ Region, data = df, FUN = sum)
#> Region Numerator Denominator
#> 1 North 2 20
#> 2 South 2 20
Another possible solution, based on dplyr
:
library(dplyr)
df %>%
group_by(Region) %>%
summarise(across(-Area, sum))
#> # A tibble: 2 × 3
#> Region Numerator Denominator
#> <chr> <int> <int>
#> 1 North 2 20
#> 2 South 2 20
R equivalent of Excel's Sumif(s) function across like columns
melt
ing your data (reshaping long) is the way to go. If I understand what you're after correctly, it's simply:
x = fread('PERSON_ID PRODUCT_ID 6/23/2017 6/16/2017 6/9/2017 6/2/2017 6/2/2017
0003603 3024 10.000 5.000 4.000 3.000 2.000
0003603 3024 1.000 2.000 3.000 8.000 1.000
0007654 2111 8.000 3.000 2.000 1.000 0.000
0008885 3025 0.000 0.000 1.000 3.000 9.000
0950645 3024 6.000 5.000 4.000 3.000 2.000',
colClasses = c('character', 'character', rep('numeric', 5L)))
xmlt =
melt(x, id.vars = c('PERSON_ID', 'PRODUCT_ID'),
variable.name = 'week', value.name = 'sales')
xmlt[ , week := as.IDate(week, format = '%m/%d/%Y')]
xmlt[ , .(total_sales = sum(sales)),
keyby = .(PERSON_ID, PRODUCT_ID, week)]
PERSON_ID PRODUCT_ID week total_sales
# 1: 0003603 3024 2017-06-02 14
# 2: 0003603 3024 2017-06-09 7
# 3: 0003603 3024 2017-06-16 7
# 4: 0003603 3024 2017-06-23 11
# 5: 0007654 2111 2017-06-02 1
# 6: 0007654 2111 2017-06-09 2
# 7: 0007654 2111 2017-06-16 3
# 8: 0007654 2111 2017-06-23 8
# 9: 0008885 3025 2017-06-02 12
# 10: 0008885 3025 2017-06-09 1
# 11: 0008885 3025 2017-06-16 0
# 12: 0008885 3025 2017-06-23 0
# 13: 0950645 3024 2017-06-02 5
# 14: 0950645 3024 2017-06-09 4
# 15: 0950645 3024 2017-06-16 5
# 16: 0950645 3024 2017-06-23 6
How to perform the equivalent of Excel sumifs in dplyr where there are multiple conditions?
You're doing the same thing pretty much, you just need to add another &
condition where you are subsetting.
Also you don't need to call Code1[1:n()]
, when you call Code1 it already takes all of the values in the Code1 column.
I believe you are looking for
myData %>% mutate(SumIfs_2 = sapply(1:n(), function(x) sum(Code2[(Code1 < Code1[x]) & (Group < Group[x])])))
Name Group Code1 Code2 SumIfs_2
1 B 1 0 1 0
2 R 1 1 0 0
3 R 1 1 2 0
4 R 2 3 0 3
5 R 2 3 1 3
6 B 3 4 2 4
7 A 3 -1 1 0
8 A 4 0 0 1
9 A 1 0 0 0
How to perform the equivalent of Excel rolling sumifs in dplyr?
You could use map()
or imap()
from purrr
:
library(dplyr)
library(purrr)
# (1)
myData %>%
mutate(SumIfs = map_dbl(1:n(), ~ sum(Code2[1:.x][Code1[1:.x] < Code1[.x]])))
# (2)
myData %>%
mutate(SumIfs = imap_dbl(Code1, ~ sum(Code2[1:.y][Code1[1:.y] < .x])))
# Name Group Code1 Code2 SumIfs
# 1 B 0 0 1 0
# 2 R 1 1 0 1
# 3 R 1 1 2 1
# 4 R 2 3 0 3
# 5 R 2 3 1 3
# 6 B 0 4 2 4
# 7 A 0 -1 1 0
# 8 A 0 0 0 1
# 9 A 0 0 0 1
If you don't want to rely on purrr
, the map()
solution can be adapted directly for the base sapply()
version:
myData %>%
mutate(SumIfs = sapply(1:n(), \(x) sum(Code2[1:x][Code1[1:x] < Code1[x]])))
Concise R code equivalent to the sumif function in Excel - particularly applied over data frame rows
#DATA
set.seed(42)
df = data.frame(matrix(sample(0:3, 250, replace = TRUE), ncol = 50))
Here's how you'd do using rowSums()
rowSums(df[,2:11] * (df[,2:11] > 1), na.rm = TRUE)
#[1] 18 13 11 19 11
Compare with your solution
apply(df[,2:11], 1, function(x) sum(2*(x==2), 3*(x==3), na.rm=TRUE))
#[1] 18 13 11 19 11
Related Topics
Shiny: Unwanted Space Added by Plotoutput() And/Or Renderplot()
Generate All Combinations, of All Lengths, in R, from a Vector
Convert Time Object to Categorical (Morning, Afternoon, Evening, Night) Variable in R
Error in Terms.Formula(Formula):'.' in Formula and No 'Data' Argument
Locator Equivalent in Ggplot2 (For Maps)
How to Reverse Legend (Labels and Color) So High Value Starts at Bottom
How to Use More Than 2 Colors in the Color_Tile Function
Changing Class and Mode from Character to Numeric
How to Pass Vector to Integrate Function
As(X, 'Double') and As.Double(X) Are Inconsistent
Draw a Trend Line Using Ggplot
Adding Slight Curve (Or Bend) in Ggplot Geom_Path to Make Path Easier to Read
Error in Chol.Default(Cxx):The Leading Minor of Order Is Not Positive Definite