Excel Sumifs Equivalent in R

sumifs and countifs equivalent in R

Group by id and sum the value for the year in 2020 and count the number of rows for it as well.

library(dplyr)

df %>%
group_by(id) %>%
mutate(value_2020 = sum(value[year %in% 2020]),
frequency_2020 = sum(year %in% 2020)) %>%
ungroup

# id value year total_value total_frequency value_2020 frequency_2020
# <int> <int> <int> <int> <int> <int> <int>
#1 1 10 2019 44 4 19 2
#2 1 15 2019 44 4 19 2
#3 1 12 2020 44 4 19 2
#4 1 7 2020 44 4 19 2
#5 2 10 2019 28 2 18 1
#6 2 18 2020 28 2 18 1
#7 3 27 2019 27 1 0 0

SUMIF rows equivalent in R

A base R solution:

df <- data.frame(
stringsAsFactors = FALSE,
Region = c("North", "North", "South", "South"),
Area = c("AreaA", "AreaB", "AreaC", "AreaD"),
Numerator = c(1L, 1L, 1L, 1L),
Denominator = c(10L, 10L, 10L, 10L)
)

aggregate(cbind(Numerator, Denominator) ~ Region, data = df, FUN = sum)

#> Region Numerator Denominator
#> 1 North 2 20
#> 2 South 2 20

Another possible solution, based on dplyr:

library(dplyr)

df %>%
group_by(Region) %>%
summarise(across(-Area, sum))

#> # A tibble: 2 × 3
#> Region Numerator Denominator
#> <chr> <int> <int>
#> 1 North 2 20
#> 2 South 2 20

R equivalent of Excel's Sumif(s) function across like columns

melting your data (reshaping long) is the way to go. If I understand what you're after correctly, it's simply:

x = fread('PERSON_ID    PRODUCT_ID  6/23/2017   6/16/2017   6/9/2017    6/2/2017    6/2/2017
0003603 3024 10.000 5.000 4.000 3.000 2.000
0003603 3024 1.000 2.000 3.000 8.000 1.000
0007654 2111 8.000 3.000 2.000 1.000 0.000
0008885 3025 0.000 0.000 1.000 3.000 9.000
0950645 3024 6.000 5.000 4.000 3.000 2.000',
colClasses = c('character', 'character', rep('numeric', 5L)))

xmlt =
melt(x, id.vars = c('PERSON_ID', 'PRODUCT_ID'),
variable.name = 'week', value.name = 'sales')

xmlt[ , week := as.IDate(week, format = '%m/%d/%Y')]

xmlt[ , .(total_sales = sum(sales)),
keyby = .(PERSON_ID, PRODUCT_ID, week)]
PERSON_ID PRODUCT_ID week total_sales
# 1: 0003603 3024 2017-06-02 14
# 2: 0003603 3024 2017-06-09 7
# 3: 0003603 3024 2017-06-16 7
# 4: 0003603 3024 2017-06-23 11
# 5: 0007654 2111 2017-06-02 1
# 6: 0007654 2111 2017-06-09 2
# 7: 0007654 2111 2017-06-16 3
# 8: 0007654 2111 2017-06-23 8
# 9: 0008885 3025 2017-06-02 12
# 10: 0008885 3025 2017-06-09 1
# 11: 0008885 3025 2017-06-16 0
# 12: 0008885 3025 2017-06-23 0
# 13: 0950645 3024 2017-06-02 5
# 14: 0950645 3024 2017-06-09 4
# 15: 0950645 3024 2017-06-16 5
# 16: 0950645 3024 2017-06-23 6

How to perform the equivalent of Excel sumifs in dplyr where there are multiple conditions?

You're doing the same thing pretty much, you just need to add another & condition where you are subsetting.

Also you don't need to call Code1[1:n()], when you call Code1 it already takes all of the values in the Code1 column.

I believe you are looking for

myData %>% mutate(SumIfs_2 = sapply(1:n(), function(x) sum(Code2[(Code1 < Code1[x]) & (Group < Group[x])])))
  Name Group Code1 Code2 SumIfs_2
1 B 1 0 1 0
2 R 1 1 0 0
3 R 1 1 2 0
4 R 2 3 0 3
5 R 2 3 1 3
6 B 3 4 2 4
7 A 3 -1 1 0
8 A 4 0 0 1
9 A 1 0 0 0

How to perform the equivalent of Excel rolling sumifs in dplyr?

You could use map() or imap() from purrr:

library(dplyr)
library(purrr)

# (1)
myData %>%
mutate(SumIfs = map_dbl(1:n(), ~ sum(Code2[1:.x][Code1[1:.x] < Code1[.x]])))

# (2)
myData %>%
mutate(SumIfs = imap_dbl(Code1, ~ sum(Code2[1:.y][Code1[1:.y] < .x])))
#   Name Group Code1 Code2 SumIfs
# 1 B 0 0 1 0
# 2 R 1 1 0 1
# 3 R 1 1 2 1
# 4 R 2 3 0 3
# 5 R 2 3 1 3
# 6 B 0 4 2 4
# 7 A 0 -1 1 0
# 8 A 0 0 0 1
# 9 A 0 0 0 1

If you don't want to rely on purrr, the map() solution can be adapted directly for the base sapply() version:

myData %>%
mutate(SumIfs = sapply(1:n(), \(x) sum(Code2[1:x][Code1[1:x] < Code1[x]])))

Concise R code equivalent to the sumif function in Excel - particularly applied over data frame rows

#DATA
set.seed(42)
df = data.frame(matrix(sample(0:3, 250, replace = TRUE), ncol = 50))

Here's how you'd do using rowSums()

rowSums(df[,2:11] * (df[,2:11] > 1), na.rm = TRUE)
#[1] 18 13 11 19 11

Compare with your solution

apply(df[,2:11], 1, function(x) sum(2*(x==2), 3*(x==3), na.rm=TRUE))
#[1] 18 13 11 19 11


Related Topics



Leave a reply



Submit