Getting All Combinations Which Sum Up to 100 Using R

Getting all combinations which sum up to 100 using R

followed by Stéphane Laurent's answer, I am able to get a super fast solution by using the uniqueperm2 function here.

library(partitions)

C = t(restrictedparts(10,8))
do.call(rbind, lapply(1:nrow(C),function(i)uniqueperm2(C[i,])))

Update, there is faster solution using iterpc the package.

library(partitions)
library(iterpc)
C = t(restrictedparts(10,8))
do.call(rbind, lapply(1:nrow(C),function(i) getall(iterpc(table(C[i,]), order=T))))

It is about twice the speed of the uniqueperm2

> f <- function(){
do.call(rbind, lapply(1:nrow(C),function(i)uniqueperm2(C[i,])))
}
> g <- function(){
do.call(rbind, lapply(1:nrow(C),function(i) getall(iterpc(table(C[i,]), order=T))))
}
> microbenchmark(f(),g())
Unit: milliseconds
expr min lq mean median uq max neval cld
f() 36.37215 38.04941 40.43063 40.07220 42.29389 46.92574 100 b
g() 16.77462 17.45665 19.46206 18.10101 20.65524 64.11858 100 a

R: get all combinations of three numbers that add up to 100

Since you're limited to 1:100 on only three columns, this is easy to brute force. Would need a more clever solution if the range was larger.

library(data.table)

df <- expand.grid(X = 0:100,
Y = 0:100,
Z = 0:100)

setDT(df)

df[, Sum := X + Y + Z]
df[Sum == 100]
# X Y Z Sum
# 1: 100 0 0 100
# 2: 99 1 0 100
# 3: 98 2 0 100
# 4: 97 3 0 100
# 5: 96 4 0 100
# ---
# 5147: 1 1 98 100
# 5148: 0 2 98 100
# 5149: 1 0 99 100
# 5150: 0 1 99 100
# 5151: 0 0 100 100

Find all combinations of a set of numbers that add up to a certain total

This is precisely what combo/permuteGeneral from RcppAlgos (I am the author) were built for. Since we have repetition of specific elements in our sample vector, we will be finding combinations of multisets that meet our criteria. Note that this is different than the more common case of generating combinations with repetition where each element is allowed to be repeated m times. For many combination generating functions, multisets pose problems as duplicates are introduced and must be dealt with. This can become a bottleneck in your code if the size of your data is decently large. The functions in RcppAlgos handle these cases efficiently without creating any duplicate results. I should mention that there are a couple of other great libraries that handle multisets quite well: multicool and arrangements.

Moving on to the task at hand, we can utilize the constraint arguments of comboGeneral to find all combinations of our vector that meet a specific criteria:

vec <- c(1,1,2,3,5)  ## using variables from @r2evans
uni <- unique(vec)
myRep <- rle(vec)$lengths
ans <- 5

library(RcppAlgos)
lapply(seq_along(uni), function(x) {
comboGeneral(uni, x, freqs = myRep,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = ans)
})

[[1]]
[,1]
[1,] 5

[[2]]
[,1] [,2]
[1,] 2 3

[[3]]
[,1] [,2] [,3]
[1,] 1 1 3

[[4]]
[,1] [,2] [,3] [,4] ## no solutions of length 4

These functions are highly optimized and extend well to larger cases. For example, consider the following example that would produce over 30 million combinations:

## N.B. Using R 4.0.0 with new updated RNG introduced in 3.6.0
set.seed(42)
bigVec <- sort(sample(1:30, 40, TRUE))

rle(bigVec)
Run Length Encoding
lengths: int [1:22] 2 1 2 3 4 1 1 1 2 1 ...
values : int [1:22] 1 2 3 4 5 7 8 9 10 11 ...

bigUni <- unique(bigVec)
bigRep <- rle(bigVec)$lengths
bigAns <- 199
len <- 12

comboCount(bigUni, len, freqs = bigRep)
[1] 32248100

All 300000+ results are returned very quickly:

system.time(bigTest <- comboGeneral(bigUni, len, freqs = bigRep,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = bigAns))
user system elapsed
0.273 0.004 0.271

head(bigTest)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 1 2 3 4 25 26 26 26 27 28 30
[2,] 1 1 2 3 5 24 26 26 26 27 28 30
[3,] 1 1 2 3 5 25 25 26 26 27 28 30
[4,] 1 1 2 3 7 24 24 26 26 27 28 30
[5,] 1 1 2 3 7 24 25 25 26 27 28 30
[6,] 1 1 2 3 7 24 25 26 26 26 28 30

nrow(bigTest)
[1] 280018

all(rowSums(bigTest) == bigAns)
[1] TRUE

Addendum

I must mention that generally when I see a problem like: "finding all combinations that sum to a particular number" my first thought is integer partitions. For example, in the related problem Getting all combinations which sum up to 100 in R, we can easily solve with the partitions library. However, this approach does not extend to the general case (as we have here) where the vector contains specific repetition or we have a vector that contains values that don't easily convert to an integer equivalent (E.g. the vector (0.1, 0.2, 0.3, 0.4) can easily be treated as 1:4, however treating c(3.98486 7.84692 0.0038937 7.4879) as integers and subsequently applying an integer partitions approach would require an extravagant amount of computing power rendering this method useless).

Get all possible combinations within a given product sum

The number of valid combinations under 1244 will increase as you increase n, so I'm not entirely clear on the goal here. That said, here's a version that uses base R:

y <- c(24,48,72) #elements to find product sum
n <- 30
instances <- expand.grid(1:n, 1:n, 1:n)
instances$product_sum = rowSums(data.frame(Map('*', instances, y)))
instances <- subset(instances, product_sum <= 1244 )

First few rows of the result:

  Var1 Var2 Var3 product_sum
1 1 1 1 144
2 2 1 1 168
3 3 1 1 192
4 4 1 1 216
5 5 1 1 240
6 6 1 1 264

If instead of 1:n you use only your original x values of 2, 4, and 6, you should get 27 valid combinations.

Finding all possible combinations of numbers from a vector to reach a given sum (No repetitions)

In the code below, we identify combinations of values with sums in the desired range and then get the indices of those combinations. We then use the indices to get the names of the elements that we want to sum together. lapply takes care of iterating over each number of values that we want to sum together.

This is packaged into a function where x is the named vector of numbers, n is the maximum number of values you want to sum together, and interval is the range that the sums should be in.

Note that the function below returns strings of the form "A + B", as described in your question. If your goal is to do additional processing on the combinations of letters, you'll probably be better off working directly with the matrices (or lists) of combinations returned by combn().

numbers <- c(40,60,20,65,45,30,5,70,100,85,75,10);
names(numbers) <- LETTERS[1:length(numbers)]

library(dplyr) # For between function

combos = function(x, n, interval) {

res = lapply(2:n, function(i) {
cc = combn(x, i)
idx = which(between(colSums(cc), interval[1], interval[2]))
apply(combn(names(x), i)[ , idx], 2, paste, collapse=" + ")
})

cbind(unlist(c(names(x)[between(x, interval[1], interval[2])], res)))
}

combos(numbers, 3, c(90, 110))
     [,1]       
[1,] "I"
[2,] "A + B"
[3,] "A + D"
[4,] "A + H"
[5,] "B + E"
[6,] "B + F"
[7,] "C + H"
[8,] "C + J"
[9,] "C + K"
[10,] "D + E"
[11,] "D + F"
[12,] "F + H"
[13,] "F + K"
[14,] "G + I"
[15,] "G + J"
[16,] "I + L"
[17,] "J + L"
[18,] "A + B + G"
[19,] "A + B + L"
[20,] "A + C + E"
[21,] "A + C + F"
[22,] "A + D + G"
[23,] "A + E + G"
[24,] "A + E + L"
[25,] "B + C + F"
[26,] "B + C + L"
[27,] "B + E + G"
[28,] "B + F + G"
[29,] "B + F + L"
[30,] "C + D + G"
[31,] "C + D + L"
[32,] "C + E + F"
[33,] "C + G + H"
[34,] "C + G + J"
[35,] "C + G + K"
[36,] "C + H + L"
[37,] "C + K + L"
[38,] "D + F + G"
[39,] "D + F + L"
[40,] "F + G + H"
[41,] "F + G + K"
[42,] "F + H + L"
[43,] "G + J + L"
[44,] "G + K + L"
set.seed(2)
nn = rnorm(10)
names(nn) = LETTERS[1:length(nn)]

combos(nn, 3, c(2,2.5))
      [,1]       
[1,] "B + I"
[2,] "C + G"
[3,] "F + I"
[4,] "B + C + G"
[5,] "B + E + I"
[6,] "B + F + I"
[7,] "B + I + J"
[8,] "C + D + I"
[9,] "C + E + G"
[10,] "C + F + G"
[11,] "C + G + H"
[12,] "C + G + J"
[13,] "E + F + I"
[14,] "G + H + I"

find all possible 3 number combinations where sum is less than a given number using R

Try the following to see if it's what the question asks for.

x <- c(10,17,5,7,15)
i <- combn(x, 3, sum) <= 35

combn(x, 3)[, i]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,] 10 10 10 10 10 17 5
#[2,] 17 17 5 5 7 5 7
#[3,] 5 7 7 15 15 7 15

The above is the general idea. A more efficient implementation, both memory and speed wise, is f2 below.

f1 <- function(x, n = 3, thres = 35){
i <- combn(x, n, sum) <= thres
combn(x, n)[, i]
}
f2 <- function(x, n = 3, thres = 35){
cmb <- combn(x, n)
cmb[, colSums(cmb) <= thres]
}

Check if the results are all with different numbers.

res <- f2(x)
apply(res, 2, function(y){
all(y[-1] != y[1])
})
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE

identical(f1(x), f2(x))
#[1] TRUE

Now time the functions.

microbenchmark::microbenchmark(f1 = f1(x), 
f2 = f2(x))
#Unit: microseconds
# expr min lq mean median uq max neval cld
# f1 105.150 107.383 110.66616 108.6535 109.896 238.899 100 b
# f2 62.779 65.568 67.65754 66.4290 67.145 122.119 100 a

Finding all combinations of four numbers that equal a sum in R

I'm sure there are many solutions, here is one with partitions library,

library(partitions)
restrictedparts(10, 4, include.zero = FALSE)

# [1,] 7 6 5 4 5 4 3 4 3
# [2,] 1 2 3 4 2 3 3 2 3
# [3,] 1 1 1 1 2 2 3 2 2
# [4,] 1 1 1 1 1 1 1 2 2

This would be the 4 integers that sum to 10 (not including 0).



Related Topics



Leave a reply



Submit