Find all combinations of a set of numbers that add up to a certain total
This is precisely what combo/permuteGeneral
from RcppAlgos
(I am the author) were built for. Since we have repetition of specific elements in our sample vector, we will be finding combinations of multisets that meet our criteria. Note that this is different than the more common case of generating combinations with repetition where each element is allowed to be repeated m times. For many combination generating functions, multisets pose problems as duplicates are introduced and must be dealt with. This can become a bottleneck in your code if the size of your data is decently large. The functions in RcppAlgos
handle these cases efficiently without creating any duplicate results. I should mention that there are a couple of other great libraries that handle multisets quite well: multicool
and arrangements
.
Moving on to the task at hand, we can utilize the constraint arguments of comboGeneral
to find all combinations of our vector that meet a specific criteria:
vec <- c(1,1,2,3,5) ## using variables from @r2evans
uni <- unique(vec)
myRep <- rle(vec)$lengths
ans <- 5
library(RcppAlgos)
lapply(seq_along(uni), function(x) {
comboGeneral(uni, x, freqs = myRep,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = ans)
})
[[1]]
[,1]
[1,] 5
[[2]]
[,1] [,2]
[1,] 2 3
[[3]]
[,1] [,2] [,3]
[1,] 1 1 3
[[4]]
[,1] [,2] [,3] [,4] ## no solutions of length 4
These functions are highly optimized and extend well to larger cases. For example, consider the following example that would produce over 30 million combinations:
## N.B. Using R 4.0.0 with new updated RNG introduced in 3.6.0
set.seed(42)
bigVec <- sort(sample(1:30, 40, TRUE))
rle(bigVec)
Run Length Encoding
lengths: int [1:22] 2 1 2 3 4 1 1 1 2 1 ...
values : int [1:22] 1 2 3 4 5 7 8 9 10 11 ...
bigUni <- unique(bigVec)
bigRep <- rle(bigVec)$lengths
bigAns <- 199
len <- 12
comboCount(bigUni, len, freqs = bigRep)
[1] 32248100
All 300000+ results are returned very quickly:
system.time(bigTest <- comboGeneral(bigUni, len, freqs = bigRep,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = bigAns))
user system elapsed
0.273 0.004 0.271
head(bigTest)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 1 2 3 4 25 26 26 26 27 28 30
[2,] 1 1 2 3 5 24 26 26 26 27 28 30
[3,] 1 1 2 3 5 25 25 26 26 27 28 30
[4,] 1 1 2 3 7 24 24 26 26 27 28 30
[5,] 1 1 2 3 7 24 25 25 26 27 28 30
[6,] 1 1 2 3 7 24 25 26 26 26 28 30
nrow(bigTest)
[1] 280018
all(rowSums(bigTest) == bigAns)
[1] TRUE
Addendum
I must mention that generally when I see a problem like: "finding all combinations that sum to a particular number" my first thought is integer partitions. For example, in the related problem Getting all combinations which sum up to 100 in R, we can easily solve with the partitions
library. However, this approach does not extend to the general case (as we have here) where the vector contains specific repetition or we have a vector that contains values that don't easily convert to an integer equivalent (E.g. the vector (0.1, 0.2, 0.3, 0.4)
can easily be treated as 1:4
, however treating c(3.98486 7.84692 0.0038937 7.4879)
as integers and subsequently applying an integer partitions approach would require an extravagant amount of computing power rendering this method useless).
Finding all possible combinations of numbers to reach a given sum
This problem can be solved with a recursive combinations of all possible sums filtering out those that reach the target. Here is the algorithm in Python:
def subset_sum(numbers, target, partial=[]):
s = sum(partial)
# check if the partial sum is equals to target
if s == target:
print "sum(%s)=%s" % (partial, target)
if s >= target:
return # if we reach the number why bother to continue
for i in range(len(numbers)):
n = numbers[i]
remaining = numbers[i+1:]
subset_sum(remaining, target, partial + [n])
if __name__ == "__main__":
subset_sum([3,9,8,4,5,7,10],15)
#Outputs:
#sum([3, 8, 4])=15
#sum([3, 5, 7])=15
#sum([8, 7])=15
#sum([5, 10])=15
This type of algorithms are very well explained in the following Stanford's Abstract Programming lecture - this video is very recommendable to understand how recursion works to generate permutations of solutions.
Edit
The above as a generator function, making it a bit more useful. Requires Python 3.3+ because of yield from
.
def subset_sum(numbers, target, partial=[], partial_sum=0):
if partial_sum == target:
yield partial
if partial_sum >= target:
return
for i, n in enumerate(numbers):
remaining = numbers[i + 1:]
yield from subset_sum(remaining, target, partial + [n], partial_sum + n)
Here is the Java version of the same algorithm:
package tmp;
import java.util.ArrayList;
import java.util.Arrays;
class SumSet {
static void sum_up_recursive(ArrayList<Integer> numbers, int target, ArrayList<Integer> partial) {
int s = 0;
for (int x: partial) s += x;
if (s == target)
System.out.println("sum("+Arrays.toString(partial.toArray())+")="+target);
if (s >= target)
return;
for(int i=0;i<numbers.size();i++) {
ArrayList<Integer> remaining = new ArrayList<Integer>();
int n = numbers.get(i);
for (int j=i+1; j<numbers.size();j++) remaining.add(numbers.get(j));
ArrayList<Integer> partial_rec = new ArrayList<Integer>(partial);
partial_rec.add(n);
sum_up_recursive(remaining,target,partial_rec);
}
}
static void sum_up(ArrayList<Integer> numbers, int target) {
sum_up_recursive(numbers,target,new ArrayList<Integer>());
}
public static void main(String args[]) {
Integer[] numbers = {3,9,8,4,5,7,10};
int target = 15;
sum_up(new ArrayList<Integer>(Arrays.asList(numbers)),target);
}
}
It is exactly the same heuristic. My Java is a bit rusty but I think is easy to understand.
C# conversion of Java solution: (by @JeremyThompson)
public static void Main(string[] args)
{
List<int> numbers = new List<int>() { 3, 9, 8, 4, 5, 7, 10 };
int target = 15;
sum_up(numbers, target);
}
private static void sum_up(List<int> numbers, int target)
{
sum_up_recursive(numbers, target, new List<int>());
}
private static void sum_up_recursive(List<int> numbers, int target, List<int> partial)
{
int s = 0;
foreach (int x in partial) s += x;
if (s == target)
Console.WriteLine("sum(" + string.Join(",", partial.ToArray()) + ")=" + target);
if (s >= target)
return;
for (int i = 0; i < numbers.Count; i++)
{
List<int> remaining = new List<int>();
int n = numbers[i];
for (int j = i + 1; j < numbers.Count; j++) remaining.Add(numbers[j]);
List<int> partial_rec = new List<int>(partial);
partial_rec.Add(n);
sum_up_recursive(remaining, target, partial_rec);
}
}
Ruby solution: (by @emaillenin)
def subset_sum(numbers, target, partial=[])
s = partial.inject 0, :+
# check if the partial sum is equals to target
puts "sum(#{partial})=#{target}" if s == target
return if s >= target # if we reach the number why bother to continue
(0..(numbers.length - 1)).each do |i|
n = numbers[i]
remaining = numbers.drop(i+1)
subset_sum(remaining, target, partial + [n])
end
end
subset_sum([3,9,8,4,5,7,10],15)
Edit: complexity discussion
As others mention this is an NP-hard problem. It can be solved in exponential time O(2^n), for instance for n=10 there will be 1024 possible solutions. If the targets you are trying to reach are in a low range then this algorithm works. So for instance:
subset_sum([1,2,3,4,5,6,7,8,9,10],100000)
generates 1024 branches because the target never gets to filter out possible solutions.
On the other hand subset_sum([1,2,3,4,5,6,7,8,9,10],10)
generates only 175 branches, because the target to reach 10
gets to filter out many combinations.
If N
and Target
are big numbers one should move into an approximate version of the solution.
Find out which combinations of numbers in a set add up to a given total
This special case of the Knapsack problem is called Subset Sum.
Find all combinations of a list of numbers with a given sum
You could use itertools to iterate through every combination of every possible size, and filter out everything that doesn't sum to 10:
import itertools
numbers = [1, 2, 3, 7, 7, 9, 10]
target = 10
result = [seq for i in range(len(numbers), 0, -1)
for seq in itertools.combinations(numbers, i)
if sum(seq) == target]
print(result)
Result:
[(1, 2, 7), (1, 2, 7), (1, 9), (3, 7), (3, 7), (10,)]
Unfortunately this is something like O(2^N) complexity, so it isn't suitable for input lists larger than, say, 20 elements.
calculate all combinations that add up to a number
As pointed out in the comments, this is a version of the subset-sum problem (but instead of summing to 0
, you are summing to some desired number).
The Wikipedia page does give an O(2 ^ (n/2))
, however since your inputs seem small, I will just implement the O(2 ^ N * N)
brute-force algorithm to solve this.
function get_summing_subsets(set_arr, target){
let finish = [];
let working = [[]];
while (working.length){
let next_work = [];
for (let i = 0; i < working.length; i++){
for (let j = 0; j < set_arr.length; j++){
let subset = working[i].concat([set_arr[j]]);
let sum = subset.reduce((a,b) => a + b, 0);
if (sum <= target){
(sum == target ? finish : next_work).push(subset);
}
}
}
working = next_work
}
return finish;
}
which works well:
//your example
get_summing_subsets([1,2], 3)
[[1,2],[2,1],[1,1,1]]
//another
get_summing_subsets([1,2,3], 4)
[[1,3],[2,2],[3,1],[1,1,2],[1,2,1],[2,1,1],[1,1,1,1]]
How to get all combination for a given sum with a given number of elements
A very simple approach is to use itertools.combinations
to generate all the possible combinations, and select those that produce the desired sum.
>>> import itertools
>>> numbers = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]
>>> target_sum = 1
>>> number = 3
>>> [c for c in itertools.combinations(numbers, number) if sum(c) == target_sum]
[(0.1, 0.2, 0.7), (0.1, 0.3, 0.6), (0.1, 0.4, 0.5), (0.2, 0.3, 0.5)]
If you want all the possible different orderings as distinct solutions, use itertools.permutations
instead:
>>> [c for c in itertools.permutations(numbers, number) if sum(c) == target_sum]
[(0.1, 0.2, 0.7), (0.1, 0.3, 0.6), (0.1, 0.4, 0.5), (0.1, 0.5, 0.4), (0.1, 0.6, 0.3), (0.1, 0.7, 0.2), (0.2, 0.1, 0.7), (0.2, 0.3, 0.5), (0.2, 0.5, 0.3), (0.3, 0.1, 0.6), (0.3, 0.2, 0.5), (0.3, 0.5, 0.2), (0.4, 0.1, 0.5), (0.4, 0.5, 0.1), (0.5, 0.1, 0.4), (0.5, 0.2, 0.3), (0.5, 0.3, 0.2), (0.5, 0.4, 0.1), (0.6, 0.1, 0.3), (0.7, 0.1, 0.2)]
If you want to be able to use each element more than once, use itertools.combinations_with_replacement
:
>>> [c for c in itertools.combinations_with_replacement(numbers, number) if sum(c) == target_sum]
[(0.1, 0.1, 0.8), (0.1, 0.2, 0.7), (0.1, 0.3, 0.6), (0.1, 0.4, 0.5), (0.2, 0.2, 0.6), (0.2, 0.3, 0.5), (0.2, 0.4, 0.4), (0.3, 0.3, 0.4)]
If you want to use elements multiple times and get all the possible orderings, use itertools.product
:
>>> [c for c in itertools.product(numbers, repeat=number) if sum(c) == target_sum]
[(0.1, 0.1, 0.8), (0.1, 0.2, 0.7), (0.1, 0.3, 0.6), (0.1, 0.4, 0.5), (0.1, 0.5, 0.4), (0.1, 0.6, 0.3), (0.1, 0.7, 0.2), (0.1, 0.8, 0.1), (0.2, 0.1, 0.7), (0.2, 0.2, 0.6), (0.2, 0.3, 0.5), (0.2, 0.4, 0.4), (0.2, 0.5, 0.3), (0.2, 0.6, 0.2), (0.3, 0.1, 0.6), (0.3, 0.2, 0.5), (0.3, 0.3, 0.4), (0.3, 0.4, 0.3), (0.3, 0.5, 0.2), (0.4, 0.1, 0.5), (0.4, 0.2, 0.4), (0.4, 0.3, 0.3), (0.4, 0.4, 0.2), (0.4, 0.5, 0.1), (0.5, 0.1, 0.4), (0.5, 0.2, 0.3), (0.5, 0.3, 0.2), (0.5, 0.4, 0.1), (0.6, 0.1, 0.3), (0.6, 0.2, 0.2), (0.7, 0.1, 0.2), (0.8, 0.1, 0.1)]
Find set of numbers in one collection that adds up to a number in another
This problem is NP-Complete... This is some variation of the sub-set sum problem which is known to be NP-Complete (actually, the sub-set sum problem is easier than yours).
Read here for more information:
http://en.wikipedia.org/wiki/Subset_sum_problem
Related Topics
Remove Quotes from a Character Vector in R
Append Data Frames Together in a for Loop
For Each Row Return the Column Name of the Largest Value
Replacing Nas With Latest Non-Na Value
What Are the Differences Between "=" and "≪-" Assignment Operators
Strptime, As.Posixct and As.Date Return Unexpected Na
Run R Script from Command Line
Insert Rows For Missing Dates/Times
Global and Local Variables in R
Axis Labels on Two Lines With Nested X Variables (Year Below Months)
Get the Difference Between Dates in Terms of Weeks, Months, Quarters, and Years
Subset/Filter Rows in a Data Frame Based on a Condition in a Column
Storing Ggplot Objects in a List from Within Loop in R