Algorithm That Will Take Numbers or Words and Find All Possible Combinations

algorithm that will take numbers or words and find all possible combinations

Take a look at http://pear.php.net/package/Math_Combinatorics

<?php
require_once 'Math/Combinatorics.php';
$words = array('cat', 'dog', 'fish');
$combinatorics = new Math_Combinatorics;
foreach($combinatorics->permutations($words, 2) as $p) {
  echo join(' ', $p), "\n"; 
}

prints

cat dog
dog cat
cat fish
fish cat
dog fish
fish dog

Finding all possible combinations of numbers to reach a given sum

This problem can be solved with a recursive combinations of all possible sums filtering out those that reach the target. Here is the algorithm in Python:

def subset_sum(numbers, target, partial=[]):
    s = sum(partial)

    # check if the partial sum is equals to target
    if s == target: 
        print "sum(%s)=%s" % (partial, target)
    if s >= target:
        return  # if we reach the number why bother to continue
    
    for i in range(len(numbers)):
        n = numbers[i]
        remaining = numbers[i+1:]
        subset_sum(remaining, target, partial + [n]) 
   

if __name__ == "__main__":
    subset_sum([3,9,8,4,5,7,10],15)

    #Outputs:
    #sum([3, 8, 4])=15
    #sum([3, 5, 7])=15
    #sum([8, 7])=15
    #sum([5, 10])=15

This type of algorithms are very well explained in the following Stanford's Abstract Programming lecture - this video is very recommendable to understand how recursion works to generate permutations of solutions.

Edit

The above as a generator function, making it a bit more useful. Requires Python 3.3+ because of yield from.

def subset_sum(numbers, target, partial=[], partial_sum=0):
    if partial_sum == target:
        yield partial
    if partial_sum >= target:
        return
    for i, n in enumerate(numbers):
        remaining = numbers[i + 1:]
        yield from subset_sum(remaining, target, partial + [n], partial_sum + n)

Here is the Java version of the same algorithm:

package tmp;

import java.util.ArrayList;
import java.util.Arrays;

class SumSet {
    static void sum_up_recursive(ArrayList<Integer> numbers, int target, ArrayList<Integer> partial) {
       int s = 0;
       for (int x: partial) s += x;
       if (s == target)
            System.out.println("sum("+Arrays.toString(partial.toArray())+")="+target);
       if (s >= target)
            return;
       for(int i=0;i<numbers.size();i++) {
             ArrayList<Integer> remaining = new ArrayList<Integer>();
             int n = numbers.get(i);
             for (int j=i+1; j<numbers.size();j++) remaining.add(numbers.get(j));
             ArrayList<Integer> partial_rec = new ArrayList<Integer>(partial);
             partial_rec.add(n);
             sum_up_recursive(remaining,target,partial_rec);
       }
    }
    static void sum_up(ArrayList<Integer> numbers, int target) {
        sum_up_recursive(numbers,target,new ArrayList<Integer>());
    }
    public static void main(String args[]) {
        Integer[] numbers = {3,9,8,4,5,7,10};
        int target = 15;
        sum_up(new ArrayList<Integer>(Arrays.asList(numbers)),target);
    }
}

It is exactly the same heuristic. My Java is a bit rusty but I think is easy to understand.

C# conversion of Java solution: (by @JeremyThompson)

public static void Main(string[] args)
{
    List<int> numbers = new List<int>() { 3, 9, 8, 4, 5, 7, 10 };
    int target = 15;
    sum_up(numbers, target);
}

private static void sum_up(List<int> numbers, int target)
{
    sum_up_recursive(numbers, target, new List<int>());
}

private static void sum_up_recursive(List<int> numbers, int target, List<int> partial)
{
    int s = 0;
    foreach (int x in partial) s += x;

    if (s == target)
        Console.WriteLine("sum(" + string.Join(",", partial.ToArray()) + ")=" + target);

    if (s >= target)
        return;

    for (int i = 0; i < numbers.Count; i++)
    {
        List<int> remaining = new List<int>();
        int n = numbers[i];
        for (int j = i + 1; j < numbers.Count; j++) remaining.Add(numbers[j]);

        List<int> partial_rec = new List<int>(partial);
        partial_rec.Add(n);
        sum_up_recursive(remaining, target, partial_rec);
    }
}

Ruby solution: (by @emaillenin)

def subset_sum(numbers, target, partial=[])
  s = partial.inject 0, :+
# check if the partial sum is equals to target

  puts "sum(#{partial})=#{target}" if s == target

  return if s >= target # if we reach the number why bother to continue

  (0..(numbers.length - 1)).each do |i|
    n = numbers[i]
    remaining = numbers.drop(i+1)
    subset_sum(remaining, target, partial + [n])
  end
end

subset_sum([3,9,8,4,5,7,10],15)

Edit: complexity discussion

As others mention this is an NP-hard problem. It can be solved in exponential time O(2^n), for instance for n=10 there will be 1024 possible solutions. If the targets you are trying to reach are in a low range then this algorithm works. So for instance:

subset_sum([1,2,3,4,5,6,7,8,9,10],100000) generates 1024 branches because the target never gets to filter out possible solutions.

On the other hand subset_sum([1,2,3,4,5,6,7,8,9,10],10) generates only 175 branches, because the target to reach 10 gets to filter out many combinations.

If N and Target are big numbers one should move into an approximate version of the solution.

Algorithm to get all possible combinations with restrictions

To make things more efficient (and supposing there are sufficiently less solutions than permutations), an idea is:

Generate all permutations one by one.
Don't interpret the permutation in its usual way, but interpret it as telling for each digit to which position it will go. So given a permutation (3, 2, 0, 1) interpret it as 0 goes to pos 3, 1 goes to pos 2, 2 to pos 0, 3 to pos 1 (so, the inverse permutation: (2, 3, 1, 0)). Then, the test to accept a permutation or not, is much more straightforward.

Here is an implementation in Python, but the same idea can be applied in any programming language:

from itertools import permutations

num = 0
q = [0 for _ in range(8)] # create a list to fit 8 values
for p in permutations(range(8)):
    if p[0] < p[1] < p[3] and p[2] < p[3] and p[4] < p[5] and p[6] < p[7]:
        for i, pi in enumerate(p):
            q[pi] = i
        num += 1
        print(num, q)

Output:

1 [0, 1, 2, 3, 4, 5, 6, 7]
2 [0, 1, 2, 3, 4, 6, 5, 7]
3 [0, 1, 2, 3, 4, 6, 7, 5]
4 [0, 1, 2, 3, 6, 4, 5, 7]
5 [0, 1, 2, 3, 6, 4, 7, 5]
6 [0, 1, 2, 3, 6, 7, 4, 5]
...
1257 [4, 6, 7, 5, 2, 0, 1, 3]
1258 [6, 4, 5, 7, 2, 0, 1, 3]
1259 [6, 4, 7, 5, 2, 0, 1, 3]
1260 [6, 7, 4, 5, 2, 0, 1, 3]

How to get all possible combinations to make a string with a dictionary

Let the dictionary D = {w₁, w₂, ..., w_n} where the w_i are dictinct words. Let S be a string.

Let Count(S, D) be a function that returns the number of possible combinations of forming S using the words of the dictionary D. Count(S, D) is defined as follows.

If S is the empty string, return 1.
Set c = 0.
For every word w in D, if S does not start with w, continue with the next word. Otherwise set c = c + Count(S - w, D) where S - w is the string S with w removed from the start.
Return c.

A dynamic programming implementation can easily be derived by storing previous results in a map of strings to their counts. In step 2, before iterating over all words, we check if S is in the map, and return its count, if it is. In step 4, before returning c, we store S in the map with its count.

Algorithm to return all combinations of k elements from n

Art of Computer Programming Volume 4: Fascicle 3 has a ton of these that might fit your particular situation better than how I describe.

Gray Codes

An issue that you will come across is of course memory and pretty quickly, you'll have problems by 20 elements in your set -- ²⁰C₃ = 1140. And if you want to iterate over the set it's best to use a modified gray code algorithm so you aren't holding all of them in memory. These generate the next combination from the previous and avoid repetitions. There are many of these for different uses. Do we want to maximize the differences between successive combinations? minimize? et cetera.

Some of the original papers describing gray codes:

Some Hamilton Paths and a Minimal Change Algorithm
Adjacent Interchange Combination Generation Algorithm

Here are some other papers covering the topic:

An Efficient Implementation of the Eades, Hickey, Read Adjacent Interchange Combination Generation Algorithm (PDF, with code in Pascal)
Combination Generators
Survey of Combinatorial Gray Codes (PostScript)
An Algorithm for Gray Codes

Chase's Twiddle (algorithm)

Phillip J Chase, `Algorithm 382: Combinations of M out of N Objects' (1970)

The algorithm in C...

Index of Combinations in Lexicographical Order (Buckles Algorithm 515)

You can also reference a combination by its index (in lexicographical order). Realizing that the index should be some amount of change from right to left based on the index we can construct something that should recover a combination.

So, we have a set {1,2,3,4,5,6}... and we want three elements. Let's say {1,2,3} we can say that the difference between the elements is one and in order and minimal. {1,2,4} has one change and is lexicographically number 2. So the number of 'changes' in the last place accounts for one change in the lexicographical ordering. The second place, with one change {1,3,4} has one change but accounts for more change since it's in the second place (proportional to the number of elements in the original set).

The method I've described is a deconstruction, as it seems, from set to the index, we need to do the reverse – which is much trickier. This is how Buckles solves the problem. I wrote some C to compute them, with minor changes – I used the index of the sets rather than a number range to represent the set, so we are always working from 0...n.
Note:

Since combinations are unordered, {1,3,2} = {1,2,3} --we order them to be lexicographical.
This method has an implicit 0 to start the set for the first difference.

Index of Combinations in Lexicographical Order (McCaffrey)

There is another way:, its concept is easier to grasp and program but it's without the optimizations of Buckles. Fortunately, it also does not produce duplicate combinations:

The set x_k...x_1 in N that maximizes i = C(x_1,k) + C(x_2,k-1) + ... + C(x_k,1) , where C(n,r) = {n choose r} .

For an example: 27 = C(6,4) + C(5,3) + C(2,2) + C(1,1). So, the 27th lexicographical combination of four things is: {1,2,5,6}, those are the indexes of whatever set you want to look at. Example below (OCaml), requires choose function, left to reader:

(* this will find the [x] combination of a [set] list when taking [k] elements *)
let combination_maccaffery set k x =
    (* maximize function -- maximize a that is aCb              *)
    (* return largest c where c < i and choose(c,i) <= z        *)
    let rec maximize a b x =
        if (choose a b ) <= x then a else maximize (a-1) b x
    in
    let rec iterate n x i = match i with
        | 0 -> []
        | i ->
            let max = maximize n i x in
            max :: iterate n (x - (choose max i)) (i-1)
    in
    if x < 0 then failwith "errors" else
    let idxs =  iterate (List.length set) x k in
    List.map (List.nth set) (List.sort (-) idxs)

A small and simple combinations iterator

The following two algorithms are provided for didactic purposes. They implement an iterator and (a more general) folder overall combinations.
They are as fast as possible, having the complexity O(ⁿC_k). The memory consumption is bound by k.

We will start with the iterator, which will call a user provided function for each combination

let iter_combs n k f =
  let rec iter v s j =
    if j = k then f v
    else for i = s to n - 1 do iter (i::v) (i+1) (j+1) done in
  iter [] 0 0

A more general version will call the user provided function along with the state variable, starting from the initial state. Since we need to pass the state between different states we won't use the for-loop, but instead, use recursion,

let fold_combs n k f x =
  let rec loop i s c x =
    if i < n then
      loop (i+1) s c @@
      let c = i::c and s = s + 1 and i = i + 1 in
      if s < k then loop i s c x else f c x
    else x in
  loop 0 0 [] x

Finding all possible combinations of numbers of an array to reach a given sum

Took me while to code this. So it's basically brute force. I recursively (backtracking) generate all possible expression with the operators given and then evaluate them. Note these are just infix expression(s).

Now this is a very slow solution. There are several optimization one can do here.

vector<string> allCombinations(vector<int> &arr, int k)
{
    int n = (int)arr.size();
    string operators = "+-*";
    vector<string> ans;
    // To check precedence of operators
    auto prec = [&](char op) -> int
    {
        if (op == '*' or op == '/') return 2;
        if (op == '+' or op == '-') return 1;
        return -1;
    };
    // For infix evaluation (kindof a helper function)
    auto compute = [&](int v1, char op, int v2) -> int
    {
        if (op == '+') return v1 + v2;
        if (op == '-') return v1 - v2;
        if (op == '*') return v1 * v2;
        if (op == '/') return v1 / v2;
        assert(false);
        return INT_MAX;
    };
    // Main infix evaluation function
    auto evaluate = [&](string s) -> int
    {
        int len = (int)s.size();
        // vector is being used as a STACK
        vector<int> val;
        vector<char> ops;
        for (int i = 0; i < len; i++)
        {
            char curr = s[i];
            if (curr == ' ') continue;
            if (isdigit(curr))
            {
                int v = 0;
                while (i < len and isdigit(s[i])) v = 10 * v + (s[i++] - '0');
                val.push_back(v);
                i--;
            }
            else
            {
                while (!ops.empty() and prec(curr) <= prec(ops.back()))
                {
                    int v1 = val.back();
                    val.pop_back();
                    int v2 = val.back();
                    val.pop_back();
                    char op = ops.back();
                    ops.pop_back();
                    val.push_back(compute(v2, op, v1));
                }
                ops.push_back(curr);
            }
        }
        while (!ops.empty())
        {
            int v1 = val.back();
            val.pop_back();
            int v2 = val.back();
            val.pop_back();
            char op = ops.back();
            ops.pop_back();
            val.push_back(compute(v2, op, v1));
        }
        return val.back();
    };
    // Generates all expression possible
    function<void(int, string&)> generate = [&](int i, string &s) -> void
    {
        s += to_string(arr[i]);
        if (i == n - 1)
        {
            if (evaluate(s) == k) ans.push_back(s);
            // Backtrack
            s.pop_back();
            return;
        }
        for (char &ops : operators) 
        {
            s.push_back(ops);
            generate(i + 1, s);
            // Backtrack
            s.pop_back();
        }
        // Backtrack
        s.pop_back();
    };
    string s;
    // Try all combinations
    sort(arr.begin(), arr.end());
    do
    {
        generate(0, s);
    } while (next_permutation(arr.begin(), arr.end()));
    return ans;
}

Need an algorithm to evenly iterate over all possible combinations of a set of values

You've already said that you can get the combinations you are looking for by enumerating all n^k possible sequences, except that you don't get them in the desired order.

You could generate the sequences in the right order if you used an odometer-style enumerator. At first, all digits must be 0 or 1. When the odometer would wrap (after 1111...), you increment the set of the digits to [0, 1, 2]. Reset the sequence to 2000... and keep iterating, but only emit sequences that have at least one 2 in them, because you've already generated all sequences of 0's and 1's. Repeat until after wrapping you go beyond the maximum threshold.

Filtering out the duplicates that don't have the current top digit in them can be done by keeping track of the count of top numbers.

Here's an implementation in C with hard-enumed limits:

enum {
    SIZE = 3,
    TOP = 4
};

typedef struct Generator Generator;

struct Generator {
    unsigned top;           // current threshold
    unsigned val[SIZE];     // sequence array
    unsigned tops;          // count of "top" values
};

/*
 *      "raw" generator backend which produces all sequences
 *      and keeps track of how many top numbers there are
 */
int gen_next_raw(Generator *gen)
{
    int i = 0;
    
    do {
        if (gen->val[i] == gen->top) gen->tops--;
        gen->val[i]++;
        if (gen->val[i] == gen->top) gen->tops++;
        
        if (gen->val[i] <= gen->top) return 1;

        gen->val[i++] = 0;
    } while (i < SIZE);
   
    return 0;
}

/*
 *      actual generator, which filters out duplicates
 *      and increases the threshold if needed
 */
int gen_next(Generator *gen)
{
    while (gen_next_raw(gen)) {
        if (gen->tops) return 1;
    }
        
    gen->top++;
    
    if (gen->top > TOP) return 0;
    
    memset(gen->val, 0, sizeof(gen->val));
    gen->val[0] = gen->top;
    gen->tops = 1;    
    
    return 1;
}

The gen_next_raw function is the base implementation of the odometer with the addition of keeping a count of current top digits. The gen_next function uses it as backend. It filters out the duplicates and increases the threshold as needed. (All that can probably be done more efficiently.)

Generate the sequence with:

Generator gen = {0};

while (gen_next(&gen)) {
    if (is_good(gen.val)) {
        puts("Bingo!");
        break;
    }        
}

Algorithm That Will Take Numbers or Words and Find All Possible Combinations