R: Generating All Permutations of N Weights in Multiples of P

R: Generating all permutations of N weights in multiples of P

Assuming the order of the weights matters, these are compositions; if they don't then these are partitions. In either case, they are restricted by the number of parts, what you have called N, though the code which follows uses numparts. There is also the question of whether weights of 0 are allowed.

Since you want weights to add up to 1, you need 1/p to be an integer, which in the following code is sumparts; it does not depend on the number of weights. Once you have the compositions, you can multiply them by p, i.e. divide by n, to get your weights.

R has a partitions package to generate such compositions or restricted partitions. The following code should be self explanatory: each column in the matrix is a set of weights. I have taken seven weights and p=0.1 or 10%, and prohibited weights of 0: this gives 84 possibilities; allowing weights of 0 would mean 8008 possibilities. With p=0.01 or 1% there would be 1,120,529,256 possibilities without weights of 0, and 1,705,904,746 with. If order does not matter, use restrictedparts instead of compositions.

> library(partitions)
> numparts <- 7  # number of weights
> sumparts <- 10  # reciprocal of p
> weights <- compositions(n=sumparts, m=numparts, include.zero=FALSE)/sumparts
> weights

[1,] 0.4 0.3 0.2 0.1 0.3 0.2 0.1 0.2 0.1 0.1 0.3 0.2 0.1 0.2 0.1 0.1 0.2 0.1 0.1
[2,] 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.1 0.2 0.1 0.1 0.2 0.3 0.1 0.2 0.1 0.1 0.2 0.1
[3,] 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.1 0.1 0.1 0.2 0.2 0.3 0.1 0.1 0.2
[4,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2 0.2 0.3 0.3 0.3
[5,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
[6,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
[7,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

[1,] 0.1 0.3 0.2 0.1 0.2 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.3 0.2 0.1
[2,] 0.1 0.1 0.2 0.3 0.1 0.2 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.3
[3,] 0.1 0.1 0.1 0.1 0.2 0.2 0.3 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1
[4,] 0.4 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1
[5,] 0.1 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.3 0.3 0.3 0.3 0.4 0.1 0.1 0.1
[6,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2
[7,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

[1,] 0.2 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.3
[2,] 0.1 0.2 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1
[3,] 0.2 0.2 0.3 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1
[4,] 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1
[5,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.3 0.1 0.1 0.1 0.1 0.2 0.1 0.1
[6,] 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.3 0.3 0.3 0.3 0.3 0.4 0.1
[7,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2

[1,] 0.2 0.1 0.2 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1
[2,] 0.2 0.3 0.1 0.2 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1
[3,] 0.1 0.1 0.2 0.2 0.3 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.1 0.1
[4,] 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.1
[5,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.3 0.1 0.1 0.1 0.1 0.2
[6,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2
[7,] 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2

[1,] 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1
[2,] 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1
[3,] 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1
[4,] 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1
[5,] 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.1
[6,] 0.3 0.1 0.1 0.1 0.1 0.1 0.2 0.1
[7,] 0.2 0.3 0.3 0.3 0.3 0.3 0.3 0.4

Generate combinations of values which sum to one, sorted in descending order

A non-base possibility:

library(partitions)

step <- 0.1
n_weights <- 3

t(restrictedparts(n = 1/step, m = n_weights) * step)
#  [1,] 1.0 0.0 0.0
#  [2,] 0.9 0.1 0.0
#  [3,] 0.8 0.2 0.0
#  [4,] 0.7 0.3 0.0
#  [5,] 0.6 0.4 0.0
#  [6,] 0.5 0.5 0.0
#  [7,] 0.8 0.1 0.1
#  [8,] 0.7 0.2 0.1
#  [9,] 0.6 0.3 0.1
# [10,] 0.5 0.4 0.1
# [11,] 0.6 0.2 0.2
# [12,] 0.5 0.3 0.2
# [13,] 0.4 0.4 0.2
# [14,] 0.4 0.3 0.3

Generate first N combinations of length L from weighted set in order of weight

Below is some enumeration code that uses a heap. The implementation principle is slightly different from what user3386109 proposes in their comment.

Order the symbols by decreasing probability. There’s a constructive one-to-one correspondence between length-L combinations of S symbols, and binary strings of length S + L − 1 with L − 1 zeros (counting out each symbol in unary with L − 1 delimiters). We can do bit-at-a-time enumeration of the possibilities for the latter.

The part that saves us from having to enumerate every single combination is that, for each binary prefix, the most probable word can be found by repeating the most probable letter still available. By storing the prefixes in a heap, we can open only the ones that appear in the top N.

Note that this uses memory proportional to the number of enumerated combinations. This may still be too much, in which case you probably want something like iterative deepening depth-first search.

symbol_probability_dict = {"a": 0.7, "b": 0.1, "c": 0.3, "z": 0.01}
L = 4

import heapq
import math

loss_symbol_pairs = [(-math.log(p), c) for (c, p) in symbol_probability_dict.items()]
loss_symbol_pairs.sort()

heap = [(0, 0, "")]
while heap:
    min_loss, i, s = heapq.heappop(heap)
    if len(s) < L:
        heapq.heappush(heap, (min_loss, i, s + loss_symbol_pairs[i][1]))
        if i + 1 < len(loss_symbol_pairs):
            heapq.heappush(
                heap,
                (
                    min_loss
                    + (L - len(s))
                    * (loss_symbol_pairs[i + 1][0] - loss_symbol_pairs[i][0]),
                    i + 1,
                    s,
                ),
            )
    else:
        print(s)

Output:

aaaa
aaac
aacc
aaab
accc
aacb
cccc
accb
aabb
aaaz
cccb
acbb
aacz
ccbb
abbb
accz
aabz
cbbb
cccz
acbz
bbbb
ccbz
abbz
aazz
cbbz
aczz
bbbz
cczz
abzz
cbzz
bbzz
azzz
czzz
bzzz
zzzz

Enumerating distibutions

Based on your response in the comments above, here's a recursive solution in Ruby:

$resolution = 100
$res_f = $resolution.to_f

def allocate(remainder, bin_number, all_bin_values, number_of_bins)
  if bin_number >= number_of_bins
    all_bin_values << remainder / $res_f
    puts all_bin_values.join(", ")
    all_bin_values.pop
  else
    remainder.downto(1) do |i|
      if remainder - i >= number_of_bins - bin_number
        all_bin_values << i / $res_f
        allocate(remainder - i, bin_number + 1, all_bin_values, number_of_bins)
        all_bin_values.pop
      end
    end
  end
end

num_bins = (ARGV.shift || 4).to_i
allocate($resolution, 1, [], num_bins)

The number of containers defaults to 4, but you can override that at run time by providing a command-line argument.

ADDENDUM

I was surprised by your comment that a looped version "was way too slow". All else being equal, looping should be faster than recursion and that was the case when I timed the iterative version given here:

resolution = 100
res_f = resolution.to_f

resolution.downto(1) do |first|
  remainder1 = resolution - first
  remainder1.downto(1) do |second|
    remainder2 = remainder1 - second
    remainder2.downto(1) do |third|
      fourth = remainder2 - third
      printf "%g, %g, %g, %g\n", first / res_f,
        second / res_f, third / res_f, fourth / res_f if fourth > 0
    end
  end
end

Although this is faster, the downside is that if you wanted a different number of containers the code would have to be modified accordingly by adding additional nested loops.

Finding all possible combinations of numbers to reach a given sum

This problem can be solved with a recursive combinations of all possible sums filtering out those that reach the target. Here is the algorithm in Python:

def subset_sum(numbers, target, partial=[]):
    s = sum(partial)

    # check if the partial sum is equals to target
    if s == target: 
        print "sum(%s)=%s" % (partial, target)
    if s >= target:
        return  # if we reach the number why bother to continue
    
    for i in range(len(numbers)):
        n = numbers[i]
        remaining = numbers[i+1:]
        subset_sum(remaining, target, partial + [n]) 
   

if __name__ == "__main__":
    subset_sum([3,9,8,4,5,7,10],15)

    #Outputs:
    #sum([3, 8, 4])=15
    #sum([3, 5, 7])=15
    #sum([8, 7])=15
    #sum([5, 10])=15

This type of algorithms are very well explained in the following Stanford's Abstract Programming lecture - this video is very recommendable to understand how recursion works to generate permutations of solutions.

Edit

The above as a generator function, making it a bit more useful. Requires Python 3.3+ because of yield from.

def subset_sum(numbers, target, partial=[], partial_sum=0):
    if partial_sum == target:
        yield partial
    if partial_sum >= target:
        return
    for i, n in enumerate(numbers):
        remaining = numbers[i + 1:]
        yield from subset_sum(remaining, target, partial + [n], partial_sum + n)

Here is the Java version of the same algorithm:

package tmp;

import java.util.ArrayList;
import java.util.Arrays;

class SumSet {
    static void sum_up_recursive(ArrayList<Integer> numbers, int target, ArrayList<Integer> partial) {
       int s = 0;
       for (int x: partial) s += x;
       if (s == target)
            System.out.println("sum("+Arrays.toString(partial.toArray())+")="+target);
       if (s >= target)
            return;
       for(int i=0;i<numbers.size();i++) {
             ArrayList<Integer> remaining = new ArrayList<Integer>();
             int n = numbers.get(i);
             for (int j=i+1; j<numbers.size();j++) remaining.add(numbers.get(j));
             ArrayList<Integer> partial_rec = new ArrayList<Integer>(partial);
             partial_rec.add(n);
             sum_up_recursive(remaining,target,partial_rec);
       }
    }
    static void sum_up(ArrayList<Integer> numbers, int target) {
        sum_up_recursive(numbers,target,new ArrayList<Integer>());
    }
    public static void main(String args[]) {
        Integer[] numbers = {3,9,8,4,5,7,10};
        int target = 15;
        sum_up(new ArrayList<Integer>(Arrays.asList(numbers)),target);
    }
}

It is exactly the same heuristic. My Java is a bit rusty but I think is easy to understand.

C# conversion of Java solution: (by @JeremyThompson)

public static void Main(string[] args)
{
    List<int> numbers = new List<int>() { 3, 9, 8, 4, 5, 7, 10 };
    int target = 15;
    sum_up(numbers, target);
}

private static void sum_up(List<int> numbers, int target)
{
    sum_up_recursive(numbers, target, new List<int>());
}

private static void sum_up_recursive(List<int> numbers, int target, List<int> partial)
{
    int s = 0;
    foreach (int x in partial) s += x;

    if (s == target)
        Console.WriteLine("sum(" + string.Join(",", partial.ToArray()) + ")=" + target);

    if (s >= target)
        return;

    for (int i = 0; i < numbers.Count; i++)
    {
        List<int> remaining = new List<int>();
        int n = numbers[i];
        for (int j = i + 1; j < numbers.Count; j++) remaining.Add(numbers[j]);

        List<int> partial_rec = new List<int>(partial);
        partial_rec.Add(n);
        sum_up_recursive(remaining, target, partial_rec);
    }
}

Ruby solution: (by @emaillenin)

def subset_sum(numbers, target, partial=[])
  s = partial.inject 0, :+
# check if the partial sum is equals to target

  puts "sum(#{partial})=#{target}" if s == target

  return if s >= target # if we reach the number why bother to continue

  (0..(numbers.length - 1)).each do |i|
    n = numbers[i]
    remaining = numbers.drop(i+1)
    subset_sum(remaining, target, partial + [n])
  end
end

subset_sum([3,9,8,4,5,7,10],15)

Edit: complexity discussion

As others mention this is an NP-hard problem. It can be solved in exponential time O(2^n), for instance for n=10 there will be 1024 possible solutions. If the targets you are trying to reach are in a low range then this algorithm works. So for instance:

subset_sum([1,2,3,4,5,6,7,8,9,10],100000) generates 1024 branches because the target never gets to filter out possible solutions.

On the other hand subset_sum([1,2,3,4,5,6,7,8,9,10],10) generates only 175 branches, because the target to reach 10 gets to filter out many combinations.

If N and Target are big numbers one should move into an approximate version of the solution.

R: sample() command subject to a constraint

To make sure you're sampling uniformly, you could just generate all the permutations and limit to those that sum to 7:

library(gtools)
perms <- permutations(8, 7, 0:7, repeats.allowed=T)
perms7 <- perms[rowSums(perms) == 7,]

From nrow(perms7), we see there are only 1716 possible permutations that sum to 7. Now you can uniformly sample from the permutations:

set.seed(144)
my.perms <- perms7[sample(nrow(perms7), 100000, replace=T),]
head(my.perms)
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,]    0    0    0    2    5    0    0
# [2,]    1    3    0    1    2    0    0
# [3,]    1    4    1    1    0    0    0
# [4,]    1    0    0    3    0    3    0
# [5,]    0    2    0    0    0    5    0
# [6,]    1    1    2    0    0    2    1

An advantage of this approach is that it's easy to see that we're sampling uniformly at random. Also, it's quite quick -- building perms7 took 0.3 seconds on my computer and building a 1 million-row my.perms took 0.04 seconds. If you need to draw many vectors this will be quite a bit quicker than a recursive approach because you're just using matrix indexing into perms7 instead of generating each vector separately.

Here's a distribution of counts of numbers in the sample:

#      0      1      2      3      4      5      6      7 
# 323347 188162 102812  51344  22811   8629   2472    423

Combinations of weighted elements in a set where weighted sum equal to fixed integer (in python)

Looping through all n! permutations is much too expensive. Instead, generate all 2^n subsets.

from itertools import chain, combinations

def weight(A):
    return sum(weights[x] for x in A)

# Copied from example at http://docs.python.org/library/itertools.html
def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in xrange(len(s) + 1))

[x for x in powerset({'A', 'B', 'C', 'D', 'E'}) if weight(x) == W]

yields

[('A', 'D'), ('C', 'B'), ('C', 'E'), ('A', 'B', 'E'), ('B', 'E', 'D')]

This can be converted to sorted tuples by changing the return part of the list comprehension to tuple(sorted(x)), or by replacing the list call in powerset with one to sorted.

Python: get every possible combination of weights for a portfolio

Something like this, although it's not really efficient.

from decimal import Decimal
import itertools

# possible optimization: use integers rather than Decimal
weights = [Decimal("-0.4"), Decimal("-0.2"), Decimal(0), Decimal("0.2"), Decimal("0.4")]

def possible_weightings(n = 5, target = 0):
    for all_bar_one in itertools.product(weights, repeat = n - 1):
        final = target - sum(all_bar_one)
        if final in weights:
            yield all_bar_one + (final,)

I repeat from comments, you cannot do this for n = 50. The code yields the right values, but there isn't time in the universe to iterate over all the possible weightings.

This code isn't brilliant. It does some unnecessary work examining cases where, for example, the sum of all but the first two is already greater than 0.8 and so there's no point separately checking all the possibilities for the first of those two.

So, this does n = 5 in nearly no time, but there is some value of n where this code becomes infeasibly slow, and you could get further with better code. You still won't get to 50. I'm too lazy to write that better code, but basically instead of all_bar_one you can make recursive calls to possible_weightings with successively smaller values of n and a value of target equal to the target you were given, minus the sum you have so far. Then prune all the branches you don't need to take, by bailing out early in cases where target is too large (positive or negative) to be reached using only n values.

R: Generating All Permutations of N Weights in Multiples of P