Translating Function for Finding All Partitions of a Set from Python to Ruby

Finding all partitions of a set in Java

You're very close to the right answer. You say you are getting infinite recursion, but in reality the program is running in an infinite loop in the outermost loop.

The primary difference from the Python code is that the i variable always advances in the outer loop in the Python version, but in your Java version, the i >>= 1 statement inside the inner loop always leaves i back at zero. The easy way to fix that is to simply use separate variables for the inner and outer loops.

In general, this is why it's a bad idea to try and directly translate a program from one language to another. Almost every program has some idioms that make sense in the original language that will be bizarre or meaningless in the target language. In particular, the Python code relies on implicit promotion to arbitrary precision integers for its correctness. This won't work well in Java, so the implementation below suffers from integer overflow if the input set is larger than 31 elements. Your example is only 4 elements, so for this specific case, it will produce the right answer.

Here's a corrected Java version:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Partition {
    private static List<List<List<String>>> partitions(List<String> inputSet) {
        List<List<List<String>>> res = new ArrayList<>();
        if (inputSet.isEmpty()) {
            List<List<String>> empty = new ArrayList<>();
            res.add(empty);
            return res;
        }
        // Note that this algorithm only works if inputSet.size() < 31
        // since you overflow int space beyond that. This is true even
        // if you use Math.pow and cast back to int. The original
        // Python code does not have this limitation because Python
        // will implicitly promote to a long, which in Python terms is
        // an arbitrary precision integer similar to Java's BigInteger.
        int limit = 1 << (inputSet.size() - 1);
        // Note the separate variable to avoid resetting
        // the loop variable on each iteration.
        for (int j = 0; j < limit; ++j) {
            List<List<String>> parts = new ArrayList<>();
            List<String> part1 = new ArrayList<>();
            List<String> part2 = new ArrayList<>();
            parts.add(part1);
            parts.add(part2);
            int i = j;
            for (String item : inputSet) {
                parts.get(i&1).add(item);
                i >>= 1;
            }
            for (List<List<String>> b : partitions(part2)) {
                List<List<String>> holder = new ArrayList<>();
                holder.add(part1);
                holder.addAll(b);
                res.add(holder);
            }
        }
        return res;
    }

    public static void main(String[] args) {
        for (List<List<String>> partitions :
                 partitions(Arrays.asList("a", "b", "c", "d"))) {
            System.out.println(partitions);
        }
    }
}

Here's the output of my Java version:

[[a, b, c, d]]
[[b, c, d], [a]]
[[a, c, d], [b]]
[[c, d], [a, b]]
[[c, d], [b], [a]]
[[a, b, d], [c]]
[[b, d], [a, c]]
[[b, d], [c], [a]]
[[a, d], [b, c]]
[[a, d], [c], [b]]
[[d], [a, b, c]]
[[d], [b, c], [a]]
[[d], [a, c], [b]]
[[d], [c], [a, b]]
[[d], [c], [b], [a]]

partition of a set or all possible subgroups of a list

Instead of doing all permutations and remove the duplicates, which was my initial thought, then you can use this recursive function, which I found here and here:

def partitions(set_):
    if not set_:
        yield []
        return
    for i in range(int(2**len(set_)/2)):
        parts = [set(), set()]
        for item in set_:
            parts[i&1].add(item)
            i >>= 1
        for b in partitions(parts[1]):
            yield [parts[0]]+b

l = [1, 2, 3, 4]
for p in reversed(sorted(partitions(l))):
    print(p)
print('The Bell number is', len(list(partitions(l))))

It prints:

[{1, 2, 3, 4}]
[{1, 2, 4}, {3}]
[{1, 4}, {2, 3}]
[{1, 4}, {3}, {2}]
[{2, 4}, {1, 3}]
[{2, 4}, {3}, {1}]
[{1, 3, 4}, {2}]
[{2, 3, 4}, {1}]
[{3, 4}, {1, 2}]
[{3, 4}, {2}, {1}]
[{4}, {1, 2, 3}]
[{4}, {1, 3}, {2}]
[{4}, {2, 3}, {1}]
[{4}, {3}, {1, 2}]
[{4}, {3}, {2}, {1}]
The Bell number is 15

Algorithm to produce all partitions of a list in order

You can think of the problem as follows: each of the partitions you want are characterized by a integer between 0 and 2^(n-1). Each 1 in the binary representation of such a number corresponds to a "partition break" between two consecutive numbers, e.g.

 a b|c|d e|f
  0 1 1 0 1

so the number 01101 corresponds to the partition {a,b},{c},{d,e},{f}. To generate the partition from a known parition number, loop through the list and slice off a new subset whenever the corresponding bit it set.

I can understand your pain reading the fashionable functional-programming-flavored Ruby example. Here's a complete example in Python if that helps.

array = ['a', 'b', 'c', 'd', 'e']
n = len(array)

for partition_index in range(2 ** (n-1)):

    # current partition, e.g., [['a', 'b'], ['c', 'd', 'e']]
    partition = []

    # used to accumulate the subsets, e.g., ['a', 'b']
    subset = []

    for position in range(n):

        subset.append(array[position])

        # check whether to "break off" a new subset
        if 1 << position & partition_index or position == n-1:
            partition.append(subset)
            subset = []

    print partition

Yielding sub combinations

You may change your code as follows:

def sub_combinations(segment):
  if len(segment) == 1:
    yield (segment,)
  else:
    for j in sub_combinations(segment[1:]):
      yield (segment[0],)+j
      for k in range(len(j)):
        yield (segment[0]+j[k],)+j[:k]+j[k+1:]

If your segment contains only one character the result is quite easy. Otherwise split off the first character and determine all partitions of the rest of your string. Afterwards you have the following (distinct) solutions: the splitt-off character builds a separate tuple or you can add it to any of the tuples of your previous solution.

Due to the recursive calls this method builds the solution set from the single character case up to the full argument.

Your example case gives the following result:

('A', 'B', 'C', 'D')
('AB', 'C', 'D')
('AC', 'B', 'D')
('AD', 'B', 'C')
('A', 'BC', 'D')
('ABC', 'D')
('AD', 'BC')
('A', 'BD', 'C')
('ABD', 'C')
('AC', 'BD')
('A', 'B', 'CD')
('AB', 'CD')
('ACD', 'B')
('A', 'BCD')
('ABCD',)

How to find all partitions of a set

I've found a straightforward recursive solution.

First, let's solve a simpler problem: how to find all partitions consisting of exactly two parts. For an n-element set, we can count an int from 0 to (2^n)-1. This creates every n-bit pattern, with each bit corresponding to one input element. If the bit is 0, we place the element in the first part; if it is 1, the element is placed in the second part. This leaves one problem: For each partition, we'll get a duplicate result where the two parts are swapped. To remedy this, we'll always place the first element into the first part. We then only distribute the remaining n-1 elements by counting from 0 to (2^(n-1))-1.

Now that we can partition a set into two parts, we can write a recursive function that solves the rest of the problem. The function starts off with the original set and finds all two-part-partitions. For each of these partitions, it recursively finds all ways to partition the second part into two parts, yielding all three-part partitions. It then divides the last part of each of these partitions to generate all four-part partitions, and so on.

The following is an implementation in C#. Calling

Partitioning.GetAllPartitions(new[] { 1, 2, 3, 4 })

yields

{ {1, 2, 3, 4} },
{ {1, 3, 4}, {2} },
{ {1, 2, 4}, {3} },
{ {1, 4}, {2, 3} },
{ {1, 4}, {2}, {3} },
{ {1, 2, 3}, {4} },
{ {1, 3}, {2, 4} },
{ {1, 3}, {2}, {4} },
{ {1, 2}, {3, 4} },
{ {1, 2}, {3}, {4} },
{ {1}, {2, 3, 4} },
{ {1}, {2, 4}, {3} },
{ {1}, {2, 3}, {4} },
{ {1}, {2}, {3, 4} },
{ {1}, {2}, {3}, {4} }.

using System;
using System.Collections.Generic;
using System.Linq;

namespace PartitionTest {
    public static class Partitioning {
        public static IEnumerable<T[][]> GetAllPartitions<T>(T[] elements) {
            return GetAllPartitions(new T[][]{}, elements);
        }

        private static IEnumerable<T[][]> GetAllPartitions<T>(
            T[][] fixedParts, T[] suffixElements)
        {
            // A trivial partition consists of the fixed parts
            // followed by all suffix elements as one block
            yield return fixedParts.Concat(new[] { suffixElements }).ToArray();

            // Get all two-group-partitions of the suffix elements
            // and sub-divide them recursively
            var suffixPartitions = GetTuplePartitions(suffixElements);
            foreach (Tuple<T[], T[]> suffixPartition in suffixPartitions) {
                var subPartitions = GetAllPartitions(
                    fixedParts.Concat(new[] { suffixPartition.Item1 }).ToArray(),
                    suffixPartition.Item2);
                foreach (var subPartition in subPartitions) {
                    yield return subPartition;
                }
            }
        }

        private static IEnumerable<Tuple<T[], T[]>> GetTuplePartitions<T>(
            T[] elements)
        {
            // No result if less than 2 elements
            if (elements.Length < 2) yield break;

            // Generate all 2-part partitions
            for (int pattern = 1; pattern < 1 << (elements.Length - 1); pattern++) {
                // Create the two result sets and
                // assign the first element to the first set
                List<T>[] resultSets = {
                    new List<T> { elements[0] }, new List<T>() };
                // Distribute the remaining elements
                for (int index = 1; index < elements.Length; index++) {
                    resultSets[(pattern >> (index - 1)) & 1].Add(elements[index]);
                }

                yield return Tuple.Create(
                    resultSets[0].ToArray(), resultSets[1].ToArray());
            }
        }
    }
}

Number of partitions with a given constraint

Here's a solution using dynamic programming.

It starts from an empty set, then adds one element at a time and calculates all the valid partitions.

The state space is huge, but notice that to be able to calculate the next step we only need to know about a partition the following things:

For each nationality, how many sets it contains that consists of only a single member of that nationality. (e.g.: {a})
How many sets it contains with mixed elements. (e.g.: {a, b, c})

For each of these configurations I only store the total count. Example:

[0, 1, 2, 2] -> 3
{a}{b}{c}{mixed} 
   e.g.: 3 partitions that look like: {b}, {c}, {c}, {a,c}, {b,c}

Here's the code in python:

import collections
from operator import mul
from fractions import Fraction

def nCk(n,k):
  return int( reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1) )

def good_partitions(l):
    n = len(l)
    i = 0
    prev = collections.defaultdict(int)
    while l:
        #any more from this kind?
        if l[0] == 0:
            l.pop(0)
            i += 1
            continue
        l[0] -= 1
        curr = collections.defaultdict(int)

        for solution,total in prev.iteritems():
            for idx,item in enumerate(solution):
                my_solution = list(solution)
                if idx == i:
                    # add element as a new set
                    my_solution[i] += 1
                    curr[tuple(my_solution)] += total
                elif my_solution[idx]:
                    if idx != n:
                        # add to a set consisting of one element
                        # or merge into multiple sets that consist of one element
                        cnt = my_solution[idx]
                        c = cnt
                        while c > 0:
                            my_solution = list(solution)
                            my_solution[n] += 1
                            my_solution[idx] -= c
                            curr[tuple(my_solution)] += total * nCk(cnt, c)
                            c -= 1
                    else:
                        # add to a mixed set
                        cnt = my_solution[idx]
                        curr[tuple(my_solution)] += total * cnt

        if not prev:
            # one set with one element
            lone = [0] * (n+1)
            lone[i] = 1
            curr[tuple(lone)] = 1

        prev = curr
    return sum(prev.values())

print good_partitions([1, 1, 1, 1])      # 15
print good_partitions([1, 1, 1, 1, 1])   # 52
print good_partitions([2, 1])            # 4
print good_partitions([13, 11, 8])       # 29811734589499214658370837

It produces correct values for the test cases. I also tested it against a brute-force solution (for small values), and it produces the same results.

How to get all subsets of a set? (powerset)

The Python itertools page has exactly a powerset recipe for this:

from itertools import chain, combinations

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

Output:

>>> list(powerset("abcd"))
[(), ('a',), ('b',), ('c',), ('d',), ('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd'), ('a', 'b', 'c'), ('a', 'b', 'd'), ('a', 'c', 'd'), ('b', 'c', 'd'), ('a', 'b', 'c', 'd')]

If you don't like that empty tuple at the beginning, you can just change the range statement to range(1, len(s)+1) to avoid a 0-length combination.

if/elsif/else return behavior difference between Ruby and Python

The Ruby code isn't very idiomatic. The last return in the if-elsif-else branch is a red herring and inconsistent with the other branches in the conditional chain. There's an implicit return on the other branches in the if-elsif-else chain as well.

To generalize the above idea, in all Ruby functions and methods, if control reaches the end of the function without encountering a return, the return value is that of the last expression evaluated. In a conditional chain, that'd be whichever branch was taken.

A minimal example is:

def foo(n)
  if n == 0
    "zero"
  elsif n == 1
    "one"
  else
    "something else"
  end
end

puts foo(0) # => zero
puts foo(1) # => one
puts foo(2) # => something else

Adding returns to any or all of the above branches does nothing to change the behavior, and is normally omitted.

Python's implicit return, on the other hand, is always None. Returning a non-None value involves using an explicit return ("explicit is better than implicit" I guess?).

def foo(n):
    if n == 0:
        return "zero"
    elif n == 1:
        return "one"
    else:
        return "something else"

print(foo(0)) # => zero
print(foo(1)) # => one
print(foo(2)) # => something else

Another note about the Ruby code: usually ! after a function/method name is used for in-place algorithms, that is, ones that modify the input, or otherwise more dangerous versions of non-bang methods. Since quickselect! doesn't modify any class state, it's confusing that it has a bang.