How to Find All the Subsets of a Set, with Exactly N Elements

How can I find all the subsets of a set, with exactly n elements?

itertools.combinations is your friend if you have Python 2.6 or greater. Otherwise, check the link for an implementation of an equivalent function.

import itertools
def findsubsets(S,m):
    return set(itertools.combinations(S, m))

S: The set for which you want to find subsets

m: The number of elements in the subset

How to find all possible n-elements subsets of a set?

I saw this in topcoder's algorithm tutorials.Take some time and understand how it works.

Iterate through all k-element subsets of {0, 1, … N-1}

   int s = (1 << k) - 1;
   while (!(s & 1 << N))
   {
     // do stuff with s
     int lo = s & ~(s - 1);       // lowest one bit
     int lz = (s + lo) & ~s;      // lowest zero bit above lo
     s |= lz;                     // add lz to the set
     s &= ~(lz - 1);              // reset bits below lz
     s |= (lz / lo / 2) - 1;      // put back right number of bits at end
   }

Finding all the subsets of a set

It's very simple to do this recursively. The basic idea is that for each element, the set of subsets can be divided equally into those that contain that element and those that don't, and those two sets are otherwise equal.

For n=1, the set of subsets is {{}, {1}}
For n>1, find the set of subsets of 1,...,n-1 and make two copies of it. For one of them, add n to each subset. Then take the union of the two copies.

Edit To make it crystal clear:

The set of subsets of {1} is {{}, {1}}
For {1, 2}, take {{}, {1}}, add 2 to each subset to get {{2}, {1, 2}} and take the union with {{}, {1}} to get {{}, {1}, {2}, {1, 2}}
Repeat till you reach n

How to get all subsets of a set? (powerset)

The Python itertools page has exactly a powerset recipe for this:

from itertools import chain, combinations

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

Output:

>>> list(powerset("abcd"))
[(), ('a',), ('b',), ('c',), ('d',), ('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd'), ('a', 'b', 'c'), ('a', 'b', 'd'), ('a', 'c', 'd'), ('b', 'c', 'd'), ('a', 'b', 'c', 'd')]

If you don't like that empty tuple at the beginning, you can just change the range statement to range(1, len(s)+1) to avoid a 0-length combination.

Finding all subsets of specified size

Let us call n the size of the array and k the number of elements to be out in a subarray.

Let us consider the first element A[0] of the array A.

If this element is put in the subset, the problem becomes a (n-1, k-1) similar problem.

If not, it becomes a (n-1, k) problem.

This can be simply implemented in a recursive function.

We just have to pay attention to deal with the extreme cases k == 0 or k > n.

During the process, we also have to keep trace of:

n: the number of remaining elements of A to consider
k: the number of elements that remain to be put in the current subset
index: the index of the next element of A to consider
The current_subset array that memorizes the elements already selected.
Here is a simple code in c++ to illustrate the algorithm

Output

For 5 elements and subsets of size 3:

#include    <iostream>
#include    <vector>

void print (const std::vector<std::vector<int>>& subsets) {
    for (auto &v: subsets) {
        for (auto &x: v) {
            std::cout << x << " ";
        }
        std::cout << "\n";
    }
}
//  n: number of remaining elements of A to consider
//  k: number of elements that remain to be put in the current subset
//  index: index of next element of A to consider

void Get_subset_rec (std::vector<std::vector<int>>& subsets, int n, int k, int index, std::vector<int>& A, std::vector<int>& current_subset) {
    if (n < k) return;   
    if (k == 0) {
        subsets.push_back (current_subset);
        return;
    }  
    Get_subset_rec (subsets, n-1, k, index+1, A, current_subset);
    current_subset.push_back(A[index]);
    Get_subset_rec (subsets, n-1, k-1, index+1, A, current_subset);
    current_subset.pop_back();         // remove last element
    return;
}

void Get_subset (std::vector<std::vector<int>>& subsets, int subset_length, std::vector<int>& A) {
    std::vector<int> current_subset;
    Get_subset_rec (subsets, A.size(), subset_length, 0, A, current_subset);
}

int main () {
    int subset_length = 3;     // subset size
    std::vector A = {1, 2, 3, 4, 5};
    int size = A.size();
    std::vector<std::vector<int>> subsets;

    Get_subset (subsets, subset_length, A);
    std::cout << subsets.size() << "\n";
    print (subsets);
}

Live demo

Find all subsets of a set that sum up to n

It has been pointed out that I've done a mistake. I were multiplying the complexities of recursive calls while I should have added them. So C(N) = C(N-1) + C(N-2) + .... The same would then apply to C(N-1), C(N-2), etc. This means that the complexity isnt' O(N!).

This have made me thinking on the algorithm from another point of view. It is checking every single possible subset. Since there are 2^N - 1 possible subsets (the empty subset is not taken into account), then the complexity is O(2^N), which I think is your original bet.

Is there any algorithm to generate all subsets of a given set in O(n^2) time?

If you want to for example print all the subsets or store them explicitly anywhere there is no such algorithm. Notice that there are 2^n subsets of a set of n elements and simply enumerating them requires an exponential time.

Find unique compositions of k-distinct-element subsets for a set of n elements

I implemented the integral maximum flow construction that's used to prove Baranyai's theorem. More details in your favorite textbook covering factors of a complete hypergraph.

from collections import defaultdict
from fractions import Fraction
from math import factorial
from operator import itemgetter

def binomial(n, k):
    return factorial(n) // (factorial(k) * factorial(n - k))

def find_path(graph, s, t):
    stack = [s]
    predecessor = {s: t}
    while stack:
        v = stack.pop()
        for u in graph[v]:
            if u not in predecessor:
                stack.append(u)
                predecessor[u] = v
    assert t in predecessor
    path = [t]
    while path[-1] != s:
        path.append(predecessor[path[-1]])
    path.reverse()
    return path

def round_flow(flow):
    while True:
        capacities = []
        for (u, v), x in flow.items():
            z = x - x.numerator // x.denominator
            if z:
                capacities.append(((v, u), z))
                capacities.append(((u, v), 1 - z))
        if not capacities:
            break
        (t, s), delta = min(capacities, key=itemgetter(1))
        graph = defaultdict(list)
        for (v, u), z in capacities:
            if (v, u) not in [(s, t), (t, s)]:
                graph[v].append(u)
        path = find_path(graph, s, t)
        for i, v in enumerate(path):
            u = path[i - 1]
            if (u, v) in flow:
                flow[(u, v)] += delta
            else:
                flow[(v, u)] -= delta

def baranyai(n, k):
    m, r = divmod(n, k)
    assert not r
    M = binomial(n - 1, k - 1)
    partition = [[()] * m for i in range(M)]
    for l in range(n):
        flow = defaultdict(Fraction)
        for i, A_i in enumerate(partition):
            for S in A_i:
                flow[(i, S)] += Fraction(k - len(S), n - l)
        round_flow(flow)
        next_partition = []
        for i, A_i in enumerate(partition):
            next_A_i = []
            for S in A_i:
                if flow[(i, S)]:
                    next_A_i.append(S + (l,))
                    flow[(i, S)] -= 1
                else:
                    next_A_i.append(S)
            next_partition.append(next_A_i)
        partition = next_partition
    assert len(partition) == M
    classes = set()
    for A in partition:
        assert len(A) == m
        assert all(len(S) == k for S in A)
        assert len({x for S in A for x in S}) == n
        classes.update(map(frozenset, A))
    assert len(classes) == binomial(n, k)
    return partition

if __name__ == '__main__':
    print(baranyai(9, 3))
    print(baranyai(20, 2))

Let me dump an email that I wrote about this answer here, since it may be useful to others.

Unfortunately there's nothing that would be better suited as a source for transliteration.

The construction I used is due to Brouwer and Schrijver (1979). I could see only half of it at the time because I was looking on Google Books, but there's a PDF floating around now. It's a high-level mathematical description of an inductive proof that sets up a max flow problem, exhibits a fractional solution, and asserts the existence of an integer solution without going into detail about how to get it. My Python implementation used pipage rounding to follow the exact structure of the proof, but if you're just trying to get work done I would suggest calling out to an R library that computes max flows (e.g., igraph).

Let me dash off a concrete example of how to set up the max flows because the abstract proof was pretty mysterious to me before I figured it out. The smallest non trivial example is n=6 and k=3. This means we have (6-1) choose (3-1) = 10 partitions, each with 2 sets of size 3. Starting off with a 10 by 2 by 3 array having all elements blank, the B&S proof calls for us to place the 1s (one per row), then the 2s, then ..., then the 6s. Placing the 1s is boring due to symmetry considerations, so I'll just do it without explanation. (You don't need to special case this in code.)

{1,_,_}, {_,_,_}
{1,_,_}, {_,_,_}
{1,_,_}, {_,_,_}
{1,_,_}, {_,_,_}
{1,_,_}, {_,_,_}
{1,_,_}, {_,_,_}
{1,_,_}, {_,_,_}
{1,_,_}, {_,_,_}
{1,_,_}, {_,_,_}
{1,_,_}, {_,_,_}

Starting with the 2s, things get interesting. The key invariant in intuitive English is that we want each censored subset to appear exactly as many times as we would expect. Of the 6 choose 3 = 20 size-3 subsets of {1, 2, ..., 6} with numbers >2 censored with , there are 4 choose 1 = 4 subsets that look like {1,2,} and 4 choose 2 = 6 that look like {1,,} and 4 choose 2 = 6 that look like {2,,} and 4 choose 3 = 4 that look like {,,_}. What we want to do is place one 2 in each row to get this distribution. Easy enough to do this one by hand:

{1,2,_}, {_,_,_}
{1,2,_}, {_,_,_}
{1,2,_}, {_,_,_}
{1,2,_}, {_,_,_}
{1,_,_}, {2,_,_}
{1,_,_}, {2,_,_}
{1,_,_}, {2,_,_}
{1,_,_}, {2,_,_}
{1,_,_}, {2,_,_}
{1,_,_}, {2,_,_}

When we get to 3 we have 3 choose 0 = 1x {1,2,3}, 3 choose 1 = 3x {1,2,}, 3x {1,3,}, 3x {2,3,}, 3 choose 2 = 3x {1,,}, 3x {2,,}, 3x {3,,}, 3 choose 3 = 1x {,,}. Actual choices to make! This is where max flow comes in.

Let ell be the number that we're placing. The flow network we construct has a source, a vertex for each row, a vertex for each censored subset containing ell, and a sink. There is an arc from the source to each row vertex of capacity 1. There is an arc from each censored subset S to the sink of capacity (n-1 - ell) choose (k - |S|). There is an arc of capacity 1 from each row vertex to each censored subset that could appear in that row if we placed ell there.

Lettering the rows a..j, the middle arcs look like

a-->{1,2,3}
a-->{3,_,_}
b-->{1,2,3}
b-->{3,_,_}
c-->{1,2,3}
c-->{3,_,_}
d-->{1,2,3}
d-->{3,_,_}
e-->{1,3,_}
e-->{2,3,_}
...
j-->{1,3,_}
j-->{2,3,_}

Get an integral max flow, which will place exactly one ell in each row. The placement looks something like

{1,2,3}, {_,_,_}
{1,2,_}, {3,_,_}
{1,2,_}, {3,_,_}
{1,2,_}, {3,_,_}
{1,3,_}, {2,_,_}
{1,3,_}, {2,_,_}
{1,3,_}, {2,_,_}
{1,_,_}, {2,3,_}
{1,_,_}, {2,3,_}
{1,_,_}, {2,3,_}

And on it goes until we get

{1,2,3}, {4,5,6}
{1,2,4}, {3,5,6}
{1,2,5}, {3,4,6}
{1,2,6}, {3,4,5}
{1,3,4}, {2,5,6}
{1,3,5}, {2,4,6}
{1,3,6}, {2,4,5}
{1,4,5}, {2,3,6}
{1,4,6}, {2,3,5}
{1,5,6}, {2,3,4}

Hope this helps!