Algorithm to return all combinations of k elements from n
Art of Computer Programming Volume 4: Fascicle 3 has a ton of these that might fit your particular situation better than how I describe.
Gray Codes
An issue that you will come across is of course memory and pretty quickly, you'll have problems by 20 elements in your set -- 20C3 = 1140. And if you want to iterate over the set it's best to use a modified gray code algorithm so you aren't holding all of them in memory. These generate the next combination from the previous and avoid repetitions. There are many of these for different uses. Do we want to maximize the differences between successive combinations? minimize? et cetera.
Some of the original papers describing gray codes:
- Some Hamilton Paths and a Minimal Change Algorithm
- Adjacent Interchange Combination Generation Algorithm
Here are some other papers covering the topic:
- An Efficient Implementation of the Eades, Hickey, Read Adjacent Interchange Combination Generation Algorithm (PDF, with code in Pascal)
- Combination Generators
- Survey of Combinatorial Gray Codes (PostScript)
- An Algorithm for Gray Codes
Chase's Twiddle (algorithm)
Phillip J Chase, `Algorithm 382: Combinations of M out of N Objects' (1970)
The algorithm in C...
Index of Combinations in Lexicographical Order (Buckles Algorithm 515)
You can also reference a combination by its index (in lexicographical order). Realizing that the index should be some amount of change from right to left based on the index we can construct something that should recover a combination.
So, we have a set {1,2,3,4,5,6}... and we want three elements. Let's say {1,2,3} we can say that the difference between the elements is one and in order and minimal. {1,2,4} has one change and is lexicographically number 2. So the number of 'changes' in the last place accounts for one change in the lexicographical ordering. The second place, with one change {1,3,4} has one change but accounts for more change since it's in the second place (proportional to the number of elements in the original set).
The method I've described is a deconstruction, as it seems, from set to the index, we need to do the reverse – which is much trickier. This is how Buckles solves the problem. I wrote some C to compute them, with minor changes – I used the index of the sets rather than a number range to represent the set, so we are always working from 0...n.
Note:
- Since combinations are unordered, {1,3,2} = {1,2,3} --we order them to be lexicographical.
- This method has an implicit 0 to start the set for the first difference.
Index of Combinations in Lexicographical Order (McCaffrey)
There is another way:, its concept is easier to grasp and program but it's without the optimizations of Buckles. Fortunately, it also does not produce duplicate combinations:
The set that maximizes , where .
For an example: 27 = C(6,4) + C(5,3) + C(2,2) + C(1,1)
. So, the 27th lexicographical combination of four things is: {1,2,5,6}, those are the indexes of whatever set you want to look at. Example below (OCaml), requires choose
function, left to reader:
(* this will find the [x] combination of a [set] list when taking [k] elements *)
let combination_maccaffery set k x =
(* maximize function -- maximize a that is aCb *)
(* return largest c where c < i and choose(c,i) <= z *)
let rec maximize a b x =
if (choose a b ) <= x then a else maximize (a-1) b x
in
let rec iterate n x i = match i with
| 0 -> []
| i ->
let max = maximize n i x in
max :: iterate n (x - (choose max i)) (i-1)
in
if x < 0 then failwith "errors" else
let idxs = iterate (List.length set) x k in
List.map (List.nth set) (List.sort (-) idxs)
A small and simple combinations iterator
The following two algorithms are provided for didactic purposes. They implement an iterator and (a more general) folder overall combinations.
They are as fast as possible, having the complexity O(nCk). The memory consumption is bound by k
.
We will start with the iterator, which will call a user provided function for each combination
let iter_combs n k f =
let rec iter v s j =
if j = k then f v
else for i = s to n - 1 do iter (i::v) (i+1) (j+1) done in
iter [] 0 0
A more general version will call the user provided function along with the state variable, starting from the initial state. Since we need to pass the state between different states we won't use the for-loop, but instead, use recursion,
let fold_combs n k f x =
let rec loop i s c x =
if i < n then
loop (i+1) s c @@
let c = i::c and s = s + 1 and i = i + 1 in
if s < k then loop i s c x else f c x
else x in
loop 0 0 [] x
all combinations of k elements out of n
In C++ given the following routine:
template <typename Iterator>
inline bool next_combination(const Iterator first, Iterator k, const Iterator last)
{
/* Credits: Thomas Draper */
if ((first == last) || (first == k) || (last == k))
return false;
Iterator itr1 = first;
Iterator itr2 = last;
++itr1;
if (last == itr1)
return false;
itr1 = last;
--itr1;
itr1 = k;
--itr2;
while (first != itr1)
{
if (*--itr1 < *itr2)
{
Iterator j = k;
while (!(*itr1 < *j)) ++j;
std::iter_swap(itr1,j);
++itr1;
++j;
itr2 = k;
std::rotate(itr1,j,last);
while (last != j)
{
++j;
++itr2;
}
std::rotate(k,itr2,last);
return true;
}
}
std::rotate(first,k,last);
return false;
}
You can then proceed to do the following:
// 9-choose-3
std::string s = "123456789";
std::size_t k = 3;
do
{
std::cout << std::string(s.begin(),s.begin() + k) << std::endl;
}
while(next_combination(s.begin(),s.begin() + k,s.end()));
Or for a std::vector of int's:
// 5-choose-3
std::size_t n = 5;
std::size_t k = 3;
std::vector<int> ints;
for (int i = 0; i < n; ints.push_back(i++));
do
{
for (int i = 0; i < k; ++i)
{
std::cout << ints[i];
}
std::cout << "\n";
}
while(next_combination(ints.begin(),ints.begin() + k,ints.end()));
Algorithm to return all combinations of k out of n as well as corresponding complement list
You can use a basic combinations-algorithm, with the exception that it returns a tuple: the elements that are "in" and those that are "out". Then just recursively generate combinations for the rest of the list and add the first element to either the "in" or the "out" list, respectively.
Here's some Python code:
def comb_and_comp(lst, n):
# no combinations
if len(lst) < n:
return
# trivial 'empty' combination
if n == 0 or lst == []:
yield [], lst
else:
first, rest = lst[0], lst[1:]
# combinations that contain the first element
for in_, out in comb_and_comp(rest, n - 1):
yield [first] + in_, out
# combinations that do not contain the first element
for in_, out in comb_and_comp(rest, n):
yield in_, [first] + out
This will create both the "in" and "out" lists in one go, instead of creating the complement in a second pass.
Fast algorithm for generating all combinations (n choose k) based on an initial input
I think the biggest problem you're going to encounter is not calculation but disk write speed or memory size. By the way, it seems you wrongly determined number of combinations for n = 250
and k = 6
. Did you use uint64_t
? My number is 244 140 625 000 000
.
So for this number you need ~1.4 Petabyte
(~1400 Tb
) of memory. This is your main problem. If you have that much big hard drive, you'd better use memory mapping, when write. You may consider using several threads to write: each will write its own chunk of memory.
So, I think you should think of other ways for providing combinations for solving your actual goal.
A naive solution. Change std::ofstream
with memory mapped object.
int main()
{
const constexpr uint8_t N = 250;
const constexpr uint8_t K = 6;
const constexpr uint64_t CombinationsCount = std::pow(N, K);
using TCombination = std::array<uint8_t, K>;
std::cout << CombinationsCount << std::endl;
std::ofstream file("output.txt");
TCombination c;
for (uint64_t i = 0; i < CombinationsCount; ++i)
{
auto I = i;
for (auto j = 0; j < K; ++j)
{
c[j] = I % N;
I /= N;
file << (int)c[j];
}
file << std::endl;
}
}
If you want to use threads, just divide CombinationsCount
with cores number and give each thread a task to write from specific address of memory (offset).
You asked for a function-like solution. You can pass different names of files and use different threads. Buy you still need to use memory mapping.
const constexpr uint8_t N = 250;
const constexpr uint8_t K = 6;
const constexpr uint64_t CombinationsCount = std::pow(N, K);
using TCombination = std::array<uint8_t, K>;
void Generate(uint64_t start, uint64_t size, const char* fileName)
{
std::ofstream file(fileName);
TCombination c;
for (uint64_t i = start; i < start + size; ++i)
{
auto I = i;
for (auto j = 0; j < K; ++j)
{
c[j] = I % N;
I /= N;
file << (int)c[j];
}
file << std::endl;
}
}
int main()
{
std::cout << CombinationsCount << std::endl;
unsigned int threadsNum = std::thread::hardware_concurrency();
std::vector<std::thread> workers;
for (size_t i = 0; i < threadsNum; ++i)
workers.emplace_back(
Generate,
i * CombinationsCount / threadsNum,
CombinationsCount / threadsNum,
(std::string("output") + std::to_string(i)).c_str());
for (size_t i = 0; i < threadsNum; ++i)
workers[i].join();
}
Fastest solution for all possible combinations, taking k elements out of n possible with k 2 and n large
Preamble
As yourself noticed in your question, it would be nice not to have nchoosek
to return all possible combinations at the same time but rather to enumerate them one by one in order not to explode memory when n
becomes large. So something like:
enumerator = CombinationEnumerator(k, n);
while(enumerator.MoveNext())
currentCombination = enumerator.Current;
...
end
Here is an implementation of such enumerator as a Matlab class. It is based on classic IEnumerator<T>
interface in C# / .NET and mimics the subfunction combs
in nchoosek
(the unrolled way):
%
% PURPOSE:
%
% Enumerates all combinations of length 'k' in a set of length 'n'.
%
% USAGE:
%
% enumerator = CombinaisonEnumerator(k, n);
% while(enumerator.MoveNext())
% currentCombination = enumerator.Current;
% ...
% end
%
%% ---
classdef CombinaisonEnumerator < handle
properties (Dependent) % NB: Matlab R2013b bug => Dependent must be declared before their get/set !
Current; % Gets the current element.
end
methods
function [enumerator] = CombinaisonEnumerator(k, n)
% Creates a new combinations enumerator.
if (~isscalar(n) || (n < 1) || (~isreal(n)) || (n ~= round(n))), error('`n` must be a scalar positive integer.'); end
if (~isscalar(k) || (k < 0) || (~isreal(k)) || (k ~= round(k))), error('`k` must be a scalar positive or null integer.'); end
if (k > n), error('`k` must be less or equal than `n`'); end
enumerator.k = k;
enumerator.n = n;
enumerator.v = 1:n;
enumerator.Reset();
end
function [b] = MoveNext(enumerator)
% Advances the enumerator to the next element of the collection.
if (~enumerator.isOkNext),
b = false; return;
end
if (enumerator.isInVoid)
if (enumerator.k == enumerator.n),
enumerator.isInVoid = false;
enumerator.current = enumerator.v;
elseif (enumerator.k == 1)
enumerator.isInVoid = false;
enumerator.index = 1;
enumerator.current = enumerator.v(enumerator.index);
else
enumerator.isInVoid = false;
enumerator.index = 1;
enumerator.recursion = CombinaisonEnumerator(enumerator.k - 1, enumerator.n - enumerator.index);
enumerator.recursion.v = enumerator.v((enumerator.index + 1):end); % adapt v (todo: should use private constructor)
enumerator.recursion.MoveNext();
enumerator.current = [enumerator.v(enumerator.index) enumerator.recursion.Current];
end
else
if (enumerator.k == enumerator.n),
enumerator.isInVoid = true;
enumerator.isOkNext = false;
elseif (enumerator.k == 1)
enumerator.index = enumerator.index + 1;
if (enumerator.index <= enumerator.n)
enumerator.current = enumerator.v(enumerator.index);
else
enumerator.isInVoid = true;
enumerator.isOkNext = false;
end
else
if (enumerator.recursion.MoveNext())
enumerator.current = [enumerator.v(enumerator.index) enumerator.recursion.Current];
else
enumerator.index = enumerator.index + 1;
if (enumerator.index <= (enumerator.n - enumerator.k + 1))
enumerator.recursion = CombinaisonEnumerator(enumerator.k - 1, enumerator.n - enumerator.index);
enumerator.recursion.v = enumerator.v((enumerator.index + 1):end); % adapt v (todo: should use private constructor)
enumerator.recursion.MoveNext();
enumerator.current = [enumerator.v(enumerator.index) enumerator.recursion.Current];
else
enumerator.isInVoid = true;
enumerator.isOkNext = false;
end
end
end
end
b = enumerator.isOkNext;
end
function [] = Reset(enumerator)
% Sets the enumerator to its initial position, which is before the first element.
enumerator.isInVoid = true;
enumerator.isOkNext = (enumerator.k > 0);
end
function [c] = get.Current(enumerator)
if (enumerator.isInVoid), error('Enumerator is positioned (before/after) the (first/last) element.'); end
c = enumerator.current;
end
end
properties (GetAccess=private, SetAccess=private)
k = [];
n = [];
v = [];
index = [];
recursion = [];
current = [];
isOkNext = false;
isInVoid = true;
end
end
We can test implementation is ok from command window like this:
>> e = CombinaisonEnumerator(3, 6);
>> while(e.MoveNext()), fprintf(1, '%s\n', num2str(e.Current)); end
Which returns as expected the following n!/(k!*(n-k)!)
combinations:
1 2 3
1 2 4
1 2 5
1 2 6
1 3 4
1 3 5
1 3 6
1 4 5
1 4 6
1 5 6
2 3 4
2 3 5
2 3 6
2 4 5
2 4 6
2 5 6
3 4 5
3 4 6
3 5 6
4 5 6
Implementation of this enumerator may be further optimized for speed, or by enumerating combinations in an order more appropriate for your case (e.g., test some combinations first rather than others) ... Well, at least it works! :)
Problem solving
Now solving your problem is really easy:
n = 100;
m = 25;
matrix = rand(n, m);
k = n;
cont = true;
while(cont && (k >= 1))
e = CombinationEnumerator(k, n);
while(cont && e.MoveNext());
cont = f(matrix(e.Current(:), :)) ~= 1;
end
if (cont), k = k - 1; end
end
Related Topics
Should I Use Virtual, Override, or Both Keywords
Difference Between Rdtscp, Rdtsc:Memory and Cpuid/Rdtsc
Implementing Multiple Interfaces in C++
Std::String Length() and Size() Member Functions
Allow for Range-Based for with Enum Classes
Visual Studio Code Formatting for "{ }"
How to Set Timeout for Std::Cin
Random Output Different Between Implementations
How to Use Glortho() in Opengl
Is It a Good Practice to Define C++ Functions Inside Header Files
How to Make Reading from 'Std::Cin' Timeout After a Particular Amount of Time
How to Store Functional Objects with Different Signatures in a Container
Why Do We Actually Need Private or Protected Inheritance in C++
How to Make a Multiple-Read/Single-Write Lock from More Basic Synchronization Primitives
Boost Thread Throwing Exception "Thread_Resource_Error: Resource Temporarily Unavailable"
What Happens When a Computer Program Runs