Efficient Cartesian Product Algorithm

Efficient Cartesian Product algorithm

The complexity of cartesian product is O(n2), there is no shortcut.

In specific cases, the order you iterate your two axis is important. For example, if your code is visiting every slot in an array — or every pixel in an in image — then you should try to visit the slots in natural order. An image is typically laid out in ‘scanlines’, so the pixels on any Y are adjacent. Therefore, you should iterate over the Y on the outer loop and the X on the inner.

Whether you need the cartesian product or wherther is a more efficient algorithm depends on the problem you are solving.

Improving the performance of cartesian product of multiple lists

A few comments:

  • If you think about it, what car_multiple_sets is doing is iterating on its parameter lists. You're doing that using recursion, but iterating on a list can also be done with a for-loop. And it so happens that recursion is somewhat slow and memory-inefficient in python, so for-loops are preferable.

  • You don't need to convert to str to group the ints together. You can use tuples. That's precisely what they're for. Replace str(x)+str(y) with (x,y) to get a pair of two integers instead of a string.

def car_two_sets(a, b, unpack=False):
if unpack:
return [(*x, y) for x in a for y in b]
else:
return [(x, y) for x in a for y in b]

def car_multiple_sets(lists):
if len(lists) == 0:
return [()] # empty Cartesian product has one element, the empty tuple
elif len(lists) == 1:
return list(zip(lists[0])) # list of "1-uples" for homogeneity
else:
result = car_two_sets(lists[0], lists[1])
for l in lists[2:]:
result = car_two_sets(result, l, unpack=True)
return result

print( car_multiple_sets((range(3), 'abc', range(2))) )
# [(0, 'a', 0), (0, 'a', 1), (0, 'b', 0), (0, 'b', 1), (0, 'c', 0), (0, 'c', 1),
# (1, 'a', 0), (1, 'a', 1), (1, 'b', 0), (1, 'b', 1), (1, 'c', 0), (1, 'c', 1),
# (2, 'a', 0), (2, 'a', 1), (2, 'b', 0), (2, 'b', 1), (2, 'c', 0), (2, 'c', 1)]

Efficient cartesian product excluding items

Okay, this would be a bit more efficient, and you can use generator like this, and take your values as needed:

def get_solution(uniques, length, constraint):
if length == 1:
for u in uniques[uniques <= constraint + 1e-8]:
yield u
else:
for u in uniques[uniques <= constraint + 1e-8]:
for s in get_solution(uniques, length - 1, constraint - u):
yield np.hstack((u, s))
g = get_solution(unique_values, 4, 1)
for _ in range(5):
print(next(g))

prints

[0. 0. 0. 0.]
[0. 0. 0. 0.1]
[0. 0. 0. 0.2]
[0. 0. 0. 0.3]
[0. 0. 0. 0.4]

Comparing with your function:

def get_solution_product(uniques, length, constraint):
return np.array([p for p in product(uniques, repeat=length) if np.sum(p) <= constraint + 1e-8])
%timeit np.vstack(list(get_solution(unique_values, 5, 1)))
346 ms ± 29.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit get_solution_product(unique_values, 5, 1)
2.94 s ± 256 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Cartesian Product of n-sets in C

You can implement Cartesian products with an odometer-like counter:

  • Set all current items to the first items in each list.
  • Process the current state, e.g. print it.
  • Advance the first list. If you reach the end, reset the state to the beginning of that list and advance the next list and so on. If you reach the end of the last list, you are done.

So assuming that you have read the information from the files into NULL terminated lists of strings, you could do the following:

#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>

int main(int argc, char *argv[])
{
const char *adj1[] = {"little", "big", "huge", NULL};
const char *adj2[] = {"red", "yellow", "grey", "orange", NULL};
const char *noun[] = {"house", "car", NULL};

const char **list[3] = {adj1, adj2, noun};
const char **curr[3];

unsigned n = sizeof(list) / sizeof(*list);
unsigned count = 0;
bool done = false;

for (unsigned i = 0; i < n; i++) {
curr[i] = list[i];
}

while (!done) {
unsigned i = 0;

printf("%s %s %s\n", *curr[0], *curr[1], *curr[2]);
count++;

curr[i]++;

while (*curr[i] == NULL) {
curr[i] = list[i]; // back to beginning
i++; // move to next list

if (i == n) { // stop when the last list is exhausted
done = true;
break;
}

curr[i]++; // ... and increment that
}
}

printf("%u combinations.\n", count);

return 0;
}

This approach is not limited to three lists. (If you know exactly how many lists you have, you could use nested loops, of course.)

Cartesian product with checking condition when adding a new element to the set and without a repeat of elements

A cartesian product without elements repetition in any given result set and without permutations:

static void combinations(List<List<string>> srs, int[] size, List<string> curr, int index)
{
if (index == srs.Count())
{
int s = curr.Count();
string[] d = new string[s];
List<string> x = d.ToList();
x.AddRange(curr);
x.RemoveAll(item => item == null);
dest.Add(x);
string res = "";
foreach (var item in x)
{
res += item + ", ";
}
//dest.add(d);
Console.WriteLine(res);
}
else
{
for (int i = 0; i < size[index]; i++)
{
int n = size[index];
string[] dim = new string[n];
dim = srs[index].ToArray();
curr.Add(dim[i]);
index++;
combinations(srs, size, curr, index);
index--;
curr.RemoveAt(curr.Count() - 1);
}
}
}

public static void init()
{
string[] confidence = new string[] { "High", "Low Variable" };
string[] goalspec = new string[] { "High", "Low Some" };
string[] quality = new string[] { "High", "Low Variable" };
string[] tansSkills = new string[] { "High", "Low some" };
string[] sitLead = new string[] { "S1", "S2", "S3", "S4" };
string[] devLev = new string[] { "D1", "D2", "D3", "D4" };
string[] statReason = new string[] {"In backlog", "Canceled", "Completed" };
List<List<string>> srs = new List<List<string>>
{
confidence.ToList(),
goalspec.ToList(),
quality.ToList(),
tansSkills.ToList(),
sitLead.ToList(),
devLev.ToList(),
statReason.ToList()
};
number_elem = srs.Count();
int[] size = new int[number_elem];
size[0] = confidence.Count();
size[1] = goalspec.Count();
size[2] = quality.Count();
size[3] = tansSkills.Count();
size[4] = sitLead.Count();
size[5] = devLev.Count();
size[6] = statReason.Count();
dest = new List<List<string>>();
List<string> curr = new List<string>();
combinations(srs, size, curr, 0);
}

Generate cartesian product in decreasing sum order

If the problem instance has N sets to cross, then you can think of the tuples in the product as an N-dimensional "rectangular" grid, where each tuple corresponds to a grid element. You'll start by emitting the max-sum tuple [9,4,5], which is at one corner of the grid.

You'll keep track of a "candidate set" of un-emitted tuples that are one smaller on each dimension with respect to at least one already emitted. If it helps, you can visualize the already-emitted tuples as a "solid" in the grid. The candidate set is all tuples that touch the solid's surface.

You'll repeatedly choose the next tuple to emit from the candidate set, then update the set with the neighbors of the newly emitted tuple. When the set is empty, you're done.

After emitting [9,4,5], the candidate set is

[8,4,5]  (one smaller on first dimension)
[9,2,5] (one smaller on second dimension)
[9,4,1] (one smaller on third dimension)

Next emit the one of these with the largest sum. That's [8,4,5]. Adjacent to that are

[0,4,5], [8,2,5], [8,4,1]

Add those to the candidate set, so we now have

[9,2,5], [9,4,1], [0,4,5], [8,2,5], [8,4,1]

Again pick the highest sum. That's [9,2,5]. Adjacent are

[8,2,5], [9,2,1]. 

So the new candidate set is

[9,4,1], [0,4,5], [8,2,5], [8,4,1], [9,2,1]

Note [8,2,5] came up again. Don't duplicate it.

This time the highest sum is [8,2,5]. Adjacent are

[0,2,5], [8,2,1]

At this point you should have the idea.

Use a max heap for the candidate set. Then finding the tuple with max sum requires O(log |C|) where C is the candidate set.

How big can the set get? Interesting question. I'll let you think about it. For 3 input sets as in your example, it's

|C| = O(|A1||A2| + |A2||A3| + |A1||A3|)

So the cost of emitting each tuple is

O(log(|A1||A2| + |A2||A3| + |A1||A3|))

If the sets have size at most N, then this is O(log 3 N^2) = O(log 3 + 2 log N) = O(log N).

There are |A1||A2||A3| tuples to emit, which is O(N^3).

The simpler algorithm of generating all tuples and sorting is O(log N^3) = O(3 log N) = O(log N). It's very roughly only 50% slower, which is asymptotically the same. The main advantage of the more complex algorithm is that it saves O(N) space. The heap/priority queue size is only O(N^2).

Here is a quick Java implementation meant to keep the code size small.

import java.util.Arrays;
import java.util.HashSet;
import java.util.PriorityQueue;
import java.util.Set;

public class SortedProduct {
final SortedTuple [] tuples;
final NoDupHeap candidates = new NoDupHeap();

SortedProduct(SortedTuple [] tuple) {
this.tuples = Arrays.copyOf(tuple, tuple.length);
reset();
}

static class SortedTuple {
final int [] elts;

SortedTuple(int... elts) {
this.elts = Arrays.copyOf(elts, elts.length);
Arrays.sort(this.elts);
}

@Override
public String toString() {
return Arrays.toString(elts);
}
}

class RefTuple {
final int [] refs;
final int sum;

RefTuple(int [] index, int sum) {
this.refs = index;
this.sum = sum;
}

RefTuple getSuccessor(int i) {
if (refs[i] == 0) return null;
int [] newRefs = Arrays.copyOf(this.refs, this.refs.length);
int j = newRefs[i]--;
return new RefTuple(newRefs, sum - tuples[i].elts[j] + tuples[i].elts[j - 1]);
}

int [] getTuple() {
int [] val = new int[refs.length];
for (int i = 0; i < refs.length; ++i)
val[i] = tuples[i].elts[refs[i]];
return val;
}

@Override
public int hashCode() {
return Arrays.hashCode(refs);
}

@Override
public boolean equals(Object o) {
if (o instanceof RefTuple) {
RefTuple t = (RefTuple) o;
return Arrays.equals(refs, t.refs);
}
return false;
}
}

RefTuple getInitialCandidate() {
int [] index = new int[tuples.length];
int sum = 0;
for (int j = 0; j < index.length; ++j)
sum += tuples[j].elts[index[j] = tuples[j].elts.length - 1];
return new RefTuple(index, sum);
}

final void reset() {
candidates.clear();
candidates.add(getInitialCandidate());
}

int [] getNext() {
if (candidates.isEmpty()) return null;
RefTuple next = candidates.poll();
for (int i = 0; i < tuples.length; ++i) {
RefTuple successor = next.getSuccessor(i);
if (successor != null) candidates.add(successor);
}
return next.getTuple();
}

/** A max heap of indirect ref tuples that ignores addition of duplicates. */
static class NoDupHeap {
final PriorityQueue<RefTuple> heap =
new PriorityQueue<>((a, b) -> Integer.compare(b.sum, a.sum));
final Set<RefTuple> set = new HashSet<>();

void add(RefTuple t) {
if (set.contains(t)) return;
heap.add(t);
set.add(t);
}

RefTuple poll() {
RefTuple t = heap.poll();
set.remove(t);
return t;
}

boolean isEmpty() {
return heap.isEmpty();
}

void clear() {
heap.clear();
set.clear();
}
}

public static void main(String [] args) {
SortedTuple [] tuples = {
new SortedTuple(9, 8, 0),
new SortedTuple(4, 2),
new SortedTuple(5, 1),
};
SortedProduct product = new SortedProduct(tuples);
for (;;) {
int[] next = product.getNext();
if (next == null) break;
System.out.println(Arrays.toString(next));
}
}
}


Related Topics



Leave a reply



Submit