Iteratively compute the Cartesian product of an arbitrary number of sets
I've written a solution that doesn't require you to fill up a large collection in memory. Unfortunately, the code required is hundreds of lines long. You may have to wait until it appears in the Guava project (https://github.com/google/guava), which I hope will be by the end of the year. Sorry. :(
Note that you may not need such a utility if the number of sets you're cartesian-producting is a fixed number known at compile time -- you could just use that number of nested for loops.
EDIT: the code is released now.
Sets.cartesianProduct()
I think you'll be very happy with it. It only creates the individual lists as you ask for them; doesn't fill up memory with all MxNxPxQ of them.
If you want to inspect the source, it's here.
Enjoy!
Cartesian product of an arbitrary number of sets
Edit: Previous solutions for two sets removed. See edit history for details.
Here is a way to do it recursively for an arbitrary number of sets:
public static Set<Set<Object>> cartesianProduct(Set<?>... sets) {
if (sets.length < 2)
throw new IllegalArgumentException(
"Can't have a product of fewer than two sets (got " +
sets.length + ")");
return _cartesianProduct(0, sets);
}
private static Set<Set<Object>> _cartesianProduct(int index, Set<?>... sets) {
Set<Set<Object>> ret = new HashSet<Set<Object>>();
if (index == sets.length) {
ret.add(new HashSet<Object>());
} else {
for (Object obj : sets[index]) {
for (Set<Object> set : _cartesianProduct(index+1, sets)) {
set.add(obj);
ret.add(set);
}
}
}
return ret;
}
Note that it is impossible to keep any generic type information with the returned sets. If you knew in advance how many sets you wanted to take the product of, you could define a generic tuple to hold that many elements (for instance Triple<A, B, C>
), but there is no way to have an arbitrary number of generic parameters in Java.
Cartesian product of a variable number of sets each of variable size
Create an array with the maximum value per digit; in your example, that would be [2,3,4]
.
Set the maximum of digit j
to 1, so you get [2,1,4]
.
Create a first combination [1,1,1]
and add it to the result. Then repeatedly increment the combination and add it to the result.
To increment the combination, increment the last digit; if that digit equals the maximum, set it to 1
and increment the previous digit, and so on. If the first digit needs to be incremented and it's already set to the maximum value, you've found all combinations.
[1,1,1] -> [1,1,2] -> [1,1,3] -> [1,1,4] ->
4 is maximum for last digit, so set it to 1 and increment previous digit ->
1 is maximum for second digit, so set it to 1 and increment previous digit ->
[2,1,1] -> [2,1,2] -> [2,1,3] -> [2,1,4] ->
4 is maximum for last digit, so set it to 1 and increment previous digit ->
1 is maximum for second digit, so set it to 1 and increment previous digit ->
2 is maximum for first digit, so all combinations have been found.
How can I compute a Cartesian product iteratively?
1) Create a list of indexes into the respective lists, initialized to 0, i.e:
indexes = [0,0,0,0,0,0]
2) Yield the appropriate element from each list (in this case the first).
3) Increase the last index by one.
4) If the last index equals the length of the last list, reset it to zero and carry one. Repeat this until there is no carry.
5) Go back to step 2 until the indexes wrap back to [0,0,0,0,0,0]
It's similar to how counting works, except the base for each digit can be different.
Here's an implementation of the above algorithm in Python:
def cartesian_product(aListOfList):
indexes = [0] * len(aListOfList)
while True:
yield [l[i] for l,i in zip(aListOfList, indexes)]
j = len(indexes) - 1
while True:
indexes[j] += 1
if indexes[j] < len(aListOfList[j]): break
indexes[j] = 0
j -= 1
if j < 0: return
Here is another way to implement it using modulo tricks:
def cartesian_product(aListOfList):
i = 0
while True:
result = []
j = i
for l in aListOfList:
result.append(l[j % len(l)])
j /= len(l)
if j > 0: return
yield result
i += 1
Note that this outputs the results in a slightly different order than in your example. This can be fixed by iterating over the lists in reverse order.
Compute Cartesian Product Fast like DBMS
Basically, no - if you want a cartesian product of two sets of sizes n,m
- the size of the cartesian product is n*m
- and you need O(nm)
algorithm to generate it.
However, for data base languages, when you do things like 'join', you sometimes do not need to generate the entire join to get the answer - if all you are going to do later on is just count number of items, or sum.
This can be shown in Pig Latin's COGROUP operator, which only creates an implicit join, in order to later just use the sizes or sums - and not the real cartesian product. This potentially could be much more efficient if used properly in the right situation.
Efficiently count sets in a cartesian product that sum above a specific number
Here is a solution. The idea is to calculate all possible sums of our values you can obtain with n loops, counting the different possible sums, and counting together all sums that are larger than the threshold.
Then, we can generate all possible sums for n+1 loops by adding our values to the previous sums. We can hope that the number of different possible sums won't grow too large, as we add many times the same values, and we regroup all sums larger than the threshold.
from collections import Counter
def all_sums(values, threshold, previous_sums = None):
"""
values must be sorted
previous_sums is a Counter of previously obtained possible sums
Returns a Counter of all possible sums of values and the previous sums
"""
if not previous_sums:
previous_sums = Counter({0:1})
new = Counter()
for existing_sum, ex_sum_count in sorted(previous_sums.items()):
for index, val in enumerate(values):
total = existing_sum + val
if total < threshold:
# With the current value, we have found ex_sum_count
# ways to obtain that total
new.update({total: ex_sum_count})
else:
# We don't need the exact sum, as anything we could
# later add to it will be over the threshold.
# We count them under the value = threshold
# As 'values' is sorted, all subsequent values will also give
# a sum over the threshold
values_left = len(values) - index
new.update({threshold: values_left * ex_sum_count})
break
return new
def count_sums(values, threshold, repeat):
"""
values must be sorted!
Recursively calculates the possible sums of 'repeat' values,
counting together all values over 'threshold'
"""
if repeat == 1:
return all_sums(values, threshold, previous_sums=None)
else:
return all_sums(values, threshold, previous_sums=count_sums(values, threshold, repeat=repeat-1))
Let's try it on your example:
loops = 10
results = [4, 2.75, 2.75, 1.5, 1.5, 1.5, 0]
threshold = loops * 2
values = sorted(results)
sums = count_sums(values, threshold, repeat=loops)
print(sums)
# Counter({20: 137401794, 19.75: 16737840, 18.25: 14016240, 18.5: 13034520, 19.5: 12904920,
# 17.0: 12349260, 15.75: 8573040, 17.25: 8048160, 15.5: 6509160, 16.75: 6395760, 14.25: 5171040,
# 18.0: 5037480, 14.5: 4461480, 16: 3739980, 18.75: 3283020, 19.25: 3220800, 13.0: 3061800,
# 14.0: 2069550, 12.75: 1927800, 15.25: 1708560, 13.25: 1574640, 17.5: 1391670, 11.5: 1326780,
# 11.75: 1224720, 14.75: 1182660, 16.5: 1109640, 10.25: 612360, 17.75: 569520, 11.25: 453600,
# 16.25: 444060, 12.5: 400680, 10.0: 374220, 12: 295365, 13.75: 265104, 10.5: 262440, 19.0: 229950,
# 13.5: 204390, 8.75: 204120, 15.0: 192609, 9.0: 153090, 8.5: 68040, 9.75: 65520, 7.5: 61236,
# 7.25: 45360, 11.0: 44940, 12.25: 21840, 6.0: 17010, 7.0: 7560, 5.75: 6480, 8.25: 5280, 4.5: 3240,
# 9.5: 2520, 10.75: 720, 4.25: 540, 5.5: 450, 3.0: 405, 6.75: 180, 8: 45, 1.5: 30, 2.75: 20, 4: 10, 0: 1})
number_of_sums = len(results) ** loops
# 282475249
good = sums[threshold]
# 137401794
bad = number_of_sums - good
# 145073455
I timed it, it takes about 9 ms on my rather old machine.
And with some other data: 10 different values, 20 loops:
loops = 20
results = [4, 2.75, 2.45, 1.5, 1.3, 0.73, 0.12, 1.4, 1.5, 0]
threshold = loops * 2
values = sorted(results)
sums = count_sums(values, threshold, repeat=loops)
number_of_sums = len(results) ** loops
good = sums[threshold]
bad = number_of_sums - good
print(good)
print(bad)
# 5440943363190360728
# 94559056636809639272
which I obtain in less than 12 seconds.
How to calculate the Cartesian Power of a list in R
Thanks @G.Grothendieck for solving the question!
s <- c(1, 2, 3)
n <- 2
do.call("expand.grid", rep(list(s), n))
This is the R code that gives the correct results.
Lazy Cartesian product of arrays (arbitrary nested loops)
Coincidentally working on the same thing over the weekend. I was looking to find alternative implementations to my [].every
-based algo which turned out to have abyssmal performance in Firefox (but screams in Chrome -- more than twice as fast as the next).
The end result is http://jsperf.com/lazy-cartesian-product/19 . It's similar to Tomalak's approach but there is only one arguments array which is mutated as the carets move instead of being generated each time.
I'm sure it could be improved further by using the clever maths in the other algos. I don't quite understand them though, so I leave it to others to try.
EDIT: the actual code, same interface as Tomalak's. I like this interface because it could be break
ed anytime. It's only slightly slower than if the loop is inlined in the function itself.
var xp = crossProduct([
[2,3,4,5],['angry','happy'],
['monkeys','anteaters','manatees']]);
while (xp.next()) xp.do(console.log, console);
function crossProduct(sets) {
var n = sets.length, carets = [], args = [];
function init() {
for (var i = 0; i < n; i++) {
carets[i] = 0;
args[i] = sets[i][0];
}
}
function next() {
if (!args.length) {
init();
return true;
}
var i = n - 1;
carets[i]++;
if (carets[i] < sets[i].length) {
args[i] = sets[i][carets[i]];
return true;
}
while (carets[i] >= sets[i].length) {
if (i == 0) {
return false;
}
carets[i] = 0;
args[i] = sets[i][0];
carets[--i]++;
}
args[i] = sets[i][carets[i]];
return true;
}
return {
next: next,
do: function (block, _context) {
return block.apply(_context, args);
}
}
}
Related Topics
JSON - Iterate Through JSONarray
Observer Is Deprecated in Java 9. What Should We Use Instead of It
How to Change the Highlight Color of a Focused Jcombobox
Java: Rationale of the Cloneable Interface
How to Use Urlclassloader to Load a *.Class File
Java Server with Multiclient Communication
How to Use 3Rd Party Library in Java9 Module
Execute Jar File with Multiple Classpath Libraries from Command Prompt
Maximum Size of Hashset, Vector, Linkedlist
How to Use Selenium Webdriver Without Informing the Document That It Is Controlled by Webdriver
Difference Between Wait and Blocked Thread States
Why Is Each Public Class in a Separate File
Filling Combobox from Database by Using Hibernate in Java
Converting to Upper and Lower Case in Java
Tomcat in Idea. War Exploded: Server Is Not Connected. Deploy Is Not Available