Best Way to Find the Intersection of Multiple Sets

Best way to find the intersection of multiple sets?

From Python version 2.6 on you can use multiple arguments to set.intersection(), like

u = set.intersection(s1, s2, s3)

If the sets are in a list, this translates to:

u = set.intersection(*setlist)

where *a_list is list expansion

Note that set.intersection is not a static method, but this uses the functional notation to apply intersection of the first set with the rest of the list. So if the argument list is empty this will fail.

What is the best way to find ALL possible intersections of multiple sets?

If you're only looking for intersections of two sets, you can simply do nested for loops:

Set1 = {1,2,3,4,5}
Set2 = {4,5,6,7}
Set3 = {6,7,8,9,10}
Set4 = {1,8,9,15}
sets = [Set1,Set2,Set3,Set4]
for i,s1 in enumerate(sets[:-1]):
for j,s2 in enumerate(sets[i+1:]):
print(f"Set{i+1} and Set{i+j+2} = {s1&s2}")

# Set1 and Set2 = {4, 5}
# Set1 and Set3 = set()
# Set1 and Set4 = {1}
# Set2 and Set3 = {6, 7}
# Set2 and Set4 = set()
# Set3 and Set4 = {8, 9}

If you're looking for intersections of any number of these sets then you can use combinations() from itertools to produce a power set of indices and perform the intersection for each combination:

from itertools import combinations
for comboSize in range(2,len(sets)):
for combo in combinations(range(len(sets)),comboSize):
intersection = sets[combo[0]]
for i in combo[1:]: intersection = intersection & sets[i]
print(" and ".join(f"Set{i+1}" for i in combo),"=",intersection)

Set1 and Set2 = {4, 5}
Set1 and Set3 = set()
Set1 and Set4 = {1}
Set2 and Set3 = {6, 7}
Set2 and Set4 = set()
Set3 and Set4 = {8, 9}
Set1 and Set2 = {4, 5}
Set1 and Set3 = set()
Set1 and Set4 = {1}
Set2 and Set3 = {6, 7}
Set2 and Set4 = set()
Set3 and Set4 = {8, 9}
Set1 and Set2 and Set3 = set()
Set1 and Set2 and Set4 = set()
Set1 and Set3 and Set4 = set()
Set2 and Set3 and Set4 = set()
  • A "power set" is when you take all possible combinations of various sizes from a set of values. itertools documentation has a recipe for that.
  • In this case we are only interested in combinations of 2,3,..., n-1 sizes. Hence the loop on comboSize in range(2,len(sets))
  • For each of these sizes, we get combination of indexes in the sets list using itertool's combinations function. e.g. for comboSize=3 and 4 items in sets, combo will get: (0, 1, 2) (0, 1, 3) (0, 2, 3) (1, 2, 3)
  • The intersection set will intersect all the sets at these indexes using the & operator (set intersection) starting with the first index (combo[0]) and intersecting the remaining indexes (combo[1:]) into a single set.
  • The print function joins the set identifiers (f"Set{i+1}") with an " and " string and prints the resulting intersection on the same line.
  • f"Set{i+1}" is a format string that replaces {i+1} with the set index from combo (+1).

Intersection of multiple sets iteratively

You might itertools.accumulate for this task as follows

import itertools
letters = [{"a","b","c","d","e"},{"b","c","d","e","f"},{"b","c","d","f","g"},{"a","b","c","f","g"},{"a","c","d","e","h"}]
intersections = list(itertools.accumulate(letters, set.intersection))
print(intersections)

output

[{'e', 'a', 'b', 'c', 'd'}, {'b', 'e', 'c', 'd'}, {'b', 'c', 'd'}, {'b', 'c'}, {'c'}]

Note first element is {'e', 'a', 'b', 'c', 'd'} rather than None, so you would need to alter intersections in that regard.

Intersection of multiple sets (as collections)

public static <T> Collection<T> getIntersection(Collection<T>... sets) {

Collection<T> firstSet;

if (sets == null || sets.length == 0 || (firstSet = sets[0]) == null)
return Collections.<T>emptySet();

Collection<T> intersection = new HashSet(firstSet);

for (Collection c : sets) {
if (c == null)
return Collections.<T>emptySet();
intersection.retainAll(c);
}
return intersection;
}

Python -Intersection of multiple lists?

for 2.4, you can just define an intersection function.

def intersect(*d):
sets = iter(map(set, d))
result = sets.next()
for s in sets:
result = result.intersection(s)
return result

for newer versions of python:

the intersection method takes an arbitrary amount of arguments

result = set(d[0]).intersection(*d[1:])

alternatively, you can intersect the first set with itself to avoid slicing the list and making a copy:

result = set(d[0]).intersection(*d)

I'm not really sure which would be more efficient and have a feeling that it would depend on the size of the d[0] and the size of the list unless python has an inbuilt check for it like

if s1 is s2:
return s1

in the intersection method.

>>> d = [[1,2,3,4], [2,3,4], [3,4,5,6,7]]
>>> set(d[0]).intersection(*d)
set([3, 4])
>>> set(d[0]).intersection(*d[1:])
set([3, 4])
>>>

How to find the list of all intersections of multiple sets in Java?

What would be the best way to do this in Java?

The first part you are describing is a powerset (as I edited your question to include last week). You would then get the intersection for each set of sets in the powerset.

Because you are doing a powerset of sets, rather than a simple powerset of something like integers, the implementation is going to be a bit more involved.

Extra Credit

I wrote a basic implementation of your requirements, as an example of how you might do it. All of the methods and types in this example are members of the Example class.

Example class with just its main method which demonstrates the working code. I'm sure you'll forgive my using deprecated Date constructors for the demonstration.

import java.text.*;
import java.util.*;

public class Example
{
public static void main(String[] args) {
// create simple test harness
Set<PeopleByDays> peepsByDay = new HashSet<PeopleByDays>();
peepsByDay.add(new PeopleByDays(new Date(2015 - 1900, Calendar.JANUARY, 1),
Person.BOB, Person.FRANK, Person.JIM, Person.JUDY, Person.SALLY));
peepsByDay.add(new PeopleByDays(new Date(2015 - 1900, Calendar.JANUARY, 2),
Person.BOB, Person.FRANK, Person.JIM, Person.JUDY, Person.TOMMY));
peepsByDay.add(new PeopleByDays(new Date(2015 - 1900, Calendar.JANUARY, 3),
Person.BOB, Person.FRANK, Person.JIM, Person.SALLY, Person.TOMMY));
peepsByDay.add(new PeopleByDays(new Date(2015 - 1900, Calendar.JANUARY, 4),
Person.BOB, Person.FRANK, Person.JUDY, Person.SALLY, Person.TOMMY));
peepsByDay.add(new PeopleByDays(new Date(2015 - 1900, Calendar.JANUARY, 5),
Person.BOB, Person.JIM, Person.JUDY, Person.SALLY, Person.TOMMY));

// make powerSet, then intersect, then sort
Set<Set<PeopleByDays>> powerPeeps = powerSet(peepsByDay);
List<PeopleByDays> powerPeepsIntersected = intersect(powerPeeps);
sort(powerPeepsIntersected);

// print out results
for (PeopleByDays peeps: powerPeepsIntersected) {
String daysFormatted = format(peeps.getDays());
System.out.print(daysFormatted);
System.out.println(peeps);
}
}

// all other Example members as defined in this answer
}

Person is a simple enum type for the people's names. Benefit of using an enum here is it takes care of appropriate equals() and hashCode() implementations for desired HashSet behavior.

    static enum Person {
BOB, FRANK, JIM, JUDY, SALLY, TOMMY;
}

PeopleByDays extends HashSet<Person> to collect an additional set of Date objects to represent the days. Overrides retainAll() (intersect) to combine days; overrides equals() and hashSet() for proper behavior in outer set.

    static class PeopleByDays extends HashSet<Person> {
private final Set<Date> days = new HashSet<Date>();

public PeopleByDays() {
super();
}
public PeopleByDays(Date day, Person... people) {
super(Arrays.asList(people));
this.days.add(day);
}
public PeopleByDays(PeopleByDays other) {
super(other);
this.days.addAll(other.days);
}

public List<Date> getDays() {
return new ArrayList<Date>(this.days);
}

@Override
public boolean retainAll(Collection<?> c) {
if (c instanceof PeopleByDays) {
this.days.addAll(((PeopleByDays)c).days);
}
return super.retainAll(c);
}

@Override
public boolean equals(Object o) {
return super.equals(o) && this.days.equals(((PeopleByDays) o).days);
}
@Override
public int hashCode() {
return super.hashCode() + this.days.hashCode();
}
}

powerSet() method, taken verbatim from this answer.

    public static <T> Set<Set<T>> powerSet(Set<T> originalSet) {
Set<Set<T>> sets = new HashSet<Set<T>>();
if (originalSet.isEmpty()) {
sets.add(new HashSet<T>());
return sets;
}
List<T> list = new ArrayList<T>(originalSet);
T head = list.get(0);
Set<T> rest = new HashSet<T>(list.subList(1, list.size()));
for (Set<T> set: powerSet(rest)) {
Set<T> newSet = new HashSet<T>();
newSet.add(head);
newSet.addAll(set);
sets.add(newSet);
sets.add(set);
}
return sets;
}

intersect() method to create intersection for each set of sets in the powerset.

    static List<PeopleByDays> intersect(Set<Set<PeopleByDays>> powerSet) {
List<PeopleByDays> intersected = new ArrayList<PeopleByDays>();
for (Set<PeopleByDays> powerElement: powerSet) {
PeopleByDays intersection = null;
if (powerElement.isEmpty()) {
intersection = new PeopleByDays();
} else for (PeopleByDays peeps: powerElement) {
if (intersection == null) {
intersection = new PeopleByDays(peeps);
} else {
intersection.retainAll(peeps);
}
}
intersected.add(intersection);
}
return intersected;
}

sort() method to sort the resulting intersected sets by date.

    static void sort(List<PeopleByDays> peeps) {
Collections.sort(peeps, new Comparator<PeopleByDays>() {
@Override
public int compare(PeopleByDays p1, PeopleByDays p2) {
List<Date> days1 = p1.getDays();
List<Date> days2 = p2.getDays();
Collections.sort(days1);
Collections.sort(days2);
for (int i = 0; i < days1.size() && i < days2.size(); i++) {
int compare = days1.get(i).compareTo(days2.get(i));
if (compare != 0) {
return compare;
}
}
return days1.size() - days2.size();
}
});
}

format() method to format the list of days for each intersection.

    static String format(List<Date> days) {
if (days.isEmpty()) {
return "";
}
StringBuilder sb = new StringBuilder();
DateFormat format = new SimpleDateFormat("MMM d");
Collections.sort(days);
String separator = "";
for (Date day: days) {
sb.append(separator);
sb.append(format.format(day));
separator = ", ";
}
sb.append(" ");
return sb.toString();
}

And finally, the output.

[]
Jan 1 [BOB, JUDY, FRANK, JIM, SALLY]
Jan 1, Jan 2 [BOB, JUDY, FRANK, JIM]
Jan 1, Jan 2, Jan 3 [BOB, FRANK, JIM]
Jan 1, Jan 2, Jan 3, Jan 4 [BOB, FRANK]
Jan 1, Jan 2, Jan 3, Jan 4, Jan 5 [BOB]
Jan 1, Jan 2, Jan 3, Jan 5 [BOB, JIM]
Jan 1, Jan 2, Jan 4 [BOB, JUDY, FRANK]
Jan 1, Jan 2, Jan 4, Jan 5 [BOB, JUDY]
Jan 1, Jan 2, Jan 5 [BOB, JUDY, JIM]
Jan 1, Jan 3 [BOB, FRANK, JIM, SALLY]
Jan 1, Jan 3, Jan 4 [BOB, FRANK, SALLY]
Jan 1, Jan 3, Jan 4, Jan 5 [BOB, SALLY]
Jan 1, Jan 3, Jan 5 [BOB, JIM, SALLY]
Jan 1, Jan 4 [BOB, JUDY, FRANK, SALLY]
Jan 1, Jan 4, Jan 5 [BOB, JUDY, SALLY]
Jan 1, Jan 5 [BOB, JUDY, JIM, SALLY]
Jan 2 [BOB, JUDY, TOMMY, FRANK, JIM]
Jan 2, Jan 3 [BOB, TOMMY, FRANK, JIM]
Jan 2, Jan 3, Jan 4 [BOB, TOMMY, FRANK]
Jan 2, Jan 3, Jan 4, Jan 5 [BOB, TOMMY]
Jan 2, Jan 3, Jan 5 [BOB, TOMMY, JIM]
Jan 2, Jan 4 [BOB, JUDY, TOMMY, FRANK]
Jan 2, Jan 4, Jan 5 [BOB, JUDY, TOMMY]
Jan 2, Jan 5 [BOB, JUDY, TOMMY, JIM]
Jan 3 [BOB, TOMMY, FRANK, JIM, SALLY]
Jan 3, Jan 4 [BOB, TOMMY, FRANK, SALLY]
Jan 3, Jan 4, Jan 5 [BOB, TOMMY, SALLY]
Jan 3, Jan 5 [BOB, TOMMY, JIM, SALLY]
Jan 4 [BOB, JUDY, TOMMY, FRANK, SALLY]
Jan 4, Jan 5 [BOB, JUDY, TOMMY, SALLY]
Jan 5 [BOB, JUDY, TOMMY, JIM, SALLY]

Hope that helps. I tinkered with it a lot longer than I intended ;) Still didn't sort the Person names in the output though.

Efficient & safe intersection of more than two sets

As you can read on cppreference,

[...] The resulting range cannot overlap with either of the input ranges.

so you're in undefined behavior land.

As a proof by verification of this, I can tell you that I've copied your code, compiled it, run it, and for me it prints 23, so your correct result is just a coincidence.

Therefore, it looks like to have to rely on another temporary.

The STL doesn't seem to contain a solution for intersecting more than two sets, and you can't even use std::set_intersection in a nested fashion (e.g. result = my_set_intersection(set_1, my_set_intersection(set_2,set_3)), the reason being pretty simple: the algorithm's interface is "tainted" by iterators, i.e. it takes begin and end iterators to the sets, rather than the sets themselves as inputs; and it also returns an iterator.

Porbably Boost has something useful, but I haven't found it yet.



Related Topics



Leave a reply



Submit