How to Find List Intersection

How to find list intersection?

If order is not important and you don't need to worry about duplicates then you can use set intersection:

>>> a = [1,2,3,4,5]
>>> b = [1,3,5,6]
>>> list(set(a) & set(b))
[1, 3, 5]

Find elements NOT in the intersection of two lists

I usually prefer a shortcut:

set(a) ^ set(b)
{2, 4, 6}

Find intersection of two nested lists?

If you want:

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
c3 = [[13, 32], [7, 13, 28], [1,6]]

Then here is your solution for Python 2:

c3 = [filter(lambda x: x in c1, sublist) for sublist in c2]

In Python 3 filter returns an iterable instead of list, so you need to wrap filter calls with list():

c3 = [list(filter(lambda x: x in c1, sublist)) for sublist in c2]

Explanation:

The filter part takes each sublist's item and checks to see if it is in the source list c1.
The list comprehension is executed for each sublist in c2.

Find intersection of words from two list of strings

You can achieve this using set() with itertools.chain() and map() as:

>>> from itertools import chain

>>> a=['ketab khaneh','danesh gah', 'shi rin i']
>>> b=['ketab khaneh','dan esh gah','shirin i']

>>> set(chain(*map(str.split, a))).intersection(chain(*map(str.split, b)))
set(['i', 'khaneh', 'ketab', 'gah'])

Python -Intersection of multiple lists?

for 2.4, you can just define an intersection function.

def intersect(*d):
sets = iter(map(set, d))
result = sets.next()
for s in sets:
result = result.intersection(s)
return result

for newer versions of python:

the intersection method takes an arbitrary amount of arguments

result = set(d[0]).intersection(*d[1:])

alternatively, you can intersect the first set with itself to avoid slicing the list and making a copy:

result = set(d[0]).intersection(*d)

I'm not really sure which would be more efficient and have a feeling that it would depend on the size of the d[0] and the size of the list unless python has an inbuilt check for it like

if s1 is s2:
return s1

in the intersection method.

>>> d = [[1,2,3,4], [2,3,4], [3,4,5,6,7]]
>>> set(d[0]).intersection(*d)
set([3, 4])
>>> set(d[0]).intersection(*d[1:])
set([3, 4])
>>>

Python: How to find the intersection between two lists based on object's id?

There's no need to intersect two sets. In this case you can just check if the id() exists in another set.

set2 = {id(n) for n in list2}
result = [n for n in list1 if id(n) in set2]

The complexity of this code is O(n1 + n2). I'll explain this in following equivalent but more readable code:

set2 = {id(n) for n in list2}  # O(n2)
result = []
for n in list1: # O(n1)
if id(n) in set2: # O(1)
result.append(n) # O(1)

In total it's O(n1 + n2).


There is also an alternative solution if you can make change to the Node class by just defining the __hash__ and __eq__ method.

class Node:
...

def __hash__(self):
return id(self)

def __eq__(self, another):
return id(self) == id(another)


list1 = [...]
list2 = [...]

result = set(list1) & set(list2)

Find the size of intersection between two lists

This part is logically incorrect:

    if dL1[1:] == [] or dL2[1:] == []:
return 0

The base case should be when one or both lists are empty. But you are returning 0 when one or both lists have a single element. That's incorrect because that element could still be a match, so the size of the intersection would not be 0. If you change this condition to dL1 == [] or dL2 == [] then it works:

>>> intersection(L1, L2)
2
>>> intersection(L1, L3)
5
>>> intersection([1, 6, 1, 4], [1, 2, 3, 4])
2
>>> intersection([1, 1, 1, 2, 2], [1, 1, 2, 2, 2])
4

How to find the intersection of list of intervals?

For this trivial example, you seem to want the maximum of the first value and minimum of the second value.

l = [[-5, 10], [-3, 5], [-53.7228, 3.72281], [-3, 4], [-6.32456, 6.32456]]

v1 = max([pair[0] for pair in l]) # -3
v2 = min([pair[1] for pair in l]) # 3.72281

assert v1 < v2

Length of the intersections between a list an list of list

Since y contains disjoint ranges and the union of them is also a range, a very fast solution is to first perform a binary search on y and then count the resulting indices and only return the ones that appear at least 10 times. The complexity of this algorithm is O(Nx log Ny) with Nx and Ny the number of items in respectively x and y. This algorithm is nearly optimal (since x needs to be read entirely).



Actual implementation

First of all, you need to transform your current y to a Numpy array containing the beginning value of all ranges (in an increasing order) with N as the last value (assuming N is excluded for the ranges of y, or N+1 otherwise). This part can be assumed as free since y can be computed at compile time in your case. Here is an example:

import numpy as np
y = np.array([1, 4, 8, 10, 13, ..., N])

Then, you need to perform the binary search and check that the values fits in the range of y:

indices = np.searchsorted(y, x, 'right')

# The `0 < indices < len(y)` check should not be needed regarding the input.
# If so, you can use only `indices -= 1`.
indices = indices[(0 < indices) & (indices < len(y))] - 1

Then you need to count the indices and filter the ones with at least :

uniqueIndices, counts = np.unique(indices, return_counts=True)
result = uniqueIndices[counts >= 10]

Here is an example based on your:

x = np.array([1, 2, 3, 4, 5, 6])

# [[1, 2, 3], [4], [5, 6], [7], [8, 9, 10, 11]]
y = np.array([1, 4, 5, 7, 8, 12])

# Actual simplified version of the above algorithm
indices = np.searchsorted(y, x, 'right') - 1
uniqueIndices, counts = np.unique(indices, return_counts=True)
result = uniqueIndices[counts >= 2]

# [0, 2]
print(result.tolist())

It runs in less than 0.1 ms on my machine on a random input based on your input constraints.



Related Topics



Leave a reply



Submit