Cannot Use a Lambda Expression as an Argument to a Dynamically Dispatched Operation Without First Casting It to a Delegate or Expression Tree Type

Check for any element that exists in two collections

Sounds like you just want:

bool hasSameElements = listA.Intersect(listB).Any();

EDIT: As noted in comments, Intersect uses lazy evaluation. It defers all execution until the first element is read from the result; at that point it will load all of listB into a set, and then stream listA until it finds a result to yield. At that point, Any() will return true and so no more work will be done. See my Edulinq post on Intersect for more information.

Determine if two collections share at least one element

You can use the Any().

var listA = new List<int>();
var listB = new List<int>();

bool hasCommonItem = listA.Any(i => listB.Contains(i));

Moreover, you can write an IEqualityComparer implementation to pass it as a parameter to the Contains() if necessary.

Check if one list contains element from the other

If you just need to test basic equality, this can be done with the basic JDK without modifying the input lists in the one line

!Collections.disjoint(list1, list2);

If you need to test a specific property, that's harder. I would recommend, by default,

list1.stream()
.map(Object1::getProperty)
.anyMatch(
list2.stream()
.map(Object2::getProperty)
.collect(toSet())
::contains)

...which collects the distinct values in list2 and tests each value in list1 for presence.

Check if one collection of values contains another

Yes, there is a faster way, provided you're not space-constrained. (See space/time tradeoff.)

The algorithm:

Just insert all the elements in Set2 into a hashtable (in C# 3.5, that's a HashSet<string>), and then go through all the elements of Set1 and check if they're in the hashtable. This method is faster (Θ(m + n) time complexity), but uses O(n) space.

Alternatively, just say:

bool isSuperset = new HashSet<string>(set2).IsSupersetOf(set1);

Edit 1:

For those people concerned about the possibility of duplicates (and hence the misnomer "set"), the idea can easily be extended:

Just make a new Dictionary<string, int> representing the count of each word in the super-list (add one to the count each time you see an instance of an existing word, and add the word with a count of 1 if it's not in the dictionary), and then go through the sub-list and decrement the count each time. If every word exists in the dictionary and the count is never zero when you try to decrement it, then the subset is in fact a sub-list; otherwise, you had too many instances of a word (or it didn't exist at all), so it's not a real sub-list.


Edit 2:

If the strings are very big and you're concerned about space efficiency, and an algorithm that works with (very) high probability works for you, then try storing a hash of each string instead. It's technically not guaranteed to work, but the probability of it not working is pretty darn low.

Check if two list have the same items

That's what sets (e.g., HashSet<T>) are for. Sets have no defined order, and SetEquals verifies whether the set and another collection contain the same elements.

var set = new HashSet<int>(list1);
var equals = set.SetEquals(list2);

Something like 'contains any' for Java set?

Stream::anyMatch

Since Java 8 you could use Stream::anyMatch.

setA.stream().anyMatch(setB::contains)

Test if lists share any items in python

Short answer: use not set(a).isdisjoint(b), it's generally the fastest.

There are four common ways to test if two lists a and b share any items. The first option is to convert both to sets and check their intersection, as such:

bool(set(a) & set(b))

Because sets are stored using a hash table in Python, searching them is O(1) (see here for more information about complexity of operators in Python). Theoretically, this is O(n+m) on average for n and m objects in lists a and b. But

  1. it must first create sets out of the lists, which can take a non-negligible amount of time, and
  2. it supposes that hashing collisions are sparse among your data.

The second way to do it is using a generator expression performing iteration on the lists, such as:

any(i in a for i in b)

This allows to search in-place, so no new memory is allocated for intermediary variables. It also bails out on the first find. But the in operator is always O(n) on lists (see here).

Another proposed option is an hybridto iterate through one of the list, convert the other one in a set and test for membership on this set, like so:

a = set(a); any(i in a for i in b)

A fourth approach is to take advantage of the isdisjoint() method of the (frozen)sets (see here), for example:

not set(a).isdisjoint(b)

If the elements you search are near the beginning of an array (e.g. it is sorted), the generator expression is favored, as the sets intersection method have to allocate new memory for the intermediary variables:

from timeit import timeit
>>> timeit('bool(set(a) & set(b))', setup="a=list(range(1000));b=list(range(1000))", number=100000)
26.077727576019242
>>> timeit('any(i in a for i in b)', setup="a=list(range(1000));b=list(range(1000))", number=100000)
0.16220548999262974

Here's a graph of the execution time for this example in function of list size:

Element sharing test execution time when shared at the beginning

Note that both axes are logarithmic. This represents the best case for the generator expression. As can be seen, the isdisjoint() method is better for very small list sizes, whereas the generator expression is better for larger list sizes.

On the other hand, as the search begins with the beginning for the hybrid and generator expression, if the shared element are systematically at the end of the array (or both lists does not share any values), the disjoint and set intersection approaches are then way faster than the generator expression and the hybrid approach.

>>> timeit('any(i in a for i in b)', setup="a=list(range(1000));b=[x+998 for x in range(999,0,-1)]", number=1000))
13.739536046981812
>>> timeit('bool(set(a) & set(b))', setup="a=list(range(1000));b=[x+998 for x in range(999,0,-1)]", number=1000))
0.08102107048034668

Element sharing test execution time when shared at the end

It is interesting to note that the generator expression is way slower for bigger list sizes. This is only for 1000 repetitions, instead of the 100000 for the previous figure. This setup also approximates well when when no elements are shared, and is the best case for the disjoint and set intersection approaches.

Here are two analysis using random numbers (instead of rigging the setup to favor one technique or another):

Element sharing test execution time for randomly generated data with high chance of sharing
Element sharing test execution time for randomly generated data with high chance of sharing

High chance of sharing: elements are randomly taken from [1, 2*len(a)]. Low chance of sharing: elements are randomly taken from [1, 1000*len(a)].

Up to now, this analysis supposed both lists are of the same size. In case of two lists of different sizes, for example a is much smaller, isdisjoint() is always faster:

Element sharing test execution time on two differently sized lists when shared at the beginning
Element sharing test execution time on two differently sized lists when shared at the end

Make sure that the a list is the smaller, otherwise the performance decreases. In this experiment, the a list size was set constant to 5.

In summary:

  • If the lists are very small (< 10 elements), not set(a).isdisjoint(b) is always the fastest.
  • If the elements in the lists are sorted or have a regular structure that you can take advantage of, the generator expression any(i in a for i in b) is the fastest on large list sizes;
  • Test the set intersection with not set(a).isdisjoint(b), which is always faster than bool(set(a) & set(b)).
  • The hybrid "iterate through list, test on set" a = set(a); any(i in a for i in b) is generally slower than other methods.
  • The generator expression and the hybrid are much slower than the two other approaches when it comes to lists without sharing elements.

In most cases, using the isdisjoint() method is the best approach as the generator expression will take much longer to execute, as it is very inefficient when no elements are shared.

How to make lookup between two collections when an item in an array exists in the other collection?

You were going in the right direction the only thing you missed here is to use expr with in aggregation operator which matches the same fields of the document

db.getCollection("Orders").aggregate([
{ "$lookup": {
"from": "Products",
"let": { "localId": "$_id" , "prods": "$Products" },
"pipeline": [
{ "$match": { "$expr": { "$in": [ "$_id", "$$prods" ] } } },
{ "$project": { "_id": 1, "name": "$ProductName" } }
],
"as": "linkedData"
}},
{ "$skip": 0 },
{ "$limit": 1 }
])

See the docs here

Check of two list have colliding element?

.NET has a number of set operations that work on enumerables, so you could take the set intersection to find members in both lists. Use Any() to find out if the resulting sequence has any entries.

E.g.

if(list1.Intersect(list2).Any()) 


Related Topics



Leave a reply



Submit