Find First Element by Predicate

Find first element by predicate

No, filter does not scan the whole stream. It's an intermediate operation, which returns a lazy stream (actually all intermediate operations return a lazy stream). To convince you, you can simply do the following test:

List<Integer> list = Arrays.asList(1, 10, 3, 7, 5);
int a = list.stream()
.peek(num -> System.out.println("will filter " + num))
.filter(x -> x > 5)
.findFirst()
.get();
System.out.println(a);

Which outputs:

will filter 1
will filter 10
10

You see that only the two first elements of the stream are actually processed.

So you can go with your approach which is perfectly fine.

Find first element in a sequence that matches a predicate

To find the first element in a sequence seq that matches a predicate:

next(x for x in seq if predicate(x))

Or simply:

Python 2:

next(itertools.ifilter(predicate, seq))

Python 3:

next(filter(predicate, seq))

These will raise a StopIteration exception if the predicate does not match for any element.


To return None if there is no such element:

next((x for x in seq if predicate(x)), None)

Or:

next(filter(predicate, seq), None)

Is there a way to find the first element to match a Predicate?

It does exactly what you want. It doesn't run over all the elements. The filter method creates a stream and on top of it findFirst creates another stream.

So when you try to get the element of the stream that was created after findFirst() you'll get only the first one that matches the predicate.

Best way to check that is to add a print line or something like that inside the predicate.

Create a stream of integers for example from 0 to 10 and create a predicate that prints the number and then checks if it's divided by 3. You'll get this printed out: 0, 1, 2, 3 and that's it.

I wrote a question + answer in the past to explain in more details how it works: Understanding java 8 stream's filter method

Fetch first element of stream matching the criteria

This might be what you are looking for:

yourStream
.filter(/* your criteria */)
.findFirst()
.get();

And better, if there's a possibility of matching no element, in which case get() will throw a NPE. So use:

yourStream
.filter(/* your criteria */)
.findFirst()
.orElse(null); /* You could also create a default object here */
An example:
public static void main(String[] args) {
class Stop {
private final String stationName;
private final int passengerCount;

Stop(final String stationName, final int passengerCount) {
this.stationName = stationName;
this.passengerCount = passengerCount;
}
}

List<Stop> stops = new LinkedList<>();

stops.add(new Stop("Station1", 250));
stops.add(new Stop("Station2", 275));
stops.add(new Stop("Station3", 390));
stops.add(new Stop("Station2", 210));
stops.add(new Stop("Station1", 190));

Stop firstStopAtStation1 = stops.stream()
.filter(e -> e.stationName.equals("Station1"))
.findFirst()
.orElse(null);

System.out.printf("At the first stop at Station1 there were %d passengers in the train.", firstStopAtStation1.passengerCount);
}

Output is:

At the first stop at Station1 there were 250 passengers in the train.

How do these four methods for finding the first element of a vector that matches a predicate compare?

Despite your request, I guess that a lot is opinion based; nonetheless, here is my take.

It really depends on the use case. Even your question might hide ambiguity; here I see two different tasks:

  1. find (the position of) the first element of a list that obeys a condition
  2. find (the position of) the first TRUE value inside a logical vector.

Of course, the second task solves also the first, but with the cost of applying the predicate for each element of the list.

A lot has to do with vectorization in R: for many tasks, even if "algorithmically" wrong, it's more convenient to evaluate the predicate for each element of the list and then find the TRUE position. This is certainly the case of your examples: there is no doubt that, given how isFour and the other are defined that match is to prefer: we don't mind to check each element of data since the operation is very fast, even if one could stop before the end to get the answer. This because, if we "devectorize" your function, on average we are going to slow things a lot since subsetting and checking a single element is way slower. Consider that here I'm using list not in the R-meaning (list object), but just as a collection of values.

On the other hand, Position is thought to be used when your data is list and/or the f function is not vectorized and very expensive. Imagine for instance that f consists in training a machine learning model, evaluate it against some data and grabbing some performance statistics. We have a list of algorithms we want to train and we want to stop when we reach a nice performance. In this case, we don't want to train every possible model in the list (super expensive), but stop as soon as we can. So, we are going to use Position (see also its source code to understand why).

Regarding the two tasks that I outlined at the beginning, all your solutions deal with the second task, while only Position solve exactly only the first.

So, in general:

  • if the predicate is vectorized and efficient, go with match;
  • if the predicate is very expensive, go with Position.

Don't see much of a reason to use the other two solutions in any case: which and which.max are used mainly for other tasks and not to determine the position of a TRUE value (even if they can, of course).

Just to outline better the differences between the match solution and the Position one, here is a recap.

  • For match the isFour function is applied to each element of the input and only after match actually acts. Of course, for the specific task of the example a better way is match(4, data), since match will stop internally as soon as a 4 is found. But the important point is that isFour is applied to the whole input in your implementation.
  • For Position instead, you pass the function, not the result of its application. Then, the function is applied element by element and when the condition is met it exits, without necessarily processing the whole input.

Now it should be clear what's to prefer: it depends on the cost of "devectorize" the function against the gain of not processing the whole input.

Use Functor / Predicate to find the first element smaller than its predecessor in vector

You can use std::adjacent_find() for that.

auto iter = std::adjacent_find(begin(v), end(v), [](const auto& a, const auto& b) { return a < b; });

Racket function to find the first element in a list that matches a predicate

You're correct in your assumption that (filter f lst) will traverse the whole input list (and create a new list on top of that!) before passing the result to first. Your second version is more efficient, and will only traverse the list until the element is found.

But there's an easier solution in Racket, there's a built-in procedure that does exactly what you want: behold findf!

(findf even? '(1 3 5 2 4))
=> 2
(findf even? '(1 3 5 7 9))
=> #f

Filter first element that match predicate with multiple checks

Here's a solution which traverses your list only once, giving you the first red and blue toys at the end. We can filter out all the other irrelevent colors and then create a map whose key is whether the toy is red and the value is the first toy matching the given criteria. Here's how it looks.

Map<Boolean, Toy> firstRedAndBlueToysMap = toyList.stream()
.filter(t -> t.isBlue() || t.isRed())
.collect(Collectors.toMap(Toy::isRed, t -> t, (a, b) -> a));
Toy firstRedToy = firstRedAndBlueToysMap.get(true);
Toy firstBlueToy = firstRedAndBlueToysMap.get(false);

And here's a one step approach to solve your problem.

RedBlueExtravaganza firstRedAndBlueToyPair = toyList.stream()
.filter(t -> t.isBlue() || t.isRed())
.collect(Collectors.collectingAndThen(Collectors.toMap(Toy::isRed,
t -> t, (a, b) -> a),
m -> new RedBlueExtravaganza(m.get(true), m.get(false))));

P.S. For this to work you need to have the following constructor in your RedBlueExtravaganza class contrary to the one you have provided above.

public RedBlueExtravaganza(Toy rt, Toy bt) {
if (!(rt instanceof RedToy) || !(bt instanceof BlueToy))
throw new IllegalArgumentException();

// remainder omitted.
}

How to find first element in a sequence that matches a predicate in Python?

You can combine ifilter and islice to get just the first matching element.

>>> list(itertools.islice(itertools.ifilter(lambda n: n % 2 == 0, lst), 1))
[8]

However, I wouldn't consider this anyhow more readable or nicer than the original code you posted. Wrapped in a function it will be much nicer though. And since next only returns one element there is no need for islice anymore:

def find(pred, iterable):
return next(itertools.ifilter(pred, iterable), None)

It returns None if no element was found.

However, you still have the rather slow call of the predicate function every loop. Please consider using a list comprehension or generator expression instead:

>>> next((x for x in lst if x % 2 == 0), None)
8


Related Topics



Leave a reply



Submit