Find first element by predicate
No, filter does not scan the whole stream. It's an intermediate operation, which returns a lazy stream (actually all intermediate operations return a lazy stream). To convince you, you can simply do the following test:
List<Integer> list = Arrays.asList(1, 10, 3, 7, 5);
int a = list.stream()
.peek(num -> System.out.println("will filter " + num))
.filter(x -> x > 5)
.findFirst()
.get();
System.out.println(a);
Which outputs:
will filter 1
will filter 10
10
You see that only the two first elements of the stream are actually processed.
So you can go with your approach which is perfectly fine.
Find first element in a sequence that matches a predicate
To find the first element in a sequence seq
that matches a predicate
:
next(x for x in seq if predicate(x))
Or simply:
Python 2:
next(itertools.ifilter(predicate, seq))
Python 3:
next(filter(predicate, seq))
These will raise a StopIteration
exception if the predicate does not match for any element.
To return None
if there is no such element:
next((x for x in seq if predicate(x)), None)
Or:
next(filter(predicate, seq), None)
Is there a way to find the first element to match a Predicate?
It does exactly what you want. It doesn't run over all the elements. The filter
method creates a stream and on top of it findFirst
creates another stream.
So when you try to get the element of the stream that was created after findFirst()
you'll get only the first one that matches the predicate.
Best way to check that is to add a print line or something like that inside the predicate.
Create a stream of integers for example from 0 to 10 and create a predicate that prints the number and then checks if it's divided by 3. You'll get this printed out: 0, 1, 2, 3 and that's it.
I wrote a question + answer in the past to explain in more details how it works: Understanding java 8 stream's filter method
Fetch first element of stream matching the criteria
This might be what you are looking for:
yourStream
.filter(/* your criteria */)
.findFirst()
.get();
And better, if there's a possibility of matching no element, in which case get()
will throw a NPE. So use:
yourStream
.filter(/* your criteria */)
.findFirst()
.orElse(null); /* You could also create a default object here */
An example:public static void main(String[] args) {
class Stop {
private final String stationName;
private final int passengerCount;
Stop(final String stationName, final int passengerCount) {
this.stationName = stationName;
this.passengerCount = passengerCount;
}
}
List<Stop> stops = new LinkedList<>();
stops.add(new Stop("Station1", 250));
stops.add(new Stop("Station2", 275));
stops.add(new Stop("Station3", 390));
stops.add(new Stop("Station2", 210));
stops.add(new Stop("Station1", 190));
Stop firstStopAtStation1 = stops.stream()
.filter(e -> e.stationName.equals("Station1"))
.findFirst()
.orElse(null);
System.out.printf("At the first stop at Station1 there were %d passengers in the train.", firstStopAtStation1.passengerCount);
}
Output is:
At the first stop at Station1 there were 250 passengers in the train.
How do these four methods for finding the first element of a vector that matches a predicate compare?
Despite your request, I guess that a lot is opinion based; nonetheless, here is my take.
It really depends on the use case. Even your question might hide ambiguity; here I see two different tasks:
- find (the position of) the first element of a
list
that obeys a condition - find (the position of) the first TRUE value inside a
logical
vector.
Of course, the second task solves also the first, but with the cost of applying the predicate for each element of the list.
A lot has to do with vectorization in R: for many tasks, even if "algorithmically" wrong, it's more convenient to evaluate the predicate for each element of the list and then find the TRUE
position. This is certainly the case of your examples: there is no doubt that, given how isFour
and the other are defined that match
is to prefer: we don't mind to check each element of data
since the operation is very fast, even if one could stop before the end to get the answer. This because, if we "devectorize" your function, on average we are going to slow things a lot since subsetting and checking a single element is way slower. Consider that here I'm using list not in the R-meaning (list
object), but just as a collection of values.
On the other hand, Position
is thought to be used when your data is list
and/or the f
function is not vectorized and very expensive. Imagine for instance that f
consists in training a machine learning model, evaluate it against some data and grabbing some performance statistics. We have a list of algorithms we want to train and we want to stop when we reach a nice performance. In this case, we don't want to train every possible model in the list (super expensive), but stop as soon as we can. So, we are going to use Position
(see also its source code to understand why).
Regarding the two tasks that I outlined at the beginning, all your solutions deal with the second task, while only Position
solve exactly only the first.
So, in general:
- if the predicate is vectorized and efficient, go with
match
; - if the predicate is very expensive, go with
Position
.
Don't see much of a reason to use the other two solutions in any case: which
and which.max
are used mainly for other tasks and not to determine the position of a TRUE value (even if they can, of course).
Just to outline better the differences between the match
solution and the Position
one, here is a recap.
- For
match
theisFour
function is applied to each element of the input and only aftermatch
actually acts. Of course, for the specific task of the example a better way ismatch(4, data)
, sincematch
will stop internally as soon as a 4 is found. But the important point is thatisFour
is applied to the whole input in your implementation. - For
Position
instead, you pass the function, not the result of its application. Then, the function is applied element by element and when the condition is met it exits, without necessarily processing the whole input.
Now it should be clear what's to prefer: it depends on the cost of "devectorize" the function against the gain of not processing the whole input.
Use Functor / Predicate to find the first element smaller than its predecessor in vector
You can use std::adjacent_find()
for that.
auto iter = std::adjacent_find(begin(v), end(v), [](const auto& a, const auto& b) { return a < b; });
Racket function to find the first element in a list that matches a predicate
You're correct in your assumption that (filter f lst)
will traverse the whole input list (and create a new list on top of that!) before passing the result to first
. Your second version is more efficient, and will only traverse the list until the element is found.
But there's an easier solution in Racket, there's a built-in procedure that does exactly what you want: behold findf
!
(findf even? '(1 3 5 2 4))
=> 2
(findf even? '(1 3 5 7 9))
=> #f
Filter first element that match predicate with multiple checks
Here's a solution which traverses your list only once, giving you the first red and blue toys at the end. We can filter out all the other irrelevent colors and then create a map whose key is whether the toy is red and the value is the first toy matching the given criteria. Here's how it looks.
Map<Boolean, Toy> firstRedAndBlueToysMap = toyList.stream()
.filter(t -> t.isBlue() || t.isRed())
.collect(Collectors.toMap(Toy::isRed, t -> t, (a, b) -> a));
Toy firstRedToy = firstRedAndBlueToysMap.get(true);
Toy firstBlueToy = firstRedAndBlueToysMap.get(false);
And here's a one step approach to solve your problem.
RedBlueExtravaganza firstRedAndBlueToyPair = toyList.stream()
.filter(t -> t.isBlue() || t.isRed())
.collect(Collectors.collectingAndThen(Collectors.toMap(Toy::isRed,
t -> t, (a, b) -> a),
m -> new RedBlueExtravaganza(m.get(true), m.get(false))));
P.S. For this to work you need to have the following constructor in your RedBlueExtravaganza
class contrary to the one you have provided above.
public RedBlueExtravaganza(Toy rt, Toy bt) {
if (!(rt instanceof RedToy) || !(bt instanceof BlueToy))
throw new IllegalArgumentException();
// remainder omitted.
}
How to find first element in a sequence that matches a predicate in Python?
You can combine ifilter
and islice
to get just the first matching element.
>>> list(itertools.islice(itertools.ifilter(lambda n: n % 2 == 0, lst), 1))
[8]
However, I wouldn't consider this anyhow more readable or nicer than the original code you posted. Wrapped in a function it will be much nicer though. And since next
only returns one element there is no need for islice
anymore:
def find(pred, iterable):
return next(itertools.ifilter(pred, iterable), None)
It returns None
if no element was found.
However, you still have the rather slow call of the predicate function every loop. Please consider using a list comprehension or generator expression instead:
>>> next((x for x in lst if x % 2 == 0), None)
8
Related Topics
Printing a Jframe and Its Components
How to Set the Style of List View Cells Based on a Condition in Javafx
How to Disable Sslv3 in Android for Httpsurlconnection
Java.Lang.Runtimeexception: Takepicture Failed
Android - Activity Constructor VS Oncreate
Inside Onclicklistener I Cannot Access a Lot of Things - How to Approach
Jcenter Deprecation; Impact on Gradle and Android
Problem Unmarshalling Parcelables
Navigation Drawer Closes on Click
Mapping Manytomany with Composite Primary Key and Annotation:
What's the Difference Between Session.Persist() and Session.Save() in Hibernate
How Apply CSS on a <H:Inputtext>
Android: Switch Camera When Button Clicked