Java 8 Distinct by Property

Java 8 Distinct by property

Consider distinct to be a stateful filter. Here is a function that returns a predicate that maintains state about what it's seen previously, and that returns whether the given element was seen for the first time:

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> seen.add(keyExtractor.apply(t));
}

Then you can write:

persons.stream().filter(distinctByKey(Person::getName))

Note that if the stream is ordered and is run in parallel, this will preserve an arbitrary element from among the duplicates, instead of the first one, as distinct() does.

(This is essentially the same as my answer to this question: Java Lambda Stream Distinct() on arbitrary key?)

Java 8 Streams - Get distinct integers from object list as an array

You can collect all ages using map and the call distinct for distinct values

Integer[] ages = persons.stream()
.map(Person::getAge)
.distinct()
.toArray(Integer[]::new);

If you want to collect into int[]

int[] ages = persons.stream()
.mapToInt(Person::getAge)
.distinct()
.toArray();

java 8 stream collect max object and distinct by Property

First you can do a group by on Book name and collect them into Map<String, List<Book>>, And then from map.values() collect the highest price book from each type

List<Book> books = list.stream()
.collect(Collectors.groupingBy(Book::getName))
.values()
.stream()
.map(book -> Collections.max(book, Comparator.comparingInt(Book::getCost)))
.collect(Collectors.toList());

The other solution suggested by @Holger using Collectors.toMap will be more effective comparing to collecting and finding the max element

List<Book> books = list.stream()
.collect(Collectors.collectingAndThen(
Collectors.toMap(Book::getName, Function.identity(),
BinaryOperator.maxBy(Comparator.comparingInt(Book::getCost))),
m -> new ArrayList<>(m.values())));

Stream API distinct by name and max by value

You could use the groupBy(classifier, downstream) method.

The classification function maps elements to some key type K. The downstream collector operates on elements of type T and produces a result of type D. The resulting collector produces a Map.

snippet

fooList.stream().collect(
Collectors.groupingBy(Foo::getName,
Collectors.collectingAndThen(
Collectors.maxBy(Comparator.comparing(Foo::getValue)),
Optional::get
)
)
).values().forEach(System.out::println);

output

Foo{name='A', value=2}
Foo{name='B', value=1}
Foo{name='C', value=1}
Foo{name='D', value=3}

Java 8 Distinct by property

Consider distinct to be a stateful filter. Here is a function that returns a predicate that maintains state about what it's seen previously, and that returns whether the given element was seen for the first time:

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> seen.add(keyExtractor.apply(t));
}

Then you can write:

persons.stream().filter(distinctByKey(Person::getName))

Note that if the stream is ordered and is run in parallel, this will preserve an arbitrary element from among the duplicates, instead of the first one, as distinct() does.

(This is essentially the same as my answer to this question: Java Lambda Stream Distinct() on arbitrary key?)

Java Lambda Stream Distinct() on arbitrary key?

The distinct operation is a stateful pipeline operation; in this case it's a stateful filter. It's a bit inconvenient to create these yourself, as there's nothing built-in, but a small helper class should do the trick:

/**
* Stateful filter. T is type of stream element, K is type of extracted key.
*/
static class DistinctByKey<T,K> {
Map<K,Boolean> seen = new ConcurrentHashMap<>();
Function<T,K> keyExtractor;
public DistinctByKey(Function<T,K> ke) {
this.keyExtractor = ke;
}
public boolean filter(T t) {
return seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
}

I don't know your domain classes, but I think that, with this helper class, you could do what you want like this:

BigDecimal totalShare = orders.stream()
.filter(new DistinctByKey<Order,CompanyId>(o -> o.getCompany().getId())::filter)
.map(Order::getShare)
.reduce(BigDecimal.ZERO, BigDecimal::add);

Unfortunately the type inference couldn't get far enough inside the expression, so I had to specify explicitly the type arguments for the DistinctByKey class.

This involves more setup than the collectors approach described by Louis Wasserman, but this has the advantage that distinct items pass through immediately instead of being buffered up until the collection completes. Space should be the same, as (unavoidably) both approaches end up accumulating all distinct keys extracted from the stream elements.

UPDATE

It's possible to get rid of the K type parameter since it's not actually used for anything other than being stored in a map. So Object is sufficient.

/**
* Stateful filter. T is type of stream element.
*/
static class DistinctByKey<T> {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
Function<T,Object> keyExtractor;
public DistinctByKey(Function<T,Object> ke) {
this.keyExtractor = ke;
}
public boolean filter(T t) {
return seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
}

BigDecimal totalShare = orders.stream()
.filter(new DistinctByKey<Order>(o -> o.getCompany().getId())::filter)
.map(Order::getShare)
.reduce(BigDecimal.ZERO, BigDecimal::add);

This simplifies things a bit, but I still had to specify the type argument to the constructor. Trying to use diamond or a static factory method doesn't seem to improve things. I think the difficulty is that the compiler can't infer generic type parameters -- for a constructor or a static method call -- when either is in the instance expression of a method reference. Oh well.

(Another variation on this that would probably simplify it is to make DistinctByKey<T> implements Predicate<T> and rename the method to eval. This would remove the need to use a method reference and would probably improve type inference. However, it's unlikely to be as nice as the solution below.)

UPDATE 2

Can't stop thinking about this. Instead of a helper class, use a higher-order function. We can use captured locals to maintain state, so we don't even need a separate class! Bonus, things are simplified so type inference works!

public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}

BigDecimal totalShare = orders.stream()
.filter(distinctByKey(o -> o.getCompany().getId()))
.map(Order::getShare)
.reduce(BigDecimal.ZERO, BigDecimal::add);

Java 8 Distinct by property

Consider distinct to be a stateful filter. Here is a function that returns a predicate that maintains state about what it's seen previously, and that returns whether the given element was seen for the first time:

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> seen.add(keyExtractor.apply(t));
}

Then you can write:

persons.stream().filter(distinctByKey(Person::getName))

Note that if the stream is ordered and is run in parallel, this will preserve an arbitrary element from among the duplicates, instead of the first one, as distinct() does.

(This is essentially the same as my answer to this question: Java Lambda Stream Distinct() on arbitrary key?)



Related Topics



Leave a reply



Submit