Ignore Duplicates When Producing Map Using Streams

Ignore duplicates when producing map using streams

This is possible using the mergeFunction parameter of Collectors.toMap(keyMapper, valueMapper, mergeFunction):

Map<String, String> phoneBook = 
people.stream()
.collect(Collectors.toMap(
Person::getName,
Person::getAddress,
(address1, address2) -> {
System.out.println("duplicate key found!");
return address1;
}
));

mergeFunction is a function that operates on two values associated with the same key. adress1 corresponds to the first address that was encountered when collecting elements and adress2 corresponds to the second address encountered: this lambda just tells to keep the first address and ignores the second.

Map duplicate keys error fixing using streams in java 8

You need to pass a merge function to Collectors.toMap(), which handles values having the same key:

Map<BgwContract, List<Fee>> bgwContractFeeMap = bgwContractList
.stream()
.filter(bgwContract -> !bgwContract.getStatus().equals(BgwContractStatus.CLOSED))
.filter(bgwContract -> availableIbans.contains(bgwContract.getFeeAccount()))
.collect(
Collectors.toMap(Function.identity(),
bgwContractFeeService::getContractMonthlyFees,
(l1,l2)->{
l1.addAll(l2);
return l1;
})
);

In this case, the elements of two value lists having the same key will be concatenated into a single list.

How to remove Keys that would cause Collisions before executing Collectors.toMap()

I would like to remove any values from that stream beforehand.

As @JimGarrison has pointed out, preprocessing the data doesn't make sense.

You can't know it in advance whether a name is unique or not until the all data set has been processed.

Another thing that you have to consider that inside the stream pipeline (before the collector) you have knowledge on what data has been encountered previously. Because results of intermediate operations should not depend on any state.

In case if you are thinking that streams are acting like a sequence of loops and therefore assuming that it's possible to preprocess stream elements before collecting them, that's not correct. Elements of the stream pipeline are being processed lazily one at a time. I.e. all the operations in the pipeline will get applied on a single element and each operation will be applied only if it's needed (that's what laziness means).

For more information, have a look at this tutorial and API documentation

Implementations

You can segregate unique values and duplicates in a single stream statement by utilizing Collectors.teeing() and a custom object that will contain separate collections of duplicated and unique entries of the phone book.

Since the primarily function of this object only to carry the data I've implemented it as Java 16 record.

public record FilteredPhoneBook(Map<String, String> uniquePersonsAddressByName,
List<String> duplicatedNames) {}

Collector teeing() expects three arguments: two collectors and a function that merges the results produced by both collectors.

The map generated by the groupingBy() in conjunction with counting(), is meant to determine duplicated names.

Since there's no point to processing the data, toMap() which is used as the second collector will create a map containing all names.

When both collectors will hand out their results to the merger function, it will take care of removing the duplicates.

public static FilteredPhoneBook getFilteredPhoneBook(Collection<Person> people) {
return people.stream()
.collect(Collectors.teeing(
Collectors.groupingBy(Person::getName, Collectors.counting()), // intermediate Map<String, Long>
Collectors.toMap( // intermediate Map<String, String>
Person::getName,
Person::getAddress,
(left, right) -> left),
(Map<String, Long> countByName, Map<String, String> addressByName) -> {
countByName.values().removeIf(count -> count == 1); // removing unique names
addressByName.keySet().removeAll(countByName.keySet()); // removing all duplicates

return new FilteredPhoneBook(addressByName, new ArrayList<>(countByName.keySet()));
}
));
}

Another way to address this problem to utilize Map<String,Boolean> as the mean of discovering duplicates, as @Holger have suggested.

With the first collector will be written using toMap(). And it will associate true with a key that has been encountered only once, and its mergeFunction will assign the value of false if at least one duplicate was found.

The rest logic remains the same.

public static FilteredPhoneBook getFilteredPhoneBook(Collection<Person> people) {
return people.stream()
.collect(Collectors.teeing(
Collectors.toMap( // intermediate Map<String, Boolean>
Person::getName,
person -> true, // not proved to be a duplicate and initially considered unique
(left, right) -> false), // is a duplicate
Collectors.toMap( // intermediate Map<String, String>
Person::getName,
Person::getAddress,
(left, right) -> left),
(Map<String, Boolean> isUniqueByName, Map<String, String> addressByName) -> {
isUniqueByName.values().removeIf(Boolean::booleanValue); // removing unique names
addressByName.keySet().removeAll(isUniqueByName.keySet()); // removing all duplicates

return new FilteredPhoneBook(addressByName, new ArrayList<>(isUniqueByName.keySet()));
}
));
}

main() - demo

public static void main(String[] args) {
List<Person> people = List.of(
new Person("Alise", "address1"),
new Person("Bob", "address2"),
new Person("Bob", "address3"),
new Person("Carol", "address4"),
new Person("Bob", "address5")
);

FilteredPhoneBook filteredPhoneBook = getFilteredPhoneBook(people);

System.out.println("Unique entries:");
filteredPhoneBook.uniquePersonsAddressByName.forEach((k, v) -> System.out.println(k + " : " + v));
System.out.println("\nDuplicates:");
filteredPhoneBook.duplicatedNames().forEach(System.out::println);
}

Output

Unique entries:
Alise : address1
Carol : address4

Duplicates:
Bob

Java 8 List to Map with stream (avoid duplicates)

Map<String, Adegae> adegaeMap = adegaes.stream()
.collect(Collectors.toMap(adegae -> adegae.getId().getAecode().trim(),
Function.identity(), (e1, e2) -> e1));
may help in your case

How to handle exception while duplicate in map with Lamda

You can use the overloaded version of Collectors.toMap which takes a third parameter mergeFunction from the Java Docs:

If the mapped keys contains duplicates (according to
Object.equals(Object)), the value mapping function is applied to each
equal element, and the results are merged using the provided merging
function.

Collector<T, ?, Map<K,U>> toMap(Function<? super T, ? extends K> keyMapper,
Function<? super T, ? extends U> valueMapper,
BinaryOperator<U> mergeFunction)

The third parameter of BinaryOperator resolves the merge error if there is a duplicate key :

Map<Object, Object> peronMap = person.stream().limit(5)
.collect(Collectors.toMap(Person::getName,
Person::getAge,
(age1, age2) -> age2));

In the above code the last parameter is the BinaryOperator which takes the second value if there is a duplicate key and ignores the first one.

For example in your data, there are two duplicates new Person("Dickens","Charles",60) and again new Person("Charles","Dickens",60), so when the Map is created from the Person stream there would be a merge error as the key is same for the two objects. If the third parameter which is a mergeFunction is supplied you are telling how to resolve the merge error.

In my sample code it will take the second value if there are two keys with the same name.

If the data would have been: new Person("Charles","Dickens",60) and new Person("Charles","Dickens",61), the key is same Charles but if you use my code the second value of 61 would be considered and the first value of 60 will be discarded in the final Map.

Java-Stream, toMap with duplicate keys

Use the other groupingBy overload.

paymentController.findPaymentsByIds(pmtIds)
.stream()
.collect(
groupingBy(Payment::getAbcId, mapping(Payment::getPaymentId, toList());

Handle duplicating keys while mapping

Add a merge function. For example:

Map<String, List<Fee>> feeAccountMap = ContractList
.stream()
.filter(o -> !o.getStatus().equals(ContractStatus.CLOSED))
.collect(Collectors.toMap(o -> o.getFeeAccount(), o -> {
List<Fee> monthlyFees;
try {
monthlyFees = contractFeeService.getContractMonthlyFees(o);
} catch (Exception e) {
throw new RuntimeException(e);
}
return monthlyFees;
}, (value1, value2) -> value1
));

Since the value of your Map seems to be a function of the key, you can simply return one of the values when you have two values having the the same key.

This is assuming that if two elements of ContractList return the same String for getFeeAccount(), they are equal to each other.

Collect key values from array to map without duplicates

You are misunderstanding the final argument of toMap (the merge operator). When it find a duplicate key it hands the current value in the map and the new value with the same key to the merge operator which produces the single value to store.

For example, if you want to just store the first value found then use (s1, s2) -> s1. If you want to comma separate them, use (s1, s2) -> s1 + ", " + s2.

Java 8 api streams. Need to resolve duplicate keys issue

You can try this. I added lambdas to make it a little cleaner to view. Basically it uses the merge function of toMap to copy list from the newly created class to the already existing class. Here are the mods I made to your class.

  • the constructor puts the values in the lists.
  • added a copy constructor to copy one list to the other in another instance of DateAndTimeInfo
  • Added a toString method.
    String[] lines = {
"LSR2019-07-12_12:07:21.554",
"KMH2019-07-12_12:09:44.291",
"KMH2019-07-12_12:09:44.292",
"RGH2019-07-12_12:29:28.352",
"RGH2019-07-12_12:33:08.603",
"RGH2019-07-12_12:33:08.604"};

Function<String, LocalTime> toLT = (str) -> LocalTime
.from(DateTimeFormatter.ofPattern("HH:mm:ss.SSS")
.parse((str.substring(3).split("_")[1])));


Function<String, LocalDate> toLD = (str) -> LocalDate.parse(
str.substring(3).split("_")[0],
DateTimeFormatter.ofPattern("yyyy-MM-dd"));


Map<String, DateAndTimeInfo> map = lines
.collect(Collectors.toMap(
string -> string.substring(0,3),
string-> new DateAndTimeInfo(toLT.apply(string), toLD.apply(string)),
(dti1,dti2)-> dti1.copy(dti2)))

class DateAndTimeInfo {
private List<LocalTime> localTime = new ArrayList<>();
private List<LocalDate> localDate = new ArrayList<>();

public DateAndTimeInfo(LocalTime lt, LocalDate ld) {
localTime.add(lt);
localDate.add(ld);
}

public DateAndTimeInfo copy(DateAndTimeInfo dti) {
this.localTime.addAll(dti.localTime);
this.localDate.addAll(dti.localDate);
return this;
}
public String toString() {
return localTime.toString() + "\n " + localDate.toString();
}
}

For the given test data, it prints.

RGH=[12:29:28.352, 12:33:08.603, 12:33:08.604]
[2019-07-12, 2019-07-12, 2019-07-12]
KMH=[12:09:44.291, 12:09:44.292]
[2019-07-12, 2019-07-12]
LSR=[12:07:21.554]
[2019-07-12]

Note. Did you consider of creating a map like the following:
Map<String, List<DateAndTimeInfo>> and storing just the date and time in each class as fields? You could get them with getters. It would be trivial to implement. So the value of the key would be a list of DateAndTimeInfo objects.

Map<String, List<DateAndTimeInfo>> map = lines
.collect(Collectors.groupingBy(str->str.substring(0,3),
Collectors.mapping(str->new DateAndTimeInfo(toLT.apply(str),
toLD.apply(str)), Collectors.toList())));


Related Topics



Leave a reply



Submit