How to Find Duplicate Values in a List and Merge Them

How to find duplicates in a List to merge them

You can collect them to a Map based on the id and merge the children using the mergeFunction. Then map them back to final objects as:

private Collection<Foo> mergeDuplicates(Collection<Foo> fooCollection) {
return fooCollection.stream()
.collect(Collectors.toMap(Foo::getId, Foo::getChildren, this::mergeChildren))
.entrySet().stream()
.map(e -> new Foo(e.getKey(), e.getValue()))
.collect(Collectors.toCollection(ArrayList::new)); // collect accordingly
}

with the updated mergeChildren method implemented in the same class as :

private Collection<String> mergeChildren(Collection<String> foo1Children, Collection<String> foo2Children) {
foo1Children.addAll(foo2Children);
return foo1Children;
}

Note: The mergeFunction((a,b) -> {...}) is executed only when the id based duplicates are identified.

How to find duplicate values in a list and merge them

Sort the list then use itertools.groupby:

>>> from itertools import groupby
>>> l = ['a','b','a','b','c','c']
>>> [list(g) for _, g in groupby(sorted(l))]
[['a', 'a'], ['b', 'b'], ['c', 'c']]

EDIT: this is probably not the fastest approach, sorting is O(n log n) time complexity for the average case and not required for all solutions (see the comments)

Find duplicate object values in an array and merge them - JAVASCRIPT

I'm not sure if you're looking for pure JavaScript, but if you are, here's one solution. It's a bit heavy on nesting, but it gets the job done.

// Loop through all objects in the array
for (var i = 0; i < jsonData.length; i++) {

// Loop through all of the objects beyond i
// Don't increment automatically; we will do this later
for (var j = i+1; j < jsonData.length; ) {

// Check if our x values are a match
if (jsonData[i].x == jsonData[j].x) {

// Loop through all of the keys in our matching object
for (var key in jsonData[j]) {

// Ensure the key actually belongs to the object
// This is to avoid any prototype inheritance problems
if (jsonData[j].hasOwnProperty(key)) {

// Copy over the values to the first object
// Note this will overwrite any values if the key already exists!
jsonData[i][key] = jsonData[j][key];
}
}

// After copying the matching object, delete it from the array
// By deleting this object, the "next" object in the array moves back one
// Therefore it will be what j is prior to being incremented
// This is why we don't automatically increment
jsonData.splice(j, 1);
} else {
// If there's no match, increment to the next object to check
j++;
}
}
}

Note there is no defensive code in this sample; you probably want to add a few checks to make sure the data you have is formatted correctly before passing it along.

Also keep in mind that you might have to decide how to handle instances where two keys overlap but do not match (e.g. two objects both having machine1, but one with the value of 5 and the other with the value of 9). As is, whatever object comes later in the array will take precedence.

Checking Duplicate values and merge them php mysql

Here is a snippet that will work with the data format you posted.

$initialData = $data = [
[
'max_start' => '2020-07-02 05:30:00',
'max_end' => '2020-07-02 06:30:00',
],
[
'max_start' => '2020-07-02 07:00:00',
'max_end' => '2020-07-02 07:30:00',
],
[
'max_start' => '2020-07-02 06:30:00',
'max_end' => '2020-07-02 07:00:00',
],
[
'max_start' => '2020-07-02 06:30:00',
'max_end' => '2020-07-02 07:30:00',
]
];

// Order the list chronologically by the "max_start" value, to make comparison easier later
usort($data, function($a, $b){
return $a['max_start'] <=> $b['max_start'];
});


// Final result will be collected here
$result = [];

// Work with the first list value as long there is one
while ($currentInterval = array_shift($data)) {

// Compare with each other value in the list
foreach ($data as $index => $interval) {

// Check if intervals start at the same time
if ($interval['max_start'] == $currentInterval['max_start']) {

// Merge when needed
$currentInterval['max_end'] = max ($currentInterval['max_end'], $interval['max_end']);

// Remove the merged interval
unset($data[$index]);

}
}

// Add to result
$result[] = $currentInterval;
}

echo 'Initial list: ', PHP_EOL, print_r($initialData, true);
echo 'Merged list: ', PHP_EOL, print_r($result, true);

This snippet has the following output:

Initial list: 
Array
(
[0] => Array
(
[max_start] => 2020-07-02 05:30:00
[max_end] => 2020-07-02 06:30:00
)

[1] => Array
(
[max_start] => 2020-07-02 07:00:00
[max_end] => 2020-07-02 07:30:00
)

[2] => Array
(
[max_start] => 2020-07-02 06:30:00
[max_end] => 2020-07-02 07:00:00
)

[3] => Array
(
[max_start] => 2020-07-02 06:30:00
[max_end] => 2020-07-02 07:30:00
)

)
Merged list:
Array
(
[0] => Array
(
[max_start] => 2020-07-02 05:30:00
[max_end] => 2020-07-02 06:30:00
)

[1] => Array
(
[max_start] => 2020-07-02 06:30:00
[max_end] => 2020-07-02 07:30:00
)

[2] => Array
(
[max_start] => 2020-07-02 07:00:00
[max_end] => 2020-07-02 07:30:00
)

)

Let me know it if fits your needs or if further tweaking is required.


For PHP versions prior to 7.0, replace the usort code with this one:

usort($data, function($a, $b){

if ($a['max_start'] == $b['max_start']) {
return 0;
}

return $a['max_start'] > $b['max_start'] ? -1 : 1;
});

Note that PHP 5.6 reached its end of life status on 31 December 2018, it is not recommended to use it anymore.

How to merge 2 List<T> and removing duplicate values from it in C#

Have you had a look at Enumerable.Union

This method excludes duplicates from the return set. This is different
behavior to the Concat
method, which returns all the elements
in the input sequences including
duplicates.

List<int> list1 = new List<int> { 1, 12, 12, 5};
List<int> list2 = new List<int> { 12, 5, 7, 9, 1 };
List<int> ulist = list1.Union(list2).ToList();

// ulist output : 1, 12, 5, 7, 9

Merge duplicate values in a dictionary

Similar to the established solutions you found, you can store a representation of the lists (and lists of lists) as strings, which makes using them as dictionary keys straightforward.

def timesheetMerge(timesheet):
output = []
unique_shifts = {}
for key, val in timesheet.items():
if str(val) not in unique_shifts.keys():
unique_shifts[str(val)] = len(output)
output.append({"weekdays": [int(key)], "time_spans": val})
else:
output[unique_shifts[str(val)]]["weekdays"].append(int(key))

return output


Related Topics



Leave a reply



Submit