Java 8 Lambda for Selecting Top Salary Employee for Each Department

Java 8 lambda for selecting top salary employee for each department

You can do that with a grouping collector:

Map<String, Employee> topEmployees =
    allEmployees.stream()
                .collect(groupingBy(
                    e -> e.department,
                    collectingAndThen(maxBy(comparingInt(e -> e.salary)), Optional::get) 
                ));

with the static imports

import static java.util.Comparator.comparingInt;
import static java.util.stream.Collectors.collectingAndThen;
import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.maxBy;

This code creates a Stream of all the employees and groups them with their department with the help of Collectors.groupingBy. For all the values classified to the same key, we need to keep only the employee with the maximum salary, so we collect them with Collectors.maxBy and the comparator compares the salary with Comparator.comparingInt. Since maxBy returns an Optional<Employee> (to handle the case where there the list is empty), we wrap it with a call to Collectors.collectingAndThen with a finisher that just returns the employee: we know in this case that the optional won't be empty.

Java stream groupingBy key as max salary employee and value as all employee of department

You can get the required result as follows:

Map<Employee, List<Employee>> result = employees.stream()
         .sorted(Comparator.comparingDouble(Employee::getSalary).reversed())
         .collect(groupingBy(Employee::getDepartment, LinkedHashMap::new, toList())).values().stream()
         .collect(toMap(l -> l.get(0), Function.identity()));

There's probably better and more efficient solutions out there and I would have exhausted those ideas had i not been on my phone.

How to apply sorting and limiting after groupby using Java streams

This part of code:

e -> e.stream().sorted().limit(limit).collect(Collectors.toList())

return List of List<Employee> and not List<String>, so either you change the type of the result to:

Map<String, List<Employee>> groupByTeachers = 
                // ...The rest of your code

Or if you expect Map<String, List<String>>, change the collectingAndThen to get the expected field, for example getName or getDep:

e -> e.stream().sorted().limit(limit)
        .map(Employee::getDep) // for example getDep
        .collect(Collectors.toList())

Retrieving list of employees with lowest salary using stream

First create a TreeMap, whose key is the salary. TreeMap sorts it's entries by it's key. Then grab the first entry, which is the entry with the lowest salary and get hold of the values associated with that. This solution iterates over the list only once. Here's how it looks.

List<Employee> empsWithLowestSalary = employees.stream()
    .collect(Collectors.groupingBy(Employee::getSalary, TreeMap::new, Collectors.toList()))
    .firstEntry()
    .getValue();

TreeMap stores map elements in a Red-Black tree. The insertion cost for one element in Red-Black tree is O(Log (n)). Since we are inserting n elements, the total Time complexity of this solution is O(n Log (n)). For the firstEntry(), it takes constant time O(1), since it maintains a pointer to the leftmost and rightmost leaf nodes in the tree respectively. The leftmost node represent the smallest value in the tree whereas the rightmost leaf node represents the highest value.

Just by following this great answer, I thought of writing a custom collector that serves our purpose. This collector iterates over the List only once and it's runtime complexity lies at O(n), which significantly outperforms the above approach. Furthermore it allows you to write your client code in one single statement. Here's how it looks.

static <T> Collector<T, ?, List<T>> minList(Comparator<? super T> comp) {
    return Collector.of(ArrayList::new, (list, t) -> {
        int c;
        if (list.isEmpty() || (c = comp.compare(t, list.get(0))) == 0)
            list.add(t);
        else if (c < 0) {
            /*
             * We have found a smaller element than what we already have. Clear the list and
             * add this smallest element to it.
             */
            list.clear();
            list.add(t);
        }
    }, (list1, list2) -> {
        if (comp.compare(list1.get(0), list2.get(0)) < 0)
            return list1;
        else if (comp.compare(list1.get(0), list2.get(0)) > 0)
            return list2;
        else {
            list1.addAll(list2);
            return list1;
        }
    });
}

And here's your client code.

Collection<Employee> empsWithLowestSalary = employees.stream()
                .collect(minList(Comparator.comparing(Employee::getSalary)));

Java 8 get all employee having address start with P

Use anyMatch in filter instead of mapping :

employees.stream()
         .filter(employee -> employee.getAddresses().stream()
                 .anyMatch(adr -> adr.getCity().startsWith("p")))
         .forEach(System.out::println); // collecting not required to use forEach

Count employee by department ID & identify top two departments with the most employee IDs

Here is my pyspark code to do this. I have assumed the read statements with delimiter "|".

from pyspark.sql.functions import *
from pyspark.sql.types import *

emp = spark.read.option("header","true") \
                .option("inferSchema","true") \
                .option("sep","|") \
                .csv("/FileStore/tables/employee.txt")

dept = spark.read.option("header","true") \
                 .option("inferSchema","true") \
                 .option("sep","|") \
                 .option("removeQuotes","true") \
                 .csv("/FileStore/tables/department.txt")

# Employee count by department
empCountByDept = emp.groupBy("deptno") \
                       .agg(count("empno").alias("no_of_employees"))

empCountByDept.show(20,False)

# Top two department names with the most employees 
topTwoDept = empCountByDept.join(dept, empCountByDept.deptno == dept.deptno, "inner") \
                           .orderBy(empCountByDept.no_of_employees.desc()).drop(dept.deptno) \
                           .select("dname","no_of_employees") \
                           .limit(2)
topTwoDept.show(20,False)

Result ::

+------+---------------+
|deptno|no_of_employees|
+------+---------------+
|20    |5              |
|10    |3              |
|30    |6              |
+------+---------------+
+----------+---------------+
|dname     |no_of_employees|
+----------+---------------+
|'Sales'   |6              |
|'Research'|5              |
+----------+---------------+

Java 8 Lambda for Selecting Top Salary Employee for Each Department