What's the Difference Between Map() and Flatmap() Methods in Java 8

What's the difference between map() and flatMap() methods in Java 8?

Both map and flatMap can be applied to a Stream<T> and they both return a Stream<R>. The difference is that the map operation produces one output value for each input value, whereas the flatMap operation produces an arbitrary number (zero or more) values for each input value.

This is reflected in the arguments to each operation.

The map operation takes a Function, which is called for each value in the input stream and produces one result value, which is sent to the output stream.

The flatMap operation takes a function that conceptually wants to consume one value and produce an arbitrary number of values. However, in Java, it's cumbersome for a method to return an arbitrary number of values, since methods can return only zero or one value. One could imagine an API where the mapper function for flatMap takes a value and returns an array or a List of values, which are then sent to the output. Given that this is the streams library, a particularly apt way to represent an arbitrary number of return values is for the mapper function itself to return a stream! The values from the stream returned by the mapper are drained from the stream and are passed to the output stream. The "clumps" of values returned by each call to the mapper function are not distinguished at all in the output stream, thus the output is said to have been "flattened."

Typical use is for the mapper function of flatMap to return Stream.empty() if it wants to send zero values, or something like Stream.of(a, b, c) if it wants to return several values. But of course any stream can be returned.

What is the difference between Optional.flatMap and Optional.map?

Use map if the function returns the object you need or flatMap if the function returns an Optional. For example:

public static void main(String[] args) {
Optional<String> s = Optional.of("input");
System.out.println(s.map(Test::getOutput));
System.out.println(s.flatMap(Test::getOutputOpt));
}

static String getOutput(String input) {
return input == null ? null : "output for " + input;
}

static Optional<String> getOutputOpt(String input) {
return input == null ? Optional.empty() : Optional.of("output for " + input);
}

Both print statements print the same thing.

map vs flatMap in reactor

  • map is for synchronous, non-blocking, 1-to-1 transformations
  • flatMap is for asynchronous (non-blocking) 1-to-N transformations

The difference is visible in the method signature:

  • map takes a Function<T, U> and returns a Flux<U>
  • flatMap takes a Function<T, Publisher<V>> and returns a Flux<V>

That's the major hint: you can pass a Function<T, Publisher<V>> to a map, but it wouldn't know what to do with the Publishers, and that would result in a Flux<Publisher<V>>, a sequence of inert publishers.

On the other hand, flatMap expects a Publisher<V> for each T. It knows what to do with it: subscribe to it and propagate its elements in the output sequence. As a result, the return type is Flux<V>: flatMap will flatten each inner Publisher<V> into the output sequence of all the Vs.

About the 1-N aspect:

for each <T> input element, flatMap maps it to a Publisher<V>. In some cases (eg. an HTTP request), that publisher will emit only one item, in which case we're pretty close to an async map.

But that's the degenerate case. The generic case is that a Publisher can emit multiple elements, and flatMap works just as well.

For an example, imagine you have a reactive database and you flatMap from a sequence of user IDs, with a request that returns a user's set of Badge. You end up with a single Flux<Badge> of all the badges of all these users.

Is map really synchronous and non-blocking?

Yes: it is synchronous in the way the operator applies it (a simple method call, and then the operator emits the result) and non-blocking in the sense that the function itself shouldn't block the operator calling it. In other terms it shouldn't introduce latency. That's because a Flux is still asynchronous as a whole. If it blocks mid-sequence, it will impact the rest of the Flux processing, or even other Flux.

If your map function is blocking/introduces latency but cannot be converted to return a Publisher, consider publishOn/subscribeOn to offset that blocking work on a separate thread.

What is the difference between map and flatMap and a good use case for each?

Here is an example of the difference, as a spark-shell session:

First, some data - two lines of text:

val rdd = sc.parallelize(Seq("Roses are red", "Violets are blue"))  // lines

rdd.collect

res0: Array[String] = Array("Roses are red", "Violets are blue")

Now, map transforms an RDD of length N into another RDD of length N.

For example, it maps from two lines into two line-lengths:

rdd.map(_.length).collect

res1: Array[Int] = Array(13, 16)

But flatMap (loosely speaking) transforms an RDD of length N into a collection of N collections, then flattens these into a single RDD of results.

rdd.flatMap(_.split(" ")).collect

res2: Array[String] = Array("Roses", "are", "red", "Violets", "are", "blue")

We have multiple words per line, and multiple lines, but we end up with a single output array of words

Just to illustrate that, flatMapping from a collection of lines to a collection of words looks like:

["aa bb cc", "", "dd"] => [["aa","bb","cc"],[],["dd"]] => ["aa","bb","cc","dd"]

The input and output RDDs will therefore typically be of different sizes for flatMap.

If we had tried to use map with our split function, we'd have ended up with nested structures (an RDD of arrays of words, with type RDD[Array[String]]) because we have to have exactly one result per input:

rdd.map(_.split(" ")).collect

res3: Array[Array[String]] = Array(
Array(Roses, are, red),
Array(Violets, are, blue)
)

Finally, one useful special case is mapping with a function which might not return an answer, and so returns an Option. We can use flatMap to filter out the elements that return None and extract the values from those that return a Some:

val rdd = sc.parallelize(Seq(1,2,3,4))

def myfn(x: Int): Option[Int] = if (x <= 2) Some(x * 10) else None

rdd.flatMap(myfn).collect

res3: Array[Int] = Array(10,20)

(noting here that an Option behaves rather like a list that has either one element, or zero elements)

FlatMap vs Filter, Map Java

I, along with the Java language architects, agree with you that Stream#flatMap for a Stream<Optional<T>> isn't readable in Java 8, which is why they introduced Optional#stream in Java 9.

Using this, your code becomes much more readable:

.flatMap(Optional::stream)

Java 8: Difference between map and flatMap for null-checking style

The difference is only that one will return Optional<?> and the other will return Optional<Optional<?>> (replace ? with the return type of processing()). Since you're discarding the return type, there's no difference.

But it's best to avoid the mapping functions, which by convention should avoid side-effects, and instead use the more idiomatic ifPresent():

person.ifPresent(p -> car.ifPresent(c -> processing(p, c)));

This also works if processing() has a void return type, which isn't the case with a mapping function.

Does flatMap method only flattens the stream and not map it?

It's only those specific method references that limit you. Using a lambda expression, you can still do both:

.flatMap(list -> list.stream().map(String::toUpperCase))
.collect(Collectors.toList());

I should mention that it's only that sequence of method references that limited you, not that it's impossible to do it with method references. The key is the mapper you pass to it. Take these as examples:

Stream<String> uppercaseArray(String[] a) {
return Arrays.stream(a).map(String::toUpperCase);
}
Stream.of(new String[] {"ab"}, new String[] {"cd", "ef"})
.flatMap(this::uppercaseArray); //map and transform

// Or a completely different perspective
Stream.of("abc", "def").flatMapToInt(String::chars);

Java 8 Streams Shallow copy of map object, cross join using streams

You can use flatMap() with a function that generates a stream of new maps, much like the loop version does, and collect everything back into a list. Your stream version modifies existing maps in-place, and keeps overwriting previously added "4" elements with new ones.

import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

public class Demo {
public static void main(String[] args) {
List<Map<String, String>> listOfMap =
List.of(Map.of("1", "a", "2", "b", "3", "c"),
Map.of("1", "d", "2", "e", "3", "f"));
List<String> stringList = List.of("x", "y", "z");

List<Map<String, String>> result =
listOfMap.stream()
.flatMap(map -> stringList.stream().map(elem -> {
Map<String, String> newMap = new HashMap<>(map);
newMap.put("4", elem);
return newMap;
}))
.collect(Collectors.toList());

for (Map<String, String> elem : result) {
System.out.println(elem);
}
}
}

outputs

{1=a, 2=b, 3=c, 4=x}
{1=a, 2=b, 3=c, 4=y}
{1=a, 2=b, 3=c, 4=z}
{1=d, 2=e, 3=f, 4=x}
{1=d, 2=e, 3=f, 4=y}
{1=d, 2=e, 3=f, 4=z}


Related Topics



Leave a reply



Submit