Why Is String.Chars() a Stream of Ints in Java 8

Why is String.chars() a stream of ints in Java 8?

As others have already mentioned, the design decision behind this was to prevent the explosion of methods and classes.

Still, personally I think this was a very bad decision, and there should, given they do not want to make CharStream, which is reasonable, different methods instead of chars(), I would think of:

  • Stream<Character> chars(), that gives a stream of boxes characters, which will have some light performance penalty.
  • IntStream unboxedChars(), which would to be used for performance code.

However, instead of focusing on why it is done this way currently, I think this answer should focus on showing a way to do it with the API that we have gotten with Java 8.

In Java 7 I would have done it like this:

for (int i = 0; i < hello.length(); i++) {
System.out.println(hello.charAt(i));
}

And I think a reasonable method to do it in Java 8 is the following:

hello.chars()
.mapToObj(i -> (char)i)
.forEach(System.out::println);

Here I obtain an IntStream and map it to an object via the lambda i -> (char)i, this will automatically box it into a Stream<Character>, and then we can do what we want, and still use method references as a plus.

Be aware though that you must do mapToObj, if you forget and use map, then nothing will complain, but you will still end up with an IntStream, and you might be left off wondering why it prints the integer values instead of the strings representing the characters.

Other ugly alternatives for Java 8:

By remaining in an IntStream and wanting to print them ultimately, you cannot use method references anymore for printing:

hello.chars()
.forEach(i -> System.out.println((char)i));

Moreover, using method references to your own method do not work anymore! Consider the following:

private void print(char c) {
System.out.println(c);
}

and then

hello.chars()
.forEach(this::print);

This will give a compile error, as there possibly is a lossy conversion.

Conclusion:

The API was designed this way because of not wanting to add CharStream, I personally think that the method should return a Stream<Character>, and the workaround currently is to use mapToObj(i -> (char)i) on an IntStream to be able to work properly with them.

Are Java 8 streams just not intended to be used with Characters and Strings?

If you can operate on Unicode code points instead of characters, it becomes a bit less cumbersome than operating on char:

String input = "Hel!lo";

String result = input.codePoints()
.map( Character::toUpperCase )
.collect( StringBuilder::new,
StringBuilder::appendCodePoint,
StringBuilder::append )
.toString();

System.out.println(result);

There's no boxing required, no conversion to string at the point of collection, and you are less likely to be tripped up by surrogate pairs in your input data. One of those nice occasions where it's less painful to implement something that caters for a broader set of inputs.

Mapping letters in a string to number of occurences, using Stream

The problem

The real cause of your problem is that there isn’t any CharStream class. For this reason s.chars() gives you an IntStream. For this reason in turn in map(c -> count(s, c)), c has type int. So you are trying to pass an int to your count method where it expects a char. This gives you your error message. An int can be converted to a char, but some bits will be discarded, which is why the conversion is necessarily lossy.

The two obvious solutions

Solution 1: Tell Java that you mean to make this lossy conversion. Since you know that c comes from a char from your string, converting it back to a char again won’t do any harm.

    int[] occurences = s.chars().map(c -> count(s, (char) c)).toArray();

Solution 2:

You may declare your method to accept an int:

public int count(String text, int ch) {

Now no conversion (cast) is needed. The comparison c == ch still works fine. It is comparing two ints anyway, so no harm is done.

PS There are other fine solutions and refinements in the other answers. Personally I’d be tempted to preprocess the string into a map of counts so I need not count again for each letter. This is not necessary, your code works fine with just one of the two changes mentioned.

Java 8 has introduce IntStream, LongStream and DoubleStream, why they have not added the CharStream and StringStream?

There is no char stream for same reason there is no byte stream. All those specific stream added for optimization (no boxing/unboxing). char and byte internally represented as int so there will be no profit in adding them.

Of course they should add them for convenience, but they didn't.

There is no String stream because there is no reason for it. String is reference type, so normal stream will work with it just fine.

How to convert Stream of Characters into a String in Java 8

Refer to @jubobs solution link. That is, you could do it this way in your case:

Stream<Character> testStream = Stream.of('a', 'b', 'c');

String result = testStream.collect(Collector.of(
StringBuilder::new,
StringBuilder::append,
StringBuilder::append,
StringBuilder::toString));

This is more performant then map/castping each character to a String first and then joining, as StringBuilder#append(char c) will cut out that intermediate step.

How to convert a String to a Java 8 Stream of Characters?

I was going to point you to my earlier answer on this topic but it turns out that you've already linked to that question. The other answer also provides useful information.

If you want char values, you can use the IntStream returned by String.chars() and cast the int values to char without loss of information. The other answers explained why there's no CharStream primitive specialization for the Stream class.

If you really want boxed Character objects, then use mapToObj() to convert from IntStream to a stream of reference type. Within mapToObj(), cast the int value to char. Since an object is expected as a return value here, the char will be autoboxed into a Character. This results in Stream<Character>. For example,

Stream<Character> sch = "abc".chars().mapToObj(i -> (char)i);
sch.forEach(ch -> System.out.printf("%c %s%n", ch, ch.getClass().getName()));

a java.lang.Character
b java.lang.Character
c java.lang.Character

Why do I need to map IntStream to Stream Character

The method CharSequence::chars returns the IntStream, which of course doesn't provide any method converting to int, such as mapToInt, but mapToObj instead. Therefore the method IntStream::map(IntUnaryOperator mapper) which both takes returns int as well shall be used since IntUnaryOperator does the same like Function<Integer, Integer> or UnaryOperator<Integer>:

int count = myString.chars()                 // IntStream
.map(c -> (set.add((char) c) ? 1 : 0)) // IntStream
.sum();

long count = myString.chars() // IntStream
.filter(c -> set.add((char) c)) // IntStream
.count();

Also, using Set<Integer> helps you to avoid conversion to a Character:

Set<Integer> set = new HashSet<>();


int count = myString.chars()                 // IntStream
.map(c -> (set.add(c) ? 1 : 0)) // IntStream
.sum();

long count = myString.chars() // IntStream
.filter(set::add) // IntStream
.count();

However, regardless of what you try to achieve, your code is wrong by principle. See the Stateless behaviors. Consider using the following snippet which lambda expressions' results are not dependent on the result of a non-deterministic operation, such as Set::add.

Stream pipeline results may be nondeterministic or incorrect if the behavioral parameters to the stream operations are stateful.

long count = myString.chars()             // IntStream
.distinct() // IntStream
.count();

Why are char[] the only arrays not supported by Arrays.stream()?

As Eran said, it's not the only one missing.

A BooleanStream would be useless, a ByteStream (if it existed) can be handled as an InputStream or converted to IntStream (as can short), and float can be handled as a DoubleStream.

As char is not able to represent all characters anyway (see linked), it would be a bit of a legacy stream. Although most people don't have to deal with codepoints anyway, so it can seem strange. I mean you use String.charAt() without thinking "this doesn't actually work in all cases".

So some things were left out because they weren't deemed that important. As said by JB Nizet in the linked question:

The designers explicitly chose to avoid the explosion of classes and
methods by limiting the primitive streams to 3 types, since the other
types (char, short, float) can be represented by their larger
equivalent (int, double) without any significant performance penalty.

The reason BooleanStream would be useless, is because you only have 2 values and that limits the operations a lot. There's no mathematical operations to do, and how often are you working with lots of boolean values anyway?


As can be seen from the comments, a BooleanStream is not needed. If it were, there would be a lot of actual use cases instead of theoretical situations, a use case going back to Java 1.4, and a fallacious comparison to while loop.



Related Topics



Leave a reply



Submit