What Is the Easiest/Best/Most Correct Way to Iterate Through the Characters of a String in Java

What is the easiest/best/most correct way to iterate through the characters of a string in Java?

I use a for loop to iterate the string and use charAt() to get each character to examine it. Since the String is implemented with an array, the charAt() method is a constant time operation.

String s = "...stuff...";

for (int i = 0; i < s.length(); i++){
char c = s.charAt(i);
//Process char
}

That's what I would do. It seems the easiest to me.

As far as correctness goes, I don't believe that exists here. It is all based on your personal style.

Fastest way to iterate over all the chars in a String

FIRST UPDATE: Before you try this ever in a production environment (not advised), read this first: http://www.javaspecialists.eu/archive/Issue237.html
Starting from Java 9, the solution as described won't work anymore, because now Java will store strings as byte[] by default.

SECOND UPDATE: As of 2016-10-25, on my AMDx64 8core and source 1.8, there is no difference between using 'charAt' and field access. It appears that the jvm is sufficiently optimized to inline and streamline any 'string.charAt(n)' calls.

THIRD UPDATE: As of 2020-09-07, on my Ryzen 1950-X 16 core and source 1.14, 'charAt1' is 9 times slower than field access and 'charAt2' is 4 times slower than field access. Field access is back as the clear winner. Note than the program will need to use byte[] access for Java 9+ version jvms.

It all depends on the length of the String being inspected. If, as the question says, it is for long strings, the fastest way to inspect the string is to use reflection to access the backing char[] of the string.

A fully randomized benchmark with JDK 8 (win32 and win64) on an 64 AMD Phenom II 4 core 955 @ 3.2 GHZ (in both client mode and server mode) with 9 different techniques (see below!) shows that using String.charAt(n) is the fastest for small strings and that using reflection to access the String backing array is almost twice as fast for large strings.

THE EXPERIMENT

  • 9 different optimization techniques are tried.

  • All string contents are randomized

  • The test are done for string sizes in multiples of two starting with 0,1,2,4,8,16 etc.

  • The tests are done 1,000 times per string size

  • The tests are shuffled into random order each time. In other words, the tests are done in random order every time they are done, over 1000 times over.

  • The entire test suite is done forwards, and backwards, to show the effect of JVM warmup on optimization and times.

  • The entire suite is done twice, once in -client mode and the other in -server mode.

CONCLUSIONS

-client mode (32 bit)

For strings 1 to 256 characters in length, calling string.charAt(i) wins with an average processing of 13.4 million to 588 million characters per second.

Also, it is overall 5.5% faster (client) and 13.9% (server) like this:

    for (int i = 0; i < data.length(); i++) {
if (data.charAt(i) <= ' ') {
doThrow();
}
}

than like this with a local final length variable:

    final int len = data.length();
for (int i = 0; i < len; i++) {
if (data.charAt(i) <= ' ') {
doThrow();
}
}

For long strings, 512 to 256K characters length, using reflection to access the String's backing array is fastest. This technique is almost twice as fast as String.charAt(i) (178% faster). The average speed over this range was 1.111 billion characters per second.

The Field must be obtained ahead of time and then it can be re-used in the library on different strings. Interestingly, unlike the code above, with Field access, it is 9% faster to have a local final length variable than to use 'chars.length' in the loop check. Here is how Field access can be setup as fastest:

   final Field field = String.class.getDeclaredField("value");
field.setAccessible(true);

try {
final char[] chars = (char[]) field.get(data);
final int len = chars.length;
for (int i = 0; i < len; i++) {
if (chars[i] <= ' ') {
doThrow();
}
}
return len;
} catch (Exception ex) {
throw new RuntimeException(ex);
}

Special comments on -server mode

Field access starting winning after 32 character length strings in server mode on a 64 bit Java machine on my AMD 64 machine. That was not seen until 512 characters length in client mode.

Also worth noting I think, when I was running JDK 8 (32 bit build) in server mode, the overall performance was 7% slower for both large and small strings. This was with build 121 Dec 2013 of JDK 8 early release. So, for now, it seems that 32 bit server mode is slower than 32 bit client mode.

That being said ... it seems the only server mode that is worth invoking is on a 64 bit machine. Otherwise it actually hampers performance.

For 32 bit build running in -server mode on an AMD64, I can say this:

  1. String.charAt(i) is the clear winner overall. Although between sizes 8 to 512 characters there were winners among 'new' 'reuse' and 'field'.
  2. String.charAt(i) is 45% faster in client mode
  3. Field access is twice as fast for large Strings in client mode.

Also worth saying, String.chars() (Stream and the parallel version) are a bust. Way slower than any other way. The Streams API is a rather slow way to perform general string operations.

Wish List

Java String could have predicate accepting optimized methods such as contains(predicate), forEach(consumer), forEachWithIndex(consumer). Thus, without the need for the user to know the length or repeat calls to String methods, these could help parsing libraries beep-beep beep speedup.

Keep dreaming :)

Happy Strings!

~SH

The test used the following 9 methods of testing the string for the presence of whitespace:

"charAt1" -- CHECK THE STRING CONTENTS THE USUAL WAY:

int charAtMethod1(final String data) {
final int len = data.length();
for (int i = 0; i < len; i++) {
if (data.charAt(i) <= ' ') {
doThrow();
}
}
return len;
}

"charAt2" -- SAME AS ABOVE BUT USE String.length() INSTEAD OF MAKING A FINAL LOCAL int FOR THE LENGTh

int charAtMethod2(final String data) {
for (int i = 0; i < data.length(); i++) {
if (data.charAt(i) <= ' ') {
doThrow();
}
}
return data.length();
}

"stream" -- USE THE NEW JAVA-8 String's IntStream AND PASS IT A PREDICATE TO DO THE CHECKING

int streamMethod(final String data, final IntPredicate predicate) {
if (data.chars().anyMatch(predicate)) {
doThrow();
}
return data.length();
}

"streamPara" -- SAME AS ABOVE, BUT OH-LA-LA - GO PARALLEL!!!

// avoid this at all costs
int streamParallelMethod(final String data, IntPredicate predicate) {
if (data.chars().parallel().anyMatch(predicate)) {
doThrow();
}
return data.length();
}

"reuse" -- REFILL A REUSABLE char[] WITH THE STRINGS CONTENTS

int reuseBuffMethod(final char[] reusable, final String data) {
final int len = data.length();
data.getChars(0, len, reusable, 0);
for (int i = 0; i < len; i++) {
if (reusable[i] <= ' ') {
doThrow();
}
}
return len;
}

"new1" -- OBTAIN A NEW COPY OF THE char[] FROM THE STRING

int newMethod1(final String data) {
final int len = data.length();
final char[] copy = data.toCharArray();
for (int i = 0; i < len; i++) {
if (copy[i] <= ' ') {
doThrow();
}
}
return len;
}

"new2" -- SAME AS ABOVE, BUT USE "FOR-EACH"

int newMethod2(final String data) {
for (final char c : data.toCharArray()) {
if (c <= ' ') {
doThrow();
}
}
return data.length();
}

"field1" -- FANCY!! OBTAIN FIELD FOR ACCESS TO THE STRING'S INTERNAL char[]

int fieldMethod1(final Field field, final String data) {
try {
final char[] chars = (char[]) field.get(data);
final int len = chars.length;
for (int i = 0; i < len; i++) {
if (chars[i] <= ' ') {
doThrow();
}
}
return len;
} catch (Exception ex) {
throw new RuntimeException(ex);
}
}

"field2" -- SAME AS ABOVE, BUT USE "FOR-EACH"

int fieldMethod2(final Field field, final String data) {
final char[] chars;
try {
chars = (char[]) field.get(data);
} catch (Exception ex) {
throw new RuntimeException(ex);
}
for (final char c : chars) {
if (c <= ' ') {
doThrow();
}
}
return chars.length;
}

COMPOSITE RESULTS FOR CLIENT -client MODE (forwards and backwards tests combined)

Note: that the -client mode with Java 32 bit and -server mode with Java 64 bit are the same as below on my AMD64 machine.

Size     WINNER  charAt1 charAt2  stream streamPar   reuse    new1    new2  field1  field2
1 charAt 77.0 72.0 462.0 584.0 127.5 89.5 86.0 159.5 165.0
2 charAt 38.0 36.5 284.0 32712.5 57.5 48.3 50.3 89.0 91.5
4 charAt 19.5 18.5 458.6 3169.0 33.0 26.8 27.5 54.1 52.6
8 charAt 9.8 9.9 100.5 1370.9 17.3 14.4 15.0 26.9 26.4
16 charAt 6.1 6.5 73.4 857.0 8.4 8.2 8.3 13.6 13.5
32 charAt 3.9 3.7 54.8 428.9 5.0 4.9 4.7 7.0 7.2
64 charAt 2.7 2.6 48.2 232.9 3.0 3.2 3.3 3.9 4.0
128 charAt 2.1 1.9 43.7 138.8 2.1 2.6 2.6 2.4 2.6
256 charAt 1.9 1.6 42.4 90.6 1.7 2.1 2.1 1.7 1.8
512 field1 1.7 1.4 40.6 60.5 1.4 1.9 1.9 1.3 1.4
1,024 field1 1.6 1.4 40.0 45.6 1.2 1.9 2.1 1.0 1.2
2,048 field1 1.6 1.3 40.0 36.2 1.2 1.8 1.7 0.9 1.1
4,096 field1 1.6 1.3 39.7 32.6 1.2 1.8 1.7 0.9 1.0
8,192 field1 1.6 1.3 39.6 30.5 1.2 1.8 1.7 0.9 1.0
16,384 field1 1.6 1.3 39.8 28.4 1.2 1.8 1.7 0.8 1.0
32,768 field1 1.6 1.3 40.0 26.7 1.3 1.8 1.7 0.8 1.0
65,536 field1 1.6 1.3 39.8 26.3 1.3 1.8 1.7 0.8 1.0
131,072 field1 1.6 1.3 40.1 25.4 1.4 1.9 1.8 0.8 1.0
262,144 field1 1.6 1.3 39.6 25.2 1.5 1.9 1.9 0.8 1.0

COMPOSITE RESULTS FOR SERVER -server MODE (forwards and backwards tests combined)

Note: this is the test for Java 32 bit running in server mode on an AMD64. The server mode for Java 64 bit was the same as Java 32 bit in client mode except that Field access starting winning after 32 characters size.

Size     WINNER  charAt1 charAt2  stream streamPar   reuse    new1    new2  field1  field2
1 charAt 74.5 95.5 524.5 783.0 90.5 102.5 90.5 135.0 151.5
2 charAt 48.5 53.0 305.0 30851.3 59.3 57.5 52.0 88.5 91.8
4 charAt 28.8 32.1 132.8 2465.1 37.6 33.9 32.3 49.0 47.0
8 new2 18.0 18.6 63.4 1541.3 18.5 17.9 17.6 25.4 25.8
16 new2 14.0 14.7 129.4 1034.7 12.5 16.2 12.0 16.0 16.6
32 new2 7.8 9.1 19.3 431.5 8.1 7.0 6.7 7.9 8.7
64 reuse 6.1 7.5 11.7 204.7 3.5 3.9 4.3 4.2 4.1
128 reuse 6.8 6.8 9.0 101.0 2.6 3.0 3.0 2.6 2.7
256 field2 6.2 6.5 6.9 57.2 2.4 2.7 2.9 2.3 2.3
512 reuse 4.3 4.9 5.8 28.2 2.0 2.6 2.6 2.1 2.1
1,024 charAt 2.0 1.8 5.3 17.6 2.1 2.5 3.5 2.0 2.0
2,048 charAt 1.9 1.7 5.2 11.9 2.2 3.0 2.6 2.0 2.0
4,096 charAt 1.9 1.7 5.1 8.7 2.1 2.6 2.6 1.9 1.9
8,192 charAt 1.9 1.7 5.1 7.6 2.2 2.5 2.6 1.9 1.9
16,384 charAt 1.9 1.7 5.1 6.9 2.2 2.5 2.5 1.9 1.9
32,768 charAt 1.9 1.7 5.1 6.1 2.2 2.5 2.5 1.9 1.9
65,536 charAt 1.9 1.7 5.1 5.5 2.2 2.4 2.4 1.9 1.9
131,072 charAt 1.9 1.7 5.1 5.4 2.3 2.5 2.5 1.9 1.9
262,144 charAt 1.9 1.7 5.1 5.1 2.3 2.5 2.5 1.9 1.9

FULL RUNNABLE PROGRAM CODE

(to test on Java 7 and earlier, remove the two streams tests)

import java.lang.reflect.Field;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Random;
import java.util.function.IntPredicate;

/**
* @author Saint Hill <http://stackoverflow.com/users/1584255/saint-hill>
*/
public final class TestStrings {

// we will not test strings longer than 512KM
final int MAX_STRING_SIZE = 1024 * 256;

// for each string size, we will do all the tests
// this many times
final int TRIES_PER_STRING_SIZE = 1000;

public static void main(String[] args) throws Exception {
new TestStrings().run();
}

void run() throws Exception {

// double the length of the data until it reaches MAX chars long
// 0,1,2,4,8,16,32,64,128,256 ...
final List<Integer> sizes = new ArrayList<>();
for (int n = 0; n <= MAX_STRING_SIZE; n = (n == 0 ? 1 : n * 2)) {
sizes.add(n);
}

// CREATE RANDOM (FOR SHUFFLING ORDER OF TESTS)
final Random random = new Random();

System.out.println("Rate in nanoseconds per character inspected.");
System.out.printf("==== FORWARDS (tries per size: %s) ==== \n", TRIES_PER_STRING_SIZE);

printHeadings(TRIES_PER_STRING_SIZE, random);

for (int size : sizes) {
reportResults(size, test(size, TRIES_PER_STRING_SIZE, random));
}

// reverse order or string sizes
Collections.reverse(sizes);

System.out.println("");
System.out.println("Rate in nanoseconds per character inspected.");
System.out.printf("==== BACKWARDS (tries per size: %s) ==== \n", TRIES_PER_STRING_SIZE);

printHeadings(TRIES_PER_STRING_SIZE, random);

for (int size : sizes) {
reportResults(size, test(size, TRIES_PER_STRING_SIZE, random));

}
}

///
///
/// METHODS OF CHECKING THE CONTENTS
/// OF A STRING. ALWAYS CHECKING FOR
/// WHITESPACE (CHAR <=' ')
///
///
// CHECK THE STRING CONTENTS
int charAtMethod1(final String data) {
final int len = data.length();
for (int i = 0; i < len; i++) {
if (data.charAt(i) <= ' ') {
doThrow();
}
}
return len;
}

// SAME AS ABOVE BUT USE String.length()
// instead of making a new final local int
int charAtMethod2(final String data) {
for (int i = 0; i < data.length(); i++) {
if (data.charAt(i) <= ' ') {
doThrow();
}
}
return data.length();
}

// USE new Java-8 String's IntStream
// pass it a PREDICATE to do the checking
int streamMethod(final String data, final IntPredicate predicate) {
if (data.chars().anyMatch(predicate)) {
doThrow();
}
return data.length();
}

// OH LA LA - GO PARALLEL!!!
int streamParallelMethod(final String data, IntPredicate predicate) {
if (data.chars().parallel().anyMatch(predicate)) {
doThrow();
}
return data.length();
}

// Re-fill a resuable char[] with the contents
// of the String's char[]
int reuseBuffMethod(final char[] reusable, final String data) {
final int len = data.length();
data.getChars(0, len, reusable, 0);
for (int i = 0; i < len; i++) {
if (reusable[i] <= ' ') {
doThrow();
}
}
return len;
}

// Obtain a new copy of char[] from String
int newMethod1(final String data) {
final int len = data.length();
final char[] copy = data.toCharArray();
for (int i = 0; i < len; i++) {
if (copy[i] <= ' ') {
doThrow();
}
}
return len;
}

// Obtain a new copy of char[] from String
// but use FOR-EACH
int newMethod2(final String data) {
for (final char c : data.toCharArray()) {
if (c <= ' ') {
doThrow();
}
}
return data.length();
}

// FANCY!
// OBTAIN FIELD FOR ACCESS TO THE STRING'S
// INTERNAL CHAR[]
int fieldMethod1(final Field field, final String data) {
try {
final char[] chars = (char[]) field.get(data);
final int len = chars.length;
for (int i = 0; i < len; i++) {
if (chars[i] <= ' ') {
doThrow();
}
}
return len;
} catch (Exception ex) {
throw new RuntimeException(ex);
}
}

// same as above but use FOR-EACH
int fieldMethod2(final Field field, final String data) {
final char[] chars;
try {
chars = (char[]) field.get(data);
} catch (Exception ex) {
throw new RuntimeException(ex);
}
for (final char c : chars) {
if (c <= ' ') {
doThrow();
}
}
return chars.length;
}

/**
*
* Make a list of tests. We will shuffle a copy of this list repeatedly
* while we repeat this test.
*
* @param data
* @return
*/
List<Jobber> makeTests(String data) throws Exception {
// make a list of tests
final List<Jobber> tests = new ArrayList<Jobber>();

tests.add(new Jobber("charAt1") {
int check() {
return charAtMethod1(data);
}
});

tests.add(new Jobber("charAt2") {
int check() {
return charAtMethod2(data);
}
});

tests.add(new Jobber("stream") {
final IntPredicate predicate = new IntPredicate() {
public boolean test(int value) {
return value <= ' ';
}
};

int check() {
return streamMethod(data, predicate);
}
});

tests.add(new Jobber("streamPar") {
final IntPredicate predicate = new IntPredicate() {
public boolean test(int value) {
return value <= ' ';
}
};

int check() {
return streamParallelMethod(data, predicate);
}
});

// Reusable char[] method
tests.add(new Jobber("reuse") {
final char[] cbuff = new char[MAX_STRING_SIZE];

int check() {
return reuseBuffMethod(cbuff, data);
}
});

// New char[] from String
tests.add(new Jobber("new1") {
int check() {
return newMethod1(data);
}
});

// New char[] from String
tests.add(new Jobber("new2") {
int check() {
return newMethod2(data);
}
});

// Use reflection for field access
tests.add(new Jobber("field1") {
final Field field;

{
field = String.class.getDeclaredField("value");
field.setAccessible(true);
}

int check() {
return fieldMethod1(field, data);
}
});

// Use reflection for field access
tests.add(new Jobber("field2") {
final Field field;

{
field = String.class.getDeclaredField("value");
field.setAccessible(true);
}

int check() {
return fieldMethod2(field, data);
}
});

return tests;
}

/**
* We use this class to keep track of test results
*/
abstract class Jobber {

final String name;
long nanos;
long chars;
long runs;

Jobber(String name) {
this.name = name;
}

abstract int check();

final double nanosPerChar() {
double charsPerRun = chars / runs;
long nanosPerRun = nanos / runs;
return charsPerRun == 0 ? nanosPerRun : nanosPerRun / charsPerRun;
}

final void run() {
runs++;
long time = System.nanoTime();
chars += check();
nanos += System.nanoTime() - time;
}
}

// MAKE A TEST STRING OF RANDOM CHARACTERS A-Z
private String makeTestString(int testSize, char start, char end) {
Random r = new Random();
char[] data = new char[testSize];
for (int i = 0; i < data.length; i++) {
data[i] = (char) (start + r.nextInt(end));
}
return new String(data);
}

// WE DO THIS IF WE FIND AN ILLEGAL CHARACTER IN THE STRING
public void doThrow() {
throw new RuntimeException("Bzzzt -- Illegal Character!!");
}

/**
* 1. get random string of correct length 2. get tests (List<Jobber>) 3.
* perform tests repeatedly, shuffling each time
*/
List<Jobber> test(int size, int tries, Random random) throws Exception {
String data = makeTestString(size, 'A', 'Z');
List<Jobber> tests = makeTests(data);
List<Jobber> copy = new ArrayList<>(tests);
while (tries-- > 0) {
Collections.shuffle(copy, random);
for (Jobber ti : copy) {
ti.run();
}
}
// check to make sure all char counts the same
long runs = tests.get(0).runs;
long count = tests.get(0).chars;
for (Jobber ti : tests) {
if (ti.runs != runs && ti.chars != count) {
throw new Exception("Char counts should match if all correct algorithms");
}
}
return tests;
}

private void printHeadings(final int TRIES_PER_STRING_SIZE, final Random random) throws Exception {
System.out.print(" Size");
for (Jobber ti : test(0, TRIES_PER_STRING_SIZE, random)) {
System.out.printf("%9s", ti.name);
}
System.out.println("");
}

private void reportResults(int size, List<Jobber> tests) {
System.out.printf("%6d", size);
for (Jobber ti : tests) {
System.out.printf("%,9.2f", ti.nanosPerChar());
}
System.out.println("");
}
}

Java - Most Efficent way to traverse a String

The first version is more efficient. The second version of the code will end up being slower and use more memory because of the cost of creating and filling the new char[] with toCharArray().

For long strings (more than 512 chars, approximately), the fastest way to inspect the string is to use reflection to access the backing char[] of the String (but only works up to Java 8, because of Compact Strings):

String data = "a really long string";
Field field = String.class.getDeclaredField("value");
field.setAccessible(true);
char[] chars = (char[]) field.get(data);

for (int i = 0, n = chars.length; i < n; i++)
System.out.println(chars[i]);

By using the above approach, we were able to avoid entirely the need to create a new char[] and also to pay the cost of an extra method call to charAt() in each iteration.

Take a look at this post, the answer contains detailed benchmarks. The best of both worlds, but anyway it was a hack and it no longer works.

How do I apply the for-each loop to every character in a String?

The easiest way to for-each every char in a String is to use toCharArray():

for (char ch: "xyz".toCharArray()) {
}

This gives you the conciseness of for-each construct, but unfortunately String (which is immutable) must perform a defensive copy to generate the char[] (which is mutable), so there is some cost penalty.

From the documentation:

[toCharArray() returns] a newly allocated character array whose length is the length of this string and whose contents are initialized to contain the character sequence represented by this string.

There are more verbose ways of iterating over characters in an array (regular for loop, CharacterIterator, etc) but if you're willing to pay the cost toCharArray() for-each is the most concise.

Is there a more efficient way to iterate through a string until you reach a certain character?

This way if there are any characters other than A or B it would immediately fails.

public static final String checkInput(final String answers) {
if (answers==null || answers.length() != ANSWER_LENGTH)
return "ERR!"; // fail fast
if(!answer.matches("[AB]*"))
return "ERR!"; // fail fast
return "";
}

What is the best correct way to iterate through the columns and rows of a string matrix in Java?

If you just need to access the values then you can use for-each loops:

for (String[] row: array) {
for (String value: row) {
// use value
}
}

This won't work if you want to change the values. In that case you need to have the indices available. You can use traditional for loops for that (as referenced in the other answers) or, in Java 8, IntStream:

IntStream.range(0, array.length).forEach(r ->
IntStream.range(0, array[r].length).forEach(c -> {
// use array[r][c]
}));

How to iterate through a String

If you want to use enhanced loop, you can convert the string to charArray

for (char ch : exampleString.toCharArray()) {
System.out.println(ch);
}

Faster way to iterate through if a string contains a phrase from array list?

What about...

public boolean checkName(String nameInputed, java.util.List<String> phraseList) {
return phraseList.stream()
.filter(phrase -> nameInputed.contains(phrase))
.findFirst()
.isPresent();
}

And if you don't want to use stream API then how about...

public boolean checkName(String nameInputed, java.util.List<String> phraseList) {
for (String phrase : phraseList) {
if (nameInputed.contains(phrase)) {
return true;
}
}
return false;
}

Edit

As @user16320675 suggested in this comment

public boolean checkName(String nameInputed, java.util.List<String> phraseList) {
return phraseList.stream()
.anyMatch(phrase -> nameInputed.contains(phrase));
}

Which is a faster approach when iterating over strings in Java and why?

The second answer suggested creates a char array and iterates over that. This mechanism is probably slower because of the additional overhead in creating the array prior to iterating over it; this overhead correlates with the length of the String.

A third method not mentioned in the answer you've referenced involves using a StringCharacterIterator; e.g.

CharacterIterator it = new StringCharacterIterator("Hello, World");
char c;

while ((c = it.next()) != CharacterIterator.DONE) {
System.err.println("Char: " + c);
}

However, my personal preference would be to use the first index-based solution in the answer you reference.



Related Topics



Leave a reply



Submit