Is String.Contains() Faster Than String.Indexof()

Is String.Contains() faster than String.IndexOf()?

Contains calls IndexOf:

public bool Contains(string value)
{
return (this.IndexOf(value, StringComparison.Ordinal) >= 0);
}

Which calls CompareInfo.IndexOf, which ultimately uses a CLR implementation.

If you want to see how strings are compared in the CLR this will show you (look for CaseInsensitiveCompHelper).

IndexOf(string) has no options and Contains()uses an Ordinal compare (a byte-by-byte comparison rather than trying to perform a smart compare, for example, e with é).

So IndexOf will be marginally faster (in theory) as IndexOf goes straight to a string search using FindNLSString from kernel32.dll (the power of reflector!).

Updated for .NET 4.0 - IndexOf no longer uses Ordinal Comparison and so Contains can be faster. See comment below.

String: Why is indexOf significantely faster than contains?

The contains method is implemented as:

public boolean contains(CharSequence s) {
return indexOf(s.toString()) > -1;
}

Which means that it may be slower if CharSequence s is not a java.lang.String because calling s.toString() will result in the allocation and initialization of a new string instance. If s is a string - then there should not be any measurable difference.

PS: The test from here is flawed: https://stackoverflow.com/a/18340277/2588800
Java initially execute in "interpreted" mode, which is quite slow, and when it detects that a piece of codes gets executed a lot of times it compiles it to native code in order to speed it up (read about JIT compilation).

As you can see contains internally calls indexOf, which means that indexOf eventually will get compiled to native. So when he tests the indexOf (notice that he tests it after contains) it might have already been compiled to native code. And that's the reason for the time difference. Try to reverse the order of that tests - first test indexOf and then contains and I bet you'll see just the opposite results.

JMH to the rescue

Benchmark                            Mode  Cnt   Score   Error   Units
StringSearchBenchmark.testContains thrpt 500 22,071 ± 0,269 ops/us
StringSearchBenchmark.testIndexOf thrpt 500 22,654 ± 0,233 ops/us

As you can see the difference is negligible and might be casued by the additional method calls (indexOf() + toString()) and load on the system.

Source-Code:

@Fork(1)
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Measurement(iterations = 500, time = 50, timeUnit = TimeUnit.MILLISECONDS)
@Warmup(iterations = 10)
@BenchmarkMode(Mode.Throughput)
public class StringSearchBenchmark {
private static final String STATE = "absdefghijklmnopqrstuvwxyzabsdefghijklmnopqrstuvwxyzabsdefghijklmnopqrstuvwxyzabsdefghijklmnopqrstuvwxyz";
private static final String SEARCH_TERM = "abcd";

@Benchmark
public void testContains(Blackhole sink) {
sink.consume(STATE.contains(SEARCH_TERM));
}

@Benchmark
public void testIndexOf(Blackhole sink) {
sink.consume(STATE.indexOf(SEARCH_TERM));
}
}

Performance of IndexOf(char) vs Contains(string) for checking the presence of a character in a string

Just test and see

  String source = new String('a', 10000000) + "b" + new String('c', 10000000);

Stopwatch sw = new Stopwatch();

sw.Start();

// Just an estimation
int result = source.IndexOf('b');
// int result = source.IndexOf("b");
// Boolean result = source.Contains("b");
// Boolean result = source.Contains('b');

sw.Stop();

int time = sw.ElapsedMilliseconds;

At my workstation (i5 3.2 GHz, .Net 5.0 64 bit) it takes about 10 ms for Char and 38 ms for String

Edit: Performance's outcome is

  IndexOf(Char)     10 -- Fastest
IndexOf(String) 38
Contains(Char) 100 -- Slowest
Contains(String) 41

so IndexOf(String) and Contains(String) are roughly the same

Which String method: contains or indexOf -1 ?

Take a look at the java.lang.String source code. The contains method is implemented using a call to indexOf, so they are essentially the same.

public boolean contains(CharSequence s) {
return indexOf(s.toString()) > -1;
}

You should use whichever method makes your code more readable. If you are checking to see if a String contains a specific substring, use contains. If you are looking for the substring's starting index, use indexOf.


Edit:

A couple of answers mention that indexOf should be preferred over contains due to the fact that contains makes an additional method call, and is thus, less efficient. This is wrong. The overhead caused by an additional method call in this case is totally insignificant. Use whichever method makes the most sense in the context of your implementation. This will make your code more readable.

Java performance String.indexOf(char) vs String.indexOf(single String)

Your JMH test is incorrect as you don't consume the result, so the indexOf call can be (or can be not) removed at all by JIT compiler. In your case it seems that JIT-compiler determined that indexOf(String) has no side-effect and removed this call at all, but did not do the same for indexOf(char). Always consume the result (the simplest way is to return it from the benchmark). Here's my version:

import java.util.*;
import java.util.concurrent.TimeUnit;

import org.openjdk.jmh.annotations.*;

@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Fork(3)
public class IndexOfTest {
private String str;
private char c;
private String s;

@Setup
public void setup() {
str = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
c = 'z';
s = "z";
}

@Benchmark
public int indexOfChar() {
return str.indexOf('z');
}

@Benchmark
public int indexOfString() {
return str.indexOf("z");
}

@Benchmark
public int indexOfCharIndirect() {
return str.indexOf(c);
}

@Benchmark
public int indexOfStringIndirect() {
return str.indexOf(s);
}
}

I test the same thing, but added two indirect tests: when searched char or String is loaded from the field, thus its exact value is unknown during the JIT-compilation. The results are the following (Intel x64):

# JMH 1.11.2 (released 27 days ago)
# VM version: JDK 1.8.0_45, VM 25.45-b02
Benchmark Mode Cnt Score Error Units
IndexOfTest.indexOfChar avgt 30 25,364 ± 0,424 ns/op
IndexOfTest.indexOfCharIndirect avgt 30 25,287 ± 0,210 ns/op
IndexOfTest.indexOfString avgt 30 24,370 ± 0,100 ns/op
IndexOfTest.indexOfStringIndirect avgt 30 27,198 ± 0,048 ns/op

As you can see, indexOfChar performs in the same way regardless of direct or indirect access. The indexOfString is slightly faster for direct access, but somewhat slower for indirect. That's because indexOf(String) is a JVM intrinsic: its Java code is actually replaced by JIT compiler with efficient inline implementation. For constant string known at JIT compilation time it's possible to generate more efficient code.

In general there's no big difference at least for such short strings. Thus you may use either of these methods for single symbol match.

Differences between contains and indexOf, Java


contains()

It is a common case in programming when you want to check if specific String contains a particular substring. For example, If you want to test if the String "The big red fox" contains the substring "red." This method is useful in such situation.

    String s1 = "My name is bar"; 
System.out.println(s1.contains("bar"));
System.out.println(s1.contains("foo"));

//output
true
false

2..

indexOf()

This method returns the index within this string of the first occurrence of the specified character or -1, if the character does not occur.

    String gfg = new String("Welcome to stackoverflow"); 
System.out.print("Found s first at position : ");
System.out.println(gfg.indexOf('s'));

//output
Found s first at position : 11

In Java, which is faster - String.contains( some text ) or Regex that looks for same text?

They're both fast enough to be over before you know it. Better to go for the one that you can read more easily.

But from forums, blogs contains is faster, but still negligible performance difference

When should Regex be used over String.IndexOf()? or String.Contains()?

It's variable. Comparison performance is a complex function of the input data, the culture being used for comparing, case sensitivity and CompareOptions. A Regex object is more expensive to instantiate (unless it's in the Regex cache), so if you're doing a lot of one off comparisons, it not that great to use and I've found it's typically slower than IndexOf(), but YMMV.

Keep in mind that when using Contains/IndexOf that the culture under which the user/thread is running will decide how the comparison is done. That can have a significant impact on performance. Not all cultures are as fast.

The Invariant culture is a very fast culture. If you use a CompareInfo directly, rather than doing String.IndexOf(), it will be somewhat faster still.

CultureInfo.InvariantCulture.CompareInfo.IndexOf(..)

The only way to have some confidence in making the right choice is to benchmark. That said, unless you're shifting through many megabytes of strings, it won't make a difference that matters to anyone. As ChrisF said earlier, focus on readable/maintainble code in that case.

Here's a good article on getting the most out of regex:
Optimizing Regular Expression Performance

String#indexOf: Why comparing 15 million strings with JDK code is faster than my code?

Because String.indexOf() is an intrinsic method, making the JDK call a native implementation and your call a Java implementation.

The JVM doesn't actually execute the Java code you see, it knows that it can replace it with a far more efficient version. When you copy the code it goes through regular JIT compilation, making it less efficient. Just one of the dozens of tricks the JVM does to make things more performant, without the developer often even realizing.



Related Topics



Leave a reply



Submit