Time Complexity of Java's Substring()

Time complexity of Java's substring()

New answer

As of update 6 within Java 7's lifetime, the behaviour of substring changed to create a copy - so every String refers to a char[] which is not shared with any other object, as far as I'm aware. So at that point, substring() became an O(n) operation where n is the numbers in the substring.

Old answer: pre-Java 7

Undocumented - but in practice O(1) if you assume no garbage collection is required, etc.

It simply builds a new String object referring to the same underlying char[] but with different offset and count values. So the cost is the time taken to perform validation and construct a single new (reasonably small) object. That's O(1) as far as it's sensible to talk about the complexity of operations which can vary in time based on garbage collection, CPU caches etc. In particular, it doesn't directly depend on the length of the original string or the substring.

Java 7 String - substring complexity

Why they decided is discussed in Oracle bug #4513622 : (str) keeping a substring of a field prevents GC for object:

When you call String.substring as in the example, a new character array for storage is not allocated. It uses the character array of the original String. Thus, the character array backing the the original String can not be GC'd until the substring's references can also be GC'd. This is an intentional optimization to prevent excessive allocations when using substring in common scenarios. Unfortunately, the problematic code hits a case where the overhead of the original array is noticeable. It is difficult to optimize for both edges cases. Any optimization for space/size trade-offs are generally complex and can often be platform-specific.

There's also this note, noting that what once was an optimization had become a pessimization according to tests:

For a long time preparations and planing have been underway to remove the offset and count fields from java.lang.String. These two fields enable multiple String instances to share the same backing character buffer. Shared character buffers were an important optimization for old benchmarks but with current real world code and benchmarks it's actually better to not share backing buffers. Shared char array backing buffers only "win" with very heavy use of String.substring. The negatively impacted situations can include parsers and compilers however current testing shows that overall this change is beneficial.

Time complexity of Stringbuilder subSequence method

In java-11-openjdk, AbstractStringBuilder.subSequence calls substring...

public CharSequence subSequence(int start, int end) {
    return substring(start, end);
}

... which is almost identical to the method of the same name of String ...

public String substring(int start, int end) {
    checkRangeSIOOBE(start, end, count);
    if (isLatin1()) {
        return StringLatin1.newString(value, start, end - start);
    }
    return StringUTF16.newString(value, start, end - start);
}

... calling newString which will (in both cases) copy the backing array, so it's O(n), too.

public static String newString(byte[] val, int index, int len) {
    return new String(Arrays.copyOfRange(val, index, index + len),
                      LATIN1);
}

How to find numbers of valid subString using O(n) time complexity algorithm

We need to make few observations to solve this problem:

Assume that we have the longest valid substring that end at index ith, which contains x character c, with x <= k, so, in total, it contains x + 2 character c. We can say that, any substring that end at this index ith and start at any of the first x+1 character is a valid substring.

So, for each character c in the string, we just need to find the number of character c (denotes as count) that stand before it, if it greater than k + 1, just add k + 1 to the result, otherwise, just add count.

So here is the pseudo code:

int result = 0;
int count = 0;
for(int i = 0; i < s.length(); i++){
    if(s.charAt(i) == c){
       result += Integer.min(k + 1, count);
       count++;
    } 
}

Time complexity: O(n)

Working demo: https://ideone.com/aUpxk5

java indexof(String str) method complexity

The complexity of Java's implementation of indexOf is O(m*n) where n and m are the length of the search string and pattern respectively.

What you can do to improve complexity is to use e.g., the Boyer-More algorithm to intelligently skip comparing logical parts of the string which cannot match the pattern.

Time Complexity of checking whether a string is present in a HashSet Java

If the String you’re checking is a new String, its hashCode must be calculated to find its bucket, so that would be O(m) for the hashCode calculation and O(1) for the existence check, so O(m).

If the String object already exists in the Set (or its hashCode has otherwise been calculated), checking it is O(1), because its hashCode is cached.

What is the Big-O of String.contains() in Java?

One of the best known algorithms is the Boyer-Moore string searching algorithm which is O(n) although it can give sublinear performance in the best case.

Which algorithm is used in Java depends on which implemetation you download. It seems that for example OpenJDK uses a naive algorithm that runs in O(nm) and linear performance in the best case. See lines 1770-1806 here.

Time Complexity of Java's Substring()