Java 7 String - Substring Complexity

Java 7 String - substring complexity

Why they decided is discussed in Oracle bug #4513622 : (str) keeping a substring of a field prevents GC for object:

When you call String.substring as in the example, a new character array for storage is not allocated. It uses the character array of the original String. Thus, the character array backing the the original String can not be GC'd until the substring's references can also be GC'd. This is an intentional optimization to prevent excessive allocations when using substring in common scenarios. Unfortunately, the problematic code hits a case where the overhead of the original array is noticeable. It is difficult to optimize for both edges cases. Any optimization for space/size trade-offs are generally complex and can often be platform-specific.

There's also this note, noting that what once was an optimization had become a pessimization according to tests:

For a long time preparations and planing have been underway to remove the offset and count fields from java.lang.String. These two fields enable multiple String instances to share the same backing character buffer. Shared character buffers were an important optimization for old benchmarks but with current real world code and benchmarks it's actually better to not share backing buffers. Shared char array backing buffers only "win" with very heavy use of String.substring. The negatively impacted situations can include parsers and compilers however current testing shows that overall this change is beneficial.

Time complexity of Java's substring()

New answer

As of update 6 within Java 7's lifetime, the behaviour of substring changed to create a copy - so every String refers to a char[] which is not shared with any other object, as far as I'm aware. So at that point, substring() became an O(n) operation where n is the numbers in the substring.

Old answer: pre-Java 7

Undocumented - but in practice O(1) if you assume no garbage collection is required, etc.

It simply builds a new String object referring to the same underlying char[] but with different offset and count values. So the cost is the time taken to perform validation and construct a single new (reasonably small) object. That's O(1) as far as it's sensible to talk about the complexity of operations which can vary in time based on garbage collection, CPU caches etc. In particular, it doesn't directly depend on the length of the original string or the substring.

Time complexity of Stringbuilder subSequence method

In java-11-openjdk, AbstractStringBuilder.subSequence calls substring...

public CharSequence subSequence(int start, int end) {
    return substring(start, end);
}

... which is almost identical to the method of the same name of String ...

public String substring(int start, int end) {
    checkRangeSIOOBE(start, end, count);
    if (isLatin1()) {
        return StringLatin1.newString(value, start, end - start);
    }
    return StringUTF16.newString(value, start, end - start);
}

... calling newString which will (in both cases) copy the backing array, so it's O(n), too.

public static String newString(byte[] val, int index, int len) {
    return new String(Arrays.copyOfRange(val, index, index + len),
                      LATIN1);
}

How to find numbers of valid subString using O(n) time complexity algorithm

We need to make few observations to solve this problem:

Assume that we have the longest valid substring that end at index ith, which contains x character c, with x <= k, so, in total, it contains x + 2 character c. We can say that, any substring that end at this index ith and start at any of the first x+1 character is a valid substring.

So, for each character c in the string, we just need to find the number of character c (denotes as count) that stand before it, if it greater than k + 1, just add k + 1 to the result, otherwise, just add count.

So here is the pseudo code:

int result = 0;
int count = 0;
for(int i = 0; i < s.length(); i++){
    if(s.charAt(i) == c){
       result += Integer.min(k + 1, count);
       count++;
    } 
}

Time complexity: O(n)

Working demo: https://ideone.com/aUpxk5

Complexity of finding all substrings of a string

Adding a character to a string is a O(1) operation. You get O(n³) if you take into account also the time need to print the output with println.

Efficient way to test if a string is substring of any in a list of strings

You could just do this:

 for (String small : list2) {
    if (set1.contains(small)) {
        // DO SOMETHING
    } else {
        // NOT FOR ME
    }
}

set1 should be the larger list of String, and instead of keeping it as a List<String>, use a Set<String> or a HashSet<String>

Java 7 String - Substring Complexity