Time Complexity of String Slice

Time complexity of string slice

Short answer: str slices, in general, copy. That means that your function that does a slice for each of your string's n suffixes is doing O(n2) work. That said, you can avoid copies if you can work with bytes-like objects using memoryviews to get zero-copy views of the original bytes data. See How you can do zero copy slicing below for how to make it work.

Long answer: (C)Python str do not slice by referencing a view of a subset of the data. There are exactly three modes of operation for str slicing:

  1. Complete slice, e.g. mystr[:]: Returns a reference to the exact same str (not just shared data, the same actual object, mystr is mystr[:] since str is immutable so there is no risk to doing so)
  2. The zero length slice and (implementation dependent) cached length 1 slices; the empty string is a singleton (mystr[1:1] is mystr[2:2] is ''), and low ordinal strings of length one are cached singletons as well (on CPython 3.5.0, it looks like all characters representable in latin-1, that is Unicode ordinals in range(256), are cached)
  3. All other slices: The sliced str is copied at creation time, and thereafter unrelated to the original str

The reason why #3 is the general rule is to avoid issues with large str being kept in memory by a view of a small portion of it. If you had a 1GB file, read it in and sliced it like so (yes, it's wasteful when you can seek, this is for illustration):

with open(myfile) as f:
data = f.read()[-1024:]

then you'd have 1 GB of data being held in memory to support a view that shows the final 1 KB, a serious waste. Since slices are usually smallish, it's almost always faster to copy on slice instead of create views. It also means str can be simpler; it needs to know its size, but it need not track an offset into the data as well.

How you can do zero copy slicing

There are ways to perform view based slicing in Python, and in Python 2, it will work on str (because str is bytes-like in Python 2, supporting the buffer protocol). With Py2 str and Py3 bytes (as well as many other data types like bytearray, array.array, numpy arrays, mmap.mmaps, etc.), you can create a memoryview that is a zero copy view of the original object, and can be sliced without copying data. So if you can use (or encode) to Py2 str/Py3 bytes, and your function can work with arbitrary bytes-like objects, then you could do:

def do_something_on_all_suffixes(big_string):
# In Py3, may need to encode as latin-1 or the like
remaining_suffix = memoryview(big_string)
# Rather than explicit loop, just replace view with one shorter view
# on each loop
while remaining_suffix: # Stop when we've sliced to empty view
some_constant_time_operation(remaining_suffix)
remaining_suffix = remaining_suffix[1:]

The slices of memoryviews do make new view objects (they're just ultra-lightweight with fixed size unrelated to the amount of data they view), just not any data, so some_constant_time_operation can store a copy if needed and it won't be changed when we slice it down later. Should you need a proper copy as Py2 str/Py3 bytes, you can call .tobytes() to get the raw bytes obj, or (in Py3 only it appears), decode it directly to a str that copies from the buffer, e.g. str(remaining_suffix[10:20], 'latin-1').

What is the time complexity of string slice? O(k) or O(n)

The relevant code is here and it is O(k) as it can be seen line 1628

result_buf = PyBytes_AS_STRING(result);
for (cur = start, i = 0; i < slicelength;cur += step, i++) {
result_buf[i] = source_buf[cur];
}
return result;

What is the algorithmic complexity of string slicing? (Practical, Real World)

Yes, V8 has optimised string slicing to O(1). This does of course depend a lot on what else happens to all the strings, they might need to get copied later on.

The relevant implementation from the above link is:

Sliced String diagram

// The Sliced String class describes strings that are substrings of another
// sequential string. The motivation is to save time and memory when creating
// a substring. A Sliced String is described as a pointer to the parent,
// the offset from the start of the parent string and the length. Using
// a Sliced String therefore requires unpacking of the parent string and
// adding the offset to the start address. A substring of a Sliced String
// are not nested since the double indirection is simplified when creating
// such a substring.
// Currently missing features are:
// - handling externalized parent strings
// - external strings as parent
// - truncating sliced string to enable otherwise unneeded parent to be GC'ed.
class SlicedString: public String {
// ...
};

Also beware of your quick test results. As you are doing nothing with the y variable, the slicing and even the whole loop might get eliminated as dead code by the optimiser. If you are doing benchmarks, do them on practical real world data.

Time complexity of slice in javascript v8 runtime

(V8 developer here.)

Array.prototype.slice is O(n), where n is the number of elements in the slice.

String.prototype.slice is O(1), thanks to our implementation of SlicedStrings, which are just storing pointer, offset, length to the original string and avoid copying the characters (except when they're tiny, so that copying a handful of characters is actually cheaper and smaller than storing a reference; that's still O(1)).

The key difference is that strings are immutable, and arrays are not. When you do str1 = "Hello World"; str2 = str1.slice(2, 5);, since there is no way to modify str1's contents afterwards, str2 doesn't need to ensure that it's unaffected by any such modification.

When you do a = [1, 2, 3, 4]; b = a.slice(1, 3); a[1] = "changed"; console.log(b[0]);, then you expect to see 2, not "changed". That's why b has to be an actual copy. (In theory, a copy-on-write approach would be possible, but V8 doesn't do that for array slices.)

"Shallow copy" means that nested objects will not be copied. Example:

let nested = {property: "value"};
var a = [nested];
var b = a.slice(0, 1);
a[0].property = "new value";
console.log(a === b); // false, `b` is a copy
console.log(a[0] === b[0]); // true, `nested` was not copied
console.log(b[0] === nested); // true
console.log(b[0].property); // "new value"

Time complexity of Java's substring()

New answer

As of update 6 within Java 7's lifetime, the behaviour of substring changed to create a copy - so every String refers to a char[] which is not shared with any other object, as far as I'm aware. So at that point, substring() became an O(n) operation where n is the numbers in the substring.

Old answer: pre-Java 7

Undocumented - but in practice O(1) if you assume no garbage collection is required, etc.

It simply builds a new String object referring to the same underlying char[] but with different offset and count values. So the cost is the time taken to perform validation and construct a single new (reasonably small) object. That's O(1) as far as it's sensible to talk about the complexity of operations which can vary in time based on garbage collection, CPU caches etc. In particular, it doesn't directly depend on the length of the original string or the substring.

Space Complexity of Python List Slices

Python is strict about adhering to possible side effects. As far as the language is concerned, it is incorrect to perform any of your operations inplace.

Your operation

 arr[2:] = arr[2:][::-1]

are three separate slice operations:

  • arr[2:] creates a new list from all elements (but the first two) of arr.
  • ...[::-1] creates a new, reversed list from all elements of ....
  • arr[2:] = ... replaces all elements (but the first two) of arr with ....

Each slice operation basically amounts to a primitive O(n) copy operation. Since only references are copied, the size or type of elements does not matter.

In your complete example, that amounts to three O(n) slice operations in an O(k) loop:

ans = [i+1 for i in range(n)]   # O(n)
for i in range(k): # O(k)
ans[i:] = ans[i:][::-1] # O(n)

In total, the time complexity amounts to O(nk). Space complexity is only O(n), since the temporary slices are reclaimable immediately. You basically end up with the initial list, plus some temporary copies.


Will it be different or same than when ans is a string, something like ans = '12345...n'?

The complexity of the operations does not change - they are still the same primitive operations.

Practically, the CPython implementation cheats with certain objects. For example, += is inplace mutation for strings with refcount 1 even though strings are immutable. However, this does not apply to your slice usage.

In general, relying on inbuilt optimisations is a bad idea with Python.


If you are worried about space, start with writing lean code. For example, ans[i:][::-1] should simply be ans[i::-1]. That alone halves the temporary space required.

What is the time complexity of removing the first character in a Python string?

Python's str() does slicing s[1:] in O(n) time, because str() is immutable and any slicing operations create whole new string and copies all slice data, so for string of length N this operations is O(N).

To do any slicing in O(1) time you need to use special implementations, like Numpy has for its arrays. Numpy arrays keep pointer to allocated array plus three extra fields begin, end, stride.

Before slicing Numpy's array having N elements will start with begin=0, end=N. After slicing e.g. a[1:], unlike str() Numpy array doesn't copy any memory but just changes begin/end/stride. Memory pointer and size remain the same. Thus after a[1:] we will have begin=1, end=N. After one more a[1:] we will have begin=2, end=N.

For strings you can convert them one time at program beginning to Numpy arrays of uint32 elements (because Unicode code point is 32-bit at current time). Then do all the operations inside your program only in Numpy arrays. Numpy also has many string operations implemented for its arrays.

You can do slicing and many other operations very fast, much faster than working with Python's str(). And all slicing operations are done in O(1) time. After you're done with Numpy operations you convert back to Python's str().

Next follows example of code. It first converts Python str() to numpy array. Then does any operations including slicing (slicing is done in O(1) time on numpy arrays). And when we're finished numpy array is converted back to Python's str().

Try it online!

import numpy as np
s = 'abcde'
# Convert string to Numpy array
a = np.frombuffer(s.encode('utf-32-le'), dtype = np.uint32)
# Do any slicing operations in O(1) time
a = a[1:]
a = a[1:]
# Convert Numpy array to string
s = a.tobytes().decode('utf-32-le')
print(s)

Output:

cde

What is the time complexity of this python function which includes slice operations?

Time and space should both be O(n*k).

Looking up a dictionary key of size k is O(k) because you have to hash k, which requires reading all the characters of k. While we often treat dictionary and set lookup as amortized constant time, I don't think we can use that simplification when the dictionary key size is one of the parameters.

Since you do these lookups O(n) times, the time complexity is O(n*k).

Since all the keys have to be stored in the dictionary, and the worst case is that there are no duplicate slices, the dictionary may have to contain n*k keys.



Related Topics



Leave a reply



Submit