Time Complexity of String Concatenation in Python

Time complexity of string concatenation in Python

Yes, in your case*1 string concatenation requires all characters to be copied, this is a O(N+M) operation (where N and M are the sizes of the input strings). M appends of the same word will trend to O(M^2) time therefor.

You can avoid this quadratic behaviour by using str.join():

word = ''.join(list_of_words)

which only takes O(N) (where N is the total length of the output). Or, if you are repeating a single character, you can use:

word = m * char

You are prepending characters, but building a list first, then reversing it (or using a collections.deque() object to get O(1) prepending behaviour) would still be O(n) complexity, easily beating your O(N^2) choice here.


*1 As of Python 2.4, the CPython implementation avoids creating a new string object when using strA += strB or strA = strA + strB, but this optimisation is both fragile and not portable. Since you use strA = strB + strA (prepending) the optimisation doesn't apply.

Time Complexity with Strings in Python

Semantically the line concat += word creates a new string containing the concatenation of concat and word and then re-assigns concat to refer to that new string. This would make the runtime of your loop quadratic because creating this new string takes time linear in the lengths of both concat and word.

However, in the actual implementation CPython optimizes this to mutate the contents of concat instead of creating a new string if and only if concat is the only reference to the string. Appending to a string in-place takes time that is only linear in the length of word (amortized). This makes the runtime of your loop linear. But, as I said, this optimization can only be applied if concat is the only reference to the string (otherwise the value of other variables would change as well, which would break the semantics). So introducing another reference to the string with temp = concat prevents the optimization.

Is the time complexity of string += a the same as string = string + a?

They have the same time complexity.

In the general Python standard defined case: They both have the same time complexity of O(n), where n is the length of string s.

In practice, with the CPython implementation: They can in some cases both be O(1), because of an optimization that the interpreter can do when it detects that s is the only reference to the string in question.

Demo (using Python 3.10.1):

O(1) (optimization in play):

String of length 10⁹, using +=:

$ python -m timeit --setup='s = "s" * 10**9' 's += "a"'
5000000 loops, best of 5: 96.6 nsec per loop

String of length 10⁹, using +:

$ python -m timeit --setup='s = "s" * 10**9' 's = s + "a"'
5000000 loops, best of 5: 95.5 nsec per loop

String of length 1, using +=:

$ python -m timeit --setup='s = "s"' 's += "a"'
5000000 loops, best of 5: 97 nsec per loop

String of length 1, using +:

$ python -m timeit --setup='s = "s"' 's = s + "a"'
5000000 loops, best of 5: 97.9 nsec per loop

O(n) (optimization doesn't apply):

String of length 10⁹, optimization doesn't apply, using +=:

$ python -m timeit --setup='s = "s" * 10**9; b = s' 's += "a"'
1 loop, best of 5: 440 msec per loop

String of length 10⁹, optimization doesn't apply, using +:

$ python -m timeit --setup='s = "s" * 10**9; b = s' 's = s + "a"'
1 loop, best of 5: 445 msec per loop

String of length 1, optimization doesn't apply, using +=:

$ python -m timeit --setup='s = "s"; b = s' 's += "a"'
5000000 loops, best of 5: 85.8 nsec per loop

String of length 1, optimization doesn't apply, using +:

$ python -m timeit --setup='s = "s"; b = s' 's = s + "a"'
5000000 loops, best of 5: 84.8 nsec per loop

More info about the time complexity of string concatenation:
https://stackoverflow.com/a/37133870/9835872
https://stackoverflow.com/a/34008199/9835872

Is the time-complexity of iterative string append actually O(n^2), or O(n)?

In CPython, the standard implementation of Python, there's an implementation detail that makes this usually O(n), implemented in the code the bytecode evaluation loop calls for + or += with two string operands. If Python detects that the left argument has no other references, it calls realloc to attempt to avoid a copy by resizing the string in place. This is not something you should ever rely on, because it's an implementation detail and because if realloc ends up needing to move the string frequently, performance degrades to O(n^2) anyway.

Without the weird implementation detail, the algorithm is O(n^2) due to the quadratic amount of copying involved. Code like this would only make sense in a language with mutable strings, like C++, and even in C++ you'd want to use +=.

How slow is Python's string concatenation vs. str.join?

From: Efficient String Concatenation

Method 1:

def method1():
out_str = ''
for num in xrange(loop_count):
out_str += 'num'
return out_str

Method 4:

def method4():
str_list = []
for num in xrange(loop_count):
str_list.append('num')
return ''.join(str_list)

Now I realise they are not strictly representative, and the 4th method appends to a list before iterating through and joining each item, but it's a fair indication.

String join is significantly faster then concatenation.

Why? Strings are immutable and can't be changed in place. To alter one, a new representation needs to be created (a concatenation of the two).

alt text

Reversing a String: Is this O(log n) time complexity?

No, it is O(n log n).

String concatenation is linear. Each + in the expression left_half + right_half takes O(l+r) time, where l = len(left_half) and r = len(right_half).

  • You concatenate two length n/2 strings 1 time.
  • You concatenate two length n/4 strings 2 times.
  • You concatenate two length n/8 strings 4 times.
  • ...
  • You concatenate two length 4 strings n/8 times.
  • You concatenate two length 2 strings n/4 times.
  • You concatenate two length 1 strings n/2 times.

Each step takes O(n) and there are O(log n) steps, leading to an overall time complexity of O(n log n).

Footnote: The string slices num[n:] and num[:n] also have linear complexity. Creating a slice of length k is O(k). Accounting for these costs doesn't change the overall analysis.

Most Efficient Method to Concatenate Strings in Python

Why don't you try it out? You can use timeit.timeit() to run a statement many times and return the overall duration.

Here, we use s to setup the variables a and b (not included in the overall time), and then run the various options 10 million times.

>>> from timeit import timeit
>>>
>>> n = 10 * 1000 * 1000
>>> s = "a = 'start'; b = ' end'"
>>>
>>> timeit("c = a + b", setup=s, number=n)
0.4452877212315798
>>>
>>> timeit("c = f'{a}{b}'", setup=s, number=n)
0.5252049304544926
>>>
>>> timeit("c = '%s%s'.format(a, b)", setup=s, number=n)
0.6849184390157461
>>>>
>>> timeit("c = ''.join((a, b))", setup=s, number=n)
0.8546998891979456
>>>
>>> timeit("c = '%s%s' % (a, b)", setup=s, number=n)
1.1699129864573479
>>>
>>> timeit("c = '{0}{1}'.format(a, b)", setup=s, number=n)
1.5954962372779846

This shows that unless your application's bottleneck is string concatenation, it's probably not worth being too concerned about...

  • The best case is ~0.45 seconds for 10 million iterations, or about 45ns per operation.
  • The worst case is ~1.59 seconds for 10 million iterations, or about 159ns per operation.

If you're performing literally millions of operations, you'll see a speed improvement of about 1 second.

Note that your results may vary quite drastically depending on the lengths (and number) of the strings you're concatenating, and the hardware you're running on.



Related Topics



Leave a reply



Submit