Time complexity of string concatenation in Python
Yes, in your case*1 string concatenation requires all characters to be copied, this is a O(N+M) operation (where N and M are the sizes of the input strings). M appends of the same word will trend to O(M^2) time therefor.
You can avoid this quadratic behaviour by using str.join()
:
word = ''.join(list_of_words)
which only takes O(N) (where N is the total length of the output). Or, if you are repeating a single character, you can use:word = m * char
You are prepending characters, but building a list first, then reversing it (or using a collections.deque()
object to get O(1) prepending behaviour) would still be O(n) complexity, easily beating your O(N^2) choice here.*1 As of Python 2.4, the CPython implementation avoids creating a new string object when using
strA += strB
or strA = strA + strB
, but this optimisation is both fragile and not portable. Since you use strA = strB + strA
(prepending) the optimisation doesn't apply. Time Complexity with Strings in Python
Semantically the line concat += word
creates a new string containing the concatenation of concat
and word
and then re-assigns concat
to refer to that new string. This would make the runtime of your loop quadratic because creating this new string takes time linear in the lengths of both concat
and word
.
However, in the actual implementation CPython optimizes this to mutate the contents of concat
instead of creating a new string if and only if concat
is the only reference to the string. Appending to a string in-place takes time that is only linear in the length of word
(amortized). This makes the runtime of your loop linear. But, as I said, this optimization can only be applied if concat
is the only reference to the string (otherwise the value of other variables would change as well, which would break the semantics). So introducing another reference to the string with temp = concat
prevents the optimization.
Is the time complexity of string += a the same as string = string + a?
They have the same time complexity.
In the general Python standard defined case: They both have the same time complexity of O(n), where n is the length of string s
.
In practice, with the CPython implementation: They can in some cases both be O(1), because of an optimization that the interpreter can do when it detects that s
is the only reference to the string in question.
Demo (using Python 3.10.1
):
O(1) (optimization in play):
String of length 10⁹, using +=
:
$ python -m timeit --setup='s = "s" * 10**9' 's += "a"'
5000000 loops, best of 5: 96.6 nsec per loop
String of length 10⁹, using +
:$ python -m timeit --setup='s = "s" * 10**9' 's = s + "a"'
5000000 loops, best of 5: 95.5 nsec per loop
String of length 1, using +=
:$ python -m timeit --setup='s = "s"' 's += "a"'
5000000 loops, best of 5: 97 nsec per loop
String of length 1, using +
:$ python -m timeit --setup='s = "s"' 's = s + "a"'
5000000 loops, best of 5: 97.9 nsec per loop
O(n) (optimization doesn't apply):String of length 10⁹, optimization doesn't apply, using +=
:
$ python -m timeit --setup='s = "s" * 10**9; b = s' 's += "a"'
1 loop, best of 5: 440 msec per loop
String of length 10⁹, optimization doesn't apply, using +
:$ python -m timeit --setup='s = "s" * 10**9; b = s' 's = s + "a"'
1 loop, best of 5: 445 msec per loop
String of length 1, optimization doesn't apply, using +=
:$ python -m timeit --setup='s = "s"; b = s' 's += "a"'
5000000 loops, best of 5: 85.8 nsec per loop
String of length 1, optimization doesn't apply, using +
:$ python -m timeit --setup='s = "s"; b = s' 's = s + "a"'
5000000 loops, best of 5: 84.8 nsec per loop
More info about the time complexity of string concatenation:https://stackoverflow.com/a/37133870/9835872
https://stackoverflow.com/a/34008199/9835872
Is the time-complexity of iterative string append actually O(n^2), or O(n)?
In CPython, the standard implementation of Python, there's an implementation detail that makes this usually O(n), implemented in the code the bytecode evaluation loop calls for +
or +=
with two string operands. If Python detects that the left argument has no other references, it calls realloc
to attempt to avoid a copy by resizing the string in place. This is not something you should ever rely on, because it's an implementation detail and because if realloc
ends up needing to move the string frequently, performance degrades to O(n^2) anyway.
Without the weird implementation detail, the algorithm is O(n^2) due to the quadratic amount of copying involved. Code like this would only make sense in a language with mutable strings, like C++, and even in C++ you'd want to use +=
.
How slow is Python's string concatenation vs. str.join?
From: Efficient String Concatenation
Method 1:
def method1():
out_str = ''
for num in xrange(loop_count):
out_str += 'num'
return out_str
Method 4: def method4():
str_list = []
for num in xrange(loop_count):
str_list.append('num')
return ''.join(str_list)
Now I realise they are not strictly representative, and the 4th method appends to a list before iterating through and joining each item, but it's a fair indication.String join is significantly faster then concatenation.
Why? Strings are immutable and can't be changed in place. To alter one, a new representation needs to be created (a concatenation of the two).
Reversing a String: Is this O(log n) time complexity?
No, it is O(n log n).
String concatenation is linear. Each +
in the expression left_half + right_half
takes O(l+r) time, where l = len(left_half)
and r = len(right_half)
.
- You concatenate two length n/2 strings 1 time.
- You concatenate two length n/4 strings 2 times.
- You concatenate two length n/8 strings 4 times.
- ...
- You concatenate two length 4 strings n/8 times.
- You concatenate two length 2 strings n/4 times.
- You concatenate two length 1 strings n/2 times.
Footnote: The string slices num[n:]
and num[:n]
also have linear complexity. Creating a slice of length k is O(k). Accounting for these costs doesn't change the overall analysis.
Most Efficient Method to Concatenate Strings in Python
Why don't you try it out? You can use timeit.timeit() to run a statement many times and return the overall duration.
Here, we use s
to setup the variables a
and b
(not included in the overall time), and then run the various options 10 million times.
>>> from timeit import timeit
>>>
>>> n = 10 * 1000 * 1000
>>> s = "a = 'start'; b = ' end'"
>>>
>>> timeit("c = a + b", setup=s, number=n)
0.4452877212315798
>>>
>>> timeit("c = f'{a}{b}'", setup=s, number=n)
0.5252049304544926
>>>
>>> timeit("c = '%s%s'.format(a, b)", setup=s, number=n)
0.6849184390157461
>>>>
>>> timeit("c = ''.join((a, b))", setup=s, number=n)
0.8546998891979456
>>>
>>> timeit("c = '%s%s' % (a, b)", setup=s, number=n)
1.1699129864573479
>>>
>>> timeit("c = '{0}{1}'.format(a, b)", setup=s, number=n)
1.5954962372779846
This shows that unless your application's bottleneck is string concatenation, it's probably not worth being too concerned about...- The best case is ~0.45 seconds for 10 million iterations, or about 45ns per operation.
- The worst case is ~1.59 seconds for 10 million iterations, or about 159ns per operation.
Note that your results may vary quite drastically depending on the lengths (and number) of the strings you're concatenating, and the hardware you're running on.
Related Topics
Python: Give Start and End of Week Data from a Given Date
How to Read Class Attributes in the Same Order as Declared
How Do Threads Work in Python, and What Are Common Python-Threading Specific Pitfalls
Complete Scan of Dynamodb with Boto3
Destructuring-Bind Dictionary Contents
Typeerror: Got Multiple Values for Argument
How to Set Opacity of Background Colour of Graph with Matplotlib
Cartesian Product of a Dictionary of Lists
How to Fix Selenium Webdriverexception: the Browser Appears to Have Exited Before We Could Connect
Django: Adding "Nulls Last" to Query
In Python, Why Is List[] Automatically Global
How to Pipe Input to Python Line by Line from Linux Program
How to Get a Gcp Bearer Token Programmatically with Python