How Slow Is Python's String Concatenation VS. Str.Join

How slow is Python's string concatenation vs. str.join?

From: Efficient String Concatenation

Method 1:

def method1():
out_str = ''
for num in xrange(loop_count):
out_str += 'num'
return out_str

Method 4:

def method4():
str_list = []
for num in xrange(loop_count):
str_list.append('num')
return ''.join(str_list)

Now I realise they are not strictly representative, and the 4th method appends to a list before iterating through and joining each item, but it's a fair indication.

String join is significantly faster then concatenation.

Why? Strings are immutable and can't be changed in place. To alter one, a new representation needs to be created (a concatenation of the two).

alt text

Python string 'join' is faster (?) than '+', but what's wrong here?

I have figured out the answer from the answers posted here by experts. Python string concatenation (and timing measurements) depends on these (as far as I've seen):

  • Number of concatenations
  • Average length of strings
  • Number of function callings

I have built a new code that relates these. Thanks to Peter S Magnusson, sepp2k, hughdbrown, David Wolever and others for indicating important points I had missed earlier. Also, in this code I might have missed something. So, I highly appreciate any replies pointing our errors, suggestions, criticisms etc. After all, I am here for learning. Here is my new code:

from timeit import timeit

noc = 100
tocat = "a"
def f_call():
pass

def loop_only():
for i in range(noc):
pass

def concat_method():
s = ''
for i in range(noc):
s = s + tocat

def list_append():
s=[]
for i in range(noc):
s.append(tocat)
''.join(s)

def list_append_opt():
s = []
zap = s.append
for i in range(noc):
zap(tocat)
''.join(s)

def list_comp():
''.join(tocat for i in range(noc))

def concat_method_buildup():
s=''

def list_append_buildup():
s=[]

def list_append_opt_buildup():
s=[]
zap = s.append

def function_time(f):
return timeit(f,number=1000)*1000

f_callt = function_time(f_call)

def measure(ftuple,n,tc):
global noc,tocat
noc = n
tocat = tc
loopt = function_time(loop_only) - f_callt
buildup_time = function_time(ftuple[1]) -f_callt if ftuple[1] else 0
total_time = function_time(ftuple[0])
return total_time, total_time - f_callt - buildup_time - loopt*ftuple[2]

functions ={'Concat Method\t\t':(concat_method,concat_method_buildup,True),
'List append\t\t\t':(list_append,list_append_buildup,True),
'Optimized list append':(list_append_opt,list_append_opt_buildup,True),
'List comp\t\t\t':(list_comp,0,False)}

for i in range(5):
print("\n\n%d concatenation\t\t\t\t10'a'\t\t\t\t 100'a'\t\t\t1000'a'"%10**i)
print('-'*80)
for (f,ft) in functions.items():
print(f,"\t|",end="\t")
for j in range(3):
t = measure(ft,10**i,'a'*10**j)
print("%.3f %.3f |" % t,end="\t")
print()

And here is what I have got. [In the time column two times (scaled) are shown: first one is the total function execution time, and the second time is the actual(?) concatenation time. I have deducted the function calling time, function buildup time(initialization time), and iteration time. Here I am considering a case where it can't be done without loop (say more statement inside).]

1 concatenation                 1'a'                  10'a'               100'a'
------------------- ---------------------- ------------------- ----------------
List comp | 2.310 2.168 | 2.298 2.156 | 2.304 2.162
Optimized list append | 1.069 0.439 | 1.098 0.456 | 1.071 0.413
Concat Method | 0.552 0.034 | 0.541 0.025 | 0.565 0.048
List append | 1.099 0.557 | 1.099 0.552 | 1.094 0.552

10 concatenations 1'a' 10'a' 100'a'
------------------- ---------------------- ------------------- ----------------
List comp | 3.366 3.224 | 3.473 3.331 | 4.058 3.916
Optimized list append | 2.778 2.003 | 2.956 2.186 | 3.417 2.639
Concat Method | 1.602 0.943 | 1.910 1.259 | 3.381 2.724
List append | 3.290 2.612 | 3.378 2.699 | 3.959 3.282

100 concatenations 1'a' 10'a' 100'a'
------------------- ---------------------- ------------------- ----------------
List comp | 15.900 15.758 | 17.086 16.944 | 20.260 20.118
Optimized list append | 15.178 12.585 | 16.203 13.527 | 19.336 16.703
Concat Method | 10.937 8.482 | 25.731 23.263 | 29.390 26.934
List append | 20.515 18.031 | 21.599 19.115 | 24.487 22.003

1000 concatenations 1'a' 10'a' 100'a'
------------------- ---------------------- ------------------- ----------------
List comp | 134.507 134.365 | 143.913 143.771 | 201.062 200.920
Optimized list append | 112.018 77.525 | 121.487 87.419 | 151.063 117.059
Concat Method | 214.329 180.093 | 290.380 256.515 | 324.572 290.720
List append | 167.625 133.619 | 176.241 142.267 | 205.259 171.313

10000 concatenations 1'a' 10'a' 100'a'
------------------- ---------------------- ------------------- ----------------
List comp | 1309.702 1309.560 | 1404.191 1404.049 | 2912.483 2912.341
Optimized list append | 1042.271 668.696 | 1134.404 761.036 | 2628.882 2255.804
Concat Method | 2310.204 1941.096 | 2923.805 2550.803 | STUCK STUCK
List append | 1624.795 1251.589 | 1717.501 1345.137 | 3182.347 2809.233

To sum up all these I have made this decisions for me:

  1. If you have a string list available,
    string 'join' method is best and
    fastest.
  2. If you can use list comprehension,
    that's the easiest and fast as well.
  3. If you need 1 to 10 concatenation
    (average) with length 1 to 100, list
    append, '+' both takes same (almost, note that times are scaled) time.
  4. Optimized list append seems very
    good in most situation.
  5. When #concatenation or string length rises, '+' starts to take significantly more
    and more time. Note that, for 10000 concatenations with 100'a' my PC is stuck!
  6. If you use list append and 'join'
    always, you are safe all time
    (pointed by Alex
    Martelli).
  7. But in some situation say, where you
    need to take user input and print
    'Hello user's world!', it is simplest to use '+'. I think building a list
    and join for this case like x = input("Enter user name:") and then x.join(["Hello ","'s world!"]) is uglier than "Hello %s's world!"%x or "Hello " +x+ "'s world"
  8. Python 3.1 has improved
    concatenation performance. But, in
    some implementation
    like Jython, '+' is less efficient.
  9. Premature optimization is the root
    of all evil (experts' saying). Most
    of the time
    you do not need optimization. So, don't waste time in aspiration
    for optimization
    (unless you are writing a big or computational project where every
    micro/milli second
    counts.
  10. Use these information and write in
    whatever way you like taking
    circumstances under
    consideration.
  11. If you really need optimization ,
    use a profiler, find the
    bottlenecks and try to
    optimize those.

Finally, I am trying to learn python more deeply. So, it is not unusual that there will be mistakes (error) in my observations. So, comment on this and suggest me if I am taking a wrong route. Thanks to all for participating.

Is list join really faster than string concatenation in python?

From Efficient String Concatenation in Python

Method 1 : 'a' + 'b' + 'c'

Method 6 : a = ''.join(['a', 'b', 'c'])

20,000 integers were concatenated into a string 86kb long :

pic

                Concatenations per second     Process size (kB)
Method 1 3770 2424
Method 6 119,800 3000

Conclusion : YES, str.join() is significantly faster then typical concatenation (str1+str2).

Why is ''.join() faster than += in Python?

Let's say you have this code to build up a string from three strings:

x = 'foo'
x += 'bar' # 'foobar'
x += 'baz' # 'foobarbaz'

In this case, Python first needs to allocate and create 'foobar' before it can allocate and create 'foobarbaz'.

So for each += that gets called, the entire contents of the string and whatever is getting added to it need to be copied into an entirely new memory buffer. In other words, if you have N strings to be joined, you need to allocate approximately N temporary strings and the first substring gets copied ~N times. The last substring only gets copied once, but on average, each substring gets copied ~N/2 times.

With .join, Python can play a number of tricks since the intermediate strings do not need to be created. CPython figures out how much memory it needs up front and then allocates a correctly-sized buffer. Finally, it then copies each piece into the new buffer which means that each piece is only copied once.


There are other viable approaches which could lead to better performance for += in some cases. E.g. if the internal string representation is actually a rope or if the runtime is actually smart enough to somehow figure out that the temporary strings are of no use to the program and optimize them away.

However, CPython certainly does not do these optimizations reliably (though it may for a few corner cases) and since it is the most common implementation in use, many best-practices are based on what works well for CPython. Having a standardized set of norms also makes it easier for other implementations to focus their optimization efforts as well.

Why is concatenating strings running faster than joining them?

The time difference you're seeing comes from creating the list to be passed to join. And while you can get a small speedup from using a tuple instead, it's still going to be slower than just concatenating with + when there are only a few short strings.

It would be different if you had an iterable of strings to start with, rather than an object with strings as attributes. Then you could call join directly on the iterable, rather than needing to build a new one for each call.

Here's some testing I did with the timeit module:

import timeit

short_strings = ["foo", "bar", "baz"]
long_strings = [s*1000 for s in short_strings]

def concat(a, b, c):
return a + b + c

def concat_from_list(lst):
return lst[0] + lst[1] + lst[2]

def join(a, b, c):
return "".join([a, b, c])

def join_tuple(a, b, c):
return "".join((a, b, c))

def join_from_list(lst):
return "".join(lst)

def test():
print("Short strings")
print("{:20}{}".format("concat:",
timeit.timeit(lambda: concat(*short_strings))))
print("{:20}{}".format("concat_from_list:",
timeit.timeit(lambda: concat_from_list(short_strings))))
print("{:20}{}".format("join:",
timeit.timeit(lambda: join(*short_strings))))
print("{:20}{}".format("join_tuple:",
timeit.timeit(lambda: join_tuple(*short_strings))))
print("{:20}{}\n".format("join_from_list:",
timeit.timeit(lambda: join_from_list(short_strings))))
print("Long Strings")
print("{:20}{}".format("concat:",
timeit.timeit(lambda: concat(*long_strings))))
print("{:20}{}".format("concat_from_list:",
timeit.timeit(lambda: concat_from_list(long_strings))))
print("{:20}{}".format("join:",
timeit.timeit(lambda: join(*long_strings))))
print("{:20}{}".format("join_tuple:",
timeit.timeit(lambda: join_tuple(*long_strings))))
print("{:20}{}".format("join_from_list:",
timeit.timeit(lambda: join_from_list(long_strings))))

Output:

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>>
>>> test()
Short strings
concat: 0.5453461176251436
concat_from_list: 0.5185697357936024
join: 0.7099379456477868
join_tuple: 0.5900842397209949
join_from_list: 0.4177281794285359

Long Strings
concat: 2.002303591571888
concat_from_list: 1.8898819841869416
join: 1.5672863477837913
join_tuple: 1.4343144915087596
join_from_list: 1.231374639083505

So, joining from an already existing list is always fastest. Concatenating with + is faster for individual items if they are short, but for long strings it is always slowest. I suspect the difference shown between concat and concat_from_list comes from the unpacking of the lists in the function call in the test code.

Why is it so slow when assigning a concatenated string to a variable in python?

In general, the Python language standard makes no guarantees here; in fact, as defined, strings are immutable and what you're doing should bite you either way, as you've written a form of Schlemiel the Painter's algorithm.

But in the first case, as an implementation detail, CPython (the reference interpreter) will help you out, and concatenate a string in place (technically violating the immutability guarantee) under some fairly specific conditions that allow it to adhere to the spirit of the immutability rules. The most important condition is that the string being concatenated must be referenced in only one place (if it wasn't, the other reference would change in place, violating the appearance of str being immutable). By assigning str2 = str1 after each concatenation, you guarantee there are two references when you concatenate, so a new str must be made by every concatenation to preserve the apparent immutability of strings. That means more memory allocation and deallocation, more (and progressively increasing) memory copies, etc.

Note that relying on this optimization is explicitly discouraged in PEP 8, the Python style guide:

  • Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).

    For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations that don't use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.

The note about "only works for some types" is important. This optimization only applies to str; in Python 2 it doesn't work on unicode (even though Python 3's str is based on the implementation of Python 2's unicode), and in Python 3 it doesn't work on bytes (which are similar to Python 2's str under the hood).

Which is the preferred way to concatenate a string in Python?

The best way of appending a string to a string variable is to use + or +=. This is because it's readable and fast. They are also just as fast, which one you choose is a matter of taste, the latter one is the most common. Here are timings with the timeit module:

a = a + b:
0.11338996887207031
a += b:
0.11040496826171875

However, those who recommend having lists and appending to them and then joining those lists, do so because appending a string to a list is presumably very fast compared to extending a string. And this can be true, in some cases. Here, for example, is one
million appends of a one-character string, first to a string, then to a list:

a += b:
0.10780501365661621
a.append(b):
0.1123361587524414

OK, turns out that even when the resulting string is a million characters long, appending was still faster.

Now let's try with appending a thousand character long string a hundred thousand times:

a += b:
0.41823482513427734
a.append(b):
0.010656118392944336

The end string, therefore, ends up being about 100MB long. That was pretty slow, appending to a list was much faster. That that timing doesn't include the final a.join(). So how long would that take?

a.join(a):
0.43739795684814453

Oups. Turns out even in this case, append/join is slower.

So where does this recommendation come from? Python 2?

a += b:
0.165287017822
a.append(b):
0.0132720470428
a.join(a):
0.114929914474

Well, append/join is marginally faster there if you are using extremely long strings (which you usually aren't, what would you have a string that's 100MB in memory?)

But the real clincher is Python 2.3. Where I won't even show you the timings, because it's so slow that it hasn't finished yet. These tests suddenly take minutes. Except for the append/join, which is just as fast as under later Pythons.

Yup. String concatenation was very slow in Python back in the stone age. But on 2.4 it isn't anymore (or at least Python 2.4.7), so the recommendation to use append/join became outdated in 2008, when Python 2.3 stopped being updated, and you should have stopped using it. :-)

(Update: Turns out when I did the testing more carefully that using + and += is faster for two strings on Python 2.3 as well. The recommendation to use ''.join() must be a misunderstanding)

However, this is CPython. Other implementations may have other concerns. And this is just yet another reason why premature optimization is the root of all evil. Don't use a technique that's supposed "faster" unless you first measure it.

Therefore the "best" version to do string concatenation is to use + or +=. And if that turns out to be slow for you, which is pretty unlikely, then do something else.

So why do I use a lot of append/join in my code? Because sometimes it's actually clearer. Especially when whatever you should concatenate together should be separated by spaces or commas or newlines.

String concatenation much faster in Python than Go

TL;DR summary: basically you're testing the two implementation's allocators / garbage collectors and heavily weighting the scale on the Python side (by chance, as it were, but this is something the Python folks optimized at some point).

To expand my comments into a real answer:

  • Both Go and Python have counted strings, i.e., strings are implemented as a two-element header thingy containing a length (byte count or, for Python 3 strings, Unicode characters count) and data pointer.

  • Both Go and Python are garbage-collected (GCed) languages. That is, in both languages, you can allocate memory without having to worry about freeing it yourself: the system takes care of that automatically.

  • But the underlying implementations differ, quite a bit in this particular one important way: the version of Python you are using has a reference counting GC. The Go system you are using does not.

With a reference count, the inner bits of the Python string handler can do this. I'll express it as Go (or at least pseudo-Go) although the actual Python implementation is in C and I have not made all the details line up properly:

// add (append) new string t to existing string s
func add_to_string(s, t string_header) string_header {
need = s.len + t.len
if s.refcount == 1 { // can modify string in-place
data = s.data
if cap(data) >= need {
copy_into(data + s.len, t.data, t.len)
return s
}
}
// s is shared or s.cap < need

new_s := make_new_string(roundup(need))
// important: new_s has extra space for the next call to add_to_string

copy_into(new_s.data, s.data, s.len)
copy_into(new_s.data + s.len, t.data, t.len)
s.refcount--
if s.refcount == 0 {
gc_release_string(s)
}
return new_s
}

By over-allocating—rounding up the need value so that cap(new_s) is large—we get about log2(n) calls to the allocator, where n is the number of times you do s += "a". With n being 1000000 (one million), that's about 20 times that we actually have to invoke the make_new_string function and release (for gc purposes because the collector uses refcounts as a first pass) the old string s.

[Edit: your source archaeology led to commit 2c9c7a5f33d, which suggests less than doubling but still a multiplicative increase. To other readers, see comment.]

The current Go implementation allocates strings without a separate capacity header field (see reflect.StringHeader and note the big caveat that says "don't depend on this, it might be different in future implementations"). Between the lack of a refcount—we can't tell in the runtime routine that adds two strings, that the target has only one reference—and the inability to observe the equivalent of cap(s) (or cap(s.data)), the Go runtime has to create a new string every time. That's one million memory allocations.

To show that the Python code really does use the refcount, take your original Python:

s = ""

for i in range(1000000):
s += "a"

and add a second variable t like this:

s = ""
t = s

for i in range(1000000):
s += "a"
t = s

The difference in execution time is impressive:

$ time python test2.py
0.68 real 0.65 user 0.03 sys
$ time python test3.py
34.60 real 34.08 user 0.51 sys

The modified Python program still beats Go (1.13.5) on this same system:

$ time ./test2
67.32 real 103.27 user 13.60 sys

and I have not poked any further into the details, but I suspect the Go GC is running more aggressively than the Python one. The Go GC is very different internally, requiring write barriers and occasional "stop the world" behavior (of all goroutines that are not doing the GC work). The refcounting nature of the Python GC allows it to never stop: even with a refcount of 2, the refcount on t drops to 1 and then next assignment to t drops it to zero, releasing the memory block for re-use in the next trip through the main loop. So it's probably picking up the same memory block over and over again.

(If my memory is correct, Python's "over-allocate strings and check the refcount to allow expand-in-place" trick was not in all versions of Python. It may have first been added around Python 2.4 or so. This memory is extremely vague and a quick Google search did not turn up any evidence one way or the other. [Edit: Python 2.7.4, apparently.])



Related Topics



Leave a reply



Submit