Python String 'Join' Is Faster () Than '+', But What's Wrong Here

Python string 'join' is faster (?) than '+', but what's wrong here?

I have figured out the answer from the answers posted here by experts. Python string concatenation (and timing measurements) depends on these (as far as I've seen):

  • Number of concatenations
  • Average length of strings
  • Number of function callings

I have built a new code that relates these. Thanks to Peter S Magnusson, sepp2k, hughdbrown, David Wolever and others for indicating important points I had missed earlier. Also, in this code I might have missed something. So, I highly appreciate any replies pointing our errors, suggestions, criticisms etc. After all, I am here for learning. Here is my new code:

from timeit import timeit

noc = 100
tocat = "a"
def f_call():
pass

def loop_only():
for i in range(noc):
pass

def concat_method():
s = ''
for i in range(noc):
s = s + tocat

def list_append():
s=[]
for i in range(noc):
s.append(tocat)
''.join(s)

def list_append_opt():
s = []
zap = s.append
for i in range(noc):
zap(tocat)
''.join(s)

def list_comp():
''.join(tocat for i in range(noc))

def concat_method_buildup():
s=''

def list_append_buildup():
s=[]

def list_append_opt_buildup():
s=[]
zap = s.append

def function_time(f):
return timeit(f,number=1000)*1000

f_callt = function_time(f_call)

def measure(ftuple,n,tc):
global noc,tocat
noc = n
tocat = tc
loopt = function_time(loop_only) - f_callt
buildup_time = function_time(ftuple[1]) -f_callt if ftuple[1] else 0
total_time = function_time(ftuple[0])
return total_time, total_time - f_callt - buildup_time - loopt*ftuple[2]

functions ={'Concat Method\t\t':(concat_method,concat_method_buildup,True),
'List append\t\t\t':(list_append,list_append_buildup,True),
'Optimized list append':(list_append_opt,list_append_opt_buildup,True),
'List comp\t\t\t':(list_comp,0,False)}

for i in range(5):
print("\n\n%d concatenation\t\t\t\t10'a'\t\t\t\t 100'a'\t\t\t1000'a'"%10**i)
print('-'*80)
for (f,ft) in functions.items():
print(f,"\t|",end="\t")
for j in range(3):
t = measure(ft,10**i,'a'*10**j)
print("%.3f %.3f |" % t,end="\t")
print()

And here is what I have got. [In the time column two times (scaled) are shown: first one is the total function execution time, and the second time is the actual(?) concatenation time. I have deducted the function calling time, function buildup time(initialization time), and iteration time. Here I am considering a case where it can't be done without loop (say more statement inside).]

1 concatenation                 1'a'                  10'a'               100'a'
------------------- ---------------------- ------------------- ----------------
List comp | 2.310 2.168 | 2.298 2.156 | 2.304 2.162
Optimized list append | 1.069 0.439 | 1.098 0.456 | 1.071 0.413
Concat Method | 0.552 0.034 | 0.541 0.025 | 0.565 0.048
List append | 1.099 0.557 | 1.099 0.552 | 1.094 0.552

10 concatenations 1'a' 10'a' 100'a'
------------------- ---------------------- ------------------- ----------------
List comp | 3.366 3.224 | 3.473 3.331 | 4.058 3.916
Optimized list append | 2.778 2.003 | 2.956 2.186 | 3.417 2.639
Concat Method | 1.602 0.943 | 1.910 1.259 | 3.381 2.724
List append | 3.290 2.612 | 3.378 2.699 | 3.959 3.282

100 concatenations 1'a' 10'a' 100'a'
------------------- ---------------------- ------------------- ----------------
List comp | 15.900 15.758 | 17.086 16.944 | 20.260 20.118
Optimized list append | 15.178 12.585 | 16.203 13.527 | 19.336 16.703
Concat Method | 10.937 8.482 | 25.731 23.263 | 29.390 26.934
List append | 20.515 18.031 | 21.599 19.115 | 24.487 22.003

1000 concatenations 1'a' 10'a' 100'a'
------------------- ---------------------- ------------------- ----------------
List comp | 134.507 134.365 | 143.913 143.771 | 201.062 200.920
Optimized list append | 112.018 77.525 | 121.487 87.419 | 151.063 117.059
Concat Method | 214.329 180.093 | 290.380 256.515 | 324.572 290.720
List append | 167.625 133.619 | 176.241 142.267 | 205.259 171.313

10000 concatenations 1'a' 10'a' 100'a'
------------------- ---------------------- ------------------- ----------------
List comp | 1309.702 1309.560 | 1404.191 1404.049 | 2912.483 2912.341
Optimized list append | 1042.271 668.696 | 1134.404 761.036 | 2628.882 2255.804
Concat Method | 2310.204 1941.096 | 2923.805 2550.803 | STUCK STUCK
List append | 1624.795 1251.589 | 1717.501 1345.137 | 3182.347 2809.233

To sum up all these I have made this decisions for me:

  1. If you have a string list available,
    string 'join' method is best and
    fastest.
  2. If you can use list comprehension,
    that's the easiest and fast as well.
  3. If you need 1 to 10 concatenation
    (average) with length 1 to 100, list
    append, '+' both takes same (almost, note that times are scaled) time.
  4. Optimized list append seems very
    good in most situation.
  5. When #concatenation or string length rises, '+' starts to take significantly more
    and more time. Note that, for 10000 concatenations with 100'a' my PC is stuck!
  6. If you use list append and 'join'
    always, you are safe all time
    (pointed by Alex
    Martelli).
  7. But in some situation say, where you
    need to take user input and print
    'Hello user's world!', it is simplest to use '+'. I think building a list
    and join for this case like x = input("Enter user name:") and then x.join(["Hello ","'s world!"]) is uglier than "Hello %s's world!"%x or "Hello " +x+ "'s world"
  8. Python 3.1 has improved
    concatenation performance. But, in
    some implementation
    like Jython, '+' is less efficient.
  9. Premature optimization is the root
    of all evil (experts' saying). Most
    of the time
    you do not need optimization. So, don't waste time in aspiration
    for optimization
    (unless you are writing a big or computational project where every
    micro/milli second
    counts.
  10. Use these information and write in
    whatever way you like taking
    circumstances under
    consideration.
  11. If you really need optimization ,
    use a profiler, find the
    bottlenecks and try to
    optimize those.

Finally, I am trying to learn python more deeply. So, it is not unusual that there will be mistakes (error) in my observations. So, comment on this and suggest me if I am taking a wrong route. Thanks to all for participating.

Why is ''.join() faster than += in Python?

Let's say you have this code to build up a string from three strings:

x = 'foo'
x += 'bar' # 'foobar'
x += 'baz' # 'foobarbaz'

In this case, Python first needs to allocate and create 'foobar' before it can allocate and create 'foobarbaz'.

So for each += that gets called, the entire contents of the string and whatever is getting added to it need to be copied into an entirely new memory buffer. In other words, if you have N strings to be joined, you need to allocate approximately N temporary strings and the first substring gets copied ~N times. The last substring only gets copied once, but on average, each substring gets copied ~N/2 times.

With .join, Python can play a number of tricks since the intermediate strings do not need to be created. CPython figures out how much memory it needs up front and then allocates a correctly-sized buffer. Finally, it then copies each piece into the new buffer which means that each piece is only copied once.


There are other viable approaches which could lead to better performance for += in some cases. E.g. if the internal string representation is actually a rope or if the runtime is actually smart enough to somehow figure out that the temporary strings are of no use to the program and optimize them away.

However, CPython certainly does not do these optimizations reliably (though it may for a few corner cases) and since it is the most common implementation in use, many best-practices are based on what works well for CPython. Having a standardized set of norms also makes it easier for other implementations to focus their optimization efforts as well.

Python string join performance

It is true you should not use '+'. Your example is quite special. Try the same code with:

s1 = '*' * 100000
s2 = '+' * 100000

Then the second version (str.join) is much faster.

How slow is Python's string concatenation vs. str.join?

From: Efficient String Concatenation

Method 1:

def method1():
out_str = ''
for num in xrange(loop_count):
out_str += 'num'
return out_str

Method 4:

def method4():
str_list = []
for num in xrange(loop_count):
str_list.append('num')
return ''.join(str_list)

Now I realise they are not strictly representative, and the 4th method appends to a list before iterating through and joining each item, but it's a fair indication.

String join is significantly faster then concatenation.

Why? Strings are immutable and can't be changed in place. To alter one, a new representation needs to be created (a concatenation of the two).

alt text

Is list join really faster than string concatenation in python?

From Efficient String Concatenation in Python

Method 1 : 'a' + 'b' + 'c'

Method 6 : a = ''.join(['a', 'b', 'c'])

20,000 integers were concatenated into a string 86kb long :

pic

                Concatenations per second     Process size (kB)
Method 1 3770 2424
Method 6 119,800 3000

Conclusion : YES, str.join() is significantly faster then typical concatenation (str1+str2).

Why is concatenating strings running faster than joining them?

The time difference you're seeing comes from creating the list to be passed to join. And while you can get a small speedup from using a tuple instead, it's still going to be slower than just concatenating with + when there are only a few short strings.

It would be different if you had an iterable of strings to start with, rather than an object with strings as attributes. Then you could call join directly on the iterable, rather than needing to build a new one for each call.

Here's some testing I did with the timeit module:

import timeit

short_strings = ["foo", "bar", "baz"]
long_strings = [s*1000 for s in short_strings]

def concat(a, b, c):
return a + b + c

def concat_from_list(lst):
return lst[0] + lst[1] + lst[2]

def join(a, b, c):
return "".join([a, b, c])

def join_tuple(a, b, c):
return "".join((a, b, c))

def join_from_list(lst):
return "".join(lst)

def test():
print("Short strings")
print("{:20}{}".format("concat:",
timeit.timeit(lambda: concat(*short_strings))))
print("{:20}{}".format("concat_from_list:",
timeit.timeit(lambda: concat_from_list(short_strings))))
print("{:20}{}".format("join:",
timeit.timeit(lambda: join(*short_strings))))
print("{:20}{}".format("join_tuple:",
timeit.timeit(lambda: join_tuple(*short_strings))))
print("{:20}{}\n".format("join_from_list:",
timeit.timeit(lambda: join_from_list(short_strings))))
print("Long Strings")
print("{:20}{}".format("concat:",
timeit.timeit(lambda: concat(*long_strings))))
print("{:20}{}".format("concat_from_list:",
timeit.timeit(lambda: concat_from_list(long_strings))))
print("{:20}{}".format("join:",
timeit.timeit(lambda: join(*long_strings))))
print("{:20}{}".format("join_tuple:",
timeit.timeit(lambda: join_tuple(*long_strings))))
print("{:20}{}".format("join_from_list:",
timeit.timeit(lambda: join_from_list(long_strings))))

Output:

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>>
>>> test()
Short strings
concat: 0.5453461176251436
concat_from_list: 0.5185697357936024
join: 0.7099379456477868
join_tuple: 0.5900842397209949
join_from_list: 0.4177281794285359

Long Strings
concat: 2.002303591571888
concat_from_list: 1.8898819841869416
join: 1.5672863477837913
join_tuple: 1.4343144915087596
join_from_list: 1.231374639083505

So, joining from an already existing list is always fastest. Concatenating with + is faster for individual items if they are short, but for long strings it is always slowest. I suspect the difference shown between concat and concat_from_list comes from the unpacking of the lists in the function call in the test code.

Why is 'join' faster than normal concatenation?

The code in a join function knows upfront all the strings it’s being asked to concatenate and how large those strings are, and hence it can calculate the final string length before beginning the operation.

Hence it needs only allocate memory for the final string once and then it can place each source string (and delimiter) in the correct place in memory.

On the other hand, a single += operation on a string has no choice but to simply allocate enough memory for the final string which is the concatenation of just two strings. Subsequent +='s must do the same, each allocating memory which on the next += will be discarded. Each time the evergrowing string is copied from one place in memory to another.

What is the most efficient string concatenation method in Python?

If you know all components beforehand once, use the literal string interpolation, also known as f-strings or formatted strings, introduced in Python 3.6.

Given the test case from mkoistinen's answer, having strings

domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'

The contenders and their execution time on my computer using Python 3.6 on Linux as timed by IPython and the timeit module are

  • f'http://{domain}/{lang}/{path}' - 0.151 µs

  • 'http://%s/%s/%s' % (domain, lang, path) - 0.321 µs

  • 'http://' + domain + '/' + lang + '/' + path - 0.356 µs

  • ''.join(('http://', domain, '/', lang, '/', path)) - 0.249 µs (notice that building a constant-length tuple is slightly faster than building a constant-length list).

Thus the shortest and the most beautiful code possible is also fastest.


The speed can be contrasted with the fastest method for Python 2, which is + concatenation on my computer; and that takes 0.203 µs with 8-bit strings, and 0.259 µs if the strings are all Unicode.

(In alpha versions of Python 3.6 the implementation of f'' strings was the slowest possible - actually the generated byte code is pretty much equivalent to the ''.join() case with unnecessary calls to str.__format__ which without arguments would just return self unchanged. These inefficiencies were addressed before 3.6 final.)



Related Topics



Leave a reply



Submit