Most Optimized Way of Concatenation in Strings

Most optimized way of concatenation in strings

Here is a small test suite:

#include <iostream>
#include <string>
#include <chrono>
#include <sstream>

int main ()
{
typedef std::chrono::high_resolution_clock clock;
typedef std::chrono::duration<float, std::milli> mil;
std::string l_czTempStr;
std::string s1="Test data1";
auto t0 = clock::now();
#if VER==1
for (int i = 0; i < 100000; ++i)
{
l_czTempStr = s1 + "Test data2" + "Test data3";
}
#elif VER==2
for (int i = 0; i < 100000; ++i)
{
l_czTempStr = "Test data1";
l_czTempStr += "Test data2";
l_czTempStr += "Test data3";
}
#elif VER==3
for (int i = 0; i < 100000; ++i)
{
l_czTempStr = "Test data1";
l_czTempStr.append("Test data2");
l_czTempStr.append("Test data3");
}
#elif VER==4
for (int i = 0; i < 100000; ++i)
{
std::ostringstream oss;
oss << "Test data1";
oss << "Test data2";
oss << "Test data3";
l_czTempStr = oss.str();
}
#endif
auto t1 = clock::now();
std::cout << l_czTempStr << '\n';
std::cout << mil(t1-t0).count() << "ms\n";
}

On coliru:

Compile with the following:

clang++ -std=c++11 -O3 -DVER=1 -Wall -pedantic -pthread main.cpp

21.6463ms

-DVER=2

6.61773ms

-DVER=3

6.7855ms

-DVER=4

102.015ms

It looks like 2), += is the winner.

(Also compiling with and without -pthread seems to affect the timings)

Most efficient way to concatenate Strings

String.concat is faster than the + operator if you are concatenating two strings... Although this can be fixed at any time and may even have been fixed in java 8 as far as I know.

The thing you missed in the first post you referenced is that the author is concatenating exactly two strings, and the fast methods are the ones where the size of the new character array is calculated in advance as str1.length() + str2.length(), so the underlying character array only needs to be allocated once.

Using StringBuilder() without specifying the final size, which is also how + works internally, will often need to do more allocations and copying of the underlying array.

If you need to concatenate a bunch of strings together, then you should use a StringBuilder. If it's practical, then precompute the final size so that the underlying array only needs to be allocated once.

What is the most efficient string concatenation method in Python?

If you know all components beforehand once, use the literal string interpolation, also known as f-strings or formatted strings, introduced in Python 3.6.

Given the test case from mkoistinen's answer, having strings

domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'

The contenders and their execution time on my computer using Python 3.6 on Linux as timed by IPython and the timeit module are

  • f'http://{domain}/{lang}/{path}' - 0.151 µs

  • 'http://%s/%s/%s' % (domain, lang, path) - 0.321 µs

  • 'http://' + domain + '/' + lang + '/' + path - 0.356 µs

  • ''.join(('http://', domain, '/', lang, '/', path)) - 0.249 µs (notice that building a constant-length tuple is slightly faster than building a constant-length list).

Thus the shortest and the most beautiful code possible is also fastest.


The speed can be contrasted with the fastest method for Python 2, which is + concatenation on my computer; and that takes 0.203 µs with 8-bit strings, and 0.259 µs if the strings are all Unicode.

(In alpha versions of Python 3.6 the implementation of f'' strings was the slowest possible - actually the generated byte code is pretty much equivalent to the ''.join() case with unnecessary calls to str.__format__ which without arguments would just return self unchanged. These inefficiencies were addressed before 3.6 final.)

Most Efficient Method to Concatenate Strings in Python

Why don't you try it out? You can use timeit.timeit() to run a statement many times and return the overall duration.

Here, we use s to setup the variables a and b (not included in the overall time), and then run the various options 10 million times.

>>> from timeit import timeit
>>>
>>> n = 10 * 1000 * 1000
>>> s = "a = 'start'; b = ' end'"
>>>
>>> timeit("c = a + b", setup=s, number=n)
0.4452877212315798
>>>
>>> timeit("c = f'{a}{b}'", setup=s, number=n)
0.5252049304544926
>>>
>>> timeit("c = '%s%s'.format(a, b)", setup=s, number=n)
0.6849184390157461
>>>>
>>> timeit("c = ''.join((a, b))", setup=s, number=n)
0.8546998891979456
>>>
>>> timeit("c = '%s%s' % (a, b)", setup=s, number=n)
1.1699129864573479
>>>
>>> timeit("c = '{0}{1}'.format(a, b)", setup=s, number=n)
1.5954962372779846

This shows that unless your application's bottleneck is string concatenation, it's probably not worth being too concerned about...

  • The best case is ~0.45 seconds for 10 million iterations, or about 45ns per operation.
  • The worst case is ~1.59 seconds for 10 million iterations, or about 159ns per operation.

If you're performing literally millions of operations, you'll see a speed improvement of about 1 second.

Note that your results may vary quite drastically depending on the lengths (and number) of the strings you're concatenating, and the hardware you're running on.

Most efficient way to concatenate strings?

The StringBuilder.Append() method is much better than using the + operator. But I've found that, when executing 1000 concatenations or less, String.Join() is even more efficient than StringBuilder.

StringBuilder sb = new StringBuilder();
sb.Append(someString);

The only problem with String.Join is that you have to concatenate the strings with a common delimiter.

Edit: as @ryanversaw pointed out, you can make the delimiter string.Empty.

string key = String.Join("_", new String[] 
{ "Customers_Contacts", customerID, database, SessionID });

Which is the preferred way to concatenate a string in Python?

The best way of appending a string to a string variable is to use + or +=. This is because it's readable and fast. They are also just as fast, which one you choose is a matter of taste, the latter one is the most common. Here are timings with the timeit module:

a = a + b:
0.11338996887207031
a += b:
0.11040496826171875

However, those who recommend having lists and appending to them and then joining those lists, do so because appending a string to a list is presumably very fast compared to extending a string. And this can be true, in some cases. Here, for example, is one
million appends of a one-character string, first to a string, then to a list:

a += b:
0.10780501365661621
a.append(b):
0.1123361587524414

OK, turns out that even when the resulting string is a million characters long, appending was still faster.

Now let's try with appending a thousand character long string a hundred thousand times:

a += b:
0.41823482513427734
a.append(b):
0.010656118392944336

The end string, therefore, ends up being about 100MB long. That was pretty slow, appending to a list was much faster. That that timing doesn't include the final a.join(). So how long would that take?

a.join(a):
0.43739795684814453

Oups. Turns out even in this case, append/join is slower.

So where does this recommendation come from? Python 2?

a += b:
0.165287017822
a.append(b):
0.0132720470428
a.join(a):
0.114929914474

Well, append/join is marginally faster there if you are using extremely long strings (which you usually aren't, what would you have a string that's 100MB in memory?)

But the real clincher is Python 2.3. Where I won't even show you the timings, because it's so slow that it hasn't finished yet. These tests suddenly take minutes. Except for the append/join, which is just as fast as under later Pythons.

Yup. String concatenation was very slow in Python back in the stone age. But on 2.4 it isn't anymore (or at least Python 2.4.7), so the recommendation to use append/join became outdated in 2008, when Python 2.3 stopped being updated, and you should have stopped using it. :-)

(Update: Turns out when I did the testing more carefully that using + and += is faster for two strings on Python 2.3 as well. The recommendation to use ''.join() must be a misunderstanding)

However, this is CPython. Other implementations may have other concerns. And this is just yet another reason why premature optimization is the root of all evil. Don't use a technique that's supposed "faster" unless you first measure it.

Therefore the "best" version to do string concatenation is to use + or +=. And if that turns out to be slow for you, which is pretty unlikely, then do something else.

So why do I use a lot of append/join in my code? Because sometimes it's actually clearer. Especially when whatever you should concatenate together should be separated by spaces or commas or newlines.

C: What is the best and fastest way to concatenate strings

Joel Spolsky, in his Back to Basics article, describes the problem of inefficient string concatenation with strcat as the Shlemiel the painter's algorithm (read the article, it's quite good). As an example of inefficient code, he gives this example, which runs in O(n2) time:

char bigString[1000];     /* I never know how much to allocate... */
bigString[0] = '\0';
strcat(bigString,"John, ");
strcat(bigString,"Paul, ");
strcat(bigString,"George, ");
strcat(bigString,"Joel ");

It's not really a problem to walk over the first string the first time; since we've already got to walk over the second string, the runtime of one strcat is linear in the length of the result. Multiple strcats is problematic though, because we walk over the previously concatenated results again and again. He provides this alternative:

How do we fix this? A few smart C programmers implemented their own
mystrcat as follows:

char* mystrcat( char* dest, char* src )
{
while (*dest) dest++;
while (*dest++ = *src++);
return --dest;
}

What have we done here? At very little extra cost we're returning a
pointer to the end of the new, longer string. That way the code that
calls this function can decide to append further without rescanning
the string:

char bigString[1000];     /* I never know how much to allocate... */
char *p = bigString;
bigString[0] = '\0';
p = mystrcat(p,"John, ");
p = mystrcat(p,"Paul, ");
p = mystrcat(p,"George, ");
p = mystrcat(p,"Joel ");

This is, of course, linear in performance, not n-squared, so it
doesn't suffer from degradation when you have a lot of stuff to
concatenate.

Of course, this is what you can do if you want to use standard C strings. The alternative that you're describing of caching the length of the string and using a special concatenation function (e.g., calling strcat with slightly different arguments) is sort of a variation on Pascal strings, which Joel also mentioned:

The designers of Pascal were aware of this problem and "fixed" it by
storing a byte count in the first byte of the string. These are called
Pascal Strings. They can contain zeros and are not null terminated.
Because a byte can only store numbers between 0 and 255, Pascal
strings are limited to 255 bytes in length, but because they are not
null terminated they occupy the same amount of memory as ASCIZ
strings. The great thing about Pascal strings is that you never have
to have a loop just to figure out the length of your string. Finding
the length of a string in Pascal is one assembly instruction instead
of a whole loop. It is monumentally faster.

For a long time, if you wanted to put a Pascal string literal in your
C code, you had to write:

char* str = "\006Hello!";

Yep, you had to count the bytes by hand, yourself, and hardcode it
into the first byte of your string. Lazy programmers would do this,
and have slow programs:

char* str = "*Hello!";
str[0] = strlen(str) - 1;

Most efficient way to concatenate strings in JavaScript?

Seems based on benchmarks at JSPerf that using += is the fastest method, though not necessarily in every browser.

For building strings in the DOM, it seems to be better to concatenate the string first and then add to the DOM, rather then iteratively add it to the dom. You should benchmark your own case though.

(Thanks @zAlbee for correction)



Related Topics



Leave a reply



Submit