Any Reason Not to Use '+' to Concatenate Two Strings

Any reason not to use '+' to concatenate two strings?

There is nothing wrong in concatenating two strings with +. Indeed it's easier to read than ''.join([a, b]).

You are right though that concatenating more than 2 strings with + is an O(n^2) operation (compared to O(n) for join) and thus becomes inefficient. However this has not to do with using a loop. Even a + b + c + ... is O(n^2), the reason being that each concatenation produces a new string.

CPython2.4 and above try to mitigate that, but it's still advisable to use join when concatenating more than 2 strings.

Is it always a bad idea to use + to concatenate strings

I would use + if you are manually concatenating,

String word = "Hello";
word += " World!";

However, if you are iterating and concatenating I would suggest StringBuilder,

StringBuilder sb = new StringBuilder();
for (My my : myList) {
sb.append(my.getX());
}

Which is the preferred way to concatenate a string in Python?

The best way of appending a string to a string variable is to use + or +=. This is because it's readable and fast. They are also just as fast, which one you choose is a matter of taste, the latter one is the most common. Here are timings with the timeit module:

a = a + b:
0.11338996887207031
a += b:
0.11040496826171875

However, those who recommend having lists and appending to them and then joining those lists, do so because appending a string to a list is presumably very fast compared to extending a string. And this can be true, in some cases. Here, for example, is one
million appends of a one-character string, first to a string, then to a list:

a += b:
0.10780501365661621
a.append(b):
0.1123361587524414

OK, turns out that even when the resulting string is a million characters long, appending was still faster.

Now let's try with appending a thousand character long string a hundred thousand times:

a += b:
0.41823482513427734
a.append(b):
0.010656118392944336

The end string, therefore, ends up being about 100MB long. That was pretty slow, appending to a list was much faster. That that timing doesn't include the final a.join(). So how long would that take?

a.join(a):
0.43739795684814453

Oups. Turns out even in this case, append/join is slower.

So where does this recommendation come from? Python 2?

a += b:
0.165287017822
a.append(b):
0.0132720470428
a.join(a):
0.114929914474

Well, append/join is marginally faster there if you are using extremely long strings (which you usually aren't, what would you have a string that's 100MB in memory?)

But the real clincher is Python 2.3. Where I won't even show you the timings, because it's so slow that it hasn't finished yet. These tests suddenly take minutes. Except for the append/join, which is just as fast as under later Pythons.

Yup. String concatenation was very slow in Python back in the stone age. But on 2.4 it isn't anymore (or at least Python 2.4.7), so the recommendation to use append/join became outdated in 2008, when Python 2.3 stopped being updated, and you should have stopped using it. :-)

(Update: Turns out when I did the testing more carefully that using + and += is faster for two strings on Python 2.3 as well. The recommendation to use ''.join() must be a misunderstanding)

However, this is CPython. Other implementations may have other concerns. And this is just yet another reason why premature optimization is the root of all evil. Don't use a technique that's supposed "faster" unless you first measure it.

Therefore the "best" version to do string concatenation is to use + or +=. And if that turns out to be slow for you, which is pretty unlikely, then do something else.

So why do I use a lot of append/join in my code? Because sometimes it's actually clearer. Especially when whatever you should concatenate together should be separated by spaces or commas or newlines.

Why does this not concatenate two strings?

Your code fails for multiple reasons:

  • you use a temporary array to make a copy of the source string: this array tmp has a fixed length of 10 bytes, which is too small if the source string is longer than 10 bytes. Otherwise you will have undefined behavior when you write beyond the end of this array.
  • there is really no need for this temporary array anyway.
  • the final loop stops at lengths - 1, hence you stop before the last byte of the copy.
  • you copy all bytes to the same position dest[lengthd + 1].
  • you finally set the null terminator at the same position again.
  • you never changed the null terminator at dest[lengthd] so the function appears to have no effect on dest.
  • the tests in main() cannot produce the output you posted, probably because of a typo in "%s\\n".
  • avoid using identifiers starting with an _.

Here is a modified version:

#include <stdio.h>
#include <string.h>

char *my_strcat(char *dest, char *src) {
int i = 0;
int k = 0;

/* find the offset of the null terminator in dest */
while (dest[i] != '\0') {
i++;
}
/* copy the bytes from the src string there */
while (src[k] != '\0') {
dest[i] = src[k];
i++;
k++;
}
/* set the null terminator */
dest[i] = '\0';
/* return the pointer to the destination array */
return dest;
}

int main(void) {
char s1[98] = "Hello ";
char s2[] = "World!";
char *ptr;
printf("%s\n", s1);
printf("%s", s2);
ptr = my_strcat(s1, s2);
printf("%s", s1);
printf("%s", s2);
printf("%s", ptr);
return 0;
}

Note that the source string is not modified and the offsets should have type size_t and can be incremented as a side effect of the assignment:

char *my_strcat(char *dest, const char *src) {
size_t i = 0;
size_t k = 0;

/* find the offset of the null terminator in dest */
while (dest[i] != '\0') {
i++;
}
/* copy the bytes from the src string there */
while (src[k] != '\0') {
dest[i++] = src[k++];
}
/* set the null terminator */
dest[i] = '\0';
/* return the pointer to the destination array */
return dest;
}

You can also use pointers instead of offsets:

char *my_strcat(char *dest, const char *src) {
/* use a working pointer to preserve dest for the return value */
char *p = dest;

/* find the offset of the null terminator in dest */
while (*p != '\0') {
p++;
}
/* copy the bytes from the src string there */
while (*src != '\0') {
*p++ = *src++;
}
/* set the null terminator */
*p = '\0';
/* return the pointer to the destination array */
return dest;
}

One final change: you can combine reading the source byte, copying to the destination and testing for the null terminator, which will have been copied already:

char *my_strcat(char *dest, const char *src) {
/* use a working pointer to preserve dest for the return value */
char *p = dest;

/* find the offset of the null terminator in dest */
while (*p != '\0') {
p++;
}
/* copy the bytes from the src string there */
while ((p++ = *src++) != '\0') {
/* nothing */
}
/* the null terminator was copied from the source string */
/* return the pointer to the destination array */
return dest;
}

Reference is not created while using + operator to concat two strings

It is all in the documentation.

For String.concat, the javadoc states this:

If the length of the argument string is 0, then this String object is returned.

For the + operator, JLS 15.8.1 states:

The result of string concatenation is a reference to a String object that is the concatenation of the two operand strings. The characters of the left-hand operand precede the characters of the right-hand operand in the newly created string.

The String object is newly created (§12.5) unless the expression is a constant expression (§15.29).

As you can see, the results will be different for the case where the 2nd string has length zero and this is not a constant expression.

That is what happens in your example.


You also said:

But while using + operator a new reference will be created in the string pool constant.

This is not directly relevant to your question, but ... actually, no it won't be created there. It will create a reference to a regular (not interned) String object in the heap. (It would only be in the class file's constant pool ... and hence the string pool ... if it was a constant expression; see JLS 15.29)

Note that the string pool and the classfile constant pool are different things.


Can I add a couple of things:

  • You probably shouldn't be using String.concat. The + operator is more concise, and the JIT compiler should know how to optimize away the creation of unnecessary intermediate strings ... in the few cases where you might consider using concat for performance reasons.

  • It is a bad idea to exploit the fact that no new object is created so that you can use == rather than equals(Object). Your code will be fragile. Just use equals always for comparing String and the primitive wrapper types. It is simpler and safer.

In short, the fact that you are even asking this question suggests that you are going down a blind alley. Knowledge of this edge-case difference between concat and + is ... pointless ... unless you are planning to enter a quiz show for Java geeks.

String Concatenation to Prevent Movement of Two Strings In Memory

For just two strings, big-O is irrelevant, as there is nothing you can do to improve it. a + b is fine; you can't amortize the growth, so you can just concatenate them, performing one allocation for the new string, and two copies into that single allocation, one from each source.

For a larger number of strings, the standard Python approach is to make a tuple (for many strings known at once) or list (for a set of strings built up piecemeal) and call ''.join(seq) on it. str.join computes the combined lengths of all the inputs, preallocates the buffer for the result, and copies each component into that buffer, one after another, keeping the runtime O(n).

You could use io.StringIO to achieve a similar effect, but it's not necessary, and under the hood, it has to do similar work to building up a list and joining it.

Why is it not possible to concatenate two Strings in Rust without taking a reference to one of them?

The answer is in two parts.


The first part is that + involves using an Add trait implementation. It is implemented only for:

impl<'a> Add<&'a str> for String

Therefore, string concatenation only works when:

  • the Left Hand Side is a String
  • the Right Hand Side is coercible to a &str

Note: unlike many other languages, addition will consume the Left Hand Side argument.


The second part, therefore, is what kind of arguments can be used when a &str is expected?

Obviously, a &str can be used as is:

let hello = "Hello ".to_string();
let hello_world = hello + "world!";

Otherwise, a reference to a type implementing Deref<&str> will work, and it turns out that String does so &String works:

let hello = "Hello ".to_string();
let world = "world!".to_string();

let hello_world = hello + &world;

And what of other implementations? They all have issues.

  • impl<'a> Add<String> for &'a str requires prepending, which is not as efficient as appending
  • impl Add<String> for String needlessly consume two arguments when one is sufficient
  • impl<'a, 'b> Add<&'a str> for &'b str hides an unconditional memory allocation

In the end, the asymmetric choice is explained by Rust philosophy of being explicit as much as possible.

Or to be more explicit, we can explain the choice by checking the algorithmic complexity of the operation. Assuming that the left-hand side has size M and the right-hand side has size N, then:

  • impl<'a> Add<&'a str> for String is O(N) (amortized)
  • impl<'a> Add<String> for &'a str is O(M+N)
  • impl<'a, 'b> Add<&'a str> for &'b str is O(M+N)
  • impl Add<String> for String is O(N) (amortized)... but requires allocating/cloning the right-hand side for nothing.

Why concatenating two Strings and passing it as argument does not create a new String Object?

Because of the statement with the operative word prior: '

prior to the println statement

8 String objects were created before (prior to) the println as you described. Another 2 were created on the println, " " and spring winter spring summer

String s1 = "spring ";             // "spring" created, reference s1 changed
String s2 = s1 + "summer "; // "summer", "spring summer" created, "summer" not saved, reference s2 changed
s1.concat("fall "); // "fall", "spring fall" created but not saved
s2.concat(s1); // "spring summer spring" created but not saved
s1 += "winter "; // "winter", "spring winter" created, reference s1 changed
System.out.println(s1 + " " + s2); //" ", "spring winter spring summer" created, " " not saved

NOTE: "created" doesn't mean created at this point in the code, just that this piece of code will ask that it be created.

Is python += string concatenation bad practice?

Is it bad practice?

It's reasonable to assume that it isn't bad practice for this example because:

  • The author doesn't give any reason. Maybe it's just disliked by him/her.
  • Python documentation doesn't mention it's bad practice (from what I've seen).
  • foo += 'ooo' is just as readable (according to me) and is approximately 100 times faster than foo = ''.join([foo, 'ooo']).

When should one be used over the other?

Concatenation of strings have the disadvantage of needing to create a new string and allocate new memory for every concatenation! This is time consuming, but isn't that big of a deal with few and small strings. When you know the number of strings to concatenate and don't need more than maybe 2-4 concatenations I'd go for it.


When joining strings Python only has to allocate new memory for the final string, which is much more efficient, but could take longer to compute. Also, because strings are immutable it's often more practical to use a list of strings to dynamically mutate, and only convert it to a string when needed.

It's often convenient to create strings with str.join() since it takes an iterable. For example:

letters = ", ".join("abcdefghij")

To conclude

In most cases it makes more sense to use str.join() but there are times when concatenation is just as viable. Using any form of string concatenation for huge or many strings would be bad practice just as using str.join() would be bad practice for short and few strings, in my own opinion.

I believe that the author was just trying to create a rule of thumb to easier identify when to use what without going in too much detail or make it complicated.



Related Topics



Leave a reply



Submit