How to Append One String to Another in Python

How do I append one string to another in Python?

If you only have one reference to a string and you concatenate another string to the end, CPython now special cases this and tries to extend the string in place.

The end result is that the operation is amortized O(n).

e.g.

s = ""
for i in range(n):
s+=str(i)

used to be O(n^2), but now it is O(n).

From the source (bytesobject.c):

void
PyBytes_ConcatAndDel(register PyObject **pv, register PyObject *w)
{
PyBytes_Concat(pv, w);
Py_XDECREF(w);
}

/* The following function breaks the notion that strings are immutable:
it changes the size of a string. We get away with this only if there
is only one module referencing the object. You can also think of it
as creating a new string object and destroying the old one, only
more efficiently. In any case, don't use this if the string may
already be known to some other part of the code...
Note that if there's not enough memory to resize the string, the original
string object at *pv is deallocated, *pv is set to NULL, an "out of
memory" exception is set, and -1 is returned. Else (on success) 0 is
returned, and the value in *pv may or may not be the same as on input.
As always, an extra byte is allocated for a trailing \0 byte (newsize
does *not* include that), and a trailing \0 byte is stored.
*/

int
_PyBytes_Resize(PyObject **pv, Py_ssize_t newsize)
{
register PyObject *v;
register PyBytesObject *sv;
v = *pv;
if (!PyBytes_Check(v) || Py_REFCNT(v) != 1 || newsize < 0) {
*pv = 0;
Py_DECREF(v);
PyErr_BadInternalCall();
return -1;
}
/* XXX UNREF/NEWREF interface should be more symmetrical */
_Py_DEC_REFTOTAL;
_Py_ForgetReference(v);
*pv = (PyObject *)
PyObject_REALLOC((char *)v, PyBytesObject_SIZE + newsize);
if (*pv == NULL) {
PyObject_Del(v);
PyErr_NoMemory();
return -1;
}
_Py_NewReference(*pv);
sv = (PyBytesObject *) *pv;
Py_SIZE(sv) = newsize;
sv->ob_sval[newsize] = '\0';
sv->ob_shash = -1; /* invalidate cached hash value */
return 0;
}

It's easy enough to verify empirically.


$ python -m timeit -s"s=''" "for i in xrange(10):s+='a'"
1000000 loops, best of 3: 1.85 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(100):s+='a'"
10000 loops, best of 3: 16.8 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
10000 loops, best of 3: 158 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
1000 loops, best of 3: 1.71 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 14.6 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000000):s+='a'"
10 loops, best of 3: 173 msec per loop

It's important however to note that this optimisation isn't part of the Python spec. It's only in the cPython implementation as far as I know. The same empirical testing on pypy or jython for example might show the older O(n**2) performance .


$ pypy -m timeit -s"s=''" "for i in xrange(10):s+='a'"
10000 loops, best of 3: 90.8 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(100):s+='a'"
1000 loops, best of 3: 896 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
100 loops, best of 3: 9.03 msec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
10 loops, best of 3: 89.5 msec per loop

So far so good, but then,


$ pypy -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 12.8 sec per loop

ouch even worse than quadratic. So pypy is doing something that works well with short strings, but performs poorly for larger strings.

Which is the preferred way to concatenate a string in Python?

The best way of appending a string to a string variable is to use + or +=. This is because it's readable and fast. They are also just as fast, which one you choose is a matter of taste, the latter one is the most common. Here are timings with the timeit module:

a = a + b:
0.11338996887207031
a += b:
0.11040496826171875

However, those who recommend having lists and appending to them and then joining those lists, do so because appending a string to a list is presumably very fast compared to extending a string. And this can be true, in some cases. Here, for example, is one
million appends of a one-character string, first to a string, then to a list:

a += b:
0.10780501365661621
a.append(b):
0.1123361587524414

OK, turns out that even when the resulting string is a million characters long, appending was still faster.

Now let's try with appending a thousand character long string a hundred thousand times:

a += b:
0.41823482513427734
a.append(b):
0.010656118392944336

The end string, therefore, ends up being about 100MB long. That was pretty slow, appending to a list was much faster. That that timing doesn't include the final a.join(). So how long would that take?

a.join(a):
0.43739795684814453

Oups. Turns out even in this case, append/join is slower.

So where does this recommendation come from? Python 2?

a += b:
0.165287017822
a.append(b):
0.0132720470428
a.join(a):
0.114929914474

Well, append/join is marginally faster there if you are using extremely long strings (which you usually aren't, what would you have a string that's 100MB in memory?)

But the real clincher is Python 2.3. Where I won't even show you the timings, because it's so slow that it hasn't finished yet. These tests suddenly take minutes. Except for the append/join, which is just as fast as under later Pythons.

Yup. String concatenation was very slow in Python back in the stone age. But on 2.4 it isn't anymore (or at least Python 2.4.7), so the recommendation to use append/join became outdated in 2008, when Python 2.3 stopped being updated, and you should have stopped using it. :-)

(Update: Turns out when I did the testing more carefully that using + and += is faster for two strings on Python 2.3 as well. The recommendation to use ''.join() must be a misunderstanding)

However, this is CPython. Other implementations may have other concerns. And this is just yet another reason why premature optimization is the root of all evil. Don't use a technique that's supposed "faster" unless you first measure it.

Therefore the "best" version to do string concatenation is to use + or +=. And if that turns out to be slow for you, which is pretty unlikely, then do something else.

So why do I use a lot of append/join in my code? Because sometimes it's actually clearer. Especially when whatever you should concatenate together should be separated by spaces or commas or newlines.

Why doesn't .append() method work on strings, don't they behave like lists?

That's what they teach you in an algorithms and data structures class, that deal with algorithmic languages (unreal) rather than real programming languages, in Python, a string is a string, and a list is a list, they're different objects, you can "append" to a string using what is called string concatenation (which is basically an addition operation on strings):

string_name = "hello"  
string_name = string_name + " world"
print(string_name) # => "hello world"

Or a shorthand concatenation:

string_name = "hello"
string_name += " world"
print(string_name) # => "hello world"

Lists and strings belong to this type called iterable. iterables are as they're name suggests, iterables, meaning you can iterate through them with the key word in, but that doesn't mean they're the same type of objects:

for i in '123': # valid, using a string
for i in [1, 2, 3]: # valid, using a list
for i in (1, 2, 3): # valid, using a tuple
for i in 1, 2, 3: # valid, using an implicit-tuple
# all valid, all different types

I strongly recommend that you read the Python Documentation and/or take the Python's Tutorial.

From Docs Glossary:

iterable

An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() or __getitem__() method. Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), ). When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call iter() or deal with iterator objects yourself. The for statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop. See also iterator, sequence, and generator.
More about iterables.

How to append a string from one row to another, based on another duplicate value

Is this what you wanted?

customers=df["Customer"].unique().tolist()
List=[]

for customer in customers:
List.append(df.loc[df["Customer"]==customer,"Contact"].tolist())

df=df.drop_duplicates("Customer",keep="first")
df["new"]=List

output

Out[10]: 
ID Customer ... Contact new
0 1234 Customer A ... NaN [nan, nan, nan, nan]
4 1233 Customer B ... NaN [nan, nan, abc@email.com]
7 1235 Customer C ... abc@email.com [abc@email.com, abc@email.com]

[3 rows x 6 columns]

How to add a string in a certain position?

No. Python Strings are immutable.

>>> s='355879ACB6'
>>> s[4:4] = '-'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

It is, however, possible to create a new string that has the inserted character:

>>> s[:4] + '-' + s[4:]
'3558-79ACB6'

Appending the same string to a list of strings in Python

The simplest way to do this is with a list comprehension:

[s + mystring for s in mylist]

Notice that I avoided using builtin names like list because that shadows or hides the builtin names, which is very much not good.

Also, if you do not actually need a list, but just need an iterator, a generator expression can be more efficient (although it does not likely matter on short lists):

(s + mystring for s in mylist)

These are very powerful, flexible, and concise. Every good python programmer should learn to wield them.

In python, how to append a string without quotation marks next to a string with the single quotation marks?

Formatted string literals in Python are strings that are prefixed by f or F and whose contents have a special semantics. The f is not a character that is part of the string; you can think of it as a modifier for the string that follows. You cannot construct such a literal by simply concatenating the f character with some string.

There are several ways to achieve your goal. The first (as other answers suggest) uses the simpler approach. If however you insist on separating the URL and dynamically generating and using the formatted string literal, see the second. The third uses a format string (identical to the url in the second).

  1. Use the formatted string literal directly:

    def getData(pageNo):
    new_link = f'https://www.amazon.com/Best-Sellers-Amazon-Launchpad/zgbs/boost/ref=zg_bs_pg_{pageNo}?_encoding=UTF8&pg={pageNo}'
    # rest of the code
  2. Use eval:

    url = 'https://www.amazon.com/Best-Sellers-Amazon-Launchpad/zgbs/boost/ref=zg_bs_pg_{pageNo}?_encoding=UTF8&pg={pageNo}'

    def getData(url, pageNo):
    new_link = eval("f'"+ url + "'"))
    # rest of the code
  3. Use .format:

    url = 'https://www.amazon.com/Best-Sellers-Amazon-Launchpad/zgbs/boost/ref=zg_bs_pg_{pageNo}?_encoding=UTF8&pg={pageNo}'

    def getData(url, pageNo):
    new_link = url.format(pageNo=pageNo)
    # rest of the code


Related Topics



Leave a reply



Submit