Why Are Strings Immutable in Many Programming Languages

Why are strings immutable in many programming languages?

Immutable types are a good thing generally:

They work better for concurrency (you don't need to lock something that can't change!)
They reduce errors: mutable objects are vulnerable to being changed when you don't expect it which can introduce all kinds of strange bugs ("action at a distance")
They can be safely shared (i.e. multiple references to the same object) which can reduce memory consumption and improve cache utilisation.
Sharing also makes copying a very cheap O(1) operation when it would be O(n) if you have to take a defensive copy of a mutable object. This is a big deal because copying is an incredibly common operation (e.g. whenever you want to pass parameters around....)

As a result, it's a pretty reasonable language design choice to make strings immutable.

Some languages (particularly functional languages like Haskell and Clojure) go even further and make pretty much everything immutable. This enlightening video is very much worth a look if you are interested in the benefits of immutability.

There are a couple of minor downsides for immutable types:

Operations that create a changed string like concatenation are more expensive because you need to construct new objects. Typically the cost is O(n+m) for concatenating two immutable Strings, though it can go as low as O(log (m+n)) if you use a tree-based string data structure like a Rope. Plus you can always use special tools like Java's StringBuilder if you really need to concatenate Strings efficiently.
A small change on a large string can result in the need to construct a completely new copy of the large String, which obviously increases memory consumption. Note however that this isn't usually a big issue in garbage-collected languages since the old copy will get garbage collected pretty quickly if you don't keep a reference to it.

Overall though, the advantages of immutability vastly outweigh the minor disadvantages. Even if you are only interested in performance, the concurrency advantages and cheapness of copying will in general make immutable strings much more performant than mutable ones with locking and defensive copying.

Pros and cons of immutable strings

One reason why immutable strings are good is that it makes Unicode support easier. Modern Unicode can no longer fit efficiently into a fixed-size data cell, which kills the one-to-one correspondence between string index and memory address which gives mutable strings their advantage.

In the past, most Western applications used single-byte characters (various ASCII-based codings, or EBCDIC...), so you could usually handle them efficiently by treating strings as byte buffers (as in traditional C applications).

When Unicode was fairly new, there wasn't much requirement for anything outside the first 16 bits, so Java used double-byte characters for its Strings (and StringBuffers). This used twice the memory, and ignored any problems that might occur from Unicode extensions beyond 16 bits, but it was convenient at the time.

Now Unicode is not so new, and while the most-used characters still fit in 16 bits, you can't really get away with pretending the Basic Multilingual Plane is all that exists. If you want to honestly claim Unicode support, you need either variable-length characters or even larger (32-bit?) character cells.

With variable-length characters, you can no longer index into an arbitrary-length string in O(1) time -- barring additional information, you need to count from the beginning to figure out what the N'th character is. This also kills the main advantage of mutable string buffers: the ability to seamlessly modify substrings in place.

Fortunately, most string manipulation doesn't actually need this modify-in-place ability. Lexing, parsing, and search all proceed on a sequential, iterative basis, from beginning to end. General search-and-replace was never in-place to begin with, since the replacement string doesn't have to be the same length as the original.

Concatenating large numbers of substrings doesn't actually need modify-in-place to be efficient, either. You do need to be more careful about it, though, since (as others have pointed out) a naive concatenation loop can easily be O(N^2) by allocating a new string for each of N partial substrings...

One way to avoid naive concatenation is to provide a mutable StringBuffer or ConcatBuffer object designed to do concatenation efficiently. Another way would be to include an immutable string constructor that takes an iterator into a sequence of strings to be (efficiently) concatenated.

But, more generally, it is possible to write an immutable string library that efficiently concatenates by reference. This kind of string is often called a "rope" or "cord" to suggest that it is at least a bit more heavyweight than the basic strings it's composed of, but for concatenation purposes it is much more efficient, since it doesn't need to recopy the data at all!

The above Wikipedia link says that "rope" datastructures are O(log N) to concatenate, but the seminal paper "Purely Functional Data Structures" by Okasaki shows how to do concatenation in O(1) time.

Why are Python strings immutable? Best practices for using them

When you receive a string, you'll be sure that it stays the same. Suppose that you'd construct a Foo as below with a string argument, and would then modify the string; then the Foo's name would suddenly change:

class Foo(object):
    def __init__(self, name):
        self.name = name

name = "Hello"
foo = Foo(name)
name[0] = "J"

With mutable strings, you'd have to make copies all the time to prevent bad things from happening.

It also allows the convenience that a single character is no different from a string of length one, so all string operators apply to characters as well.

And lastly, if strings weren't immutable, you couldn't reliably use them as keys in a dict, since their hash value might suddenly change.

As for programming with immutable strings, just get used to treating them the same way you treat numbers: as values, not as objects. Changing the first letter of name would be

name = "J" + name[1:]

Why can't strings be mutable in Java and .NET?

According to Effective Java, chapter 4, page 73, 2nd edition:

"There are many good reasons for this: Immutable classes are easier to
design, implement, and use than mutable classes. They are less prone
to error and are more secure.

[...]

"Immutable objects are simple. An immutable object can be in
exactly one state, the state in which it was created. If you make sure
that all constructors establish class invariants, then it is
guaranteed that these invariants will remain true for all time, with
no effort on your part.

[...]

Immutable objects are inherently thread-safe; they require no synchronization. They cannot be corrupted by multiple threads
accessing them concurrently. This is far and away the easiest approach
to achieving thread safety. In fact, no thread can ever observe any
effect of another thread on an immutable object. Therefore,
immutable objects can be shared freely

[...]

Other small points from the same chapter:

Not only can you share immutable objects, but you can share their internals.

[...]

Immutable objects make great building blocks for other objects, whether mutable or immutable.

[...]

The only real disadvantage of immutable classes is that they require a separate object for each distinct value.

String is immutable. What exactly is the meaning?

Before proceeding further with the fuss of immutability, let's just take a look into the String class and its functionality a little before coming to any conclusion.

This is how String works:

String str = "knowledge";

This, as usual, creates a string containing "knowledge" and assigns it a reference str. Simple enough? Lets perform some more functions:

 String s = str;     // assigns a new reference to the same string "knowledge"

Lets see how the below statement works:

  str = str.concat(" base");

This appends a string " base" to str. But wait, how is this possible, since String objects are immutable? Well to your surprise, it is.

When the above statement is executed, the VM takes the value of String str, i.e. "knowledge" and appends " base", giving us the value "knowledge base". Now, since Strings are immutable, the VM can't assign this value to str, so it creates a new String object, gives it a value "knowledge base", and gives it a reference str.

An important point to note here is that, while the String object is immutable, its reference variable is not. So that's why, in the above example, the reference was made to refer to a newly formed String object.

At this point in the example above, we have two String objects: the first one we created with value "knowledge", pointed to by s, and the second one "knowledge base", pointed to by str. But, technically, we have three String objects, the third one being the literal "base" in the concat statement.

Important Facts about String and Memory usage

What if we didn't have another reference s to "knowledge"? We would have lost that String. However, it still would have existed, but would be considered lost due to having no references.
Look at one more example below

String s1 = "java";
s1.concat(" rules");
System.out.println("s1 refers to "+s1);  // Yes, s1 still refers to "java"

What's happening:

The first line is pretty straightforward: create a new String "java" and refer s1 to it.
Next, the VM creates another new String "java rules", but nothing
refers to it. So, the second String is instantly lost. We can't reach
it.

The reference variable s1 still refers to the original String "java".

Almost every method, applied to a String object in order to modify it, creates new String object. So, where do these String objects go? Well, these exist in memory, and one of the key goals of any programming language is to make efficient use of memory.

As applications grow, it's very common for String literals to occupy large area of memory, which can even cause redundancy. So, in order to make Java more efficient, the JVM sets aside a special area of memory called the "String constant pool".

When the compiler sees a String literal, it looks for the String in the pool. If a match is found, the reference to the new literal is directed to the existing String and no new String object is created. The existing String simply has one more reference. Here comes the point of making String objects immutable:

In the String constant pool, a String object is likely to have one or many references. If several references point to same String without even knowing it, it would be bad if one of the references modified that String value. That's why String objects are immutable.

Well, now you could say, what if someone overrides the functionality of String class? That's the reason that the String class is marked final so that nobody can override the behavior of its methods.

Why Are Strings Immutable in Many Programming Languages