Why Did Matz Choose to Make Strings Mutable by Default in Ruby

Why did Matz choose to make Strings mutable by default in Ruby?

This is in line with Ruby's design, as you note. Immutable strings are more efficient than mutable strings - less copying, as strings are re-used - but make work harder for the programmer. It is intuitive to see strings as mutable - you can concatenate them together. To deal with this, Java silently translates concatenation (via +) of two strings into the use of a StringBuffer object, and I'm sure there are other such hacks. Ruby chooses instead to make strings mutable by default at the expense of performance.

Ruby also has a number of destructive methods such as String#upcase! that rely on strings being mutable.

Another possible reason is that Ruby is inspired by Perl, and Perl happens to use mutable strings.

Ruby has Symbols and frozen Strings, both are immutable. As an added bonus, symbols are guaranteed to be unique per possible string value.

Are strings in Ruby mutable?

Yes, strings in Ruby, unlike in Python, are mutable.

s += "hello" is not appending "hello" to s - an entirely new string object gets created. To append to a string 'in place', use <<, like in:

s = "hello"
s << "   world"
s # hello world

Why are strings immutable in many programming languages?

Immutable types are a good thing generally:

They work better for concurrency (you don't need to lock something that can't change!)
They reduce errors: mutable objects are vulnerable to being changed when you don't expect it which can introduce all kinds of strange bugs ("action at a distance")
They can be safely shared (i.e. multiple references to the same object) which can reduce memory consumption and improve cache utilisation.
Sharing also makes copying a very cheap O(1) operation when it would be O(n) if you have to take a defensive copy of a mutable object. This is a big deal because copying is an incredibly common operation (e.g. whenever you want to pass parameters around....)

As a result, it's a pretty reasonable language design choice to make strings immutable.

Some languages (particularly functional languages like Haskell and Clojure) go even further and make pretty much everything immutable. This enlightening video is very much worth a look if you are interested in the benefits of immutability.

There are a couple of minor downsides for immutable types:

Operations that create a changed string like concatenation are more expensive because you need to construct new objects. Typically the cost is O(n+m) for concatenating two immutable Strings, though it can go as low as O(log (m+n)) if you use a tree-based string data structure like a Rope. Plus you can always use special tools like Java's StringBuilder if you really need to concatenate Strings efficiently.
A small change on a large string can result in the need to construct a completely new copy of the large String, which obviously increases memory consumption. Note however that this isn't usually a big issue in garbage-collected languages since the old copy will get garbage collected pretty quickly if you don't keep a reference to it.

Overall though, the advantages of immutability vastly outweigh the minor disadvantages. Even if you are only interested in performance, the concurrency advantages and cheapness of copying will in general make immutable strings much more performant than mutable ones with locking and defensive copying.

Why are symbols not frozen strings?

This answer drastically different from my original answer, but I ran into a couple interesting threads on the Ruby mailing list. (Both good reads)

So, at one point in 2006, matz implemented the Symbol class as Symbol < String. Then the Symbol class was stripped down to remove any mutability. So a Symbol was in fact a immutable String.

However, it was reverted. The reason given was

Even though it is highly against DuckTyping, people tend to use case
on classes, and Symbol < String often cause serious problems.

So the answer to your question is still: a Symbol is like a String, but it isn't.

The problem isn't that a Symbol shouldn't be String, but instead that it historically wasn't.

Why is a string key for a hash frozen?

In short it's just Ruby trying to be nice.

When a key is entered in a Hash, a special number is calculated, using the hash method of the key. The Hash object uses this number to retrieve the key. For instance, if you ask what the value of h['a'] is, the Hash calls the hash method of string 'a' and checks if it has a value stored for that number. The problem arises when someone (you) mutates the string object, so the string 'a' is now something else, let's say 'aa'. The Hash would not find a hash number for 'aa'.

The most common types of keys for hashes are strings, symbols and integers. Symbols and integers are immutable, but strings are not. Ruby tries to protect you from the confusing behaviour described above by dupping and freezing string keys. I guess it's not done for other types because there could be nasty performance side effects (think of large arrays).

Why isn't there a String#shift()?

Strings don't act as an enumerable object as of 1.9, because it's considered too confusing to decide what it'd be a list of:

A list of characters / codepoints?
A list of bytes?
A list of lines?

Pros and cons of immutable strings

One reason why immutable strings are good is that it makes Unicode support easier. Modern Unicode can no longer fit efficiently into a fixed-size data cell, which kills the one-to-one correspondence between string index and memory address which gives mutable strings their advantage.

In the past, most Western applications used single-byte characters (various ASCII-based codings, or EBCDIC...), so you could usually handle them efficiently by treating strings as byte buffers (as in traditional C applications).

When Unicode was fairly new, there wasn't much requirement for anything outside the first 16 bits, so Java used double-byte characters for its Strings (and StringBuffers). This used twice the memory, and ignored any problems that might occur from Unicode extensions beyond 16 bits, but it was convenient at the time.

Now Unicode is not so new, and while the most-used characters still fit in 16 bits, you can't really get away with pretending the Basic Multilingual Plane is all that exists. If you want to honestly claim Unicode support, you need either variable-length characters or even larger (32-bit?) character cells.

With variable-length characters, you can no longer index into an arbitrary-length string in O(1) time -- barring additional information, you need to count from the beginning to figure out what the N'th character is. This also kills the main advantage of mutable string buffers: the ability to seamlessly modify substrings in place.

Fortunately, most string manipulation doesn't actually need this modify-in-place ability. Lexing, parsing, and search all proceed on a sequential, iterative basis, from beginning to end. General search-and-replace was never in-place to begin with, since the replacement string doesn't have to be the same length as the original.

Concatenating large numbers of substrings doesn't actually need modify-in-place to be efficient, either. You do need to be more careful about it, though, since (as others have pointed out) a naive concatenation loop can easily be O(N^2) by allocating a new string for each of N partial substrings...

One way to avoid naive concatenation is to provide a mutable StringBuffer or ConcatBuffer object designed to do concatenation efficiently. Another way would be to include an immutable string constructor that takes an iterator into a sequence of strings to be (efficiently) concatenated.

But, more generally, it is possible to write an immutable string library that efficiently concatenates by reference. This kind of string is often called a "rope" or "cord" to suggest that it is at least a bit more heavyweight than the basic strings it's composed of, but for concatenation purposes it is much more efficient, since it doesn't need to recopy the data at all!

The above Wikipedia link says that "rope" datastructures are O(log N) to concatenate, but the seminal paper "Purely Functional Data Structures" by Okasaki shows how to do concatenation in O(1) time.

Why Did Matz Choose to Make Strings Mutable by Default in Ruby