Why Can't Strings Be Mutable in Java and .Net

Why can't strings be mutable in Java and .NET?

According to Effective Java, chapter 4, page 73, 2nd edition:

"There are many good reasons for this: Immutable classes are easier to
design, implement, and use than mutable classes. They are less prone
to error and are more secure.

[...]

"Immutable objects are simple. An immutable object can be in
exactly one state, the state in which it was created. If you make sure
that all constructors establish class invariants, then it is
guaranteed that these invariants will remain true for all time, with
no effort on your part.

[...]

Immutable objects are inherently thread-safe; they require no synchronization. They cannot be corrupted by multiple threads
accessing them concurrently. This is far and away the easiest approach
to achieving thread safety. In fact, no thread can ever observe any
effect of another thread on an immutable object. Therefore,
immutable objects can be shared freely

[...]

Other small points from the same chapter:

Not only can you share immutable objects, but you can share their internals.

[...]

Immutable objects make great building blocks for other objects, whether mutable or immutable.

[...]

The only real disadvantage of immutable classes is that they require a separate object for each distinct value.

Why are strings immutable in many programming languages?

Immutable types are a good thing generally:

  • They work better for concurrency (you don't need to lock something that can't change!)
  • They reduce errors: mutable objects are vulnerable to being changed when you don't expect it which can introduce all kinds of strange bugs ("action at a distance")
  • They can be safely shared (i.e. multiple references to the same object) which can reduce memory consumption and improve cache utilisation.
  • Sharing also makes copying a very cheap O(1) operation when it would be O(n) if you have to take a defensive copy of a mutable object. This is a big deal because copying is an incredibly common operation (e.g. whenever you want to pass parameters around....)

As a result, it's a pretty reasonable language design choice to make strings immutable.

Some languages (particularly functional languages like Haskell and Clojure) go even further and make pretty much everything immutable. This enlightening video is very much worth a look if you are interested in the benefits of immutability.

There are a couple of minor downsides for immutable types:

  • Operations that create a changed string like concatenation are more expensive because you need to construct new objects. Typically the cost is O(n+m) for concatenating two immutable Strings, though it can go as low as O(log (m+n)) if you use a tree-based string data structure like a Rope. Plus you can always use special tools like Java's StringBuilder if you really need to concatenate Strings efficiently.
  • A small change on a large string can result in the need to construct a completely new copy of the large String, which obviously increases memory consumption. Note however that this isn't usually a big issue in garbage-collected languages since the old copy will get garbage collected pretty quickly if you don't keep a reference to it.

Overall though, the advantages of immutability vastly outweigh the minor disadvantages. Even if you are only interested in performance, the concurrency advantages and cheapness of copying will in general make immutable strings much more performant than mutable ones with locking and defensive copying.

Why mutable strings can cause security issue?

In general, it's easier to write and review sensitive code when values don't change, because
there are fewer interleavings of operations that might affect the result.

Imagine code like

void doSomethingImportant(String name) {
if (!isAlphaNumeric(name)) { throw new IllegalArgumentException(); }
Object o = lookupThingy(name);
// No chance of SQL-Injection because name is alpha-numeric.
connection.executeStatement("INSERT INTO MyTable (column) VALUES ('" + name + "')");
}

The code does some checks to prevent an escalation of authority, but this only holds if isAlphaNumeric(name) is true when the argument to executeStatement is called.

If the first two statements were re-ordered then this would not be a problem, so the insecurity arises, in-part, from a bad interleaving. But other code might call this function and assume that name is not changed by it, so might have to perform and re-perform validity checks.

If String is not immutable, then it might have been changed by lookupThingy. To be sure the security check works, there is a much larger amount of code that has to perform correctly for this code to be secure against SQL injection.

Not only is the amount of code that has to perform correctly larger, but a maintainer who makes local changes to one function might affect the security of other functions far-away. Non-local effects make code-maintenance hard. Maintaining security properties is always dicey since security vulnerabilities are rarely obvious, so mutability can lead to degradation of security over time.


Why is it a reason that string is designed to be immutable?

This is separate from why it is bad security-wise.

It is widely believed that programs written in languages with readily-available immutable string types do fewer unnecessary buffer copies than ones that do not. Unnecessary buffer copies eat up memory, cause GC churn, and can cause simple operations on large inputs to perform much worse than on small inputs.

It is also widely believed that it is easier to write correct programs when using immutable strings, because you are unlikely to fail to defensively copy a buffer.

Why .NET String is immutable?

  1. Instances of immutable types are inherently thread-safe, since no thread can modify it, the risk of a thread modifying it in a way that interferes with another is removed (the reference itself is a different matter).
  2. Similarly, the fact that aliasing can't produce changes (if x and y both refer to the same object a change to x entails a change to y) allows for considerable compiler optimisations.
  3. Memory-saving optimisations are also possible. Interning and atomising being the most obvious examples, though we can do other versions of the same principle. I once produced a memory saving of about half a GB by comparing immutable objects and replacing references to duplicates so that they all pointed to the same instance (time-consuming, but a minute's extra start-up to save a massive amount of memory was a performance win in the case in question). With mutable objects that can't be done.
  4. No side-effects can come from passing an immutable type as a method to a parameter unless it is out or ref (since that changes the reference, not the object). A programmer therefore knows that if string x = "abc" at the start of a method, and that doesn't change in the body of the method, then x == "abc" at the end of the method.
  5. Conceptually, the semantics are more like value types; in particular equality is based on state rather than identity. This means that "abc" == "ab" + "c". While this doesn't require immutability, the fact that a reference to such a string will always equal "abc" throughout its lifetime (which does require immutability) makes uses as keys where maintaining equality to previous values is vital, much easier to ensure correctness of (strings are indeed commonly used as keys).
  6. Conceptually, it can make more sense to be immutable. If we add a month onto Christmas, we haven't changed Christmas, we have produced a new date in late January. It makes sense therefore that Christmas.AddMonths(1) produces a new DateTime rather than changing a mutable one. (Another example, if I as a mutable object change my name, what has changed is which name I am using, "Jon" remains immutable and other Jons will be unaffected.
  7. Copying is fast and simple, to create a clone just return this. Since the copy can't be changed anyway, pretending something is its own copy is safe.
  8. [Edit, I'd forgotten this one]. Internal state can be safely shared between objects. For example, if you were implementing list which was backed by an array, a start index and a count, then the most expensive part of creating a sub-range would be copying the objects. However, if it was immutable then the sub-range object could reference the same array, with only the start index and count having to change, with a very considerable change to construction time.

In all, for objects which don't have undergoing change as part of their purpose, there can be many advantages in being immutable. The main disadvantage is in requiring extra constructions, though even here it's often overstated (remember, you have to do several appends before StringBuilder becomes more efficient than the equivalent series of concatenations, with their inherent construction).

It would be a disadvantage if mutability was part of the purpose of an object (who'd want to be modeled by an Employee object whose salary could never ever change) though sometimes even then it can be useful (in a many web and other stateless applications, code doing read operations is separate from that doing updates, and using different objects may be natural - I wouldn't make an object immutable and then force that pattern, but if I already had that pattern I might make my "read" objects immutable for the performance and correctness-guarantee gain).

Copy-on-write is a middle ground. Here the "real" class holds a reference to a "state" class. State classes are shared on copy operations, but if you change the state, a new copy of the state class is created. This is more often used with C++ than C#, which is why it's std:string enjoys some, but not all, of the advantages of immutable types, while remaining mutable.

Why only string is immutable & not other data types

Many other languages provide similar design for strings: Java with StringBuffer and StringBuilder, Scala with StringBuilder, Python with MutableString though there are other, beter solutions in Python. In C++ strings are mutable, so no need for a builder.

The reason why builder exist for strings is:

  1. Many languages define string as immutable (any change requires a new object in memory)
  2. Strings tend to be large, much larger than ints
  3. [1] and [2] combined cause inefficiency

The reason why builder doesn't exist for int:

  1. It is simple data structure by itself
  2. Most CPU have optimised instructions to deal with simple numbers (add, take away, etc)
  3. Most CPU would efficiently process [2] instructions in just one or a few cycles, using registers or fast CPU cache
  4. [2] and [3] combined remove the need for optimisation
  5. There is little need to mutate an int per se, however, if you need to, you can use BitConverter or binary shift operations

If Java Strings are immutable and StringBuilder is mutable why they wasting same amount of memory in my code?

It is hard to figure out what you are actually asking here, but the application is behaving exactly as I would expect.

Strings are immutable and the garbage collector doesn't take them out. isn't it

Both mutable and immutable objects may be garbage collected in Java.

The actual criterion that determines whether an object is actually garbage collected is it reachability. In simple terms, when the garbage collector figures out that the application can no longer use an object, the object will be deleted.

In both of your applications, objects of roughly the same size are being created once every 10 milliseconds. In each iteration, a new object is being created and its reference is being assigned to s, replacing the previous reference. This makes the previous object unreachable, and eligible for garbage collection. At some point, the Java VM decides to run the garbage collector. This gets rid of all of the unreachable object ... and the application continues.

I read that common Strings are not collected ever by the garbage collector, is that false?

This is false on two counts:

  • Strings created by new String(...), String.substring(...)1 and so on are no different from any other Java object.

  • Strings that are interned (by calling String.intern()) are stored in the string pool which is held in the PermGen heap2. However, even the PermGen heap is garbage collected, albeit on longer timescales that the heap in which objects are normally created.

(Once upon a time, the PermGen heap was not garbage collected, but that was changed a long time ago.)

As @MichaelBorgwardt correctly identified, you were confusing string objects (in general) with string objects that correspond to string literals. The latter are interned automatically, and end up in the string pool. However, they may still be subject to garbage collection. This can happen if the parent class is unloaded and nothing else references the literal.


1 - In Java 6 and earlier, there is a difference between strings created using new String and using String.substring. In the latter case, the original string and the substring would share the backing array that holds the string's characters. In Java 7, this changed. String.substring now creates a new backing array.

2 - From Java 7 onwards, the string pool is just a (hidden) data structure in the normal heap. From Java 8 onwards, the PermGen heap no longer exists.

Strings vs classes when both are reference types

One of reasons strings were made immutable, even though they are reference types, was to make them look and behave like primitive types (e.g., int, double, float).

That's also the reason why strings are the only reference type that can be represented as a literal (e.g., "some string"). Lots of other languages take the same approach, like Java for example.

Why does `String.Trim()` not trim the object itself?

s.Trim() creates a new trimmed version of the original string and returns it instead of storing the new version in s. So, what you have to do is to store the trimmed instance in your variable:

  s = s.Trim();

This pattern is followed in all the string methods and extension methods.

The fact that string is immutable doesn't have to do with the decision to use this pattern, but with the fact of how strings are kept in memory. This methods could have been designed to create the new modified string instance in memory and point the variable to the new instance.

It's also good to remember that if you need to make lots of modifications to a string, it's much better to use an StringBuilder, which behaves like a "mutable" string, and it's much more eficient doing this kind of operations.



Related Topics



Leave a reply



Submit