String Interning in .Net Framework - What Are the Benefits and When to Use Interning

String interning in .Net Framework - What are the benefits and when to use interning

Interning is an internal implementation detail. Unlike boxing, I do not think there is any benefit in knowing more than what you have read in Richter's book.

Micro-optimisation benefits of interning strings manually are minimal hence is generally not recommended.

This probably describes it:

class Program
{
const string SomeString = "Some String"; // gets interned

static void Main(string[] args)
{
var s1 = SomeString; // use interned string
var s2 = SomeString; // use interned string
var s = "String";
var s3 = "Some " + s; // no interning

Console.WriteLine(s1 == s2); // uses interning comparison
Console.WriteLine(s1 == s3); // do NOT use interning comparison
}
}

Is string interning really useful?

No, Java and .NET don't do it "automatically with all strings". They (well, Java and C#) do it with constant string expressions expressed in bytecode/IL, and on demand via the String.intern and String.Intern (.NET) methods. The exact situation in .NET is interesting, but basically the C# compiler will guarantee that every reference to an equal string constant within an assembly ends up referring to the same string object. That can be done efficiently at type initialization time, and can save a bunch of memory.

It doesn't happen every time a new string is created.

(On the string immutability front, I for one am extremely glad that strings are immutable. I don't want to have to take a copy every time I receive a parameter etc, thank you very much. I haven't seen it make string processing tasks harder, either...)

And as others have pointed out, looking up a string in a hash table isn't generally an O(n) operation, unless you're incredibly unlucky with hash collisions...

Personally I don't use string interning in user-land code; if I want some sort of cache of strings I'll create a HashSet<string> or something similar. That can be useful in various situations where you expect to come across the same strings several times (e.g. XML element names) but with a simple collection you don't pollute a system-wide cache.

When is it a good idea to intern strings manually in a .Net code?

I have done this is deserialization/materialization code when there is a good chance of repeated values (almost an enum, but not quite). When deserializing thousands of records this can give a significant memory benefit. However, in such cases you might prefer to use a separate intern cache, to avoid saturatig the shared one (or maybe the shared one is fine; it depends on the scenario).

But the key point there is: a scenario where you are likely to have lots and lots of different string instances with the same value. Deserialization is a big candidate there. It should also be note that there is some CPU overhead in checking the interned cache (progressively more overhead as you add data), so this should obly be done if there is a chance that the constucted objects are goin to live more than gen-0; if they are always going to be collected quickly anyway then it isn't worth swapping them for interned versions.

Is caching strings something to worry about?

I am also curious, is the compiler smart enough to automatically turn some of these strings into cached variables?

Yes it is. If you have identical string literals in a single assembly, C# will, as an optimization, only create one instance of that string at runtime and cache it. This is called "string interning". This blog post by Eric Lippert explains the general principle and the implications it can have for performance / garbage collection.

As a result of string interning, both of your examples result in just one string allocation at runtime, assuming all code in those examples exists within a single assembly.

Therefore which pattern you follow is a matter of style. For examples used in documentation and education, it's typically easier to read if the string is "in line" with the rest of the code, like Example 2. If you're writing a large, complex program where the exact same string must be used in multiple places, I would prefer Example 1. This avoids having to retype the string - possibly mistyping it - at each place it is used, gives a meaningful name to the string within your code, and allows you to edit only one line if the string ever needs to be changed in a future version.

Finally, about the project you mention. If the strings you are concerned about are defined at compile-time (i.e., they are string literals like "My String"), then C#'s automatic string interning may suffice. If you have a lot of strings that you don't know until runtime, but which you want to avoid re-allocating, you may want to declare something to hold those string objects for later use by your code. What kind of data structure you use depends on your use case.

You can also force the runtime to intern a string by calling String.Intern(String). This method puts the given string into the string interning cache (if it is not already there) and returns the string from the cache. Note that you will still need to allocate the string somehow to pass it into this method, but the lifetime of that instance may be a lot shorter and therefore make your project perform better.

Passing string as parameter

void Update(){
Method("String");
}

In this example "String" will be interned once and there will be only one object in memory.

Which should I use for empty string and why?

String.Empty is what is normally recommended.

Why? Because it conveys meaning, you are very explicitly saying that you want the string to be empty.

Additionally, "" creates a string object (this object will be reused across the applications, because strings in .NET are interned), String.Empty does not create a new object. See here.



Related Topics



Leave a reply



Submit