Where Do Java and .Net String Literals Reside

Where do Java and .NET string literals reside?

Strings in .NET are reference types, so they are always on the heap (even when they are interned). You can verify this using a debugger such as WinDbg.

If you have the class below

   class SomeType {
public void Foo() {
string s = "hello world";
Console.WriteLine(s);
Console.WriteLine("press enter");
Console.ReadLine();
}
}

And you call Foo() on an instance, you can use WinDbg to inspect the heap.

The reference will most likely be stored in a register for a small program, so the easiest is to find the reference to the specific string is by doing a !dso. This gives us the address of our string in question:

0:000> !dso
OS Thread Id: 0x1660 (0)
ESP/REG Object Name
002bf0a4 025d4bf8 Microsoft.Win32.SafeHandles.SafeFileHandle
002bf0b4 025d4bf8 Microsoft.Win32.SafeHandles.SafeFileHandle
002bf0e8 025d4e5c System.Byte[]
002bf0ec 025d4c0c System.IO.__ConsoleStream
002bf110 025d4c3c System.IO.StreamReader
002bf114 025d4c3c System.IO.StreamReader
002bf12c 025d5180 System.IO.TextReader+SyncTextReader
002bf130 025d4c3c System.IO.StreamReader
002bf140 025d5180 System.IO.TextReader+SyncTextReader
002bf14c 025d5180 System.IO.TextReader+SyncTextReader
002bf15c 025d2d04 System.String hello world // THIS IS THE ONE
002bf224 025d2ccc System.Object[] (System.String[])
002bf3d0 025d2ccc System.Object[] (System.String[])
002bf3f8 025d2ccc System.Object[] (System.String[])

Now use !gcgen to find out which generation the instance is in:

0:000> !gcgen 025d2d04 
Gen 0

It's in generation zero - i.e. it has just be allocated. Who's rooting it?

0:000> !gcroot 025d2d04 
Note: Roots found on stacks may be false positives. Run "!help gcroot" for
more info.
Scan Thread 0 OSTHread 1660
ESP:2bf15c:Root:025d2d04(System.String)
Scan Thread 2 OSTHread 16b4
DOMAIN(000E4840):HANDLE(Pinned):6513f4:Root:035d2020(System.Object[])->
025d2d04(System.String)

The ESP is the stack for our Foo() method, but notice that we have a object[] as well. That's the intern table. Let's take a look.

0:000> !dumparray 035d2020
Name: System.Object[]
MethodTable: 006984c4
EEClass: 00698444
Size: 528(0x210) bytes
Array: Rank 1, Number of elements 128, Type CLASS
Element Methodtable: 00696d3c
[0] 025d1360
[1] 025d137c
[2] 025d139c
[3] 025d13b0
[4] 025d13d0
[5] 025d1400
[6] 025d1424
...
[36] 025d2d04 // THIS IS OUR STRING
...
[126] null
[127] null

I reduced the output somewhat, but you get the idea.

In conclusion: strings are on the heap - even when they are interned. The interned table holds a reference to the instance on the heap. I.e. interned strings are not collected during GC because the interned table roots them.

string literal and memory representation

What you really appear to be asking is

is the interned string stored as a member of the string class?

I haven't found a definitive source to confirm this, but I'm pretty certain that the interned strings are maintained by the CLR outside of the memory scope of the string class. This is indicated by the source for String.Intern, which calls out to a method on the Thread class:

public static String Intern(String str) {
if (str==null) {
throw new ArgumentNullException("str");
}
return Thread.GetDomain().GetOrInternString(str);
}

If interning affected the memory footprint of the string class, I'd expect the Intern function to call methods within the string class rather than call out to Thread.

However, this is an implementation detail (since it's not part of the ECMA-335 standard that I can find) and may be different between different implementations of the spec.

String Immutability

The compiler has special treatment for string concatenation, which is why the second example is only ever one string. And "interning" means that even if you run this line 20000 times there is still only 1 string.

Re testing the results... the easiest way (in this case) is probably to look in reflector:

.method private hidebysig static void Main() cil managed
{
.entrypoint
.maxstack 1
.locals init (
[0] string s)
L_0000: ldstr "goodbye cruel world!"
L_0005: stloc.0
L_0006: ldloc.0
L_0007: call void [mscorlib]System.Console::WriteLine(string)
L_000c: ret
}

As you can see (ldstr), the compiler has done this for you already.

Do string literals get optimised by the compiler?

EDIT: While I strongly suspect the statement below is true for all C# compiler implementations, I'm not sure it's actually guaranteed in the spec. Section 2.4.4.5 of the spec talks about literals referring to the same string instance, but it doesn't mention other constant string expressions. I suspect this is an oversight in the spec - I'll email Mads and Eric about it.


It's not just string literals. It's any string constant. So for example, consider:

public const string X = "X";
public const string Y = "Y";
public const string XY = "XY";

void Foo()
{
string z = X + Y;
}

The compiler realises that the concatenation here (for z) is between two constant strings, and so the result is also a constant string. Therefore the initial value of z will be the same reference as the value of XY, because they're compile-time constants with the same value.

EDIT: The reply from Mads and Eric suggested that in the Microsoft C# compiler string constants and string literals are usually treated the same way - but that other implementations may differ.

string literal and memory representation

What you really appear to be asking is

is the interned string stored as a member of the string class?

I haven't found a definitive source to confirm this, but I'm pretty certain that the interned strings are maintained by the CLR outside of the memory scope of the string class. This is indicated by the source for String.Intern, which calls out to a method on the Thread class:

public static String Intern(String str) {
if (str==null) {
throw new ArgumentNullException("str");
}
return Thread.GetDomain().GetOrInternString(str);
}

If interning affected the memory footprint of the string class, I'd expect the Intern function to call methods within the string class rather than call out to Thread.

However, this is an implementation detail (since it's not part of the ECMA-335 standard that I can find) and may be different between different implementations of the spec.

If == compares references in Java, why does it evaluate to true with these Strings?

The program will print Equal. (At least using the Sun Hotspot and suns Javac.) Here it is demonstrated on http://ideone.com/8UrRrk

This is due to the fact that string-literal constants are stored in a string pool and string references may be reused.

Further reading:

  • What is String literal pool?
  • String interning

This however:

public class Salmon {
public static void main(String[] args) {

String str1 = "Str1";
String str2 = new String("Str1");

if (str1 == str2) {
System.out.println("Equal");
} else {
System.out.println("Not equal");
}
}
}

Will print Not equal since new is guaranteed to introduce a fresh reference.

So, rule of thumb: Always compare strings using the equals method.

Strings in Java : equals vs ==

First of all String.toString is a no-op:

/**
* This object (which is already a string!) is itself returned.
*
* @return the string itself.
*/
public String toString() {
return this;
}

Second of all, String constants are interned so s1 and s2 are behind the scenes changed to be the same String instance.

What does the @ prefix do on string literals in C#

@ is not related to any method.

It means that you don't need to escape special characters in the string following to the symbol:

@"c:\temp"

is equal to

"c:\\temp"

Such string is called 'verbatim' or @-quoted. See MSDN.



Related Topics



Leave a reply



Submit