Where do Java and .NET string literals reside?
Strings in .NET are reference types, so they are always on the heap (even when they are interned). You can verify this using a debugger such as WinDbg.
If you have the class below
class SomeType {
public void Foo() {
string s = "hello world";
Console.WriteLine(s);
Console.WriteLine("press enter");
Console.ReadLine();
}
}
And you call Foo()
on an instance, you can use WinDbg to inspect the heap.
The reference will most likely be stored in a register for a small program, so the easiest is to find the reference to the specific string is by doing a !dso
. This gives us the address of our string in question:
0:000> !dso
OS Thread Id: 0x1660 (0)
ESP/REG Object Name
002bf0a4 025d4bf8 Microsoft.Win32.SafeHandles.SafeFileHandle
002bf0b4 025d4bf8 Microsoft.Win32.SafeHandles.SafeFileHandle
002bf0e8 025d4e5c System.Byte[]
002bf0ec 025d4c0c System.IO.__ConsoleStream
002bf110 025d4c3c System.IO.StreamReader
002bf114 025d4c3c System.IO.StreamReader
002bf12c 025d5180 System.IO.TextReader+SyncTextReader
002bf130 025d4c3c System.IO.StreamReader
002bf140 025d5180 System.IO.TextReader+SyncTextReader
002bf14c 025d5180 System.IO.TextReader+SyncTextReader
002bf15c 025d2d04 System.String hello world // THIS IS THE ONE
002bf224 025d2ccc System.Object[] (System.String[])
002bf3d0 025d2ccc System.Object[] (System.String[])
002bf3f8 025d2ccc System.Object[] (System.String[])
Now use !gcgen
to find out which generation the instance is in:
0:000> !gcgen 025d2d04
Gen 0
It's in generation zero - i.e. it has just be allocated. Who's rooting it?
0:000> !gcroot 025d2d04
Note: Roots found on stacks may be false positives. Run "!help gcroot" for
more info.
Scan Thread 0 OSTHread 1660
ESP:2bf15c:Root:025d2d04(System.String)
Scan Thread 2 OSTHread 16b4
DOMAIN(000E4840):HANDLE(Pinned):6513f4:Root:035d2020(System.Object[])->
025d2d04(System.String)
The ESP is the stack for our Foo()
method, but notice that we have a object[]
as well. That's the intern table. Let's take a look.
0:000> !dumparray 035d2020
Name: System.Object[]
MethodTable: 006984c4
EEClass: 00698444
Size: 528(0x210) bytes
Array: Rank 1, Number of elements 128, Type CLASS
Element Methodtable: 00696d3c
[0] 025d1360
[1] 025d137c
[2] 025d139c
[3] 025d13b0
[4] 025d13d0
[5] 025d1400
[6] 025d1424
...
[36] 025d2d04 // THIS IS OUR STRING
...
[126] null
[127] null
I reduced the output somewhat, but you get the idea.
In conclusion: strings are on the heap - even when they are interned. The interned table holds a reference to the instance on the heap. I.e. interned strings are not collected during GC because the interned table roots them.
string literal and memory representation
What you really appear to be asking is
is the interned string stored as a member of the string class?
I haven't found a definitive source to confirm this, but I'm pretty certain that the interned strings are maintained by the CLR outside of the memory scope of the string class. This is indicated by the source for String.Intern
, which calls out to a method on the Thread
class:
public static String Intern(String str) {
if (str==null) {
throw new ArgumentNullException("str");
}
return Thread.GetDomain().GetOrInternString(str);
}
If interning affected the memory footprint of the string
class, I'd expect the Intern
function to call methods within the string
class rather than call out to Thread
.
However, this is an implementation detail (since it's not part of the ECMA-335 standard that I can find) and may be different between different implementations of the spec.
String Immutability
The compiler has special treatment for string concatenation, which is why the second example is only ever one string. And "interning" means that even if you run this line 20000 times there is still only 1 string.
Re testing the results... the easiest way (in this case) is probably to look in reflector:
.method private hidebysig static void Main() cil managed
{
.entrypoint
.maxstack 1
.locals init (
[0] string s)
L_0000: ldstr "goodbye cruel world!"
L_0005: stloc.0
L_0006: ldloc.0
L_0007: call void [mscorlib]System.Console::WriteLine(string)
L_000c: ret
}
As you can see (ldstr
), the compiler has done this for you already.
Do string literals get optimised by the compiler?
EDIT: While I strongly suspect the statement below is true for all C# compiler implementations, I'm not sure it's actually guaranteed in the spec. Section 2.4.4.5 of the spec talks about literals referring to the same string instance, but it doesn't mention other constant string expressions. I suspect this is an oversight in the spec - I'll email Mads and Eric about it.
It's not just string literals. It's any string constant. So for example, consider:
public const string X = "X";
public const string Y = "Y";
public const string XY = "XY";
void Foo()
{
string z = X + Y;
}
The compiler realises that the concatenation here (for z
) is between two constant strings, and so the result is also a constant string. Therefore the initial value of z
will be the same reference as the value of XY
, because they're compile-time constants with the same value.
EDIT: The reply from Mads and Eric suggested that in the Microsoft C# compiler string constants and string literals are usually treated the same way - but that other implementations may differ.
string literal and memory representation
What you really appear to be asking is
is the interned string stored as a member of the string class?
I haven't found a definitive source to confirm this, but I'm pretty certain that the interned strings are maintained by the CLR outside of the memory scope of the string class. This is indicated by the source for String.Intern
, which calls out to a method on the Thread
class:
public static String Intern(String str) {
if (str==null) {
throw new ArgumentNullException("str");
}
return Thread.GetDomain().GetOrInternString(str);
}
If interning affected the memory footprint of the string
class, I'd expect the Intern
function to call methods within the string
class rather than call out to Thread
.
However, this is an implementation detail (since it's not part of the ECMA-335 standard that I can find) and may be different between different implementations of the spec.
If == compares references in Java, why does it evaluate to true with these Strings?
The program will print Equal
. (At least using the Sun Hotspot and suns Javac.) Here it is demonstrated on http://ideone.com/8UrRrk
This is due to the fact that string-literal constants are stored in a string pool and string references may be reused.
Further reading:
- What is String literal pool?
- String interning
This however:
public class Salmon {
public static void main(String[] args) {
String str1 = "Str1";
String str2 = new String("Str1");
if (str1 == str2) {
System.out.println("Equal");
} else {
System.out.println("Not equal");
}
}
}
Will print Not equal
since new
is guaranteed to introduce a fresh reference.
So, rule of thumb: Always compare strings using the equals
method.
Strings in Java : equals vs ==
First of all String.toString
is a no-op:
/**
* This object (which is already a string!) is itself returned.
*
* @return the string itself.
*/
public String toString() {
return this;
}
Second of all, String constants are interned so s1 and s2 are behind the scenes changed to be the same String instance.
What does the @ prefix do on string literals in C#
@
is not related to any method.
It means that you don't need to escape special characters in the string following to the symbol:
@"c:\temp"
is equal to
"c:\\temp"
Such string is called 'verbatim' or @-quoted. See MSDN.
Related Topics
How to Retrieve Value from Jtextfield in Java Swing
Selenium Switch Focus to Tab, Which Opened After Clicking Link
How to Write Logs in Text File When Using Java.Util.Logging.Logger
How to Remove Single Character from a String
How to Hot-Reload Properties in Java Ee and Spring Boot
Java.Util.Zip.Zipexception: Error in Opening Zip File
Is String Literal Pool a Collection of References to the String Object, or a Collection of Objects
Isn't the Size of Character in Java 2 Bytes
Configure Hibernate (Using JPA) to Store Y/N for Type Boolean Instead of 0/1
How to Convert a String to Double in Java Using a Specific Locale
How to Setsize of Image Using Rescaleop
Jpa: How to Have One-To-Many Relation of the Same Entity Type
Filter Values Only If Not Null Using Lambda in Java8
The Meaning of Noinitialcontextexception Error
Java Regular Expression to Extract Content Within Square Brackets
What Is the Convention for Word Separator in Java Package Names