What determines which strings are interned and when?
String interning is implementation specific and shouldn't be relied upon, use equality testing if you want to check two strings are identical.
When are Java Strings interned?
The optimization happens (or at least can happen) in both places:
- If two references to the same string constant appear in the same class, I'd expect the class file to only contain one constant pool entry. This isn't strictly required in order to ensure that there's only one
String
object created in the JVM, but it's an obvious optimization to make. This isn't actually interning as such - just constant optimization. - When classes are loaded, the string pool for the class is added to the intern pool. This is "real" interning.
(I have a vague recollection that one of the bits of work for Java 7 around "small jar files" included a single string pool for the whole jar file... but I could be very wrong.)
EDIT: Section 5.1 of the JVM spec, "The Runtime Constant Pool" goes into details of this:
To derive a string literal, the Java
virtual machine examines the sequence
of characters given by the
CONSTANT_String_info structure.
If the method String.intern has
previously been called on an instance
of class String containing a sequence
of Unicode characters identical to
that given by the CONSTANT_String_info
structure, then the result of string
literal derivation is a reference to
that same instance of class String.Otherwise, a new instance of class
String is created containing the
sequence of Unicode characters given
by the CONSTANT_String_info structure;
that class instance is the result of
string literal derivation. Finally,
the intern method of the new String
instance is invoked.
The return of String.intern() explained
s2.intern()
would return the instance referenced by s2
only if the String pool didn't contain a String
whose value is "java" prior to that call. The JDK classes intern some String
s before your code is executed. "java" must be one of them. Therefore, s2.intern()
returns the previously interned instance instead of s2
.
On the other hand, the JDK classes did not intern any String
whose value is equal to "Cattie & Doggie", so s1.intern()
returns s1
.
I am not aware of any list of pre-interned Strings. Such a list will most likely be considered an implementation detail, which may vary on different JDK implementations and JDK versions, and should not be relied on.
Java string intern and literal
They have the same end result, but they are not the same (they'll produce different bytecode; the new String("foo").intern()
version actually goes through those steps, producing a new string object, then interning it).
Two relevant quotes from String#intern
:
When the
intern
method is invoked, if the pool already contains a string equal to thisString
object as determined by theequals(Object)
method, then the string from the pool is returned. Otherwise, thisString
object is added to the pool and a reference to thisString
object is returned.All literal strings and string-valued constant expressions are interned.
So the end result is the same: A variable referencing the interned string "foo".
When should we use intern method of String on String literals
Java automatically interns String literals. This means that in many cases, the == operator appears to work for Strings in the same way that it does for ints or other primitive values.
Since interning is automatic for String literals, the intern()
method is to be used on Strings constructed with new String()
Using your example:
String s1 = "Rakesh";
String s2 = "Rakesh";
String s3 = "Rakesh".intern();
String s4 = new String("Rakesh");
String s5 = new String("Rakesh").intern();
if ( s1 == s2 ){
System.out.println("s1 and s2 are same"); // 1.
}
if ( s1 == s3 ){
System.out.println("s1 and s3 are same" ); // 2.
}
if ( s1 == s4 ){
System.out.println("s1 and s4 are same" ); // 3.
}
if ( s1 == s5 ){
System.out.println("s1 and s5 are same" ); // 4.
}
will return:
s1 and s2 are same
s1 and s3 are same
s1 and s5 are same
In all the cases besides of s4
variable, a value for which was explicitly created using new
operator and where intern
method was not used on it's result, it is a single immutable instance that's being returned JVM's string constant pool.
Refer to JavaTechniques "String Equality and Interning" for more information.
Does String interning causes a String to be both in heap and in native memory?
If we consider your example, yes, ref1
is still in the heap, but because both ref1
and ref2
point to the same instance. You initialise ref1
with a string literal, and string literals are automatically interned as described here:
Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.
So, no double memory usage (if you don't consider the string being present in the separate memory area that holds the content of the class ConstantPool and all the class structure information).
To explain a bit more in details how interning actually works, see this example:
public class Intern{
public static void main(String... args){
String str1="TestStr";
String str2="TestStr";
System.out.println("1. "+(str1==str2));
String str3=str1.intern();
System.out.println("2. "+(str1==str3));
String str4=new String("TestStr");
System.out.println("3. "+(str1==str4));
String str5=str4.intern();
System.out.println("4. "+(str4==str5));
System.out.println("5. "+(str1==str5));
}
}
You'll get this output:
1. true
Strings loaded from the Constant Pool are automatically interned into the String Pool, the result is true both instances refer to the same interned object.
2. true
str3
refers to a string instance that was already interned.
3. false
str4
is a new instance, nothing to do with the previous ones.
4. false
The throwaway str4
instance does not point to the same object that is present since the beginning in the string pool.
5. true
str5
points to our interned string as expected.
It's important to note that before Java 7(Oracle implementation) interned strings were stored in PermGem (that since Java 8 does not exist anymore), but since that release they have been moved to the Heap. So, using an older release of the JVM peculiar memory issues could appear when using the interning feature massively.
For additional info on how interned Strings are managed in different releases, check this nice post.
How does string interning work in Java 7+?
There's a thing called String Memory Pool in java, when you declare:
String str1="abc";
It goes to that memory pool and not on the heap. But when you write:
String str2=new String("abc");
It creates a full fledged object on the heap, If you again write:
String str3 = "abc";
It won't create any more object on the pool, it will check the pool if this literal already exists it will assign that to it. But writing:
String str4 = new String("abc");
will again create a new object on the heap
Key point is that:
A new object will always be created on the heap as many times as you keep writing:
new String("abc");
But if you keep assigning the Strings directly without using the keyword new, it will just get referenced from the memory pool (or get created if not present in the memory pool)
intern()
method finds if the string is present in the memory pool if it is not it adds it to the memory pool and returns a reference to it. so after using this method the String reference of yours is not pointing to any object on the heap, it is pointing to an object in the String Memory Pool (Also, note that the memory pool only contains unique strings).
What is Java String interning?
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#intern()
Basically doing String.intern() on a series of strings will ensure that all strings having same contents share same memory. So if you have list of names where 'john' appears 1000 times, by interning you ensure only one 'john' is actually allocated memory.
This can be useful to reduce memory requirements of your program. But be aware that the cache is maintained by JVM in permanent memory pool which is usually limited in size compared to heap so you should not use intern if you don't have too many duplicate values.
More on memory constraints of using intern()
On one hand, it is true that you can remove String duplicates by
internalizing them. The problem is that the internalized strings go to
the Permanent Generation, which is an area of the JVM that is reserved
for non-user objects, like Classes, Methods and other internal JVM
objects. The size of this area is limited, and is usually much smaller
than the heap. Calling intern() on a String has the effect of moving
it out from the heap into the permanent generation, and you risk
running out of PermGen space.
--
From: http://www.codeinstructions.com/2009/01/busting-javalangstringintern-myths.html
From JDK 7 (I mean in HotSpot), something has changed.
In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.
-- From Java SE 7 Features and Enhancements
Update: Interned strings are stored in main heap from Java 7 onwards. http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html#jdk7changes
String interning in .Net Framework - What are the benefits and when to use interning
Interning is an internal implementation detail. Unlike boxing, I do not think there is any benefit in knowing more than what you have read in Richter's book.
Micro-optimisation benefits of interning strings manually are minimal hence is generally not recommended.
This probably describes it:
class Program
{
const string SomeString = "Some String"; // gets interned
static void Main(string[] args)
{
var s1 = SomeString; // use interned string
var s2 = SomeString; // use interned string
var s = "String";
var s3 = "Some " + s; // no interning
Console.WriteLine(s1 == s2); // uses interning comparison
Console.WriteLine(s1 == s3); // do NOT use interning comparison
}
}
Related Topics
Typeerror: Can Only Concatenate Str (Not "Float") to Str
Super() Raises "Typeerror: Must Be Type, Not Classobj" for New-Style Class
How to Compute Derivative Using Numpy
How to Implement a Binary Tree
What Are Good Uses for Python3's "Function Annotations"
How to Turn Off Info Logging in Spark
How to Add a Custom Ca Root Certificate to the Ca Store Used by Pip in Windows
Does Python Have a Stack/Heap and How Is Memory Managed
How to Use Python to Execute a Curl Command
Best Way to Format Integer as String with Leading Zeros
Making Heatmap from Pandas Dataframe
Is There Any Difference Between "Foo Is None" and "Foo == None"
Processing Single File from Multiple Processes
Convert Columns to String in Pandas
How to Scrape a Website Which Requires Login Using Python and Beautifulsoup