Interned strings avoid duplicate strings. There is only one copy of each
String that has been interned, no matter how many references point to it. Since
Strings are immutable, if two different methods "incidentally" use the
same String, (even if they concocted the same String by totally independent
means, e.g. one might use the string "sin"
in the context of Moses and another in the context of trigonometry.) they can
share a copy of the same string. The process of converting duplicated strings to
shared ones is called interning. String.intern()
gives you the address of the canonical master String. You can compare interned
Strings with simple == (which compares pointers)
instead of .equals which compares the characters of
the String one by one. Because Strings are immutable, the intern process is free
to further save space, for example, by not creating a separate string literal
for "pot" when it exists as a substring of
some other literal such as "hippopotamus"
There there are two reasons for interning Strings:
-
To save space, by removing String literal duplicates.
-
To speed up String equality compares.
Interning and String.substring
when you use String.substring the JVM allocates a new
String descriptor, but it just points into the original String literal.
It does not need to allocate space for the substring. It does not copy any
characters. String.substring does not intern the
result. The original base string cannot be garbage collected as long as there
are any live references to substrings inside it.
Empty strings resulting from String.substring are not
automatically interned either. Because of this, the resulting empty substring
can still indefinitely encumber a long base string preventing it from being
garbage collected.
public class Empty
{
public static void main ( String [] args )
{
String s = "a very long string";
String e1 = s.substring( 0, 0 );
String e2 = ( e1.length() == 0 ) ? "" : e1;
System.out.println( e1 == "" );
System.out.println( e2 == "" );
}
}
Interning and the void String
To ensure you don't accidentally encumber base strings, and to avoid the
confusion of using a mixture of blank (i.e x.length()
!= 0 && x.trim().length() == 0, e.g. " "),
empty (i.e. x.length() == 0,
e.g. "") and null
(i.e. x == null) to represent the void
string, you may want to use code like this:
The Intern Gotcha
All String literals present at compile time are automatically interned. It is
only Strings generated on the fly as the program runs that might not be interned.
A nasty side effect of this behaviour is that a program will work fine for some
simple cases, but fail on complex ones. The problem comes if you used ==
to test for String equality where you should have used .equals.
The wrong code will still work much of the time because most String literals are
naturally interned.
Intern and new String(String)
Newbies often say foolish things like
String s = new String( "hello" );
instead of
String s = "hello";
This is the opposite of interning. You are deliberately creating a duplicate
unique "hello" String object. There are
two legitimate uses for doing that:
-
To provide a unique String synchronization object.
-
Unencumbering the huge base String on which a substring is embedded. By making a
copy with new String(String), the original string is
free to be garbage collected. It can pay to use new String(String),
if you have only a few short substrings into a common mother base string. Then
garbage collection can let go of the mother string. If you have a large number
of substrings so that the entire mother string is represented in some substring,
then there is no point in doing that. It is more efficient to just reference
into the common mother string with the substring.
Is new String compelled to create a brand new
underlying String when you use new String(String)? You
might imagine a clever JVM that always interned every new
String or that simply passed back the original reference, treating it as
a no-op. The language specification says that it is is fact compelled, that new
String must create a new unique reference, however, the JVM could
theoretically do that by treating new String as if it
were String.substring(0) or String.intern().substring(0)
and avoid actually making a physical copy.
This brings up yet another related question. Is s == s.substring(0)
compelled to be false?
One other place will see new String used legitimately is in:
String password = new String ( jpassword.getPassWord() );
getPassword returns a char[],
so it it not the silliness it first appears to be. It does this to permit you to
empty the char array after use in high security
situations.
Intern and garbage Collection
In the early JDKs, any string you interned could never be garbage collected
because the JVM had to keep a reference to in its Hashtable
so it could check each incoming string to see if it already had it in the pool.
With JDK 1.2 came weak references. Now unused interned
strings will be garbage collected.
Overflow
java.lang.OutOfMemoryError: string intern table overflow
means you have too many interned strings. Some older JVM's may limit you to 64K
Strings, which leaves perhaps 50,000 for your application. The IBM Java 1.1.8
JRE has this limit. This is an Error not an Exception
if you want to catch it. Here is the source for a simple Java program called InternTest
to test your JVM.