29 December 2011

CLR Optimizations - String Interning

In my previous post, I explained how the immutable nature of strings can hurt the performance of your application. In this post, I am going to explain how CLR optimizes string handling through a technique called String Interning.

To begin with, let's take a look at below simple program.

static void Main()
{
    String s1 = "tiger";
    String s2 = "tiger";

    //compare the values of s1 and s2
    bool valuesEqual = String.Equals(s1, s2);

    //compare the references of s1 and s2
    bool referenceEqual = Object.ReferenceEquals(s1, s2);
}

When you execute this program, valuesEqual will be true which is expected to be true since both s1 and s2 contain the same value "tiger". What about the value of referenceEquals variable? there is a twist here. You expect referenceEquals to be false because both s1 and s2 are completely different objects and hence their references should be different. But wait, the value of referenceEquals will also be true!

To proceed further, just change the value of s2 to something else say "lion" and run your program. Now, valueEquals is false which is expected. Also referenceEqual is false too. Now, if you are wondoring why referenceEquals was true in first case. The answer is - it was because the result of String Interning, an optimization technique adapted by clr for string manipulation. So, let's understand what is String interning.

When you run your application (say the above program iteself), CLR creates an internal hash table. Initially the hash table will empty. Then, string s1 is created on heap with value "tiger". Now, an entry is made in the hash table where key will be "tiger" and value will be reference to string object created on the heap.




You see that s1 is created on the heap and the address (reference) of the object is stored in hash table for "tiger". Then, the CLR sees the second instruction String s2 = "tiger"; Now, instead of creating a new string object on heap, it first searches the hash table for the key "tiger" and it will defnitely find an entry. This means that a string "tiger" already exists on the heap whose reference is 0x100. Hence, CLR simly stores the reference from hash table into s2. This way, creating of a new object is avoided and thereby saving memory.



Later  if s2 is assigned a different value say "lion", then CLR will first search for the key "lion" in hash table. But it will not find any entry in hash table. Hence, a new String object will be created on heap for "lion". Also, a new entry will be made in the hash table with key being "lion" and value being reference to new object on heap.



Pretty interesting right? By adapting string interning mechanism, CLR efficiently controls the creation of strings. If you think that this feature is quite useful and want to take advantage of it, you can refer MSDN for String class's static methods Intern and IsInterned.

Apart from above said advantanges, this mechanism has also some disadvantages. The additional overhead in creating & maintaining hash table, repeated hash table lookups can hurt the performance of your application. If you think, string interning hurts your application performance, you can turn this feature off by supplying assembly level attribute "CompilationRelaxationAttribute" with value "CompilationRelaxation.NoStringInterning". But there is a catch here. Even if you supply this attribute, CLR may ignore this attribute and use String Interning. So, just be aware of this.

2 comments: