Recently nullable reference types have become trendy. Meanwhile, the good old nullable value types are still here and actively used. How well do you remember the nuances of working with them? Let's jog your memory or test your knowledge by reading this article. Examples of C# and IL code, references to the CLI specification, and CoreCLR code are provided. Let's start with an interesting case.
Note. If you are interested in nullable reference types, you can read several articles by my colleagues: "Nullable Reference types in C# 8.0 and static analysis", "Nullable Reference will not protect you, and here is the proof".
Take a look at the sample code below and answer what will be output to the console. And, just as importantly, why. Just let's agree right away that you will answer as it is: without compiler hints, documentation, reading literature, or anything like that. :)
Well, let's do some thinking. Let's take a few main lines of thought that I think may arise.
1. Assume thatĀ int?Ā is a reference type.
Let's reason, thatĀ int?Ā is a reference type. In this case,Ā nullĀ will be stored in a, and it will also be stored inĀ aObjĀ after assignment. A reference to an object will be stored inĀ b. It will also be stored inĀ bObjĀ after assignment. As a result,Ā Object.ReferenceEqualsĀ will takeĀ nullĀ and a non-null reference to the object as arguments, so...
That needs no saying, the answer is False!
2. Assume thatĀ int?Ā is a value type.
Or maybe you doubt thatĀ int?Ā is a reference type? And you are sure of this, despite theĀ int? a = nullĀ expression? Well, let's go from the other side and start from the fact thatĀ int?Ā is a value type.
In this case, the expressionĀ int? a = nullĀ looks a bit strange, but let's assume that C# got some extra syntactic sugar. Turns out,Ā aĀ stores an object. So doesĀ b. When initializingĀ aObjĀ andĀ bObjĀ variables, objects stored inĀ aĀ andĀ b will be boxed, resulting in different references being stored inĀ aObjĀ and bObj. So, in the end,Ā Object.ReferenceEqualsĀ takes references to different objects as arguments, therefore...
That needs no saying, the answer is False!
3. We assume that here we useĀ Nullable<T>.
Let's say you didn't like the options above. Because you know perfectly well that there is noĀ int?, but there is a value typeĀ Nullable<T>, and in this case Nullable<int>Ā will be used. You also realize thatĀ aĀ andĀ bĀ will actually have the same objects. With that, you remember that storing values inĀ aObjĀ and bObjĀ will result in boxing. At long last, we'll get references to different objects. SinceĀ Object.ReferenceEqualsĀ gets references to the different objects...
That needs no saying, the answer is False!
4. ;)
For those who started from value types - if a suspicion crept into your mind about comparing links, you can view the documentation for Object.ReferenceEqualsĀ atĀ docs.microsoft.com. In particular, it also touches on the topic of value types and boxing/unboxing. Except for the fact that it describes the case, when instances of value types are passed directly to the method, whereas we made the boxing separately, but the main point is the same.
When comparing value types, if objA and objB are value types, they are boxed before they are passed to the ReferenceEquals method. This means thatĀ if both objA and objB represent the same instance of a value type, the ReferenceEqualsĀ method nevertheless returns false, as the following example shows.
Here we could have ended the article, but the thing is that... the correct answer isĀ True.
Well, let's figure it out.
Investigation
There are two ways - simple and interesting.
Simple way
int?Ā isĀ Nullable<int>. OpenĀ documentation onĀ Nullable<T>, where we look at the section "Boxing and Unboxing". Well, that's all, see the behavior description. But if you want more details, welcome to the interesting path. ;)
Interesting way
There won't be enough documentation on this path. It describes the behavior, but does not answer the question 'why'?
What are actuallyĀ int?Ā andĀ nullĀ in the given context? Why does it work like this? Are there different commands used in the IL code or not? Is behavior different at the CLR level? Is it another kind of magic?
Let's start by analyzing theĀ int?Ā entity to recall the basics, and gradually get to the initial case analysis. Since C# is a rather "sugary" language, we will sometimes refer to the IL code to get to the bottom of things (yes, C# documentation is not our cup of tea today).
int?, Nullable<T>
Here we will look at the basics of nullable value types in general: what they are, what they are compiled into in IL, etc. The answer to the question from the case at the very beginning of the article is discussed in the next section.
Let's look at the following code fragment:
Although the initialization of these variables looks different in C#, the same IL code will be generated for all of them.
As you can see, in C# everything is heartily flavored with syntactic sugar for our greater good. But in fact:
- int?Ā is a value type.
- int?Ā is the same asĀ Nullable<int>.Ā The IL code works withĀ Nullable<int32>
- int? aVal = nullĀ is the same asĀ Nullable<int> aVal =Ā new Nullable<int>(). In IL, this is compiled to anĀ initobjĀ instruction that performs default initialization by the loaded address.
Let's consider this code:
We're done with the default initialization - we saw the related IL code above. What happens here when we want to initializeĀ aValĀ with the value 62?
Look at the IL code:
Again, nothing complicated - theĀ aValĀ address pushes onto the evaluation stack, as well as the value 62. After the constructor with the signature Nullable<T>(T) is called. In other words, the following two statements will be completely identical:
You can also see this after checking out the IL code again:
And what about the checks? What does this code represent?
That's right, for better understanding, we will again refer to the corresponding IL code.
As you may have guessed, there is actually noĀ nullĀ - all that happens is accessing theĀ Nullable<T>.HasValueĀ property. In other words, the same logic in C# can be written more explicitly in terms of the entities used, as follows.
IL code:
Let's recap.
- Nullable value types are implemented using theĀ Nullable<T>Ā type;
- int?Ā is actually a constructed type of the unbound generic value typeĀ Nullable<T>;
- int? a = nullĀ is the initialization of an object ofĀ Nullable<int>Ā type with the default value, noĀ nullĀ is actually present here;
- if (a == null)Ā - again, there is noĀ null, there is a call of theĀ Nullable<T>.HasValueĀ property.
The source code of theĀ Nullable<T>Ā type can be viewed, for example, on GitHub in the dotnet/runtime repository - aĀ direct link to the source code file. There's not much code there, so check it out just for kicks. From there, you can learn (or recall) the following facts.
For convenience, theĀ Nullable<T>Ā type defines:
- implicit conversion operator fromĀ TĀ toĀ Nullable<T>;
- explicit conversion operator fromĀ Nullable<T>Ā toĀ T.
The main logic of work is implemented by two fields (and corresponding properties):
- T valueĀ - the value itself, the wrapper over which isĀ Nullable<T>;
- bool hasValueĀ - the flag indicating "whether the wrapper contains a value". It's in quotation marks, since in factĀ Nullable<T>Ā always contains a value of typeĀ T.
Now that we've refreshed our memory about nullable value types, let's see what's going on with the boxing.
Nullable<T> boxing
Let me remind you that when boxing an object of a value type, a new object will be created on the heap. The following code snippet illustrates this behavior:
The result of comparing references is expected to beĀ false. It is due to 2 boxing operations and creating of 2 objects whose references were stored inĀ obj1Ā andĀ obj2
Now let's changeĀ intĀ toĀ Nullable<int>.
The result is expectedlyĀ false.
And now, instead of 62, we write the default value.
Aaand... the result is unexpectedlyĀ true. One might wonder that we have all the same 2 boxing operations, two created objects and references to two different objects, but the result isĀ true!
Yeah, it's probably sugar again, and something has changed at the IL code level! Let's see.
Example N1.
C# code:
IL code:
Example N2.
C# code:
IL code:
Example N3.
C# code:
IL code:
As we can see, in all cases boxing happens in the same way - values of local variables are pushed onto the evaluation stack (ldlocĀ instruction). After that the boxing itself occurs by calling theĀ boxĀ command, which specifies what type we will be boxing.
Next we refer toĀ Common Language Infrastructure specification, see the description of theĀ boxĀ command, and find an interesting note regarding nullable types:
If typeTok is a value type, the box instruction converts val to its boxed form. ...Ā If it is a nullable type, this is done by inspecting val's HasValue property; if it is false, a null reference is pushed onto the stack; otherwise, the result of boxing val's Value property is pushed onto the stack.
This leads to several conclusions that dot the 'i':
- the state of theĀ Nullable<T>Ā object is taken into account (theĀ HasValueĀ flag we discussed earlier is checked). IfĀ Nullable<T>Ā does not contain a value (HasValueĀ -Ā false), the result of boxing isĀ null;
- ifĀ Nullable<T>Ā contains a value (HasValueĀ -Ā true), it is not aĀ Nullable<T>Ā object that is boxed, but an instance of typeĀ TĀ that is stored in theĀ valueĀ field of typeĀ Nullable<T>;
- specific logic for handlingĀ Nullable<T>Ā boxing is not implemented at the C# level or even at the IL level - it is implemented in the CLR.
Let's go back to the examples withĀ Nullable<T>Ā that we touched upon above.
First:
The state of the instance before the boxing:
- TĀ ->Ā int;
- valueĀ ->Ā 62;
- hasValueĀ ->Ā true.
The value 62 is boxed twice. As we remember, in this case, instances of the intĀ type are boxed, notĀ Nullable<int>. Then 2 new objects are created, and 2 references to different objects are obtained, the result of their comparing isĀ false.
Second:
The state of the instance before the boxing:
- TĀ ->Ā int;
- valueĀ ->Ā defaultĀ (in this case,Ā 0Ā - a default value forĀ int);
- hasValueĀ ->Ā false.
Since isĀ hasValueĀ isĀ false, objects are not created. The boxing operation returnsĀ nullĀ which is stored in variablesĀ obj1Ā andĀ obj2. Comparing these values is expected to returnĀ true.
In the original example, which was at the very beginning of the article, exactly the same thing happens:
For the sake of interest, let's look at the CoreCLR source code from the dotnet/runtimeĀ repository mentioned earlier. We are interested in the file object.cpp, specifically, theĀ Nullable::Box method with the logic we need:
Here we have everything we discussed earlier. If we don't store the value, we returnĀ NULL:
Otherwise we initiate the boxing:
Conclusion
You're welcome to show the example from the beginning of the article to your colleagues and friends just for kicks. Will they give the correct answer and justify it? If not, share this article with them. If they do it - well, kudos to them!
I hope it was a small but exciting adventure. :)
P.S.Ā Someone might have a question: how did we happen to dig that deep in this topic? We were writing a new diagnostic rule inĀ PVS-StudioĀ related to Object.ReferenceEqualsĀ working with arguments, one of which is represented by a value type. Suddenly it turned out that withĀ Nullable<T> there is an unexpected subtlety in the behavior when boxing. We looked at the IL code - there was nothing special about theĀ box. Checked out the CLI specification - and gotcha! The case promised to be rather exceptional and noteworthy, so here's the article right in front of you.
P.P.S.Ā By the way, recently, I have been spending more time on Twitter where I post some interesting code snippets and retweet some news in the .Net world and so on. Feel free to look through it and follow me if you want (link to the profile).
Previously published at https://viva64.com/en/b/0772/