Type inference is a common feature among mainstream programming languages. The functional ones, like ML and Haskell, are pioneers in exploring this programming paradigm where the declaration of a variable or a function may omit a type annotation. Today, even classic imperative languages such as C++ offer, to a certain extent, a kind of type inference.
C# is not an exception to the ever-increasing popularity of inference-enabled languages: since C# 3.0, developers may declare implicitly typed variables with the
var
keyword (in C++ 11, a similar feature was introducedāāāre-targeting the then-existing auto
keyword). This functionality, however, falls under a provision: a declaration must be accompanied by an initializer.Although one would not expect to see ML and Haskellās āwhole-program type inferenceā in C#, could this initializer requirement be relaxed? Why does it exist, altogether?
- Is it due to any limitation or just a matter of taste?
- Could the variable's type be inferred from arbitrary expressions?
- If yes, what are trade-offs from a language-design perspective?
In this article, I will describe the foundation of a standard type inference technique and ramble about the questions above.
Non-initializing Declarations
What do you answer when someone asks: why is it necessary, in C#, that a local
var
-declared variable is accompanied with an initializer? The accurate reply to this question is: because the language specification says so; it is not because the type would not be inferable otherwise. Consider function f
.The twoĀ statementsĀ above (aĀ declaration-Ā and anĀ expression-statement) are not fundamentally different from the singleĀ statementĀ
var num = 42;
. In fact, if permitted, theĀ Intermediate LanguageĀ (IL) code produced for the former would probably be equal to that produced for the latter. Also, one may easily observe thatĀ int
Ā is a valid replacement forĀ var
Ā inĀ f
.To set the stage for our discussion, let us quote ā with minor adjustments for pedagogical reasons ā the relevant fragment of theĀ C# specificationĀ to whichĀ varĀ is subject:
āWhen the local variable type is specified as varā¦ the declaration is an implicitly typed local variable declaration, whose type is inferred from the type of the associated initializer expression.ā
Another characteristic of C# is that a variable must beĀ definitely assignedĀ before we attempt to obtain its value. Let us also quote the language specificationĀ fragmentĀ that states such mandatory assignment.
āFor an initially unassigned variable to be considered definitely assignedā¦ , an assignment to the variable must occur in every possible execution pathā¦ā
It is not my intent to enter the pedantic realm of language specifications. Yet, based on our (informal) observation thatĀ
var num; num = 42;
Ā is equivalent toĀ var num = 42;
, and the fact that a variable must be assigned before beingĀ used, could we slightly stretch the functionality provided byĀ var
?Hindley-Milner, Constraints, and Unification
In our small and trivial functionĀ
f
, it is clear thatĀ int
Ā is a valid type for the program variableĀ num
. But, in a general setup, reasoning about the typing relations among a set of expressions demands a systemic approach. The de facto type inference discipline adopted in typical functional languages is that ofĀ Hindley-Milnerās type system.Here, I will describe (at a superficial level) the core ideas behind a constraint-basedĀ variationĀ of Hindley-Milner, alongside a simplified version of a popularĀ constraint-generationĀ andĀ constraint-solvingĀ technique that builds on theĀ unification frameworkĀ to implement such a type system. The setup I present is more elaborate than what would be essential to our discussion aboutĀ
var
Ā ā to make it comprehensible in the event one would like to further extend it.A Constraint Language
A primary component of aĀ constraint-basedĀ inference technique is a constraint language. A constraint language is composed by its own syntactic elements, and by the types of the (programming) language which we are modelling; in our case, C# is theĀ modelledĀ language. We refer to a C# type (e.g., anĀ
int
Ā orĀ System.DateTime
) by š, and to aĀ type variableĀ , which is a type too, by š¶.TheĀ BNFĀ grammar of our constraint language is as follows. C is a constraint.
C ::=Ā defĀ fĀ : š¶Ā inĀ C |Ā āš¶.Ā C | CĀ ^Ā C | šĀ =Ā š
The first constraint form,Ā def, introduces a function identified byĀ f, whose type is š¶. (A function type is denoted with the classicalĀ arrow notation.) The second form,Ā ā, is anĀ existentialĀ constraint which indicates that š¶ āholdsā some type that may be employed in a constraint. TheĀ ^Ā operator, appearing in the third constraint form, isĀ logical conjunctionĀ ā think of it as theĀ & operatorĀ in C#. The last form is a constraint that expresses type equality; it is on this one that we will focus. Moreover, our constraint language is equipped with two builtin operators,Ā decl_type(š), which designates the declared type of a šariable, andĀ type_of(šµ), which designates the type of a šµiteral.
A constraint language must also have its semantics defined. But, except for the equality of types, I will not do so in this article ā that would entail a longer discussionā¦ and one that is expendable for this general presentation. Having said that, the semantic definition that we have for šĀ =Ā š is straightforward: that ofĀ mathematical equality.
Constraint Generation and Constraint Solving
Supplied with a constraint language, we mayĀ generateĀ the constraint associated with a program. This task is usually accomplished in aĀ syntax-directedĀ manner, i.e., we traverse the programāsĀ Abstract Syntax TreeĀ (AST) and, while doing so, generate a constraint that isĀ soundĀ in respect to theĀ modelledĀ language. Below is how a constraint forĀ
f
Ā could look like:defĀ
f
Ā :Ā string
Ā āĀ void
Ā in āš¶1.Ā decl_type(num
)Ā =Ā š¶1Ā ^Ā š¶1Ā =Ā type_of(42
)A constraint implies the typing of a program. But for it to be really useful,Ā type variablesĀ must be instantiated. This is achieved byĀ solvingĀ the constraint. Let us do that for the one generated forĀ
f
.- defĀ is intrinsic to scope; there are various ways to deal with it, e.g., throughĀ De Bruijn Indices. We assume that a function comprises a single naming environment and conflicts never happen. Also,Ā
Ā is not called recursively, none of theĀ expressionsĀ in its body mention the parameterĀf
, itsĀname
Ā return is uninteresting, and it contains noĀ local functionsĀ inside it. These allow us to reduce the previous constraint as follows.void
āš¶1.Ā decl_type(
)Ā =Ā š¶1Ā ^Ā š¶1Ā =Ā type_of(num
)42
- Because theĀ type variableĀ š¶1 does notĀ occur freeĀ in the constraint, we may eliminate the existential quantification.
decl_type(
)Ā =Ā š¶1Ā ^Ā š¶1Ā =Ā type_of(num
)42
- We now evaluate operator calls.Ā type_of(
) evaluates toĀ42
; forĀ decl_type(int
), we introduce aĀ freshĀ type variable for each program variable, and store the corresponding mappings in a set ā this set is referred to asĀ š.num
š¶2Ā =Ā š¶1Ā ^Ā š¶1Ā =Ā
Ā whereĀ šĀ = { (int
, š¶2) }num
At this point, our constraint is in aĀ normal formatĀ such that it is eligible for a final solving stage by an off-the-shelf framework.
Unification and Typing
What remains from the partly-solved constraint of functionĀ
f
Ā is two equalities. A solution to those is aĀ substitution, fromĀ type variablesĀ to types, that will ideally render the constraint in questionĀ satisfiable. This is exactly whatĀ unificationĀ provides us with. Specifically, as the result of a successful unification, we have a āspecialā substitution that is referred to as theĀ most general unifierĀ (mgu), which, in turn, suggests aĀ principal type.Provided with an implementation of unification, this is how to proceed.
- Initially, we unify the first constraint of the conjunction, š¶2Ā =Ā š¶1. The return is a substitution of š¶2 for š¶1, written as š = [š¶2 ā¦ š¶1].
- Next, š must beĀ appliedĀ both on the remaining constraint, š¶1Ā =Ā
, and onĀ š. The application of a substitution is denoted withĀ juxtaposition, therefore: šC = [š¶2 ā¦ š¶1]š¶1Ā =Āint
Ā and ššĀ = [š¶2 ā¦ š¶1]{ (Āint
, š¶2) }.num
a) The former operation has no effect, for that š¶2 does not appear further.
b) The latter, šš, results inĀ šĀ = { (
, š¶1) }.num
- Lastly, we move on to the constraint that follows, š¶1Ā =Ā
, and repeat the process from the beginning.int
This iteration goes on until the entire constraint is processed or an error happens, e.g., due to an attempt of unifying incompatible types likes an
int
Ā against aĀ string
. Our example terminates without any error, after the second iteration, when [š¶1 ā¦Ā int
] is computed andĀ appliedĀ onĀ š, leaving it as:šĀ = { (num,Ā int) }
The mapping above tells us thatĀ
int
Ā is a solution as the type of program variableĀ num
. If the constraint that we generated forĀ f
Ā is indeedĀ sound, then replacingĀ var
Ā forĀ int
Ā in the original source-code must yield a well typed C# program, according to the languageās typingĀ judgements.In aĀ formal semantics, typing judgements are defined by means of inference rulesĀ like this one, for a basic (symmetric) assignment.
The above rule says that, under the judgement of theĀ typing environment šŖ, whenever the type of an assignment is š, then the type of its left-hand-sideĀ expressionĀ and its right-hand-sideĀ expressionĀ are š as well. However, in aĀ constraint-basedĀ setting, the constraint ā or š in our formulation ā must be accounted as part of the environment too, enriching the rule as follows.
Back to the C# Universe
We have seen that, from a technical standpoint, it is possible to stretch the functionality of C#āsĀ
var
Ā a bit further. But at what ācostā? There are trade-offs to be aware of. To have a sense of those, let us pretend that we are extending C# into a fictitious language named C##. (We shall not āresolveā the double hash, since aĀ D languageĀ already exists ā pun intended.)Of course, I will not go into every possible language-design aspect here; I would not know all of them myself, as a matter of fact. So the upcoming sections are just a glimpse of the highlights.
The Dynamic Typing Confusion
Suppose that, in C##, we decided to accept the syntax of our originalĀ
f
function,Ā as-is. We know that the compiler would have no problem to infer the type ofĀ num
Ā asĀ int
, but would developers find that an intelligible approach? Consider this functionĀ g
, a slightly incrementedĀ f
.The error above may be obvious to some, but, apparently, not to all. This affirmation is in view of the fact that the introduction of theĀ
dynamic
keyword in C# 4.0 prompted aĀ confusion between said new keyword and var
. The developerās intention was perhaps to writeĀ g
Ā like this:In spite of the contrast betweenĀ
var
Ā andĀ dynamic
, as a language designer, one wants to avoid potential confusion. Therefore, it might indeed be sensible to forbid non-initializingĀ var
-declarations in C##. Howeverā¦ what if we demand, for a non-initializing implicitly typed variable, an extra level of consciousness from the developer? For example, that the keywordsĀ var
Ā andĀ static
Ā are combined, as in this C## version ofĀ g
.A collateral coolness of the above approach is that it reuses an existing keyword. This change is subtle. Whether or not it would effectively eliminate doubts for the general audience, I cannot tell, though. Toward āconsistencyā, we could also enableĀ
var dynamic
Ā as a synonym of the plainĀ dynamic
.Type Conversions, Subtyping, and Generics
The foremost advantage of requiring that aĀ
var
-declared variable is accompanied by anĀ initializerĀ is that there is no ambiguity in deciding what the variable's type is. By giving away this obligation, yet retaining the requirement that variables must be assigned prior to theirĀ use, our inference needs to account for further possibilities when typing a valid program.- Inferring a type that renders an individual assignment correct, but which fails to type another assignment to the same implicitly typed variable.
- Inferring multiple unequal types that render all assignments to the implicitly typed variable correct.
This situation is exemplified by functionĀ
conv
.The type of the literalĀ
42
Ā isĀ int
, but we may not typeĀ v
Ā as such; otherwise, the subsequence assignment would trigger a compilation error, given that the type of literalĀ 1.6180F
Ā isĀ float
, but noĀ coercion from float
Ā toĀ int
Ā is available. One correct alternative is to typeĀ v
Ā asĀ float
ā keep in mind that C# allows developers to defineĀ implicit/explicit type conversions.In the presence ofĀ subtype polymorphism, the situation is similar.
Now, a compilation error would be triggered if we inferred, based on the first assignment, the type ofĀ
w
Ā asĀ C
. To avoid this error, one may naively think that always inferring aĀ top typeĀ is a solution to this situation; leading toĀ A
Ā as the typeĀ w
, and, likewise,Ā double
Ā as the type ofĀ v
Ā in function conv
. But that strategy does not work in general, because there might be furtherĀ usesĀ of the implicitly typed variable that impose a specific type: a callĀ w.b()
Ā inĀ sub
, or a callĀ whoops(v)
Ā inĀ conv
, whereĀ whoops
Ā isĀ void whoops(float p) {}
.There are other language constructs that I do not mention in these basic examples that would also demand special care in the non-initializing behavior ofĀ
var static
-declared variables in C##. A notorious challenge isĀ parametric polymorphism; put into that bucketĀ covariance and contravarianceĀ as well, features that areĀ supported by C#ās generics.It is certainly possible to deal with many (maybe all) of the difficulties I just presented in a nice way. However, advances in our type inference technique and constraint language would be necessary ā alongside eventual syntax hints. A few to consider are:
- Type schemesĀ andĀ let-polymorphism, for fine-grained control over generalization and instantiationĀ of types.
- Constraints in the form of type inequalities, i.e., š ā¤ š, whose implied semantics may of a subtype: š <: š . Its interpretation, in turn, depend, among other matters, on whether the system isĀ nominalĀ orĀ structural.
Is it Worth It?
After all this rambling, the natural question is: would it be worth it or anyhow beneficial to incorporate, in C#, the feature we have been discussing about? The language specification could be amended in the following (loose) way:
āWhen the local variable type is specified as varā¦ the declaration is an implicitly typed local variable declaration, whose type is inferred from the type of the associated initializer expression;Ā if said variable is specified as var static, its type is inferred from assignment expressions involving it.ā
That being said ā and as a disclaimer ā , I do not feel in position to make a thorough evaluation of this āproposalā. Apart from the technicalities, a primary aspect to consider is the actual/practical benefit (if any) implied in such a feature. To my understanding (I am not affiliated to Microsoft),Ā
var
was originally conceived forĀ anonymous types. Yet, I believe that extending its functionality to non-anonymous types was likely a choice of convenience. Then, under the subjectiveness of what is convenientā¦, one may just as well wish to write the code as in functionĀ z
Ā below.Thank you for reading! I hope you have enjoyed the text ā if you did, I would be curious to hear your feedback; and even more grateful if you share this article further.
Read behind a paywall at https://medium.com/p/stretching-the-reach-of-implicitly-typed-variables-in-c-16882318a92