Is there any object-oriented static typed language with variables with few types? - programming-languages

I like reading about programming theories, so could you tell me if there is any object-oriented static typed language that allow variables to have a few types?
Example in pesudocode:
var value: BigInteger | Double | Nil
I think about way of calling methods on this object. If object value have type BigInteger | Double language could allow user to call only shared methods (lake plus, minus) but when the type is BigInteger | Double | Nil then object of Nil hasn't methods plus and minus, so we can't do anything usefull with this object, because it has only few shared methods (like toString).
So is there any idea how should work calling methods on variable with few types in static typed object-oriented language?

What you are describing is an intersection type. They do exist in Java, for example, but they only arise within the type-checker as the result of capture conversion and type-inference. You cannot write one yourself.
I don't know of any language which uses them directly, but they are often used to describe or analyze type systems of languages, espececially languages which don't actually have a type system. For example, Diamondback Ruby, which is a static type system and type-inferencer for the dynamically typed Ruby programming language, uses both union and intersection types.
Note that the syntax you are using is generally used to denote union types, which are the dual of intersection types. Intersection types are generally written A & B & C.

I am not aware of any language that does this... sadly, I'd love to play around with it (but first, they should adopt type inference and parametric polymorphism ;) ).
Although it is alreapossible: Relatively elegantly in a structural type system (type a is a subtype of type b if a has everything b has), simply by specifying a type for value that is a structural subtype of BigInteger and of Double and of Nil and slightly less elegantly in a nominative type system (type a is a subtype of type b if and only if it inherits from it, directly or indirectly) by specifying a common ancestor of all three (if all else fails, object). Of course we'd need to go recursive - what is the type of toString? And what's the typ of (Integer | Double | BigInteger).+?!? This is far from trivial (in fact, looking for a solution made my head hurt a bit). I can't say if it is impossible, but no mainly-OO-language's type system is anywhere sophisticated enough for a possible solution.
The bottom line is: It'd be really cool if some whizz came along and sorted out the issues it raises. Propably not worth the effort...
Edit: Do you know algebraic data types? They are similar to your idea (but much older ;) ) in that an algebraic data type is composed of several types and can therefore contain e.g. a BigInteger, a Double and Nil - the actual value is one of these and a tag (as in tagged union) says which. But to use the value stored in an algebraic data type, you have to use pattern matching to extract it safely. This concept is very powerful, and still "simple" enough to be understood tools - e.g. type inference and static typechecking work.

It has not much to do with OO but (as far as I understand it) what you describe looks much like polymorphism as implemented by C++.

Yes, OCaml has these in the form of polymorphic variants:
type my_var = Integer of int | Float of float;;
let x = Integer(10);;
let y = Float(3.14);;

Pike has them, as does Magpie, an optionally-typed language I'm working on. Google's Closure compiler for Javascript allows you to annotate types in Javascript using |.
They crop up frequently in languages that bridge static and dynamic typing because a lot of expressions in a dynamic language can yield one of a couple of types:
var a = 123;
if (foo) { a = "string"; }
bar(a);
The statically-determined type being passed to bar() is Number | String.

I'm not so sure if we really have a complete definition of what a static typed language is but I also hope that the language you describe wouldn't qualify as one.
One of my concerns is that if you add type T1 and T2 to be a part of your BigInteger | Double | Nil, how would they know about each other and how to handle the operations you defined? Now I realize you never said that the language would allow expanding the "implicit" conversion definition.
Come to think of it, C# does something that resembles this in its string handling
string s = -42 + '+' + "+" + -0.1 / -0.1 + "=" + (7 ^ 5) +
" is " + true + " and not " + AddressFamily.Unknown;
=> "1+1=2 is True and not Unknown"
string str = 1 + 2 + "!=" + 1 + 2;
=> "3!=12"
And I do not like it.

Related

Understanding recurive let expression in lambda calculus with Haskell, OCaml and nix language

I'm trying to understand how recursive set operate internally by comparing similar feature in another functional programming languages and concepts.
I can find it in wiki. In that, I need to know Y combinator, fixed point. I can get it briefly in wiki.
Then, now I start to apply this in Haskell.
Haskell
It is easy. But I want to know behind the scenes.
*Main> let x = y; y = 10; in x
10
When you write a = f b in a lazy functional language like Haskell or Nix, the meaning is stronger than just assignment. a and f b will be the same thing. This is usually called a binding.
I'll focus on a Nix example, because you're asking about recursive sets specifically.
A simple attribute set
Let's look at the initialization of an attribute set first. When the Nix interpreter is asked to evaluate this file
{ a = 1 + 1; b = true; }
it parses it and returns a data structure like this
{ a = <thunk 1>; b = <thunk 2>; }
where a thunk is a reference to the relevant syntax tree node and a reference to the "environment", which behaves like a dictionary from identifiers to their values, although implemented more efficiently.
Perhaps the reason we're evaluating this file is because you requested nix-build, which will not just ask for the value of a file, but also traverse the attribute set when it sees that it is one. So nix-build will ask for the value of a, which will be computed from its thunk. When the computation is complete, the memory that held the thunk is assigned the actual value, type = tInt, value.integer = 2.
A recursive attribute set
Nix has a special syntax that combines the functionality of attribute set construction syntax ({ }) and let-binding syntax. This is avoids some repetition when you're constructing attribute sets with some shared values.
For example
let b = 1 + 1;
in { b = b; a = b + 5; }
can be expressed as
rec { b = 1 + 1; a = b + 5; }
Evaluation works in a similar manner.
At first the evaluator returns a representation of the attribute set with all thunks, but this time the thunks reference an new environment that includes all the attributes, on top of the existing lexical scope.
Note that all these representations can be constructed while performing a minimal amount of work.
nix-build traverses attrsets in alphabetic order, so it will evaluate a first. It's a thunk that references the a + syntax node and an environment with b in it. Evaluating this requires evaluating the b syntax node (an ExprVar), which references the environment, where we find the 1 + 1 thunk, which is changed to a tInt of 2 as before.
As you can see, this process of creating thunks but only evaluating them when needed is indeed lazy and allows us to have various language constructs with their own scoping rules.
Haskell implementations usually follow a similar pattern, but may compile the code rather than interpret a syntax tree, and resolve all variable references to constant memory offsets completely. Nix tries to do this to some degree, but it must be able to fall back on strings because of the inadvisable with keyword that makes the scope dynamic.
I guess several things by myself.
In eagar evaluation language, I must declare before use it. So the order of declaration is simple.
int x = 10;
int y = x;
Just for Nix language
In wiki, there isn't any concept comparision with Haskell though let ... in is compared with Haskell.
lexical scope
all variables are lexically scoped.
mutual recursion
https://en.wikipedia.org/wiki/Let_expression#Mutually_recursive_let_expression

Alloy - Dealing with unbounded universal quantifiers

Good afternoon,
I've been experiencing an issue with Alloy when dealing with unbounded universal quantifiers. As explained in Daniel Jackson's book 'Software Abstractions' (Section 5.3 'Unbounded Universal Quantifiers'), Alloy has a subtle limitation regarding universal quantifiers and assertion checking. Alloy produces spurious counterexamples in some cases, such as the next one to check that sets are closed under union (shown in the aforementioned book):
sig Set {
elements: set Element
}
sig Element {}
assert Closed {
all s0, s1: Set | some s2: Set |
s2.elements = s0.elements + s1.elements
}
check Closed for 3
Producing a counterexample such as:
Set = {(S0),(S1)}
Element = {(E0),(E1)}
s0 = {(S0)}
s1 = {(S1)}
elements = {(S0,E0), (S1,E1)}
where the analyser didn't populate Set with enough values (a missing Set atom, S2, containing the union of S0 and S1).
Two solutions to this general problem are suggested then in the book:
1) Declaring a generator axiom to force Alloy to generate all possible instances.
For example:
fact SetGenerator{
some s: Set | no s.elements
all s: Set, e: Element |
some s': Set | s'.elements = s.elements + e
}
This solution, however, produces a scope explosion and may also lead to inconsistencies.
2) Omitting the generator axiom and using the bounded-universal form for constraints. That is, quantified variable's bounding expression doesn't mention the names of generated signatures. However, not every assertion can be expressed in such a form.
My question is: is there any specific rule to choose any of these solutions? It isn't clear to me from the book.
Thanks.
No, there's no specific rule (or at least none that I've come up with). In practice, this doesn't arise very often, so I would deal with each case as it comes up. Do you have a particular example in mind?
Also, bear in mind that sometimes you can formulate your problem with a higher order quantifier (ie a quantifier over a set or relation) and in that case you can use Alloy*, an extension of Alloy that supports higher order analysis.

meaning of Alloy predicate in relational join

Consider the following simple variant of the Address Book example
sig Name, Addr {}
sig Book { addr : Name -> Addr } // no lone on Addr
pred show(b:Book) { some n : Name | #addr[b,n] > 1 }
run show for exactly 2 Book, exactly 2 Addr, exactly 2 Name
In some model instances, I can get the following results in the evaluator
all b:Book | show[b]
--> yields false
some b:Book | show[b]
--> yields true
show[Book]
--> yields true
If show was a relation, then one might expect to get an answer like: { true, false }. Given that it is a predicate, a single Boolean value is returned. I would have expected show[Book] to be a shorthand for the universally quantified expression above it. Instead, it seems to be using existential quantification to fold the results. Anyone know what might be the rational for this, or have another explanation for the meaning of show[Book]?
(I'm not sure I have the correct words for this, so bear with me if this seems fuzzy.)
Bear in mind that all expressions in Alloy that denote individuals denote sets of individuals, and that there is no distinction available in the language between 'individual X' and 'the singleton set whose member is the individual X'. ([Later addendum:] In the terms more usually used: the general rule in Alloy's logic is that all values are relations. Binary relations are sets of pairs, n-ary relations sets of n-tuples, sets are unary relations, and scalars are singleton sets. See the discussion in sec. 3.2.2 of Software Abstractions, or the slide "Everything's a relation" in the Alloy Analyzer 4 tutorial by Greg Dennis and Rob Seater.)
Given the declaration you give of the 'show' predicate, it's easy to expect that the argument of 'show' should be a single Book -- or more correctly, a singleton set of Book --, and then to expect further that if the argument is not actually a singleton set (as in the expression show[Book] here) then the system will coerce it to being a singleton set, or interpret it with some sort of implicit existential or universal quantification. But in the declaration pred show(b:Book) ..., the expression b:Book just names an object b which will be a set of objects in the signature Book. (To require that b be a singleton set, write pred show(one b: Book) ....) The expression which constitutes the body of show is evaluated for b = Book just as readily as for b = Book$0.
The appearance of existential quantification is a consequence of the way the dot operator at the heart of the expression addr[b,n] (or equivalently n.(b.addr) is defined. Actually, if you experiment you'll find that show[Book] is true whenever there is any name for which the set of all books contains a mapping to two different addresses, even in cases where an existential interpretation would fail. Try adding this to your model, for example:
pred hmmmm { show[Book] and no b: Book | show[b] }
run hmmmm for exactly 2 Book, exactly 2 Addr, exactly 2 Name

Union types and Intersection types

What are the various use cases for union types and intersection types? There has been lately a lot of buzz about these type system features, yet somehow I have never felt need for either of these!
Union Types
To quote Robert Harper, "Practical Foundations for Programming
Languages", ch 15:
Most data structures involve
alternatives such as the distinction
between a leaf and an interior node in
a tree, or a choice in the outermost
form of a piece of abstract syntax.
Importantly, the choice determines the
structure of the value. For example,
nodes have children, but leaves do
not, and so forth. These concepts are
expressed by sum types, specifically
the binary sum, which offers a choice
of two things, and the nullary sum,
which offers a choice of no things.
Booleans
The simplest sum type is the Boolean,
data Bool = True
| False
Booleans have only two valid values, T or F. So instead of representing them as numbers, we can instead use a sum type to more accurately encode the fact there are only two possible values.
Enumerations
Enumerations are examples of more general sum types: ones with many, but finite, alternative values.
Sum types and null pointers
The best practically motivating example for sum types is discriminating between valid results and error values returned by functions, by distinguishing the failure case.
For example, null pointers and end-of-file characters are hackish encodings of the sum type:
data Maybe a = Nothing
| Just a
where we can distinguish between valid and invalid values by using the Nothing or Just tag to annotate each value with its status.
By using sum types in this way we can rule out null pointer errors entirely, which is a pretty decent motivating example. Null pointers are entirely due to the inability of older languages to express sum types easily.
Intersection Types
Intersection types are much newer, and their applications are not as widely understood. However, Benjamin Pierce's thesis ("Programming with Intersection Types
and Bounded Polymorphism") gives a good overview:
The most intriguing and potentially
useful property of intersection types
is their ability to express an
essentially unbounded (though of
course finite) amount of information
about the components of a program.
For
example, the addition function (+) can be
given the type Int -> Int -> Int ^ Real -> Real -> Real, capturing both the
general fact that the sum of two real
numbers is always a real and the more
specialized fact that the sum of two
integers is always an integer. A
compiler for a language with
intersection types might even provide
two different object-code sequences
for the two versions of (+), one using a
floating point addition instruction and
one using integer addition. For each
instance of+ in a program, the
compiler can decide whether both
arguments are integers and generate
the more efficient object code sequence
in this case.
This kind of finitary
polymorphism or coherent overloading
is so expressive, that ... the set of
all valid typings for a program
amounts to a complete characterization
of the program’s behavior
They let us encode a lot of information in the type, explaining via type theory what multiple inheritance means, giving types to type classes,
Union types are useful for typing dynamic languages or otherwise allowing more flexibility in the types passed around than most static languages allow. For example, consider this:
var a;
if (condition) {
a = "string";
} else {
a = 123;
}
If you have union types, it's easy to type a as int | string.
One use for intersection types is to describe an object that implements multiple interfaces. For example, C# allows multiple interface constraints on generics:
interface IFoo {
void Foo();
}
interface IBar {
void Bar();
}
void Method<T>(T arg) where T : IFoo, IBar {
arg.Foo();
arg.Bar();
}
Here, arg's type is the intersection of IFoo and IBar. Using that, the type-checker knows both Foo() and Bar() are valid methods on it.
If you want a more practice-oriented answer:
With union and recursive types you can encode regular tree types and therefore XML types.
With intersection types you can type BOTH overloaded functions and refinement types (what in a previous post is called coherent overloading)
So for instance you can write the function add (that overloads integer sum and string concatenation) as follows
let add ( (Int,Int)->Int ; (String,String)->String )
| (x & Int, y & Int) -> x+y
| (x & String, y & String) -> x#y ;;
Which has the intersection type
(Int,Int)->Int & (String,String)->String
But you can also refine the type above and type the function above as
(Pos,Pos) -> Pos &
(Neg,Neg) -> Neg &
(Int,Int)->Int &
(String,String)->String.
where Pos and Neg are positive and negative integer types.
The code above is executable in the language CDuce ( http://www.cduce.org ) whose type system includes union, intersections, and negation types (it is mainly targeted at XML transformations).
If you want to try it and you are on Linux, then it is probably included in your distribution (apt-get install cduce or yum install cduce should do the work) and you can use its toplevel (a la OCaml) to play with union and intersection types. On the CDuce site you will find a lot of practical examples of use of union and intersection types. And since there is a complete integration with OCaml libraries (you can import OCaml libraries in CDuce and export CDuce modules to OCaml) you can also check the correspondence with ML sum types (see here).
Here you are a complex example that mix union and intersection types (explained in the page "http://www.cduce.org/tutorial_overloading.html#val"), but to understand it you need to understand regular expression pattern matching, which requires some effort.
type Person = FPerson | MPerson
type FPerson = <person gender = "F">[ Name Children ]
type MPerson = <person gender = "M">[ Name Children ]
type Children = <children>[ Person* ]
type Name = <name>[ PCDATA ]
type Man = <man name=String>[ Sons Daughters ]
type Woman = <woman name=String>[ Sons Daughters ]
type Sons = <sons>[ Man* ]
type Daughters = <daughters>[ Woman* ]
let fun split (MPerson -> Man ; FPerson -> Woman)
<person gender=g>[ <name>n <children>[(mc::MPerson | fc::FPerson)*] ] ->
(* the above pattern collects all the MPerson in mc, and all the FPerson in fc *)
let tag = match g with "F" -> `woman | "M" -> `man in
let s = map mc with x -> split x in
let d = map fc with x -> split x in
<(tag) name=n>[ <sons>s <daughters>d ] ;;
In a nutshell it transforms values of type Person into values of type (Man | Women) (where the vertical bar denotes a union type) but keeping the correspondence between genres: split is a function with intersection type
MPerson -> Man & FPerson -> Woman
For instance with union types one could describe json domain model without introducing actual new classes but using only type aliases.
type JObject = Map[String, JValue]
type JArray = List[JValue]
type JValue = String | Number | Bool | Null | JObject | JArray
type Json = JObject | JArray
def stringify(json: JValue): String = json match {
case String | Number | Bool | Null => json.toString()
case JObject => "{" + json.map(x y => x + ": " + stringify(y)).mkStr(", ") + "}"
case JArray => "[" + json.map(stringify).mkStr(", ") + "]"
}

Best explanation for languages without null

Every so often when programmers are complaining about null errors/exceptions someone asks what we do without null.
I have some basic idea of the coolness of option types, but I don't have the knowledge or languages skill to best express it. What is a great explanation of the following written in a way approachable to the average programmer that we could point that person towards?
The undesirability of having references/pointers be nullable by default
How option types work including strategies to ease checking null cases such as
pattern matching and
monadic comprehensions
Alternative solution such as message eating nil
(other aspects I missed)
I think the succinct summary of why null is undesirable is that meaningless states should not be representable.
Suppose I'm modeling a door. It can be in one of three states: open, shut but unlocked, and shut and locked. Now I could model it along the lines of
class Door
private bool isShut
private bool isLocked
and it is clear how to map my three states into these two boolean variables. But this leaves a fourth, undesired state available: isShut==false && isLocked==true. Because the types I have selected as my representation admit this state, I must expend mental effort to ensure that the class never gets into this state (perhaps by explicitly coding an invariant). In contrast, if I were using a language with algebraic data types or checked enumerations that lets me define
type DoorState =
| Open | ShutAndUnlocked | ShutAndLocked
then I could define
class Door
private DoorState state
and there are no more worries. The type system will ensure that there are only three possible states for an instance of class Door to be in. This is what type systems are good at - explicitly ruling out a whole class of errors at compile-time.
The problem with null is that every reference type gets this extra state in its space that is typically undesired. A string variable could be any sequence of characters, or it could be this crazy extra null value that doesn't map into my problem domain. A Triangle object has three Points, which themselves have X and Y values, but unfortunately the Points or the Triangle itself might be this crazy null value that is meaningless to the graphing domain I'm working in. Etc.
When you do intend to model a possibly-non-existent value, then you should opt into it explicitly. If the way I intend to model people is that every Person has a FirstName and a LastName, but only some people have MiddleNames, then I would like to say something like
class Person
private string FirstName
private Option<string> MiddleName
private string LastName
where string here is assumed to be a non-nullable type. Then there are no tricky invariants to establish and no unexpected NullReferenceExceptions when trying to compute the length of someone's name. The type system ensures that any code dealing with the MiddleName accounts for the possibility of it being None, whereas any code dealing with the FirstName can safely assume there is a value there.
So for example, using the type above, we could author this silly function:
let TotalNumCharsInPersonsName(p:Person) =
let middleLen = match p.MiddleName with
| None -> 0
| Some(s) -> s.Length
p.FirstName.Length + middleLen + p.LastName.Length
with no worries. In contrast, in a language with nullable references for types like string, then assuming
class Person
private string FirstName
private string MiddleName
private string LastName
you end up authoring stuff like
let TotalNumCharsInPersonsName(p:Person) =
p.FirstName.Length + p.MiddleName.Length + p.LastName.Length
which blows up if the incoming Person object does not have the invariant of everything being non-null, or
let TotalNumCharsInPersonsName(p:Person) =
(if p.FirstName=null then 0 else p.FirstName.Length)
+ (if p.MiddleName=null then 0 else p.MiddleName.Length)
+ (if p.LastName=null then 0 else p.LastName.Length)
or maybe
let TotalNumCharsInPersonsName(p:Person) =
p.FirstName.Length
+ (if p.MiddleName=null then 0 else p.MiddleName.Length)
+ p.LastName.Length
assuming that p ensures first/last are there but middle can be null, or maybe you do checks that throw different types of exceptions, or who knows what. All these crazy implementation choices and things to think about crop up because there's this stupid representable-value that you don't want or need.
Null typically adds needless complexity. Complexity is the enemy of all software, and you should strive to reduce complexity whenever reasonable.
(Note well that there is more complexity to even these simple examples. Even if a FirstName cannot be null, a string can represent "" (the empty string), which is probably also not a person name that we intend to model. As such, even with non-nullable strings, it still might be the case that we are "representing meaningless values". Again, you could choose to battle this either via invariants and conditional code at runtime, or by using the type system (e.g. to have a NonEmptyString type). The latter is perhaps ill-advised ("good" types are often "closed" over a set of common operations, and e.g. NonEmptyString is not closed over .SubString(0,0)), but it demonstrates more points in the design space. At the end of the day, in any given type system, there is some complexity it will be very good at getting rid of, and other complexity that is just intrinsically harder to get rid of. The key for this topic is that in nearly every type system, the change from "nullable references by default" to "non-nullable references by default" is nearly always a simple change that makes the type system a great deal better at battling complexity and ruling out certain types of errors and meaningless states. So it is pretty crazy that so many languages keep repeating this error again and again.)
The nice thing about option types isn't that they're optional. It is that all other types aren't.
Sometimes, we need to be able to represent a kind of "null" state. Sometimes we have to represent a "no value" option as well as the other possible values a variable may take. So a language that flat out disallows this is going to be a bit crippled.
But often, we don't need it, and allowing such a "null" state only leads to ambiguity and confusion: every time I access a reference type variable in .NET, I have to consider that it might be null.
Often, it will never actually be null, because the programmer structures the code so that it can never happen. But the compiler can't verify that, and every single time you see it, you have to ask yourself "can this be null? Do I need to check for null here?"
Ideally, in the many cases where null doesn't make sense, it shouldn't be allowed.
That's tricky to achieve in .NET, where nearly everything can be null. You have to rely on the author of the code you're calling to be 100% disciplined and consistent and have clearly documented what can and cannot be null, or you have to be paranoid and check everything.
However, if types aren't nullable by default, then you don't need to check whether or not they're null. You know they can never be null, because the compiler/type checker enforces that for you.
And then we just need a back door for the rare cases where we do need to handle a null state. Then an "option" type can be used. Then we allow null in the cases where we've made a conscious decision that we need to be able to represent the "no value" case, and in every other case, we know that the value will never be null.
As others have mentioned, in C# or Java for example, null can mean one of two things:
the variable is uninitialized. This should, ideally, never happen. A variable shouldn't exist unless it is initialized.
the variable contains some "optional" data: it needs to be able to represent the case where there is no data. This is sometimes necessary. Perhaps you're trying to find an object in a list, and you don't know in advance whether or not it's there. Then we need to be able to represent that "no object was found".
The second meaning has to be preserved, but the first one should be eliminated entirely. And even the second meaning should not be the default. It's something we can opt in to if and when we need it. But when we don't need something to be optional, we want the type checker to guarantee that it will never be null.
All of the answers so far focus on why null is a bad thing, and how it's kinda handy if a language can guarantee that certain values will never be null.
They then go on to suggest that it would be a pretty neat idea if you enforce non-nullability for all values, which can be done if you add a concept like Option or Maybe to represent types that may not always have a defined value. This is the approach taken by Haskell.
It's all good stuff! But it doesn't preclude the use of explicitly nullable / non-null types to achieve the same effect. Why, then, is Option still a good thing? After all, Scala supports nullable values (is has to, so it can work with Java libraries) but supports Options as well.
Q. So what are the benefits beyond being able to remove nulls from a language entirely?
A. Composition
If you make a naive translation from null-aware code
def fullNameLength(p:Person) = {
val middleLen =
if (null == p.middleName)
p.middleName.length
else
0
p.firstName.length + middleLen + p.lastName.length
}
to option-aware code
def fullNameLength(p:Person) = {
val middleLen = p.middleName match {
case Some(x) => x.length
case _ => 0
}
p.firstName.length + middleLen + p.lastName.length
}
there's not much difference! But it's also a terrible way to use Options... This approach is much cleaner:
def fullNameLength(p:Person) = {
val middleLen = p.middleName map {_.length} getOrElse 0
p.firstName.length + middleLen + p.lastName.length
}
Or even:
def fullNameLength(p:Person) =
p.firstName.length +
p.middleName.map{length}.getOrElse(0) +
p.lastName.length
When you start dealing with List of Options, it gets even better. Imagine that the List people is itself optional:
people flatMap(_ find (_.firstName == "joe")) map (fullNameLength)
How does this work?
//convert an Option[List[Person]] to an Option[S]
//where the function f takes a List[Person] and returns an S
people map f
//find a person named "Joe" in a List[Person].
//returns Some[Person], or None if "Joe" isn't in the list
validPeopleList find (_.firstName == "joe")
//returns None if people is None
//Some(None) if people is valid but doesn't contain Joe
//Some[Some[Person]] if Joe is found
people map (_ find (_.firstName == "joe"))
//flatten it to return None if people is None or Joe isn't found
//Some[Person] if Joe is found
people flatMap (_ find (_.firstName == "joe"))
//return Some(length) if the list isn't None and Joe is found
//otherwise return None
people flatMap (_ find (_.firstName == "joe")) map (fullNameLength)
The corresponding code with null checks (or even elvis ?: operators) would be painfully long. The real trick here is the flatMap operation, which allows for the nested comprehension of Options and collections in a way that nullable values can never achieve.
Since people seem to be missing it: null is ambiguous.
Alice's date-of-birth is null. What does it mean?
Bob's date-of-death is null. What does that mean?
A "reasonable" interpretation might be that Alice's date-of-birth exists but is unknown, whereas Bob's date-of-death does not exist (Bob is still alive). But why did we get to different answers?
Another problem: null is an edge case.
Is null = null?
Is nan = nan?
Is inf = inf?
Is +0 = -0?
Is +0/0 = -0/0?
The answers are usually "yes", "no", "yes", "yes", "no", "yes" respectively. Crazy "mathematicians" call NaN "nullity" and say it compares equal to itself. SQL treats nulls as not equal to anything (so they behave like NaNs). One wonders what happens when you try to store ±∞, ±0, and NaNs into the same database column (there are 253 NaNs, half of which are "negative").
To make matters worse, databases differ in how they treat NULL, and most of them aren't consistent (see NULL Handling in SQLite for an overview). It's pretty horrible.
And now for the obligatory story:
I recently designed a (sqlite3) database table with five columns a NOT NULL, b, id_a, id_b NOT NULL, timestamp. Because it's a generic schema designed to solve a generic problem for fairly arbitrary apps, there are two uniqueness constraints:
UNIQUE(a, b, id_a)
UNIQUE(a, b, id_b)
id_a only exists for compatibility with an existing app design (partly because I haven't come up with a better solution), and is not used in the new app. Because of the way NULL works in SQL, I can insert (1, 2, NULL, 3, t) and (1, 2, NULL, 4, t) and not violate the first uniqueness constraint (because (1, 2, NULL) != (1, 2, NULL)).
This works specifically because of how NULL works in a uniqueness constraint on most databases (presumably so it's easier to model "real-world" situations, e.g. no two people can have the same Social Security Number, but not all people have one).
FWIW, without first invoking undefined behaviour, C++ references cannot "point to" null, and it's not possible to construct a class with uninitialized reference member variables (if an exception is thrown, construction fails).
Sidenote: Occasionally you might want mutually-exclusive pointers (i.e. only one of them can be non-NULL), e.g. in a hypothetical iOS type DialogState = NotShown | ShowingActionSheet UIActionSheet | ShowingAlertView UIAlertView | Dismissed. Instead, I'm forced to do stuff like assert((bool)actionSheet + (bool)alertView == 1).
The undesirability of having having references/pointers be nullable by default.
I don't think this is the main issue with nulls, the main issue with nulls is that they can mean two things:
The reference/pointer is uninitialized: the problem here is the same as mutability in general. For one, it makes it more difficult to analyze your code.
The variable being null actually means something: this is the case which Option types actually formalize.
Languages which support Option types typically also forbid or discourage the use of uninitialized variables as well.
How option types work including strategies to ease checking null cases such as pattern matching.
In order to be effective, Option types need to be supported directly in the language. Otherwise it takes a lot of boiler-plate code to simulate them. Pattern-matching and type-inference are two keys language features making Option types easy to work with. For example:
In F#:
//first we create the option list, and then filter out all None Option types and
//map all Some Option types to their values. See how type-inference shines.
let optionList = [Some(1); Some(2); None; Some(3); None]
optionList |> List.choose id //evaluates to [1;2;3]
//here is a simple pattern-matching example
//which prints "1;2;None;3;None;".
//notice how value is extracted from op during the match
optionList
|> List.iter (function Some(value) -> printf "%i;" value | None -> printf "None;")
However, in a language like Java without direct support for Option types, we'd have something like:
//here we perform the same filter/map operation as in the F# example.
List<Option<Integer>> optionList = Arrays.asList(new Some<Integer>(1),new Some<Integer>(2),new None<Integer>(),new Some<Integer>(3),new None<Integer>());
List<Integer> filteredList = new ArrayList<Integer>();
for(Option<Integer> op : list)
if(op instanceof Some)
filteredList.add(((Some<Integer>)op).getValue());
Alternative solution such as message eating nil
Objective-C's "message eating nil" is not so much a solution as an attempt to lighten the head-ache of null checking. Basically, instead of throwing a runtime exception when trying to invoke a method on a null object, the expression instead evaluates to null itself. Suspending disbelief, it's as if each instance method begins with if (this == null) return null;. But then there is information loss: you don't know whether the method returned null because it is valid return value, or because the object is actually null. It's a lot like exception swallowing, and doesn't make any progress addressing the issues with null outlined before.
Assembly brought us addresses also known as untyped pointers. C mapped them directly as typed pointers but introduced Algol's null as a unique pointer value, compatible with all typed pointers. The big issue with null in C is that since every pointer can be null, one never can use a pointer safely without a manual check.
In higher-level languages, having null is awkward since it really conveys two distinct notions:
Telling that something is undefined.
Telling that something is optional.
Having undefined variables is pretty much useless, and yields to undefined behavior whenever they occur. I suppose everybody will agree that having things undefined should be avoided at all costs.
The second case is optionality and is best provided explicitly, for instance with an option type.
Let's say we're in a transport company and we need to create an application to help create a schedule for our drivers. For each driver, we store a few informations such as: the driving licences they have and the phone number to call in case of emergency.
In C we could have:
struct PhoneNumber { ... };
struct MotorbikeLicence { ... };
struct CarLicence { ... };
struct TruckLicence { ... };
struct Driver {
char name[32]; /* Null terminated */
struct PhoneNumber * emergency_phone_number;
struct MotorbikeLicence * motorbike_licence;
struct CarLicence * car_licence;
struct TruckLicence * truck_licence;
};
As you observe, in any processing over our list of drivers we'll have to check for null pointers. The compiler won't help you, the safety of the program relies on your shoulders.
In OCaml, the same code would look like this:
type phone_number = { ... }
type motorbike_licence = { ... }
type car_licence = { ... }
type truck_licence = { ... }
type driver = {
name: string;
emergency_phone_number: phone_number option;
motorbike_licence: motorbike_licence option;
car_licence: car_licence option;
truck_licence: truck_licence option;
}
Let's now say that we want to print the names of all the drivers along with their truck licence numbers.
In C:
#include <stdio.h>
void print_driver_with_truck_licence_number(struct Driver * driver) {
/* Check may be redundant but better be safe than sorry */
if (driver != NULL) {
printf("driver %s has ", driver->name);
if (driver->truck_licence != NULL) {
printf("truck licence %04d-%04d-%08d\n",
driver->truck_licence->area_code
driver->truck_licence->year
driver->truck_licence->num_in_year);
} else {
printf("no truck licence\n");
}
}
}
void print_drivers_with_truck_licence_numbers(struct Driver ** drivers, int nb) {
if (drivers != NULL && nb >= 0) {
int i;
for (i = 0; i < nb; ++i) {
struct Driver * driver = drivers[i];
if (driver) {
print_driver_with_truck_licence_number(driver);
} else {
/* Huh ? We got a null inside the array, meaning it probably got
corrupt somehow, what do we do ? Ignore ? Assert ? */
}
}
} else {
/* Caller provided us with erroneous input, what do we do ?
Ignore ? Assert ? */
}
}
In OCaml that would be:
open Printf
(* Here we are guaranteed to have a driver instance *)
let print_driver_with_truck_licence_number driver =
printf "driver %s has " driver.name;
match driver.truck_licence with
| None ->
printf "no truck licence\n"
| Some licence ->
(* Here we are guaranteed to have a licence *)
printf "truck licence %04d-%04d-%08d\n"
licence.area_code
licence.year
licence.num_in_year
(* Here we are guaranteed to have a valid list of drivers *)
let print_drivers_with_truck_licence_numbers drivers =
List.iter print_driver_with_truck_licence_number drivers
As you can see in this trivial example, there is nothing complicated in the safe version:
It's terser.
You get much better guarantees and no null check is required at all.
The compiler ensured that you correctly dealt with the option
Whereas in C, you could just have forgotten a null check and boom...
Note : these code samples where not compiled, but I hope you got the ideas.
Microsoft Research has a intersting project called
Spec#
It is a C# extension with not-null type and some mechanism to check your objects against not being null, although, IMHO, applying the design by contract principle may be more appropriate and more helpful for many troublesome situations caused by null references.
Robert Nystrom offers a nice article here:
http://journal.stuffwithstuff.com/2010/08/23/void-null-maybe-and-nothing/
describing his thought process when adding support for absence and failure to his Magpie programming language.
Coming from .NET background, I always thought null had a point, its useful. Until I came to know of structs and how easy it was working with them avoiding a lot of boilerplate code. Tony Hoare speaking at QCon London in 2009, apologized for inventing the null reference. To quote him:
I call it my billion-dollar mistake. It was the invention of the null
reference in 1965. At that time, I was designing the first
comprehensive type system for references in an object oriented
language (ALGOL W). My goal was to ensure that all use of references
should be absolutely safe, with checking performed automatically by
the compiler. But I couldn't resist the temptation to put in a null
reference, simply because it was so easy to implement. This has led to
innumerable errors, vulnerabilities, and system crashes, which have
probably caused a billion dollars of pain and damage in the last forty
years. In recent years, a number of program analysers like PREfix and
PREfast in Microsoft have been used to check references, and give
warnings if there is a risk they may be non-null. More recent
programming languages like Spec# have introduced declarations for
non-null references. This is the solution, which I rejected in 1965.
See this question too at programmers
I've always looked at Null (or nil) as being the absence of a value.
Sometimes you want this, sometimes you don't. It depends on the domain you are working with. If the absence is meaningful: no middle name, then your application can act accordingly. On the other hand if the null value should not be there: The first name is null, then the developer gets the proverbial 2 a.m. phone call.
I've also seen code overloaded and over-complicated with checks for null. To me this means one of two things:
a) a bug higher up in the application tree
b) bad/incomplete design
On the positive side - Null is probably one of the more useful notions for checking if something is absent, and languages without the concept of null will endup over-complicating things when it's time to do data validation. In this case, if a new variable is not initialized, said languagues will usually set variables to an empty string, 0, or an empty collection. However, if an empty string or 0 or empty collection are valid values for your application -- then you have a problem.
Sometimes this circumvented by inventing special/weird values for fields to represent an uninitialized state. But then what happens when the special value is entered by a well-intentioned user? And let's not get into the mess this will make of data validation routines.
If the language supported the null concept all the concerns would vanish.
Vector languages can sometimes get away with not having a null.
The empty vector serves as a typed null in this case.

Resources