Why don't we have two nulls? - programming-languages

Why don't we have two nulls? - programming-languages

I've often wondered why languages with a null representing "no value" don't differentiate between the passive "I don't know what the value is" and the more assertive "There is no value.".
There have been several cases where I'd have liked to differentiate between the two (especially when working with user-input and databases).
I imagine the following, where we name the two states unknown and null:
var apple;
while (apple is unknown)
{
askForApple();
}
if (apple is null)
{
sulk();
}
else
{
eatApple(apple);
}
Obviously, we can get away without it by manually storing the state somwhere else, but we can do that for nulls too.
So, if we can have one null, why can't we have two?

Isn't is bad enough that we have one null?

In my programming, I recently adopted the practice of differentiating "language null" and "domain null".
The "language null" is the special value that is provided by the programming language to express that a variable has "no value". It is needed as dummy value in data structures, parameter lists, and return values.
The "domain null" is any number of objects that implement the NullObject design pattern. Actually, you have one distinct domain null for each domain context.
It is fairly common for programmers to use the language null as a catch-all domain null, but I have found that it tends to make code more procedural (less object oriented) and the intent harder to discern.
Every time to want a null, ask yourself: is that a language null, or a domain null?

In most programming languages null means "empty" or "undefined". "Unknown" on the other hand is something different. In essence "unknown" describes the state of the object. This state must have come from somewhere in your program.
Have a look at the Null Object pattern. It may help you with what you are trying to achieve.

javascript actually has both null and undefined (http://www.w3schools.com/jsref/jsref_undefined.asp), but many other languages don't.

It would be easy enough to create a static constant indicating unknown, for the rare cases when you'd need such a thing.
var apple = Apple.Unknown;
while (apple == Apple.Unknown) {} // etc

Existence of value:
Python: vars().has_key('variableName')
PHP: isset(variable)
JavaScript: typeof(variable) != 'undefined'
Perl: (variable != undef) or if you wish: (defined variable)
Of course, when variable is undefined, it's not NULL

Why stop at two?
When I took databases in college, we were told that somebody (sorry, don't remember the name of the researcher or paper) had looked at a bunch of db schemas and found that null had something like 17 different meanings: "don't know yet", "can't be known", "doesn't apply", "none", "empty", "action not taken", "field not used", and so on.

In haskell you can define something like this:
data MaybeEither a b = Object a
| Unknown b
| Null
deriving Eq
main = let x = Object 5 in
if x == (Unknown [2]) then putStrLn ":-("
else putStrLn ":-)"
The idea being that Unknown values hold some data of type b that can transform them into known values (how you'd do that depends on the concrete types a and b).
The observant reader will note that I'm just combining Maybe and Either into one data type :)

The Null type is a subtype of all reference types - you can use null instead of a reference to any type of object - which severely weakens the type system. It is considered one of the a historically bad idea by its creator, and only exists as checking whether an address is zero is easy to implement.

As to why we don't have two nulls, is it down to the fact that, historically in C, NULL was a simple #define and not a distinct part of the language at all?

Within PHP Strict you need to do an isset() check for set variables (or else it throws a warning)
if(!isset($apple))
{
askForApple();
}
if(isset($apple) && empty($apple))
{
sulk();
}
else
{
eatApple();
}

In .net langages, you can use nullable types, which address this problem for value types.
The problem remains, however, for reference types. Since there's no such thing as pointers in .net (at least in 'safe' blocks), "object? o" won't compile.

Note null is an acceptable, yet known condition. An unknown state is a different thing IMO. My conversation with Dan in the comments' section of the top post will clarify my position. Thanks Dan!.
What you probably want to query is whether the object was initialized or not.
Actionscript has such a thing (null and undefined). With some restrictions however.
See documentation:
void data type
The void data type contains only one value, undefined. In previous versions of ActionScript, undefined was the default value for instances of the Object class. In ActionScript 3.0, the default value for Object instances is null. If you attempt to assign the value undefined to an instance of the Object class, Flash Player or Adobe AIR will convert the value to null. You can only assign a value of undefined to variables that are untyped. Untyped variables are variables that either lack any type annotation, or use the asterisk (*) symbol for type annotation. You can use void only as a return type annotation.

Some people will argue that we should be rid of null altogether, which seems fairly valid. After all, why stop at two nulls? Why not three or four and so on, each representing a "no value" state?
Imagine this, with refused, null, invalid:
var apple;
while (apple is refused)
{
askForApple();
}
if (apple is null)
{
sulk();
}
else if(apple is invalid)
{
discard();
}
else
{
eatApple(apple);
}

It's been tried: Visual Basic 6 had Nothing, Null, and Empty. And it led to such poor code, it featured at #12 in the legendary Thirteen Ways to Loathe VB article in Dr Dobbs.
Use the Null Object pattern instead, as others have suggested.

The problem is that in a strongly typed language these extra nulls are expected to hold specific information about the type.
Basically your extra null is meta-information of a sort, meta-information that can depend on type.
Some value types have this extra information, for instance many numeric types have a NaN constant.
In a dynamically typed language you have to account for the difference between a reference without a value (null) and a variable where the type could be anything (unknown or undefined)
So, for instance, in statically typed C# a variable of type String can be null, because it is a reference type. A variable of type Int32 cannot, because it is a value type it cannot be null. We always know the type.
In dynamically typed Javascript a variable's type can be left undefined, in which case the distinction between a null reference and an undefined value is needed.

Some people are one step ahead of you already. ;)

boolean getAnswer() throws Mu

Given how long it took Western philosophy to figure out how it was possible to talk about the concept of "nothing"... Yeah, I'm not too surprised the distinction got overlooked for a while.

I think having one NULL is a lower-common denominator to deal with the basic pattern
if thing is not NULL
work with it
else
do something else
In the "do something else" part, there are a wide range of possibilities from "okay, forget it" to trying to get "thing" somewhere else. If you don't simply ignore something that's NULL, you probably need to know why "thing" was NULL. Having multiple types of NULL, would help you answering that question, but the possible answers are numerous as hinted in the other answers here. The missing thing could be simply a bug, it could be an error when trying to get it, it may not be available right now, and so on. To decide which cases apply to your code -- which means you have to handle them -- it domain specific. So it's better to use an application defined mechanism to encode these reasons instead of finding a language feature that tries to deal with all of them.

It's because Null is an artifact of the language you're using, not a programmer convenience. It describes a naturally occurring state of the object in the context in which it is being used.

If you are using .NET 3.0+ and need something else, you might try the Maybe Monad. You could create whatever "Maybe" types you need and, using LINQ syntax, process accordingly.

AppleInformation appleInfo;
while (appleInfo is null)
{
askForAppleInfo();
}
Apple apple = appleInfo.apple;
if (apple is null)
{
sulk();
}
else
{
eatApple(apple);
}
First you check if you have the apple info, later you check if there is an apple or not. You don't need different native language support for that, just use the right classes.

For me null represents lack of value, and I try to use it only to represent that. Of course you can give null as many meanings as you like, just like you can use 0 or -1 to represent errors instead of their numerical values. Giving various meanings to one representation could be ambiguous though, so I wouldn't recommend it.
Your examples can be coded like apple.isRefused() or !apple.isValid() with little work; you should define beforehand what is an invalid apple anyway, so I don't see the gain of having more keywords.

You can always create an object and assign it to same static field to get a 2nd null.
For example, this is used in collections that allow elements to be null. Internally they use a private static final Object UNSET = new Object which is used as unset value and thus allows you to store nulls in the collection. (As I recall, Java's collection framework calls this object TOMBSTONE instead of UNSET. Or was this Smalltalk's collection framework?)

VB6
Nothing => "There is no value."
Null = > "I don't know what the value is" - Same as DBNull.Value in .NET

Two nulls would be the wrongest answer around. If one null is not enough, you need infinity nulls.
Null Could mean:
'Uninitialized'
'User didn't specify'
'Not Applicable here, The color of a car before it has been painted'
'Unity: This domain has zero bits of information.'
'Empty: this correctly holds no data in this case, for example the last time the tires were rotated on a new car'
'Multiple, Cascading nulls: for instance, the extension of a quantity price when no quantity may be specified times a quantity which wasn't specified by the user anyway'
And your particular domain may need many other kinds of 'out of band' values. Really, these values are in the domain, and need to have a well defined meaning in each case. (ergo, infinity really is zero)

Related

Shall I set an empty string computed string attribute for Terraform resource?

context: I'm adding a new resource to TF Provider.
I've got an API that optionally return a string attribute so I represent it as:
"foo": {
Type: schema.TypeString,
Computed: true,
Optional: true,
},
Question: if an API returns value not set / empty string for response.foo, shall I still set an empty string for foo attribute or I shouldn't set any value instead (e.g., null)?
in my resource schema.

(Hello! I'm the same person who wrote the answer you included in your screenshot.)
If both approaches -- returning null or returning an empty string -- were equally viable from a technical standpoint then I would typically prefer to use null to represent the absence of a value, since that is clearly distinct from an empty string which for some situations would otherwise be a valid present value for the attribute.
However, since it seems like you are using the old SDK ("SDKv2") here, you will probably be constrained from a technical standpoint: SDKv2 was designed for Terraform v0.11 and earlier and so it predates the idea of attributes being null and so there is no way in its API to specify that. You may be able to "trick" the SDK into effectively returning null by not calling d.Set("foo", ...) at all in your Create function, but there is no API provided to unset an attribute and so once you've set it to something non-null there would typically be no way to get it to go back to being null again.
Given that, I'd suggest it better to be consistent and always use "" when using the old SDK, because that way users of the provider won't have to deal with the inconsistency of the value sometimes being null and sometimes being "" in this case.
When using the modern Terraform Plugin Framework this limitation doesn't apply, because that framework was designed with the modern Terraform language in mind. You aren't using that framework and so this part of the answer probably won't help you right now, but I'm mentioning it just in case someone else finds this answer in future who might already be using or be considering use of the new framework.

Using Roslyn, if I have an IdentifierNameSyntax, can I find the member type it refers to (field, property, method...)

I am attempting to use the Roslyn SDK and StackExchange.Precompilation (thank you!) to implement aspect-oriented programming in C#6. My specific problem right now is, starting with an IdentifierNameSyntax instance, I want to find the "member type" (method, property, field, var, etc.) that the identifier refers to. (How) can this be done?
Background:
The first proof-of-concept I am working on is some good old design-by-contract. I have a NonNullAttribute which can be applied to parameters, properties, or method return values. Along with the attribute there is a class implementing the StackExchange.Precompilation.ICompileModule interface, which on compilation will insert null checks on the marked parameters or return values.
This is the same idea as PostSharp's NonNullAttribute, but the transformation is being done on one of Roslyn's syntax trees, not on an already compiled assembly. It is also similar to Code Contracts, but with a declarative attribute approach, and again operating on syntax trees not IL.
For example, this source code:
[return: NonNull]
public string Capitalize([NonNull] string text) {
return text.ToUpper();
}
will be transformed into this during precompilation:
[return: NonNull]
public string Capitalize([NonNull] string text) {
if (Object.Equals(text, null))
throw new ArgumentNullException(nameof(text));
var result = text.ToUpper();
if (Object.Equals(result, null))
throw new PostconditionFailedException("Result cannot be null.");
return result;
}
(PostconditionFailedException is a custom exception I made to compliment ArgumentException for return values. If there is already something like this in the framework please let me know.)
For properties with this attribute, there would be a similar transformation, but with preconditions and postconditions implemented separately in the set and get accessors, respectively.
The specific reason I need to find the "member type" of an identifier here is for an optimization on implementing postconditions. Note in the post-compilation sample above, the value that would have been returned is stored in a local variable, checked, and then the local is returned. This storage is necessary for transforming return statements that evaluate a method or complex expression, but if the returned expression is just a field or local variable reference, creating that temporary storage local is wasteful.
So, when the return statement is being scanned, I first check if the statement is of the form ReturnKeyword-IdentifierSyntaxToken-SemicolonToken. If so, I then need to check what that identifier refers to, so I avoid that local variable allocation if the referent is a field or var.
Update
For more context, check out the project this is in reference to on GitHub.

You'll need to use SemanticModel.GetSymbolInfo to determine the symbol an identifier binds to.

Use SemanticModel.GetTypeInfo.Type to obtain the TypeInfo and use it to explore the Type

what does getType do in antlr4?

This question is with reference to the Cymbol code from the book (~ page 143) :
int t = ctx.type().start.getType(); // in DefPhase.enterFunctionDecl()
Symbol.Type type = CheckSymbols.getType(t);
What does each component return: "ctx.type()", "start", "getType()" ? The book does not contain any explanation about these names.
I can "kind of" understand that "ctx.type()" refers to the "type" rule, and "getType()" returns the number associated with it. But what exactly does the "start" do?
Also, to generalize this question: what is the mechanism to get the value/structure returned by a rule - especially in the context of usage in a listener?
I can see that for an ID, it is:
String name = ctx.ID().getText();
And as in above, for an enumeration of keywords it is via "start.getType()". Any other special kinds of access that I should be aware of?

Lets disassemble problem step by step. Obviously, ctx is instance of CymbolParser.FunctionDeclContext. On page 98-99 you can see how grammar and ParseTree are implemented (at least the feeling - for real implementation please see th .g4 file).
Take a look at the figure of AST on page 99 - you can see that node FunctionDeclContext has a several children, one labeled type. Intuitively you see that it somehow correspond with function return-type. This is the node you retrieve when calling CymbolParser.FunctionDeclContext::type. The return type is probably sth like TypeContext.
Note that methods without 'get' at the beginning are usually children-getters - e.g. you can access the block by calling CymbolParser.FunctionDeclContext::block.
So you got the type context of the method you got passed. You can call either begin or end on any context to get first of last Token defining the context. Simply start gets you "the first word". In this case, the first Token is of course the function return-type itsef, e.g. int.
And the last call - Token::getType returns integral representation of Token.
You can find more information at API reference webpages - Context, Token. But the best way of understanding the behavior is reading through the generated ANTLR classes such as <GrammarName>Parser etc. And to be complete, I attach a link to the book.

Why assign a reference to a struct in go?

I'm having a look at the code at this page:
http://golang.org/pkg/net/http/
And there's one thing I don't understand - at some point, a new structure is created and initialized like this:
client := &http.Client{
CheckRedirect: redirectPolicyFunc,
}
Why use & when creating this structure?
I've also read this blog post and structs are initialized like this:
r := Rectangle{}
What is the difference between both and how should I know which one to use?

The difference is in the type of your variable.
client := &http.Client{
makes client of type *http.Client
while
client := http.Client{
builds a http.Client.

The top one is returning a pointer. It is a Go idiom instead of using new. The second one is just a value object. If you need a pointer use the top.
Check the effective go doc for more about this
http://golang.org/doc/effective_go.html#allocation_new

In object-oriented programming, in order for an object to have dynamic lifetime (i.e. not tied to the current function call), it needs to be dynamically allocated in a place other than the current stack frame, thus you manipulate the object through a pointer. This is such a common pattern that in many object-oriented languages, including Java, Python, Ruby, Objective-C, Smalltalk, JavaScript, and others, you can only deal with pointers to objects, never with an "object as a value" itself. (Some languages though, like C++, do allow you to have "objects as values"; it comes with the RAII idiom which adds some complexity.)
Go is not an object-oriented language, but its ability to define custom types and define methods that operates on that custom type, can be made to work very much like classes and methods. Returning a pointer to the type from the "constructor" function allows the "object" to have a dynamic lifetime.

When we use reference, we use a single item throughout the program runtime. Even if we assign that to a new variable or pass through a function. But when we use value, we make new copies of individual items.
( Reference is not right word according to golang convention. "Address of value" would be more appropriate here https://golang.org/ref/spec#Package_initialization )
An example will make it much clear I hope.
type Employee struct {
ID int
Name string
Address string
}
func main() {
andy := &Employee{}
andy.Name = "Andy"
brad := andy
brad.Name = "Brad"
fmt.Println(andy.Name)
}
The result of this code block would be:
Brad
As we made new variable from it but still referring to same data. But if we use value instead of reference and keep the rest of the code same.
// from
andy := &Employee{}
// to
andy := Employee{}
This time the result would be:
Andy
As this time they both are individual items and not referring to same data anymore.

should it be allowed to change the method signature in a non statically typed language

Hypothetic and academic question.
pseudo-code:
<pre><code>
class Book{
read(theReader)
}
class BookWithMemory extends Book {
read(theReader, aTimestamp = null)
}
</pre></code>
Assuming:
an interface (if supported) would prohibit it
default value for parameters are supported
Notes:
PHP triggers an strict standards error for this.

I'm not surprised that PHP strict mode complains about such an override. It's very easy for a similar situation to arise unintentionally in which part of a class hierarchy was edited to use a new signature and a one or a few classes have fallen out of sync.
To avoid the ambiguity, name the new method something different (for this example, maybe readAt?), and override read to call readAt in the new class. This makes the intent plain to the interpreter as well as anyone reading the code.
The actual behavior in such a case is language-dependent -- more specifically, it depends on how much of the signature makes up the method selector, and how parameters are passed.
If the name alone is the selector (as in PHP or Perl), then it's down to how the language handles mismatched method parameter lists. If default arguments are processed at the call site based on the static type of the receiver instead of at the callee's entry point, when called through a base class reference you'd end up with an undefined argument value instead of your specified default, similarly to what would happen if there was no default specified.
If the number of parameters (with or without their types) are part of the method selector (as in Erlang or E), as is common in dynamic languages that run on JVM or CLR, you have two different methods. Create a new overload taking additional arguments, and override the base method with one that calls the new overload with default argument values.

If I am reading the question correctly, this question seems very language specific (as in it is not applicable to all dynamic languages), as I know you can do this in ruby.
class Book
def read(book)
puts book
end
end
class BookWithMemory < Book
def read(book,aTimeStamp = nil)
super book
puts aTimeStamp
end
end
I am not sure about dynamic languages besides ruby. This seems like a pretty subjective question as well, as at least two languages were designed on either side of the issue (method overloading vs not: ruby vs php).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why don't we have two nulls? - programming-languages

Isn't is bad enough that we have one null?

javascript actually has both null and undefined (http://www.w3schools.com/jsref/jsref_undefined.asp), but many other languages don't.

It would be easy enough to create a static constant indicating unknown, for the rare cases when you'd need such a thing. var apple = Apple.Unknown; while (apple == Apple.Unknown) {} // etc

Existence of value: Python: vars().has_key('variableName') PHP: isset(variable) JavaScript: typeof(variable) != 'undefined' Perl: (variable != undef) or if you wish: (defined variable) Of course, when variable is undefined, it's not NULL

The Null type is a subtype of all reference types - you can use null instead of a reference to any type of object - which severely weakens the type system. It is considered one of the a historically bad idea by its creator, and only exists as checking whether an address is zero is easy to implement.

As to why we don't have two nulls, is it down to the fact that, historically in C, NULL was a simple #define and not a distinct part of the language at all?

Within PHP Strict you need to do an isset() check for set variables (or else it throws a warning) if(!isset($apple)) { askForApple(); } if(isset($apple) && empty($apple)) { sulk(); } else { eatApple(); }

In .net langages, you can use nullable types, which address this problem for value types. The problem remains, however, for reference types. Since there's no such thing as pointers in .net (at least in 'safe' blocks), "object? o" won't compile.

It's been tried: Visual Basic 6 had Nothing, Null, and Empty. And it led to such poor code, it featured at #12 in the legendary Thirteen Ways to Loathe VB article in Dr Dobbs. Use the Null Object pattern instead, as others have suggested.

Some people are one step ahead of you already. ;)

boolean getAnswer() throws Mu

Given how long it took Western philosophy to figure out how it was possible to talk about the concept of "nothing"... Yeah, I'm not too surprised the distinction got overlooked for a while.

It's because Null is an artifact of the language you're using, not a programmer convenience. It describes a naturally occurring state of the object in the context in which it is being used.

If you are using .NET 3.0+ and need something else, you might try the Maybe Monad. You could create whatever "Maybe" types you need and, using LINQ syntax, process accordingly.

VB6 Nothing => "There is no value." Null = > "I don't know what the value is" - Same as DBNull.Value in .NET

Related

Shall I set an empty string computed string attribute for Terraform resource?

Using Roslyn, if I have an IdentifierNameSyntax, can I find the member type it refers to (field, property, method...)

what does getType do in antlr4?

Why assign a reference to a struct in go?

should it be allowed to change the method signature in a non statically typed language

Categories

Resources