Haskell usage of null - haskell

I'm reading through Real World Haskell and in my copy, on page 59, it states:
In Haskell we don't have the equivalent of null. We could use Maybe... Instead we've decided to use a no-argument Empty constructor
Then, in the next section, on Errors, there is the Haskell function:
mySecond xs = if null (tail xs)
then error "list too short"
else head (tail xs)
Now, I don't understand what the "null" in this function definition is referring to, since it was stated clearly that there is no equivalent to (Java's) null in Haskell.
Any help appreciated.

null is a function which simply tests if a list is empty. It doesn't have anything to do with nullable values in other languages.
null :: [a] -> Bool
null [] = True
null (_:_) = False

Not really an answer to your question, but I feel it fits here to give some technical background of what the book is talking about:
Most language implementations nowadays (C++ is the most important counterexample1) store objects not right "in place" where they are used (i.e. in the function-call stack), but in some random place on the heap. All that's stored on the stack is a pointer/reference to the actual object.
One consequence of this is that you can, with mutability, easily switch a reference to point to another object, without needing to meddle with that object itself. Also, you don't even need to have an object, instead you can consider the reference a "promise" that there will be an object by the time somebody derefers it. Now if you've learned some Haskell that will heavily remind you of lazy evaluation, and indeed it's at the ground of how lazyness is implemented. This allows not only Haskell's infinite lists etc., also you can structuce both code and data more nicely – in Haskell we call it "tying the knot", and particularly in OO languages it's quite detrimental to the way you construct objects.
Imperative languages can't do this "promise" thing nicely automatic as Haskell can: lazy evaluation without referential transparency would lead to awfully unpredictable side-effects. So when you promise a value, you need to keep track of where it'll be soonest used, and make sure you manally construct the promised object before that point. Obviously, it's a big disaster to use a reference that just points to some random place in memory (as can easily happen in C), so it's now standard to point to a conventional spot that can't be a valid memory location: this is the null pointer. That way, you get at least a predictable error message, but an error it is nevertheless; Tony Hoare therefore now calls this idea of his "billion dollar mistake".
So null references are evil, Haskell cleverly avoids them. Or does it?
Actually, it can be perfectly reasonable to not have an object at all, namely when you explicitly allow empty data structures. The classic example are linked lists, and thus Haskell has a specialised null function that checks just this: is the list empty, or is there a head I can safely refer to?
Still – this isn't really idiomatic, for Haskell has a far more general and convenient way of savely resolving such "options" scenarios: pattern matching. The idiomatic version of your example would be
mySecond' (_:x2:_) = x2
mySecond' _ = error "list too short"
Note that this is not just more consise, but also safer than your code: you check null (tail xs), but if xs itself is empty then tail will already throw an error.
1C++ certainly allows storing stuff on the heap too, but experienced programmers like to avoid it for performance reasons. On the other hand, Java stores "built-in" types (you can recognise them by lowercase names, e.g. int) on the stack, to avoid redirection cost.
2Consider the words pointer and reference as synonyms, though some languages prefer either or give them slightly different meanings.

Related

Is the function `a -> <pattern> -> Bool` possible?

Would it be possible to have a function that takes some value and a pattern in order to check if both match?
Let's call this hypothetical function matches and with it we could rewrite the following function...
isSingleton :: [a] -> Bool
isSingleton [_] = True
isSingleton _ = False
...like so...
isSingleton xs = xs `matches` [_]
Would this be theoretically possible? If yes, how? And if not, why?
Well, you can't really use [_] as an expression – the compiler doesn't allow it. There are various bodges that could be applied to make it kind-of-possible:
-fdefer-typed-holes makes GHC ignore the _ during compilation. That doesn't really make it a legal expression, it just means the error will be raised at runtime instead. (Generally a very bad idea, should only be used for trying out something while another piece of your code isn't complete yet.)
You could define _' = undefined to have similar syntax that is accepted as an expression. Of course, that still means there will be a runtime error.
A runtime error could be caught, but only in the IO monad. That means basically don't do anything that requires it. But of course, technically speaking you could, and then wrap it in unsafePerformIO. Disgusting, but I won't say there aren't situations where this sort of thing is a necessary evil.
With that, you could already implement a function that would be able to determine that either it definitely doesn't match [_] (if == returns False without an error being raise), or possibly matches (in case the error is raised before a constructor discrepancy is found). That would be enough to determine that [] does not match [_], but not to determine that [1,_,3] does not match [1,_,0].
Again this could be botched around with a type class that first converts the values to a proper ⊥-less tree structure (using the unsafePerformIO catch to determine how deep to descent), but... please, let's stop the thought process here. We're clearly working against the language instead of with it with all of that.
What could of course be done is to do it in Template Haskell, but that would fix you to compile-time pattern expressions, I doubt that's in the spirit of the question. Or you could generate a type that explicitly expresses “original type with holes in it”, so the holes could properly be distinguished without any unsafe IO shenanigans. That would become quite verbose though.
In practice, I would just not try passing around first-class patterns, but instead pass around boolean-valued functions. Granted, functions are more general than patterns, but not in a way that would pose any practical problems.
Something that's more closely analogous to patterns are Prisms in the lens ecosystem. Indeed, there is the is combinator that does what you're asking for. However, lenses/prisms can be again fiddly to construct, I'm not even sure a prism corresponding to [_] can be built using the _Cons and _Empty primitives.
tl;dr it's not worth it.
Patterns aren't first-class in Haskell; you cannot write a function that receives a pattern as an argument.
However, anywhere you would call your hypothetical matches function, you can instead make a bespoke function that tests the particular pattern you're interested in, and use that function instead. So you don't really need first class patterns, they just might save a bit of boilerplate.
The extension LambdaCase more or less allows you to write a function that just does a pattern match with minimal syntactic overhead, although there is no special-case syntax available for the specific purpose of returning a Bool saying whether a pattern matches (such special-purpose syntax would allow you to to avoid explicitly writing that you want to map the pattern to True and having to add a catch-all alternative case to map to False).
For example:
isSingleton = \case [_] -> True
_ -> False
For something like isSingleton (which already is a single pattern match encapsulated into a function) there's not much benefit in doing this over just implementing it directly. But in a more complex function whether you want to call x `matches` <pattern> (or pass (`matches` <pattern>) to another function), it might be an alternative that keeps the pattern inline.
Honestly I'd probably just define functions like isSingleton for each pattern I wanted this for though (possibly just in a where clause).

How does the presence of the "error" function bear on the purity of Haskell?

I've always wondered how the Haskell exception system fits in with the whole "Pure functional language" thing. For example see the below GHCi session.
GHCi, version 8.0.1: http://www.haskell.org/ghc/ :? for help
Prelude> head []
*** Exception: Prelude.head: empty list
Prelude> :t head
head :: [a] -> a
Prelude> :t error
error :: [Char] -> a
Prelude> error "ranch"
*** Exception: ranch
CallStack (from HasCallStack):
error, called at <interactive>:4:1 in interactive:Ghci1
Prelude>
The type of head is [a] -> a. But when you call it on the special case of an empty list, you get an exception instead. But this exception is not accounted for in the type signature.
If I remember correctly it's a similar story when there is a failure during pattern matching. It doesn't matter what the type signature says, if you haven't accounted for every possible pattern, you run the risk of throwing an exception.
I don't have a single, concise question to ask, but my head is swimming. What was the motivation for adding this strange exception system to an otherwise pure and elegant language? Is it still pure but I'm just missing something? If I want to take advantage of this exception feature, how would I go about doing it (ie how do I catch and handle exceptions? is there anything else I can do with them?) For example, if ever I write code that uses the "head" function, surely I should take precautions for the case where an empty list somehow smuggles itself in.
You are confusing two concepts: purity and totality.
Purity says that functions have no side effects.
Totality says that every function terminates and produces a value.
Haskell is pure, but is not total.
Outside of IO, nontermination (e.g., let loop = loop in loop) and exceptions (e.g., error "urk!") are the same – nonterminating and exceptional terms, when forced, do not evaluate to a value. The designers of Haskell wanted a Turing-complete language, which – as per the halting problem – means that they forwent totality. And once you have nontermination, I suppose you might as well have exceptions, too – defining error msg = error msg and having calls to error do nothing forever is much less satisfying in practice than actually seeing the error message you want in finite time!
In general, though, you're right – partial functions (those which are not defined for every input value, like head) are ugly. Modern Haskell generally prefers writing total functions instead by returning Maybe or Either values, e.g.
safeHead :: [a] -> Maybe a
safeHead [] = Nothing
safeHead (x:_) = Just x
errHead :: [a] -> Either String a
errHead [] = Left "Prelude.head: empty list"
errHead (x:_) = Right x
In this case, the Functor, Applicative, Monad, MonadError, Foldable, Traversable, etc., machinery makes combining these total functions and working with their results easy.
Should you actually come across an exception in your code – for instance, you might use error to check a complicated invariant in your code that you think you've enforced, but you have a bug – you can catch it in IO. Which returns to the question of why it's OK to interact with exceptions in IO – doesn't that make the language impure? The answer is the same as that to the question of why we can do I/O in IO, or work with mutable variables – evaluating a value of type IO A doesn't produce the side effects that it describes, it's just an action that describes what a program could do. (There are better descriptions of this elsewhere on the internet; exceptions aren't any different than other effects.)
(Also, note that there is a separate-but-related exception system in IO, which is used when e.g. trying to read a file that isn't there. People are often OK with this exception system, in moderation, because since you're in IO you're already working with impure code.)
For example, if ever I write code that uses the "head" function, surely I should take precautions for the case where an empty list somehow smuggles itself in.
A simpler solution: don't use head. There are plenty of replacements: listToMaybe from Data.Maybe, the various alternative implementations in the safe package, etc. The partial functions [1] in the base libraries -- specially ones as easy to replace as head -- are little more than historical cruft, and should be either ignored or replaced by safe variants, such as those in the aforementioned safe package. For further arguments, here is an entirely reasonable rant about partial functions.
If I want to take advantage of this exception feature, how would I go about doing it (ie how do I catch and handle exceptions? is there anything else I can do with them?)
Exceptions of the sort thrown by error can only be caught in the IO monad. If you are writing pure functions you won't want to force your users to run them in the IO monad merely for catching exceptions. Therefore, if you ever use error in a pure function, assume the error will not be caught [2]. Ideally you shouldn't use error in pure code at all, but if you are somehow compelled to do so, at least make sure to write an informative error message (that is, not "Prelude.head: empty list") so that your users know what is going on when the program crashes.
If I remember correctly it's a similar story when there is a failure during pattern matching. It doesn't matter what the type signature says, if you haven't accounted for every possible pattern, you run the risk of throwing an exception.
Indeed. The only difference from using head to writing the incomplete pattern match (\(x:_) -> x) by yourself explicitly is that in the latter case the compiler will at least warn you if you use -Wall, while with head even that is swept under the rug.
I've always wondered how the Haskell exception system fits in with the whole "Pure functional language" thing.
Technically speaking, partial functions don't affect purity (which doesn't make them any less nasty, of course). From a theoretical point of view, head [] is just as undefined as things like foo = let x = x in x. (The keyword for further reading into such subtleties is "bottom".)
[1]: Partial functions are functions that, just like head, are not defined for some values of the argument types they are supposed to take.
[2]: It is worth mentioning that exceptions in IO are a whole different issue, as you can't trivially avoid e.g. a file read failure just by using better functions. There are quite a few approaches towards handling such scenarios in a sensible way. If you are curious about the issue, here is one "highly opinionated" article about it that is illustrative of the relevant tools and trade-offs.
Haskell does not require that your functions be total, and doesn't track when they're not. (Total functions are those that have a well defined output for every possible value of their input type)
Even without exceptions or pattern match failures, you can have a function that doesn't define output for some inputs by just going on forever. An example is length (repeat 1). This continues to compute forever, but never actually throws an error.
The way Haskell semantics "copes" with this is to declare that there is an "extra" value in every single type; the so called "bottom value", and declare that any computation that doesn't properly complete and produce a normal value of its type actually produces the bottom value. It's represented by the mathematical symbol ⊥ (only when talking about Haskell; there isn't really any way in Haskell to directly refer to this value, but undefined is often also used since that is a Haskell name that is bound to an error-raising computation, and so semantically produces the bottom value).
This is a theoretical wart in the system, since it gives you the ability to create a 'value' of any type (albeit not a very useful one), and a lot of the reasoning about bits of code being correct based on types actually relies on the assumption that you can't do exactly that (if you're into the Curry-Howard isomorphism between pure functional programs and formal logic, the existence of ⊥ gives you the ability to "prove" logical contradictions, and thus to prove absolutely anything at all).
But in practice it seems to work out that all the reasoning done by pretending that ⊥ doesn't exist in Haskell still generally works well enough to be useful when you're writing "well-behaved" code that doesn't use ⊥ very much.
The main reason for tolerating this situation in Haskell is ease-of-use as a programming language rather than a system of formal logic or mathematics. It's impossible to make a compiler that could actually tell of arbitrary Haskell-like code whether or not each function is total or partial (see the Halting Problem). So a language that wanted to enforce totality would have to either remove a lot of the things you can do, or require you to jump through lots of hoops to demonstrate that your code always terminates, or both. The Haskell designers didn't want to do that.
So given that Haskell as a language is resigned to partiality and ⊥, it may as well give you things like error as a convenience. After all, you could always write a error :: String -> a function by just not terminating; getting an immediate printout of the error message rather than having the program just spin forever is a lot more useful to practicing programmers, even if those are both equivalent in the theory of Haskell semantics!
Similarly, the original designers of Haskell decided that implicitly adding a catch-all case to every pattern match that just errors out would be more convenient than forcing programmers to add the error case explicitly every time they expect a part of their code to only ever see certain cases. (Although a lot of Haskell programmers, including me, work with the incomplete-pattern-match warning and almost always treat it as an error and fix their code, and so would probably prefer the original Haskell designers went the other way on this one).
TLDR; exceptions from error and pattern match failure are there for convenience, because they don't make the system any more broken than it already has to be, without being quite a different system than Haskell.
You can program by throwing and catch exceptions if you really want, including catching the exceptions from error or pattern match failure, by using the facilities from Control.Exception.
In order to not break the purity of the system you can raise exceptions from anywhere (because the system always has to deal with the possibility of a function not properly terminating and producing a value; "raising an exception" is just another way in which that can happen), but exceptions can only be caught by constructs in IO. Because the formal semantics of IO permit basically anything to happen (because it has to interface with the real world and there aren't really any hard restrictions we can impose on that from the definition of Haskell), we can also relax most of the rules we usually need for pure functions in Haskell and still have something that technically fits in the model of Haskell code being pure.
I haven't used this very much at all (usually I prefer to keep my error handling using things that are more well-defined in terms of Haskell's semantic model than the operational model of what IO does, which can be as simple as Maybe or Either), but you can read about it if you want to.

Why is the default catch-all not mandatory in a Haskell case statement?

Haskell has a reputation for being a safe language. One that generally speaking pushes more of the possible programming errors to compile-time errors, and less and run-time.
One example of this is the if expression. The else in the if is always mandatory. You need to cover off both possibilities. This is great because you have thought and covered off all possibilities of what will happen at runtime.
Now Haskell has a case expression. (This bears some similarity to switch statements in other OO and imperative languages - but Haskell adds a lot of richness in the type system).
describeList :: [a]
describeList xs = "The list is " ++ case cs of [] -> "empty."
[x] -> "a singleton list."
xs -> "a longer list."
But with the case expression, the default 'catch-all' is not mandatory.
To me this sounds like it would lead to runtime errors.
My question is: Why is the default catch-all not mandatory in a Haskell case statement?
Haskell is certainly safer than many other languages, especially mainstream ones, but it is far from 100% safe. Other unsafe features in Haskell include the ability to write infinite loops, like let x = x in x, or unsafePerformIO, or error, or undefined. I guess you could say it's just a tradeoff between convenience and safety.
In more complex code than your simple example, sometimes there's some cases that you know can't occur, and so you leave them out of your case expression, but Haskell's type system isn't strong enough for the compiler to be able know that those missing cases are impossible. (For example you might be computing some mathematical function that can only return the numbers 3, 4, or 5, but that fact is due to some difficult theorem that you happen to know about and which the compiler is not aware of.)
For what it's worth, you can pass -fwarn-incomplete-patterns to the ghc compiler to get a warning whenever at compile time you're missing a branch in a case statement, if you like. That's included in the -W option which turns on several commonly desirable compiler warnings.
A catch-all is not always needed. If one handles all cases explicitly (the pattern matching is exhaustive), a catch-all case would never be reached.
I can think of a serious drawback of a mandatory catch-all case: if one extends an ADT (algebraic data type), dependant code would still compile and the programmer is not reminded of that extra case they are supposed to handle.

Modify an argument in a pure function

I am aware that I should not do this
I am asking for dirty hacks to do something nasty
Goal
I wish to modify an argument in a pure function.
Thus achieving effect of pass by reference.
Illustration
For example, I have got the following function.
fn :: a -> [a] -> Bool
fn a as = elem a as
I wish to modify the arguments passed in. Such as the list as.
as' = [1, 2, 3, 4]
pointTo as as'
It is a common misconception that Haskell just deliberately prevents you from doing stuff it deems unsafe, though most other languages allow it. That's the case, for instance, with C++' const modifiers: while they are a valuable guarantee that something which is supposed to stay constant isn't messed with by mistake, it's generally assumed that the overall compiled result doesn't really change when using them; you still get basically impure functions, implemented as some assembler instructions over a stack memory frame.
What's already less known is that even in C++, these const modifiers allow the compiler to apply certain optimisations that can completely break the results when the references are modified (as possible with const_cast – which in Haskell would have at least “unsafe” in its name!); only the mutable keyword gives the guarantee that hidden modifications don't mess up the sematics you'd expect.
In Haskell, the same general issue applies, but much stronger. A Haskell function is not a bunch of assembler instructions operating on a stack frame and memory locations pointed to from the stack. Sometimes the compiler may actually generate something like that, but in general the standard is very careful not to suggest any particular implementation details. So actually, functions may get inlined, stream-fused, rule-replaced etc. to something where it totally wouldn't make sense to even think about “modifying the arguments”, because perhaps the arguments never even exist as such. The thing that Haskell guarantees is that, in a much higher-level sense, the semantics are preserved; the semantics of how you'd think about a function mathematically. In mathematics, it also doesn't make sense at all, conceptually, to talk about “modified arguments”.

Why Haskell uses bottom instead of null in partial functions?

I'm reading about Haskell denotational semantics (http://en.wikibooks.org/wiki/Haskell/Denotational_semantics) and I fail to see why, in a type, bottom "value" is placed at another level compared to "normal" values, eg why it can't be pattern matched.
I believe that pattern patching bottom would cause trouble as bottom denotes also non-terminating computations, but why should non-terminating computations and errors be treated the same? (I'm assuming calling a partial function with unsupported argument can be considered as an error).
What useful properties would be lost, if all Haskell types included a pattern-matchable Java-null-like value instead of bottom?
In other words: why wouldn't it be wise to make all Haskell functions total by lifting all types with null value?
(Do non-terminating computations need a special type at all?)
You can't get rid of non-termination without restricting the turing-completeness of your language, and by the halting problem, we can't generally detect non-termination and replace it by a value.
So every turing complete language has bottom.
The only difference between Haskell and Java is then that Java has bottom and null. Haskell doesn't have the latter, which is handy because then we don't have to check for nulls!
Put another way, since bottom is inescapable (in the turing complete world), then what's the point of also making everything nullable too, other than inviting bugs?
Also note that while some functions in the Prelude are partial for historic reasons, modern Haskell style leans towards writing total functions nearly everywhere and using an explicit Maybe return type in functions such as head that would otherwise be partial.
My quibbling in the comments not withstanding, I think sclv answers the first part of your question, but as to
What useful properties would be lost, if all Haskell types included a pattern-matchable Java-null-like value instead of bottom?
In other words: why wouldn't it be wise to make all Haskell functions total by lifting all types with null value?
Here you appear to be drawing a distinction between non-termination and exception. So, although it is impossible (because of the halting problem) to pattern match on non-termination, why not be able to pattern match on exception?
To which I reply with a question of my own: what about functions that never throw an exception? Haskell has total functions after all. I shouldn't have to pattern match to ensure that something is non-exceptional, if it is known to be non-exceptional. Haskell, being a bondage and discipline language, would naturally want to communicate this difference in the types. Perhaps by writing
Integer
for the type of Integers that are known to be not exceptional and
?Integer
for the type of Integers that might be an exception instead. The answer is we do this already: Haskell has a type in the prelude
data Maybe a = Just a | Nothing
which can be read as "either an a or nothing at all." We can pattern match on Maybe so this proposal doesn't give us anything. (We also have types like Either for richer kinds of "computations that might go wrong" as well as fancy monad syntax/combinators to make these easy to work with).
So then, why have exceptions at all? In Haskell we can't "catch" exceptions except in the IO monad. If we can simulate exceptions perfectly with Maybe and Either why have exceptions in the language?
There are a couple of answers to this, but the core is that Haskell exceptions are imprecise. An exception might arise because your program ran out of memory, or the thread you were executing got killed by another thread, or a whole host of other non-predictable reasons. Further, generally with exceptions we care which exception we get out. So what should the following expression result in?
(error "error 1") + (error "error 2") :: Integer
this expression should clearly result in an exception, but which exception? (+) specialized to Integer is strict in both arguments, so that isn't going to help. We could just decide that it was the first value, but then in general we would have
x + y =/= y + x
which would limit our options for equational reasoning. Haskell provides a notion of exceptions with imprecise behavior, and this is important since the pure part of the language has perfectly precise behavior and that can be limiting.

Resources