Why Haskell uses bottom instead of null in partial functions?

Why Haskell uses bottom instead of null in partial functions? - haskell

I'm reading about Haskell denotational semantics (http://en.wikibooks.org/wiki/Haskell/Denotational_semantics) and I fail to see why, in a type, bottom "value" is placed at another level compared to "normal" values, eg why it can't be pattern matched.
I believe that pattern patching bottom would cause trouble as bottom denotes also non-terminating computations, but why should non-terminating computations and errors be treated the same? (I'm assuming calling a partial function with unsupported argument can be considered as an error).
What useful properties would be lost, if all Haskell types included a pattern-matchable Java-null-like value instead of bottom?
In other words: why wouldn't it be wise to make all Haskell functions total by lifting all types with null value?
(Do non-terminating computations need a special type at all?)

You can't get rid of non-termination without restricting the turing-completeness of your language, and by the halting problem, we can't generally detect non-termination and replace it by a value.
So every turing complete language has bottom.
The only difference between Haskell and Java is then that Java has bottom and null. Haskell doesn't have the latter, which is handy because then we don't have to check for nulls!
Put another way, since bottom is inescapable (in the turing complete world), then what's the point of also making everything nullable too, other than inviting bugs?
Also note that while some functions in the Prelude are partial for historic reasons, modern Haskell style leans towards writing total functions nearly everywhere and using an explicit Maybe return type in functions such as head that would otherwise be partial.

My quibbling in the comments not withstanding, I think sclv answers the first part of your question, but as to
What useful properties would be lost, if all Haskell types included a pattern-matchable Java-null-like value instead of bottom?
In other words: why wouldn't it be wise to make all Haskell functions total by lifting all types with null value?
Here you appear to be drawing a distinction between non-termination and exception. So, although it is impossible (because of the halting problem) to pattern match on non-termination, why not be able to pattern match on exception?
To which I reply with a question of my own: what about functions that never throw an exception? Haskell has total functions after all. I shouldn't have to pattern match to ensure that something is non-exceptional, if it is known to be non-exceptional. Haskell, being a bondage and discipline language, would naturally want to communicate this difference in the types. Perhaps by writing
Integer
for the type of Integers that are known to be not exceptional and
?Integer
for the type of Integers that might be an exception instead. The answer is we do this already: Haskell has a type in the prelude
data Maybe a = Just a | Nothing
which can be read as "either an a or nothing at all." We can pattern match on Maybe so this proposal doesn't give us anything. (We also have types like Either for richer kinds of "computations that might go wrong" as well as fancy monad syntax/combinators to make these easy to work with).
So then, why have exceptions at all? In Haskell we can't "catch" exceptions except in the IO monad. If we can simulate exceptions perfectly with Maybe and Either why have exceptions in the language?
There are a couple of answers to this, but the core is that Haskell exceptions are imprecise. An exception might arise because your program ran out of memory, or the thread you were executing got killed by another thread, or a whole host of other non-predictable reasons. Further, generally with exceptions we care which exception we get out. So what should the following expression result in?
(error "error 1") + (error "error 2") :: Integer
this expression should clearly result in an exception, but which exception? (+) specialized to Integer is strict in both arguments, so that isn't going to help. We could just decide that it was the first value, but then in general we would have
x + y =/= y + x
which would limit our options for equational reasoning. Haskell provides a notion of exceptions with imprecise behavior, and this is important since the pure part of the language has perfectly precise behavior and that can be limiting.

Related

Is Monad just a functional way of Error handling?

I am reading "Programming in Haskell" book and trying to correlate ideas of haskell to my knowledge in C#. Please correct me if I am wrong.
I feel like monads enforces the programmer to write code which can handle exceptions. So we will explicitly mention the error handling in the type system like
Optional(Int) functionName(Int a, Int b)
The return type is Optional(Int) but not an Int, So who so ever uses the library that has this kind of return types will understand that error handling is happening and the result will be like None(explains Something went wrong) and Some(explains we got some result).
Any code can result to Happy Path(where we get some results) and Sad Path(where errors occurs). Making this paths explicitly in Type system is what monad is. That's my understanding of it. Please correct me.
Monads are like a bridge between Pure Functional Programming and Impure code (that results in side effects).
Apart from this I want to make sure my understanding on Exception handling (VS) Option Types.
Exception Handling tries to do the operations without having a deeper look on to the inputs. Exception Handling's are Heavy since the Call stack has to unwind till it get to the Catch || Rescue || Handling Code.
And the Functional way of dealing things is check the inputs before doing the operations and return the "None" as result if the inputs does't match the required criteria. Option types are light weight of handling errors.

Monad is just an interface (in Haskell terms, typeclass) that a type can implement, along with a contract specifying some restrictions on how the interface should behave.
It's not that different from how a C# type T can implement, say, the IComparable<T> interface. However, the Monad interface is quite abstract and the functions can do surprisingly different things for different types (but always respecting the same laws, and the same "flavor" of composition).
Instead of seeing Monad as a functional way of error handling, it's better to go the other way: inventing a type like Optional that represents errors / absence of values, and start devising useful functions on that type. For example, a function that produces an "inhabited" Optional from an existing value, a function that composes two Optional-returning functions to minimize repetitive code, a function that changes the value inside the Optional if it exists, and so on. All of them functions that can be useful on their own.
After we have the type and and a bevy of useful functions, we might ask ourselves:
Does the type itself fit the requirements for Monad? It must have a type parameter, for example.
Do some (not necessarily all) of the useful functions we have discovered for the type fit the Monad interface? Not only they must fit the signatures, they must fit the contract.
In the affirmative case, good news! We can define a Monad instance for the type, and now we are able to use a whole lot of monad-generic functions for free!
But even if the Monad typeclass doesn't exist in our language, we can keep in mind that the type and some of the functions defined on it behave like a Monad. From the documentation of the thenCompose method of the CompletableFuture Java class:
This method is analogous to Optional.flatMap and Stream.flatMap.
This allows us to "transfer intuitions" between seemingly unrelated classes, even if we can't write monad-generic code because the shared interface doesn't exist.

Monads aren't 'just a functional way of error handling', so you are, indeed, wrong.
It would be pointless to turn this answer into a monad tutorial, so I'm not going to try. It takes some time to understand what a monad is, and the best advice I can give is to keep working with the concept until it clicks. Eventually it will.
The type described in the OP looks equivalent (isomorphic) to Haskell's more standard Maybe type, which is indeed a monad. In a pinch, it can be used for error handling, but more often you'd use another monad called Either (or types isomorphic to it), since it's better suited for that task.
A monad, though, can be many other things. Lists are monads, as are functions themselves (via the Reader monad). Trees are monads as well. These have nothing to do with error handling.
When it comes to exceptions in Haskell, I take the view that it's a legacy feature. I'd never design my Haskell code around exceptions, since exceptions aren't visible via the type system. When a function may fail to return a result, I'd let it return a Maybe, an Either, or another type isomorphic to that. This will, indeed, force the caller to handle not only the happy path, but also any failures that may occur.

Is Monad just a functional way of Error handling?
No it is not. The most prominent use for monads is handling side-effecting computations in Haskell. They can also be used for error handling, but the reason they are called "monads" and not "error handlers" is that they provide a common abstraction around several seemingly different things. For instance, in Haskell, join is a synonym for concat and =<< is an infix synonym for concatMap.
Monads are like a bridge between Pure Functional Programming and Impure code (that results in side effects).
This is a subtle point. In a lazy language such as Haskell, side effects refer to computations which may indeed be pure, such as FFI calls that require memory to be freed upon completion. Monads provide a way to track (and perform) side effects. "Pure" simply means "function returns the same value when called with the same value".

How does the presence of the "error" function bear on the purity of Haskell?

I've always wondered how the Haskell exception system fits in with the whole "Pure functional language" thing. For example see the below GHCi session.
GHCi, version 8.0.1: http://www.haskell.org/ghc/ :? for help
Prelude> head []
*** Exception: Prelude.head: empty list
Prelude> :t head
head :: [a] -> a
Prelude> :t error
error :: [Char] -> a
Prelude> error "ranch"
*** Exception: ranch
CallStack (from HasCallStack):
error, called at <interactive>:4:1 in interactive:Ghci1
Prelude>
The type of head is [a] -> a. But when you call it on the special case of an empty list, you get an exception instead. But this exception is not accounted for in the type signature.
If I remember correctly it's a similar story when there is a failure during pattern matching. It doesn't matter what the type signature says, if you haven't accounted for every possible pattern, you run the risk of throwing an exception.
I don't have a single, concise question to ask, but my head is swimming. What was the motivation for adding this strange exception system to an otherwise pure and elegant language? Is it still pure but I'm just missing something? If I want to take advantage of this exception feature, how would I go about doing it (ie how do I catch and handle exceptions? is there anything else I can do with them?) For example, if ever I write code that uses the "head" function, surely I should take precautions for the case where an empty list somehow smuggles itself in.

You are confusing two concepts: purity and totality.
Purity says that functions have no side effects.
Totality says that every function terminates and produces a value.
Haskell is pure, but is not total.
Outside of IO, nontermination (e.g., let loop = loop in loop) and exceptions (e.g., error "urk!") are the same – nonterminating and exceptional terms, when forced, do not evaluate to a value. The designers of Haskell wanted a Turing-complete language, which – as per the halting problem – means that they forwent totality. And once you have nontermination, I suppose you might as well have exceptions, too – defining error msg = error msg and having calls to error do nothing forever is much less satisfying in practice than actually seeing the error message you want in finite time!
In general, though, you're right – partial functions (those which are not defined for every input value, like head) are ugly. Modern Haskell generally prefers writing total functions instead by returning Maybe or Either values, e.g.
safeHead :: [a] -> Maybe a
safeHead [] = Nothing
safeHead (x:_) = Just x
errHead :: [a] -> Either String a
errHead [] = Left "Prelude.head: empty list"
errHead (x:_) = Right x
In this case, the Functor, Applicative, Monad, MonadError, Foldable, Traversable, etc., machinery makes combining these total functions and working with their results easy.
Should you actually come across an exception in your code – for instance, you might use error to check a complicated invariant in your code that you think you've enforced, but you have a bug – you can catch it in IO. Which returns to the question of why it's OK to interact with exceptions in IO – doesn't that make the language impure? The answer is the same as that to the question of why we can do I/O in IO, or work with mutable variables – evaluating a value of type IO A doesn't produce the side effects that it describes, it's just an action that describes what a program could do. (There are better descriptions of this elsewhere on the internet; exceptions aren't any different than other effects.)
(Also, note that there is a separate-but-related exception system in IO, which is used when e.g. trying to read a file that isn't there. People are often OK with this exception system, in moderation, because since you're in IO you're already working with impure code.)

For example, if ever I write code that uses the "head" function, surely I should take precautions for the case where an empty list somehow smuggles itself in.
A simpler solution: don't use head. There are plenty of replacements: listToMaybe from Data.Maybe, the various alternative implementations in the safe package, etc. The partial functions [1] in the base libraries -- specially ones as easy to replace as head -- are little more than historical cruft, and should be either ignored or replaced by safe variants, such as those in the aforementioned safe package. For further arguments, here is an entirely reasonable rant about partial functions.
If I want to take advantage of this exception feature, how would I go about doing it (ie how do I catch and handle exceptions? is there anything else I can do with them?)
Exceptions of the sort thrown by error can only be caught in the IO monad. If you are writing pure functions you won't want to force your users to run them in the IO monad merely for catching exceptions. Therefore, if you ever use error in a pure function, assume the error will not be caught [2]. Ideally you shouldn't use error in pure code at all, but if you are somehow compelled to do so, at least make sure to write an informative error message (that is, not "Prelude.head: empty list") so that your users know what is going on when the program crashes.
If I remember correctly it's a similar story when there is a failure during pattern matching. It doesn't matter what the type signature says, if you haven't accounted for every possible pattern, you run the risk of throwing an exception.
Indeed. The only difference from using head to writing the incomplete pattern match (\(x:_) -> x) by yourself explicitly is that in the latter case the compiler will at least warn you if you use -Wall, while with head even that is swept under the rug.
I've always wondered how the Haskell exception system fits in with the whole "Pure functional language" thing.
Technically speaking, partial functions don't affect purity (which doesn't make them any less nasty, of course). From a theoretical point of view, head [] is just as undefined as things like foo = let x = x in x. (The keyword for further reading into such subtleties is "bottom".)
[1]: Partial functions are functions that, just like head, are not defined for some values of the argument types they are supposed to take.
[2]: It is worth mentioning that exceptions in IO are a whole different issue, as you can't trivially avoid e.g. a file read failure just by using better functions. There are quite a few approaches towards handling such scenarios in a sensible way. If you are curious about the issue, here is one "highly opinionated" article about it that is illustrative of the relevant tools and trade-offs.

Haskell does not require that your functions be total, and doesn't track when they're not. (Total functions are those that have a well defined output for every possible value of their input type)
Even without exceptions or pattern match failures, you can have a function that doesn't define output for some inputs by just going on forever. An example is length (repeat 1). This continues to compute forever, but never actually throws an error.
The way Haskell semantics "copes" with this is to declare that there is an "extra" value in every single type; the so called "bottom value", and declare that any computation that doesn't properly complete and produce a normal value of its type actually produces the bottom value. It's represented by the mathematical symbol ⊥ (only when talking about Haskell; there isn't really any way in Haskell to directly refer to this value, but undefined is often also used since that is a Haskell name that is bound to an error-raising computation, and so semantically produces the bottom value).
This is a theoretical wart in the system, since it gives you the ability to create a 'value' of any type (albeit not a very useful one), and a lot of the reasoning about bits of code being correct based on types actually relies on the assumption that you can't do exactly that (if you're into the Curry-Howard isomorphism between pure functional programs and formal logic, the existence of ⊥ gives you the ability to "prove" logical contradictions, and thus to prove absolutely anything at all).
But in practice it seems to work out that all the reasoning done by pretending that ⊥ doesn't exist in Haskell still generally works well enough to be useful when you're writing "well-behaved" code that doesn't use ⊥ very much.
The main reason for tolerating this situation in Haskell is ease-of-use as a programming language rather than a system of formal logic or mathematics. It's impossible to make a compiler that could actually tell of arbitrary Haskell-like code whether or not each function is total or partial (see the Halting Problem). So a language that wanted to enforce totality would have to either remove a lot of the things you can do, or require you to jump through lots of hoops to demonstrate that your code always terminates, or both. The Haskell designers didn't want to do that.
So given that Haskell as a language is resigned to partiality and ⊥, it may as well give you things like error as a convenience. After all, you could always write a error :: String -> a function by just not terminating; getting an immediate printout of the error message rather than having the program just spin forever is a lot more useful to practicing programmers, even if those are both equivalent in the theory of Haskell semantics!
Similarly, the original designers of Haskell decided that implicitly adding a catch-all case to every pattern match that just errors out would be more convenient than forcing programmers to add the error case explicitly every time they expect a part of their code to only ever see certain cases. (Although a lot of Haskell programmers, including me, work with the incomplete-pattern-match warning and almost always treat it as an error and fix their code, and so would probably prefer the original Haskell designers went the other way on this one).
TLDR; exceptions from error and pattern match failure are there for convenience, because they don't make the system any more broken than it already has to be, without being quite a different system than Haskell.
You can program by throwing and catch exceptions if you really want, including catching the exceptions from error or pattern match failure, by using the facilities from Control.Exception.
In order to not break the purity of the system you can raise exceptions from anywhere (because the system always has to deal with the possibility of a function not properly terminating and producing a value; "raising an exception" is just another way in which that can happen), but exceptions can only be caught by constructs in IO. Because the formal semantics of IO permit basically anything to happen (because it has to interface with the real world and there aren't really any hard restrictions we can impose on that from the definition of Haskell), we can also relax most of the rules we usually need for pure functions in Haskell and still have something that technically fits in the model of Haskell code being pure.
I haven't used this very much at all (usually I prefer to keep my error handling using things that are more well-defined in terms of Haskell's semantic model than the operational model of what IO does, which can be as simple as Maybe or Either), but you can read about it if you want to.

Why is the default catch-all not mandatory in a Haskell case statement?

Haskell has a reputation for being a safe language. One that generally speaking pushes more of the possible programming errors to compile-time errors, and less and run-time.
One example of this is the if expression. The else in the if is always mandatory. You need to cover off both possibilities. This is great because you have thought and covered off all possibilities of what will happen at runtime.
Now Haskell has a case expression. (This bears some similarity to switch statements in other OO and imperative languages - but Haskell adds a lot of richness in the type system).
describeList :: [a]
describeList xs = "The list is " ++ case cs of [] -> "empty."
[x] -> "a singleton list."
xs -> "a longer list."
But with the case expression, the default 'catch-all' is not mandatory.
To me this sounds like it would lead to runtime errors.
My question is: Why is the default catch-all not mandatory in a Haskell case statement?

Haskell is certainly safer than many other languages, especially mainstream ones, but it is far from 100% safe. Other unsafe features in Haskell include the ability to write infinite loops, like let x = x in x, or unsafePerformIO, or error, or undefined. I guess you could say it's just a tradeoff between convenience and safety.
In more complex code than your simple example, sometimes there's some cases that you know can't occur, and so you leave them out of your case expression, but Haskell's type system isn't strong enough for the compiler to be able know that those missing cases are impossible. (For example you might be computing some mathematical function that can only return the numbers 3, 4, or 5, but that fact is due to some difficult theorem that you happen to know about and which the compiler is not aware of.)
For what it's worth, you can pass -fwarn-incomplete-patterns to the ghc compiler to get a warning whenever at compile time you're missing a branch in a case statement, if you like. That's included in the -W option which turns on several commonly desirable compiler warnings.

A catch-all is not always needed. If one handles all cases explicitly (the pattern matching is exhaustive), a catch-all case would never be reached.
I can think of a serious drawback of a mandatory catch-all case: if one extends an ADT (algebraic data type), dependant code would still compile and the programmer is not reminded of that extra case they are supposed to handle.

What are Haskell's strictness points?

We all know (or should know) that Haskell is lazy by default. Nothing is evaluated until it must be evaluated. So when must something be evaluated? There are points where Haskell must be strict. I call these "strictness points", although this particular term isn't as widespread as I had thought. According to me:
Reduction (or evaluation) in Haskell only occurs at strictness points.
So the question is: what, precisely, are Haskell's strictness points? My intuition says that main, seq / bang patterns, pattern matching, and any IO action performed via main are the primary strictness points, but I don't really know why I know that.
(Also, if they're not called "strictness points", what are they called?)
I imagine a good answer will include some discussion about WHNF and so on. I also imagine it might touch on lambda calculus.
Edit: additional thoughts about this question.
As I've reflected on this question, I think it would be clearer to add something to the definition of a strictness point. Strictness points can have varying contexts and varying depth (or strictness). Falling back to my definition that "reduction in Haskell only occurs at strictness points", let us add to that definition this clause: "a strictness point is only triggered when its surrounding context is evaluated or reduced."
So, let me try to get you started on the kind of answer I want. main is a strictness point. It is specially designated as the primary strictness point of its context: the program. When the program (main's context) is evaluated, the strictness point of main is activated. Main's depth is maximal: it must be fully evaluated. Main is usually composed of IO actions, which are also strictness points, whose context is main.
Now you try: discuss seq and pattern matching in these terms. Explain the nuances of function application: how is it strict? How is it not? What about deepseq? let and case statements? unsafePerformIO? Debug.Trace? Top-level definitions? Strict data types? Bang patterns? Etc. How many of these items can be described in terms of just seq or pattern matching?

A good place to start is by understanding this paper: A Natural Semantics for Lazy Evalution (Launchbury). That will tell you when expressions are evaluated for a small language similar to GHC's Core. Then the remaining question is how to map full Haskell to Core, and most of that translation is given by the Haskell report itself. In GHC we call this process "desugaring", because it removes syntactic sugar.
Well, that's not the whole story, because GHC includes a whole raft of optimisations between desugaring and code generation, and many of these transformations will rearrange the Core so that things get evaluated at different times (strictness analysis in particular will cause things to be evaluated earlier). So to really understand how your
program will be evaluated, you need to look at the Core produced by GHC.
Perhaps this answer seems a bit abstract to you (I didn't specifically mention bang patterns or seq), but you asked for something precise, and this is about the best we can do.

I would probably recast this question as, Under what circumstances will Haskell evaluate an expression? (Perhaps tack on a "to weak head normal form.")
To a first approximation, we can specify this as follows:
Executing IO actions will evaluate any expressions that they “need.” (So you need to know if the IO action is executed, e.g. it's name is main, or it is called from main AND you need to know what the action needs.)
An expression that is being evaluated (hey, that's a recursive definition!) will evaluate any expressions it needs.
From your intuitive list, main and IO actions fall into the first category, and seq and pattern matching fall into the second category. But I think that the first category is more in line with your idea of "strictness point", because that is in fact how we cause evaluation in Haskell to become observable effects for users.
Giving all of the details specifically is a large task, since Haskell is a large language. It's also quite subtle, because Concurrent Haskell may evaluate things speculatively, even though we end up not using the result in the end: this is a third breed of things that cause evaluation. The second category is quite well studied: you want to look at the strictness of the functions involved. The first category too can be thought to be a sort of "strictness", though this is a little dodgy because evaluate x and seq x $ return () are actually different things! You can treat it properly if you give some sort of semantics to the IO monad (explicitly passing a RealWorld# token works for simple cases), but I don't know if there's a name for this sort of stratified strictness analysis in general.

C has the concept of sequence points, which are guarantees for particular operations that one operand will be evaluated before the other. I think that's the closest existing concept, but the essentially equivalent term strictness point (or possibly force point) is more in line with Haskell thinking.
In practice Haskell is not a purely lazy language: for instance pattern matching is usually strict (So trying a pattern match forces evaluation to happen at least far enough to accept or reject the match.
…
Programmers can also use the seq primitive to force an expression to evaluate regardless of whether the result will ever be used.
$! is defined in terms of seq.
—Lazy vs. non-strict.
So your thinking about !/$! and seq is essentially right, but pattern matching is subject to subtler rules. You can always use ~ to force lazy pattern matching, of course. An interesting point from that same article:
The strictness analyzer also looks for cases where sub-expressions are always required by the outer expression, and converts those into eager evaluation. It can do this because the semantics (in terms of "bottom") don't change.
Let's continue down the rabbit hole and look at the docs for optimisations performed by GHC:
Strictness analysis is a process by which GHC attempts to determine, at compile-time, which data definitely will 'always be needed'. GHC can then build code to just calculate such data, rather than the normal (higher overhead) process for storing up the calculation and executing it later.
—GHC Optimisations: Strictness Analysis.
In other words, strict code may be generated anywhere as an optimisation, because creating thunks is unnecessarily expensive when the data will always be needed (and/or may only be used once).
…no more evaluation can be performed on the value; it is said to be in normal form. If we are at any of the intermediate steps so that we've performed at least some evaluation on a value, it is in weak head normal form (WHNF). (There is also a 'head normal form', but it's not used in Haskell.) Fully evaluating something in WHNF reduces it to something in normal form…
—Wikibooks Haskell: Laziness
(A term is in head normal form if there is no beta-redex in head position1. A redex is a head redex if it is preceded only by lambda abstractors of non-redexes 2.) So when you start to force a thunk, you're working in WHNF; when there are no more thunks left to force, you're in normal form. Another interesting point:
…if at some point we needed to, say, print z out to the user, we'd need to fully evaluate it…
Which naturally implies that, indeed, any IO action performed from main does force evaluation, which should be obvious considering that Haskell programs do, in fact, do things. Anything that needs to go through the sequence defined in main must be in normal form and is therefore subject to strict evaluation.
C. A. McCann got it right in the comments, though: the only thing that's special about main is that main is defined as special; pattern matching on the constructor is sufficient to ensure the sequence imposed by the IO monad. In that respect only seq and pattern-matching are fundamental.

Haskell is AFAIK not a pure lazy language, but rather a non-strict language. This means that it does not necessarily evaluate terms at the last possible moment.
A good source for haskell's model of "lazyness" can be found here: http://en.wikibooks.org/wiki/Haskell/Laziness
Basically, it is important to understand the difference between a thunk and the weak header normal form WHNF.
My understanding is that haskell pulls computations through backwards as compared to imperative languages. What this means is that in the absence of "seq" and bang patterns, it will ultimately be some kind of side effect that forces the evaluation of a thunk, which may cause prior evaluations in turn (true lazyness).
As this would lead to a horrible space leak, the compiler then figures out how and when to evaluate thunks ahead of time to save space. The programmer can then support this process by giving strictness annotations (en.wikibooks.org/wiki/Haskell/Strictness , www.haskell.org/haskellwiki/Performance/Strictness) to further reduce space usage in form of nested thunks.
I am not an expert in the operational semantics of haskell, so I will just leave the link as a resource.
Some more resources:
http://www.haskell.org/haskellwiki/Performance/Laziness
http://www.haskell.org/haskellwiki/Haskell/Lazy_Evaluation

Lazy doesn't mean do nothing. Whenever your program pattern matches a case expression, it evaluates something -- just enough anyway. Otherwise it can't figure out which RHS to use. Don't see any case expressions in your code? Don't worry, the compiler is translating your code to a stripped down form of Haskell where they are hard to avoid using.
For a beginner, a basic rule of thumb is let is lazy, case is less lazy.

This is not a full answer aiming for karma, but just a piece of the puzzle -- to the extent that this is about semantics, bear in mind that there are multiple evaluation strategies that provide the same semantics. One good example here -- and the project also speaks to how we typically think of Haskell semantics -- was the Eager Haskell project, which radically altered evaluation strategies while maintaining the same semantics: http://csg.csail.mit.edu/pubs/haskell.html

The Glasgow Haskell compiler translates your code into a Lambda-calculus-like language called core. In this language, something is going to be evaluated, whenever you pattern match it by a case-statement. Thus if a function is called, the outermost constructor and only it (if there are no forced fields) is going to be evaluated. Anything else is canned in a thunk. (Thunks are introduced by let bindings).
Of course this is not exactly what happens in the real language. The compiler convert Haskell into Core in a very sophisticated way, making as many things as possibly lazy and anything that is always needed lazy. Additionally, there are unboxed values and tuples that are always strict.
If you try to evaluate a function by hand, you can basically think:
Try to evaluate the outermost constructor of the return.
If anything else is needed to get the result (but only if it's really needed) is also going to be evaluated. The order doesn't matters.
In case of IO you have to evaluate the results of all statements from the first to the last in that. This is a bit more complicated, since the IO monad does some tricks to force evaluation in a specific order.

We all know (or should know) that Haskell is lazy by default. Nothing is evaluated until it must be evaluated.
No.
Haskell is not a lazy language
Haskell is a language in which evaluation order doesn't matter because there are no side effects.
It's not quite true that evaluation order doesn't matter, because the language allows for infinite loops. If you aren't careful, it's possible to get stuck in a cul-de-sac where you evaluate a subexpression forever when a different evaluation order would have led to termination in finite time. So it's more accurate to say:
Haskell implementations must evaluate the program in a way that terminates if there is any evaluation order that terminates. Only if every possible evaluation order fails to terminate can the implementation fail to terminate.
This still leaves implementations with a huge freedom in how they evaluate the program.
A Haskell program is a single expression, namely let {all top-level bindings} in Main.main. Evaluation can be understood as a sequence of reduction (small-)steps which change the expression (which represents the current state of the executing program).
You can divide reduction steps into two categories: those that are provably necessary (provably will be part of any terminating sequence of reductions), and those that aren't. You can divide the provably necessary reductions somewhat vaguely into two subcategories: those that are "obviously" necessary, and those that require some nontrivial analysis to prove them necessary.
Performing only obviously necessary reductions is what's called "lazy evaluation". I don't know whether a purely lazy evaluating implementation of Haskell has ever existed. Hugs may have been one. GHC definitely isn't.
GHC performs reduction steps at compile time that aren't provably necessary; for example, it will replace 1+2::Int with 3::Int even if it can't prove that the result will be used.
GHC may also perform not-provably-necessary reductions at run time in some circumstances. For example, when generating code to evaluate f (x+y), if x and y are of type Int and their values will be known at run time, but f can't be proven to use its argument, there is no reason not to compute x+y before calling f. It uses less heap space and less code space and is probably faster even if the argument isn't used. However, I don't know whether GHC actually takes these sorts of optimization opportunities.
GHC definitely performs evaluation steps at run time that are proven necessary only by fairly complex cross-module analyses. This is extremely common and may represent the bulk of the evaluation of realistic programs. Lazy evaluation is a last-resort fallback evaluation strategy; it isn't what happens as a rule.
There was an "optimistic evaluation" branch of GHC that did much more speculative evaluation at run time. It was abandoned because of its complexity and the ongoing maintenance burden, not because it didn't perform well. If Haskell was as popular as Python or C++ then I'm sure there would be implementations with much more sophisticated runtime evaluation strategies, maintained by deep-pocketed corporations. Non-lazy evaluation isn't a change to the language, it's just an engineering challenge.
Reduction is driven by top-level I/O, and nothing else
You can model interaction with the outside world by means of special side-effectful reduction rules like: "If the current program is of the form getChar >>= <expr>, then get a character from standard input and reduce the program to <expr> applied to the character you got."
The entire goal of the run time system is to evaluate the program until it has one of these side-effecting forms, then do the side effect, then repeat until the program has some form that implies termination, such as return ().
There are no other rules about what is reduced when. There are only rules about what can reduce to what.
For example, the only rules for if expressions are that if True then <expr1> else <expr2> can be reduced to <expr1>, if False then <expr1> else <expr2> can be reduced to <expr2>, and if <exc> then <expr1> else <expr2>, where <exc> is an exceptional value, can be reduced to an exceptional value.
If the expression representing the current state of your program is an if expression, you have no choice but to perform reductions on the condition until it's True or False or <exc>, because that's the only way you'll ever get rid of the if expression and have any hope of reaching a state that matches one of the I/O rules. But the language specification doesn't tell you to do that in so many words.
These sorts of implicit ordering constraints are the only way that evaluation can be "forced" to happen. This is a frequent source of confusion for beginners. For example, people sometimes try to make foldl more strict by writing foldl (\x y -> x `seq` x+y) instead of foldl (+). This doesn't work, and nothing like it can ever work, because no expression can make itself evaluate. The evaluation can only "come from above". seq is not special in any way in this regard.
Reduction happens everywhere
Reduction (or evaluation) in Haskell only occurs at strictness points. [...] My intuition says that main, seq / bang patterns, pattern matching, and any IO action performed via main are the primary strictness points [...].
I don't see how to make sense of that statement. Every part of the program has some meaning, and that meaning is defined by reduction rules, so reduction happens everywhere.
To reduce a function application <expr1> <expr2>, you have to evaluate <expr1> until it has a form like (\x -> <expr1'>) or (getChar >>=) or something else that matches a rule. But for some reason function application doesn't tend to show up on lists of expressions that allegedly "force evaluation", while case always does.
You can see this misconception in a quote from the Haskell wiki, found in another answer:
In practice Haskell is not a purely lazy language: for instance pattern matching is usually strict
I don't understand what could qualify as a "purely lazy language" to whoever wrote that, except, perhaps, a language in which every program hangs because the runtime never does anything. If pattern matching is a feature of your language then you've got to actually do it at some point. To do it, you have to evaluate the scrutinee enough to determine whether it matches the pattern. That's the laziest way to match a pattern that is possible in principle.
~-prefixed patterns are often called "lazy" by programmers, but the language spec calls them "irrefutable". Their defining property is that they always match. Because they always match, you don't have to evaluate the scrutinee to determine whether they match or not, so a lazy implementation won't. The difference between regular and irrefutable patterns is what expressions they match, not what evaluation strategy you're supposed to use. The spec says nothing about evaluation strategies.
main is a strictness point. It is specially designated as the primary strictness point of its context: the program. When the program (main's context) is evaluated, the strictness point of main is activated. [...] Main is usually composed of IO actions, which are also strictness points, whose context is main.
I'm not convinced that any of that has any meaning.
Main's depth is maximal: it must be fully evaluated.
No, main only has to be evaluated "shallowly", to make I/O actions appear at the top level. main is the entire program, and the program isn't completely evaluated on every run because not all code is relevant to every run (in general).
discuss seq and pattern matching in these terms.
I already talked about pattern matching. seq can be defined by rules that are similar to case and application: for example, (\x -> <expr1>) `seq` <expr2> reduces to <expr2>. This "forces evaluation" in the same way that case and application do. WHNF is just a name for what these expressions "force evaluation" to.
Explain the nuances of function application: how is it strict? How is it not?
It's strict in its left expression just like case is strict in its scrutinee. It's also strict in the function body after substitution just like case is strict in the RHS of the selected alternative after substitution.
What about deepseq?
It's just a library function, not a builtin.
Incidentally, deepseq is semantically weird. It should take only one argument. I think that whoever invented it just blindly copied seq with no understanding of why seq needs two arguments. I count deepseq's name and specification as evidence that a poor understanding of Haskell evaluation is common even among experienced Haskell programmers.
let and case statements?
I talked about case. let, after desugaring and type checking, is just a way of writing an arbitrary expression graph in tree form. Here's a paper about it.
unsafePerformIO?
To an extent it can be defined by reduction rules. For example, case unsafePerformIO <expr> of <alts> reduces to unsafePerformIO (<expr> >>= \x -> return (case x of <alts>)), and at the top level only, unsafePerformIO <expr> reduces to <expr>.
This doesn't do any memoization. You could try to simulate memoization by rewriting every unsafePerformIO expression to explicitly memoize itself, and creating the associated IORefs... somewhere. But you could never reproduce GHC's memoization behavior, because it depends on unpredictable details of the optimization process, and because it isn't even type safe (as shown by the infamous polymorphic IORef example in the GHC documentation).
Debug.Trace?
Debug.Trace.trace is just a simple wrapper around unsafePerformIO.
Top-level definitions?
Top-level variable bindings are the same as nested let bindings. data, class, import, and such are a whole different ball game.
Strict data types? Bang patterns?
Just sugar for seq.

What's the alternative to exceptions in a deep Haskell recursion?

I'm trying to learn Haskell by writing little programs... so I'm currently writing a lexer/parser for simple expressions. (Yes I could use Alex/Happy... but I want to learn the core language first).
My parser is essentially a set of recursive functions that build a Tree. In the case of syntax errors, I'd normally throw an exception (i.e. if I were writing in C#), but that seems to be discouraged in Haskell.
So what's the alternative? I don't really want to test for error states in every single bit of the parser. I want to either end up with a valid tree of nodes, or an error state with details.

Haskell provides the Maybe and Either types for this. Since you want to return an error state, Either seems to be what you want.

For a computation that might fail with details on why it failes, there's the type Either a b, e.g. Either ErrorDetails ParseTree, so your result could be Right theParseTree or Left ErrorDetails. You can pattern-match these constructors in your recursive function and if there's an error, you pass it on; otherwise you go on as usual.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string