Does Prolog use lazy evaluation like Haskell? [duplicate]

Does Prolog use lazy evaluation like Haskell? [duplicate] - haskell

Because Prolog uses chronological backtracking(from the Prolog Wikipedia page) even after an answer is found(in this example where there can only be one solution), would this justify Prolog as using eager evaluation?
mother_child(trude, sally).
father_child(tom, sally).
father_child(tom, erica).
father_child(mike, tom).
sibling(X, Y) :- parent_child(Z, X), parent_child(Z, Y).
parent_child(X, Y) :- father_child(X, Y).
parent_child(X, Y) :- mother_child(X, Y).
With the following output:
?- sibling(sally, erica).
true ;
false.

To summarize the discussion with #WillNess below, yes, Prolog is strict. However, Prolog's execution model and semantics are substantially different from the languages that are usually labelled strict or non-strict. For more about this, see below.
I'm not sure the question really applies to Prolog, because it doesn't really have the kind of implicit evaluation ordering that other languages have. Where this really comes into play in a language like Haskell, you might have an expression like:
f (g x) (h y)
In a strict language like ML, there is a defined evaluation order: g x will be evaluated, then h y, and f (g x) (h y) last. In a language like Haskell, g x and h y will only be evaluated as required ("non-strict" is more accurate than "lazy"). But in Prolog,
f(g(X), h(Y))
does not have the same meaning, because it isn't using a function notation. The query would be broken down into three parts, g(X, A), h(Y, B), and f(A,B,C), and those constituents can be placed in any order. The evaluation strategy is strict in the sense that what comes earlier in a sequence will be evaluated before what comes next, but it is non-strict in the sense that there is no requirement that variables be instantiated to ground terms before evaluation can proceed. Unification is perfectly content to complete without having given you values for every variable. I am bringing this up because you have to break down a complex, nested expression in another language into several expressions in Prolog.
Backtracking has nothing to do with it, as far as I can tell. I don't think backtracking to the nearest choice point and resuming from there precludes a non-strict evaluation method, it just happens that Prolog's is strict.

That Prolog pauses after giving each of the several correct answers to a problem has nothing to do with laziness; it is a part of its user interaction protocol. Each answer is calculated eagerly.
Sometimes there will be only one answer but Prolog doesn't know that in advance, so it waits for us to press ; to continue search, in hopes of finding another solution. Sometimes it is able to deduce it in advance and will just stop right away, but only sometimes.
update:
Prolog does no evaluation on its own. All terms are unevaluated, as if "quoted" in Lisp.
Prolog will unfold your predicate definitions as written and is perfectly happy to keep your data structures full of unevaluated uninstantiated holes, if so entailed by your predicate definitions.
Haskell does not need any values, a user does, when requesting an output.
Similarly, Prolog produces solutions one-by-one, as per the user requests.
Prolog can even be seen to be lazier than Haskell where all arithmetic is strict, i.e. immediate, whereas in Prolog you have to explicitly request the arithmetic evaluation, with is/2.
So perhaps the question is ill-posed. Prolog's operations model is just too different. There are no "results" nor "functions", for one; but viewed from another angle, everything is a result, and predicates are "multi"-functions.

As it stands, the question is not correct in what it states. Chronological backtracking does not mean that Prolog will necessarily backtrack "in an example where there can be only one solution".
Consider this:
foo(a, 1).
foo(b, 2).
foo(c, 3).
?- foo(b, X).
X = 2.
?- foo(X, 2).
X = b.
So this is an example that does have only one solution and Prolog recognizes that, and does not attempt to backtrack. There are cases in which you can implement a solution to a problem in a way that Prolog will not recognize that there is only one logical solution, but this is due to the implementation and is not inherent to Prolog's execution model.
You should read up on Prolog's execution model. From the Wikipedia article which you seem to cite, "Operationally, Prolog's execution strategy can be thought of as a generalization of function calls in other languages, one difference being that multiple clause heads can match a given call. In that case, [emphasis mine] the system creates a choice-point, unifies the goal with the clause head of the first alternative, and continues with the goals of that first alternative." Read Sterling and Shapiro's "The Art of Prolog" for a far more complete discussion of the subject.

from Wikipedia I got
In eager evaluation, an expression is evaluated as soon as it is bound to a variable.
Then I think there are 2 levels - at user level (our predicates) Prolog is not eager.
But it is at 'system' level, because variables are implemented as efficiently as possible.
Indeed, attributed variables are implemented to be lazy, and are rather 'orthogonal' to 'logic' Prolog variables.

Related

Haskell pattern match "diverge" and ⊥

I'm trying to understand the Haskell 2010 Report section 3.17.2 "Informal Semantics of Pattern Matching". Most of it, relating to a pattern match succeeding or failing seems straightforward, however I'm having difficulty understanding the case which is described as the pattern match "diverging".
I'm semi-persuaded it means that the match algorithm does not "converge" to an answer (hence the match function never returns). But if doesn't return, then, how can it return a value, as suggested by the parenthetical "i.e. return ⊥"? And what does it mean to "return ⊥" anyway? How one handle that outcome?
Item 5 has the particularly confusing (to me) point "If the value is ⊥, the match diverges". Is this just saying that a value of ⊥ produces a match result of ⊥? (Setting aside that I don't know what that outcome means!)
Any illumination, possibly with an example, would be appreciated!
Addendum after a couple of lengthy answers:
Thanks Tikhon and all for your efforts.
It seems my confusion comes from there being two different realms of explanation: The realm of Haskell features and behaviors, and the realm of mathematics/semantics, and in Haskell literature these two are intermingled in an attempt to explain the former in terms of the latter, without sufficient signposts (to me) as to which elements belong to which.
Evidently "bottom" ⊥ is in the semantics domain, and does not exist as a value within Haskell (ie: you can't type it in, you never get a result that prints out as " ⊥").
So, where the explanation says a function "returns ⊥", this refers to a function that does any of a number of inconvenient things, like not terminate, throw an exception, or return "undefined". Is that right?
Further, those who commented that ⊥ actually is a value that can be passed around, are really thinking of bindings to ordinary functions that haven't yet actually been called upon to evaluate ("unexploded bombs" so to speak) and might never, due to laziness, right?

The value is ⊥, usually pronounced "bottom". It is a value in the semantic sense--it is not a normal Haskell value per se. It represents computations that do not produce a normal Haskell value: exceptions and infinite loops, for example.
Semantics is about defining the "meaning" of a program. In Haskell, we usually talk about denotational semantics, where the value is a mathematical object of some sort. The most trivial example would be that the expression 10 (but also the expression 9 + 1) have denotations of the number 10 (rather than the Haskell value 10). We usually write that ⟦9 + 1⟧ = 10 meaning that the denotation of the Haskell expression 9 + 1 is the number 10.
However, what do we do with an expression like let x = x in x? There is no Haskell value for this expression. If you tried to evaluate it, it would simply never finish. Moreover, it is not obvious what mathematical object this corresponds to. However, in order to reason about programs, we need to give some denotation for it. So, essentially, we just make up a value for all these computations, and we call the value ⊥ (bottom).
So ⊥ is just a way to define what a computation that doesn't return "means".
We also define other computations like undefined and error "some message" as ⊥ because they also do not have obvious normal values. So throwing an exception corresponds to ⊥. This is exactly what happens with a failed pattern match.
The usual way of thinking about this is that every Haskell type is "lifted"--it contains ⊥. That is, Bool corresponds to {⊥, True, False} rather than just {True, False}. This represents the fact that Haskell programs are not guaranteed to terminate and can have exceptions. This is also true when you define your own type--the type contains every value you defined for it as well as ⊥.
Interestingly, since Haskell is non-strict, ⊥ can exist in normal code. So you could have a value like Just ⊥, and if you never evaluate it, everything will work fine. A good example of this is const: const 1 ⊥ evaluates to 1. This works for failed pattern matches as well:
const 1 (let Just x = Nothing in x) -- 1
You should read the section on denotational semantics in the Haskell WikiBook. It's a very approachable introduction to the subject, which I personally find very fascinating.

Denotational semantics
So, briefly denotational semantics, which is where ⊥ lives, is a mapping from Haskell values to some other space of values. You do this to give meaning to programs in a more formal manner than just talking about what programs should do—you say that they must respect their denotational semantics.
So for Haskell, you often think about how Haskell expressions denote mathematical values. You often see Strachey brackets ⟦·⟧ to denote the "semantic mapping" from Haskell to Math. Finally, we want our semantic brackets to be compatible with semantic operations. For instance
⟦x + y⟧ = ⟦x⟧ + ⟦y⟧
where on the left side + is the Haskell function (+) :: Num a => a -> a -> a and on the right side it's the binary operation in a commutative group. While is cool, because then we know that we can use the properties from the semantic map to know how our Haskell functions should work. To wit, let's write the commutative property "in Math"
⟦x⟧ + ⟦y⟧ == ⟦y⟧ + ⟦x⟧
= ⟦x + y⟧ == ⟦y + x⟧
= ⟦x + y == y + x⟧
where the third step also indicates that the Haskell (==) :: Eq a => a -> a -> a ought to have the properties of a mathematical equivalence relationship.
Well, except...
Anyway, that's all well and good until we remember that computers are finite things and Maths don't much care about that (unless you're using intuitionistic logic, and then you get Coq). So, we have to take note of places where our semantics don't follow Math quite right. Here are three examples
⟦undefined⟧ = ??
⟦error "undefined"⟧ = ??
⟦let x = x in x⟧ = ??
This is where ⊥ comes into play. We just assert that so far as the denotational semantics of Haskell are concerned each of those examples might as well mean (the newly introduced Mathematical/semantic concept of) ⊥. What are the Mathematical properties of ⊥? Well, this is where we start to really dive into what the semantic domain is and start talking about monotonicity of functions and CPOs and the like. Essentially, though, ⊥ is a mathematical object which plays roughly the same game as non-termination does. To the point of view of the semantic model, ⊥ is toxic and it infects expressions with its toxic-nondeterminism.
But it's not a Haskell-the-language concept, just a Semantic-domain-of-the-language-Haskell thing. In Haskell we have undefined, error and infinite looping. This is important.
Extra-semantic behavior (side note)
So the semantics of ⟦undefined⟧ = ⟦error "undefined"⟧ = ⟦let x = x in x⟧ = ⊥ are clear once we understand the mathematical meanings of ⊥, but it's also clear that those each have different effects "in reality". This is sort of like "undefined behavior" of C... it's behavior that's undefined so far as the semantic domain is concerned. You might call it semantically unobservable.
So how does pattern matching return ⊥?
So what does it mean "semantically" to return ⊥? Well, ⊥ is a perfectly valid semantic value which has the infection property which models non-determinism (or asynchronous error throwing). From the semantic point of view it's a perfectly valid value which can be returned as is.
From the implementation point of view, you have a number of choices, each of which map to the same semantic value. undefined isn't quite right, nor is entering an infinite loop, so if you're going to pick a semantically undefined behavior you might as well pick one that's useful and throw an error
*** Exception: <interactive>:2:5-14: Non-exhaustive patterns in function cheers

What are super combinators and constant applicative forms?

I'm struggling with what Super Combinators are:
A supercombinator is either a constant, or a combinator which contains only supercombinators as subexpressions.
And also with what Constant Applicative Forms are:
Any super combinator which is not a lambda abstraction. This includes truly constant expressions such as 12, ((+) 1 2), [1,2,3] as well as partially applied functions such as ((+) 4). Note that this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
This is just not making any sense to me! Isn't ((+) 4) just as "truly constant" as 12? CAFs sound like values to my simple mind.

These Haskell wiki pages you reference are old, and I think unfortunately written. Particularly unfortunate is that they mix up CAFs and supercombinators. Supercombinators are interesting but unrelated to GHC. CAFs are still very much a part of GHC, and can be understood without reference to supercombinators.
So let's start with supercombinators. Combinators derive from combinatory logic, and, in the usage here, consist of functions which only apply the values passed in to one another in one or another form -- i.e. they combine their arguments. The most famous set of combinators are S, K, and I, which taken together are Turing-complete. Supercombinators, in this context, are functions built only of values passed in, combinators, and other supercombinators. Hence any supercombinator can be expanded, through substitution, into a plain old combinator.
Some compilers for functional languages (not GHC!) use combinators and supercombinators as intermediate steps in compilation. As with any similar compiler technology, the reason for doing this is to admit optimization analysis that is more easily performed in such a simplified, minimal language. One such core language built on supercombinators is Edwin Brady's epic.
Constant Applicative Forms are something else entirely. They're a bit more subtle, and have a few gotchas. The way to think of them is as an aspect of compiler implementation with no separate semantic meaning but with a potentially profound effect on runtime performance. The following may not be a perfect description of a CAF, but it'll try to convey my intuition of what one is, since I haven't seen a really good description anywhere else for me to crib from. The clean "authoritative" description in the GHC Commentary Wiki reads as follows:
Constant Applicative Forms, or CAFs for short, are top-level values
defined in a program. Essentially, they are objects that are not
allocated dynamically at run-time but, instead, are part of the static
data of the program.
That's a good start. Pure, functional, lazy languages can be thought of in some sense as a graph reduction machine. The first time you demand the value of a node, that forces its evaluation, which in turn can demand the values of subnodes, etc. One a node is evaluated, the resultant value sticks around (although it does not have to stick around -- since this is a pure language we could always keep the subnodes live and recalculate with no semantic effect). A CAF is indeed just a value. But, in the context, a special kind of value -- one which the compiler can determine has a meaning entirely dependent on its subnodes. That is to say:
foo x = ...
where thisIsACaf = [1..10::Int]
thisIsNotACaf = [1..x::Int]
thisIsAlsoNotACaf :: Num a => [a]
thisIsAlsoNotACaf = [1..10] -- oops, polymorphic! the "num" dictionary is implicitly a parameter.
thisCouldBeACaf = const [1..10::Int] x -- requires a sufficiently smart compiler
thisAlsoCouldBeACaf _ = [1..10::Int] -- also requires a sufficiently smart compiler
So why do we care if things are CAFs? Basically because sometimes we really really don't want to recompute something (for example, a memotable!) and so want to make sure it is shared properly. Other times we really do want to recompute something (e.g. a huge boring easy to generate list -- such as the naturals -- which we're just walking over) and not have it stick around in memory forever. A combination of naming things and binding them under lets or writing them inline, etc. typically lets us specify these sorts of things in a natural, intuitive way. Occasionally, however, the compiler is smarter or dumber than we expect, and something we think should only be computed once is always recomputed, or something we don't want to hang on to gets lifted out as a CAF. Then, we need to think things through more carefully. See this discussion to get an idea about some of the trickiness involved: A good way to avoid "sharing"?
[By the way, I don't feel up to it, but anyone that wants to should feel free to take as much of this answer as they want to try and integrate it with the existing Haskell Wiki pages and improve/update them]

Matt is right in that the definition is confusing. It is even contradictory. A CAF is defined as:
Any super combinator which is not a lambda abstraction. This includes
truly constant expressions such as 12, ((+) 1 2), [1,2,3] as
well as partially applied functions such as ((+) 4).
Hence, ((+) 4) is seen as a CAF. But in the very next sentence we're told it is equivalent to something that is not a CAF:
this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
It would be cleaner to rule out partially applied functions on the ground that they are equivalent to lambda abstractions.

Difference between logic programming and functional programming

I have been reading many articles trying to understand the difference between functional and logic programming, but the only deduction I have been able to make so far is that logic programming defines programs through mathematical expressions. But such a thing is not associated with logic programming.
I would really appreciate some light being shed on the difference between functional and logic programming.

I wouldn't say that logic programming defines programs through mathematical expressions; that sounds more like functional programming. Logic programming uses logic expressions (well, eventually logic is math).
In my opinion, the major difference between functional and logic programming is the "building blocks": functional programming uses functions while logic programming uses predicates. A predicate is not a function; it does not have a return value. Depending on the value of it's arguments it may be true or false; if some values are undefined it will try to find the values that would make the predicate true.
Prolog in particular uses a special form of logic clauses named Horn clauses that belong to first order logic; Hilog uses clauses of higher order logic.
When you write a prolog predicate you are defining a horn clause:
foo :- bar1, bar2, bar3. means that foo is true if bar1, bar2 and bar3 is true.
note that I did not say if and only if; you can have multiple clauses for one predicate:
foo:-
bar1.
foo:-
bar2.
means that foo is true if bar1 is true or if bar2 is true
Some say that logic programming is a superset of functional programming since each function could be expressed as a predicate:
foo(x,y) -> x+y.
could be written as
foo(X, Y, ReturnValue):-
ReturnValue is X+Y.
but I think that such statements are a bit misleading
Another difference between logic and functional is backtracking. In functional programming once you enter the body of the function you cannot fail and move to the next definition. For example you can write
abs(x) ->
if x>0 x else -x
or even use guards:
abs(x) x>0 -> x;
abs(x) x=<0 -> -x.
but you cannot write
abs(x) ->
x>0,
x;
abs(x) ->
-x.
on the other hand, in Prolog you could write
abs(X, R):-
X>0,
R is X.
abs(X, R):-
R is -X.
if then you call abs(-3, R), Prolog would try the first clause, and fail when the execution reaches the -3 > 0 point but you wont get an error; Prolog will try the second clause and return R = 3.
I do not think that it is impossible for a functional language to implement something similar (but I haven't used such a language).
All in all, although both paradigms are considered declarative, they are quite different; so different that comparing them feels like comparing functional and imperative styles. I would suggest to try a bit of logic programming; it should be a mind-boggling experience. However, you should try to understand the philosophy and not simply write programs; Prolog allows you to write in functional or even imperative style (with monstrous results).

In a nutshell:
In functional programming, your program is a set of function definitions. The return value for each function is evaluated as a mathematical expression, possibly making use of passed arguments and other defined functions. For example, you can define a factorial function, which returns a factorial of a given number:
factorial 0 = 1 // a factorial of 0 is 1
factorial n = n * factorial (n - 1) // a factorial of n is n times factorial of n - 1
In logic programming, your program is a set of predicates. Predicates are usually defined as sets of clauses, where each clause can be defined using mathematical expressions, other defined predicates, and propositional calculus. For example, you can define a 'factorial' predicate, which holds whenever second argument is a factorial of first:
factorial(0, 1). // it is true that a factorial of 0 is 1
factorial(X, Y) :- // it is true that a factorial of X is Y, when all following are true:
X1 is X - 1, // there is a X1, equal to X - 1,
factorial(X1, Z), // and it is true that factorial of X1 is Z,
Y is Z * X. // and Y is Z * X
Both styles allow using mathematical expressions in the programs.

First, there are a lot of commonalities between functional and logic programming. That is, a lot of notions developed in one community can also be used in the other. Both paradigms started with rather crude implementations and strive towards purity.
But you want to know the differences.
So I will take Haskell on the one side and Prolog with constraints on the other. Practically all current Prolog systems offer constraints of some sort, like B, Ciao, ECLiPSe, GNU, IF, Scryer, SICStus, SWI, YAP, XSB. For the sake of the argument, I will use a very simple constraint dif/2 meaning inequality, which was present even in the very first Prolog implementation - so I will not use anything more advanced than that.
What functional programming is lacking
The most fundamental difference revolves around the notion of a variable. In functional programming a variable denotes a concrete value. This value must not be entirely defined, but only those parts that are defined can be used in computations. Consider in Haskell:
> let v = iterate (tail) [1..3]
> v
[[1,2,3],[2,3],[3],[],*** Exception: Prelude.tail: empty list
After the 4th element, the value is undefined. Nevertheless, you can use the first 4 elements safely:
> take 4 v
[[1,2,3],[2,3],[3],[]]
Note that the syntax in functional programs is cleverly restricted to avoid that a variable is left undefined.
In logic programming, a variable does not need to refer to a concrete value. So, if we want a list of 3 elements, we might say:
?- length(Xs,3).
Xs = [_A,_B,_C].
In this answer, the elements of the list are variables. All possible instances of these variables are valid solutions. Like Xs = [1,2,3]. Now, lets say that the first element should be different to the remaining elements:
?- length(Xs,3), Xs = [X|Ys], maplist(dif(X), Ys).
Xs = [X,_A,_B], Ys = [_A,_B], dif(X,_B), dif(X,_A).
Later on, we might demand that the elements in Xs are all equal. Before I write it out, I will try it alone:
?- maplist(=(_),Xs).
Xs = []
; Xs = [_A]
; Xs = [_A,_A]
; Xs = [_A,_A,_A]
; Xs = [_A,_A,_A,_A]
; ... .
See that the answers contain always the same variable? Now, I can combine both queries:
?- length(Xs,3), Xs = [X|Ys], maplist(dif(X), Ys), maplist(=(_),Xs).
false.
So what we have shown here is that there is no 3 element list where the first element is different to the other elements and all elements are equal.
This generality has permitted to develop several constraint languages which are offered as libraries to Prolog systems, the most prominent are CLPFD and CHR.
There is no straight forward way to get similar functionality in functional programming. You can emulate things, but the emulation isn't quite the same.
What logic programming is lacking
But there are many things that are lacking in logic programming that make functional programming so interesting. In particular:
Higher-order programming: Functional programming has here a very long tradition and has developed a rich set of idioms. For Prolog, the first proposals date back to the early 1980s, but it is still not very common. At least ISO Prolog has now the homologue to apply called call/2, call/3 ....
Lambdas: Again, it is possible to extend logic programming in that direction, the most prominent system is Lambda Prolog. More recently, lambdas have been developed also for ISO Prolog.
Type systems: There have been attempts, like Mercury, but it has not caught on that much. And there is no system with functionality comparable to type classes.
Purity: Haskell is entirely pure, a function Integer -> Integer is a function. No fine print lurking around. And still you can perform side effects. Comparable approaches are very slowly evolving.
There are many areas where functional and logic programming more or less overlap. For example backtracking and lazyness and list comprehensions, lazy evaluation and freeze/2, when/2, block. DCGs and monads. I will leave discussing these issues to others...

Logic programming and functional programming use different "metaphors" for computation. This often affects how you think about producing a solution, and sometimes means that different algorithms come naturally to a functional programmer than a logic programmer.
Both are based on mathematical foundations that provide more benefits for "pure" code; code that doesn't operate with side effects. There are languages for both paradigms that enforce purity, as well as languages that allow unconstrained side effects, but culturally the programmers for such languages tend to still value purity.
I'm going to consider append, a fairly basic operation in both logical and functional programming, for appending a list on to the end of another list.
In functional programming, we might consider append to be something like this:
append [] ys = ys
append (x:xs) ys = x : append xs ys
While in logic programming, we might consider append to be something like this:
append([], Ys, Ys).
append([X|Xs], Ys, [X|Zs]) :- append(Xs, Ys, Zs).
These implement the same algorithm, and even work basically the same way, but they "mean" something very different.
The functional append defines the list that results from appending ys onto the end of xs. We think of append as a function from two lists to another list, and the runtime system is designed to calculate the result of the function when we invoke it on two lists.
The logical append defines a relationship between three lists, which is true if the third list is the elements of the first list followed by the elements of the second list. We think of append as a predicate that is either true or false for any 3 given lists, and the runtime system is designed to find values that will make this predicate true when we invoke it with some arguments bound to specific lists and some left unbound.
The thing that makes logical append different is you can use it to compute the list that results from appending one list onto another, but you can also use it to compute the list you'd need to append onto the end of another to get a third list (or whether no such list exists), or to compute the list to which you need to append another to get a third list, or to give you two possible lists that can be appended together to get a given third (and to explore all possible ways of doing this).
While equivalent in that you can do anything you can do in one in the other, they lead to different ways of thinking about your programming task. To implement something in functional programming, you think about how to produce your result from the results of other function calls (which you may also have to implement). To implement something in logic programming, you think about what relationships between your arguments (some of which are input and some of which are output, and not necessarily the same ones from call to call) will imply the desired relationship.

Prolog, being a logical language, gives you free backtracking, it's pretty noticeable.
To elaborate, and I precise that I'm in no way expert in any of the paradigms, it looks to me like logical programming is way better when it comes to solving things. Because that's precisely what the language does (that appears clearly when backtracking is needed for example).

I think the difference is this:
imperative programming=modelling actions
function programming=modelling reasoning
logic programming =modelling knowledge
choose what fits your mind best

functional programming:
when 6PM, light on.
logic programming:
when dark, light on.

What are Haskell's strictness points?

We all know (or should know) that Haskell is lazy by default. Nothing is evaluated until it must be evaluated. So when must something be evaluated? There are points where Haskell must be strict. I call these "strictness points", although this particular term isn't as widespread as I had thought. According to me:
Reduction (or evaluation) in Haskell only occurs at strictness points.
So the question is: what, precisely, are Haskell's strictness points? My intuition says that main, seq / bang patterns, pattern matching, and any IO action performed via main are the primary strictness points, but I don't really know why I know that.
(Also, if they're not called "strictness points", what are they called?)
I imagine a good answer will include some discussion about WHNF and so on. I also imagine it might touch on lambda calculus.
Edit: additional thoughts about this question.
As I've reflected on this question, I think it would be clearer to add something to the definition of a strictness point. Strictness points can have varying contexts and varying depth (or strictness). Falling back to my definition that "reduction in Haskell only occurs at strictness points", let us add to that definition this clause: "a strictness point is only triggered when its surrounding context is evaluated or reduced."
So, let me try to get you started on the kind of answer I want. main is a strictness point. It is specially designated as the primary strictness point of its context: the program. When the program (main's context) is evaluated, the strictness point of main is activated. Main's depth is maximal: it must be fully evaluated. Main is usually composed of IO actions, which are also strictness points, whose context is main.
Now you try: discuss seq and pattern matching in these terms. Explain the nuances of function application: how is it strict? How is it not? What about deepseq? let and case statements? unsafePerformIO? Debug.Trace? Top-level definitions? Strict data types? Bang patterns? Etc. How many of these items can be described in terms of just seq or pattern matching?

A good place to start is by understanding this paper: A Natural Semantics for Lazy Evalution (Launchbury). That will tell you when expressions are evaluated for a small language similar to GHC's Core. Then the remaining question is how to map full Haskell to Core, and most of that translation is given by the Haskell report itself. In GHC we call this process "desugaring", because it removes syntactic sugar.
Well, that's not the whole story, because GHC includes a whole raft of optimisations between desugaring and code generation, and many of these transformations will rearrange the Core so that things get evaluated at different times (strictness analysis in particular will cause things to be evaluated earlier). So to really understand how your
program will be evaluated, you need to look at the Core produced by GHC.
Perhaps this answer seems a bit abstract to you (I didn't specifically mention bang patterns or seq), but you asked for something precise, and this is about the best we can do.

I would probably recast this question as, Under what circumstances will Haskell evaluate an expression? (Perhaps tack on a "to weak head normal form.")
To a first approximation, we can specify this as follows:
Executing IO actions will evaluate any expressions that they “need.” (So you need to know if the IO action is executed, e.g. it's name is main, or it is called from main AND you need to know what the action needs.)
An expression that is being evaluated (hey, that's a recursive definition!) will evaluate any expressions it needs.
From your intuitive list, main and IO actions fall into the first category, and seq and pattern matching fall into the second category. But I think that the first category is more in line with your idea of "strictness point", because that is in fact how we cause evaluation in Haskell to become observable effects for users.
Giving all of the details specifically is a large task, since Haskell is a large language. It's also quite subtle, because Concurrent Haskell may evaluate things speculatively, even though we end up not using the result in the end: this is a third breed of things that cause evaluation. The second category is quite well studied: you want to look at the strictness of the functions involved. The first category too can be thought to be a sort of "strictness", though this is a little dodgy because evaluate x and seq x $ return () are actually different things! You can treat it properly if you give some sort of semantics to the IO monad (explicitly passing a RealWorld# token works for simple cases), but I don't know if there's a name for this sort of stratified strictness analysis in general.

C has the concept of sequence points, which are guarantees for particular operations that one operand will be evaluated before the other. I think that's the closest existing concept, but the essentially equivalent term strictness point (or possibly force point) is more in line with Haskell thinking.
In practice Haskell is not a purely lazy language: for instance pattern matching is usually strict (So trying a pattern match forces evaluation to happen at least far enough to accept or reject the match.
…
Programmers can also use the seq primitive to force an expression to evaluate regardless of whether the result will ever be used.
$! is defined in terms of seq.
—Lazy vs. non-strict.
So your thinking about !/$! and seq is essentially right, but pattern matching is subject to subtler rules. You can always use ~ to force lazy pattern matching, of course. An interesting point from that same article:
The strictness analyzer also looks for cases where sub-expressions are always required by the outer expression, and converts those into eager evaluation. It can do this because the semantics (in terms of "bottom") don't change.
Let's continue down the rabbit hole and look at the docs for optimisations performed by GHC:
Strictness analysis is a process by which GHC attempts to determine, at compile-time, which data definitely will 'always be needed'. GHC can then build code to just calculate such data, rather than the normal (higher overhead) process for storing up the calculation and executing it later.
—GHC Optimisations: Strictness Analysis.
In other words, strict code may be generated anywhere as an optimisation, because creating thunks is unnecessarily expensive when the data will always be needed (and/or may only be used once).
…no more evaluation can be performed on the value; it is said to be in normal form. If we are at any of the intermediate steps so that we've performed at least some evaluation on a value, it is in weak head normal form (WHNF). (There is also a 'head normal form', but it's not used in Haskell.) Fully evaluating something in WHNF reduces it to something in normal form…
—Wikibooks Haskell: Laziness
(A term is in head normal form if there is no beta-redex in head position1. A redex is a head redex if it is preceded only by lambda abstractors of non-redexes 2.) So when you start to force a thunk, you're working in WHNF; when there are no more thunks left to force, you're in normal form. Another interesting point:
…if at some point we needed to, say, print z out to the user, we'd need to fully evaluate it…
Which naturally implies that, indeed, any IO action performed from main does force evaluation, which should be obvious considering that Haskell programs do, in fact, do things. Anything that needs to go through the sequence defined in main must be in normal form and is therefore subject to strict evaluation.
C. A. McCann got it right in the comments, though: the only thing that's special about main is that main is defined as special; pattern matching on the constructor is sufficient to ensure the sequence imposed by the IO monad. In that respect only seq and pattern-matching are fundamental.

Haskell is AFAIK not a pure lazy language, but rather a non-strict language. This means that it does not necessarily evaluate terms at the last possible moment.
A good source for haskell's model of "lazyness" can be found here: http://en.wikibooks.org/wiki/Haskell/Laziness
Basically, it is important to understand the difference between a thunk and the weak header normal form WHNF.
My understanding is that haskell pulls computations through backwards as compared to imperative languages. What this means is that in the absence of "seq" and bang patterns, it will ultimately be some kind of side effect that forces the evaluation of a thunk, which may cause prior evaluations in turn (true lazyness).
As this would lead to a horrible space leak, the compiler then figures out how and when to evaluate thunks ahead of time to save space. The programmer can then support this process by giving strictness annotations (en.wikibooks.org/wiki/Haskell/Strictness , www.haskell.org/haskellwiki/Performance/Strictness) to further reduce space usage in form of nested thunks.
I am not an expert in the operational semantics of haskell, so I will just leave the link as a resource.
Some more resources:
http://www.haskell.org/haskellwiki/Performance/Laziness
http://www.haskell.org/haskellwiki/Haskell/Lazy_Evaluation

Lazy doesn't mean do nothing. Whenever your program pattern matches a case expression, it evaluates something -- just enough anyway. Otherwise it can't figure out which RHS to use. Don't see any case expressions in your code? Don't worry, the compiler is translating your code to a stripped down form of Haskell where they are hard to avoid using.
For a beginner, a basic rule of thumb is let is lazy, case is less lazy.

This is not a full answer aiming for karma, but just a piece of the puzzle -- to the extent that this is about semantics, bear in mind that there are multiple evaluation strategies that provide the same semantics. One good example here -- and the project also speaks to how we typically think of Haskell semantics -- was the Eager Haskell project, which radically altered evaluation strategies while maintaining the same semantics: http://csg.csail.mit.edu/pubs/haskell.html

The Glasgow Haskell compiler translates your code into a Lambda-calculus-like language called core. In this language, something is going to be evaluated, whenever you pattern match it by a case-statement. Thus if a function is called, the outermost constructor and only it (if there are no forced fields) is going to be evaluated. Anything else is canned in a thunk. (Thunks are introduced by let bindings).
Of course this is not exactly what happens in the real language. The compiler convert Haskell into Core in a very sophisticated way, making as many things as possibly lazy and anything that is always needed lazy. Additionally, there are unboxed values and tuples that are always strict.
If you try to evaluate a function by hand, you can basically think:
Try to evaluate the outermost constructor of the return.
If anything else is needed to get the result (but only if it's really needed) is also going to be evaluated. The order doesn't matters.
In case of IO you have to evaluate the results of all statements from the first to the last in that. This is a bit more complicated, since the IO monad does some tricks to force evaluation in a specific order.

We all know (or should know) that Haskell is lazy by default. Nothing is evaluated until it must be evaluated.
No.
Haskell is not a lazy language
Haskell is a language in which evaluation order doesn't matter because there are no side effects.
It's not quite true that evaluation order doesn't matter, because the language allows for infinite loops. If you aren't careful, it's possible to get stuck in a cul-de-sac where you evaluate a subexpression forever when a different evaluation order would have led to termination in finite time. So it's more accurate to say:
Haskell implementations must evaluate the program in a way that terminates if there is any evaluation order that terminates. Only if every possible evaluation order fails to terminate can the implementation fail to terminate.
This still leaves implementations with a huge freedom in how they evaluate the program.
A Haskell program is a single expression, namely let {all top-level bindings} in Main.main. Evaluation can be understood as a sequence of reduction (small-)steps which change the expression (which represents the current state of the executing program).
You can divide reduction steps into two categories: those that are provably necessary (provably will be part of any terminating sequence of reductions), and those that aren't. You can divide the provably necessary reductions somewhat vaguely into two subcategories: those that are "obviously" necessary, and those that require some nontrivial analysis to prove them necessary.
Performing only obviously necessary reductions is what's called "lazy evaluation". I don't know whether a purely lazy evaluating implementation of Haskell has ever existed. Hugs may have been one. GHC definitely isn't.
GHC performs reduction steps at compile time that aren't provably necessary; for example, it will replace 1+2::Int with 3::Int even if it can't prove that the result will be used.
GHC may also perform not-provably-necessary reductions at run time in some circumstances. For example, when generating code to evaluate f (x+y), if x and y are of type Int and their values will be known at run time, but f can't be proven to use its argument, there is no reason not to compute x+y before calling f. It uses less heap space and less code space and is probably faster even if the argument isn't used. However, I don't know whether GHC actually takes these sorts of optimization opportunities.
GHC definitely performs evaluation steps at run time that are proven necessary only by fairly complex cross-module analyses. This is extremely common and may represent the bulk of the evaluation of realistic programs. Lazy evaluation is a last-resort fallback evaluation strategy; it isn't what happens as a rule.
There was an "optimistic evaluation" branch of GHC that did much more speculative evaluation at run time. It was abandoned because of its complexity and the ongoing maintenance burden, not because it didn't perform well. If Haskell was as popular as Python or C++ then I'm sure there would be implementations with much more sophisticated runtime evaluation strategies, maintained by deep-pocketed corporations. Non-lazy evaluation isn't a change to the language, it's just an engineering challenge.
Reduction is driven by top-level I/O, and nothing else
You can model interaction with the outside world by means of special side-effectful reduction rules like: "If the current program is of the form getChar >>= <expr>, then get a character from standard input and reduce the program to <expr> applied to the character you got."
The entire goal of the run time system is to evaluate the program until it has one of these side-effecting forms, then do the side effect, then repeat until the program has some form that implies termination, such as return ().
There are no other rules about what is reduced when. There are only rules about what can reduce to what.
For example, the only rules for if expressions are that if True then <expr1> else <expr2> can be reduced to <expr1>, if False then <expr1> else <expr2> can be reduced to <expr2>, and if <exc> then <expr1> else <expr2>, where <exc> is an exceptional value, can be reduced to an exceptional value.
If the expression representing the current state of your program is an if expression, you have no choice but to perform reductions on the condition until it's True or False or <exc>, because that's the only way you'll ever get rid of the if expression and have any hope of reaching a state that matches one of the I/O rules. But the language specification doesn't tell you to do that in so many words.
These sorts of implicit ordering constraints are the only way that evaluation can be "forced" to happen. This is a frequent source of confusion for beginners. For example, people sometimes try to make foldl more strict by writing foldl (\x y -> x `seq` x+y) instead of foldl (+). This doesn't work, and nothing like it can ever work, because no expression can make itself evaluate. The evaluation can only "come from above". seq is not special in any way in this regard.
Reduction happens everywhere
Reduction (or evaluation) in Haskell only occurs at strictness points. [...] My intuition says that main, seq / bang patterns, pattern matching, and any IO action performed via main are the primary strictness points [...].
I don't see how to make sense of that statement. Every part of the program has some meaning, and that meaning is defined by reduction rules, so reduction happens everywhere.
To reduce a function application <expr1> <expr2>, you have to evaluate <expr1> until it has a form like (\x -> <expr1'>) or (getChar >>=) or something else that matches a rule. But for some reason function application doesn't tend to show up on lists of expressions that allegedly "force evaluation", while case always does.
You can see this misconception in a quote from the Haskell wiki, found in another answer:
In practice Haskell is not a purely lazy language: for instance pattern matching is usually strict
I don't understand what could qualify as a "purely lazy language" to whoever wrote that, except, perhaps, a language in which every program hangs because the runtime never does anything. If pattern matching is a feature of your language then you've got to actually do it at some point. To do it, you have to evaluate the scrutinee enough to determine whether it matches the pattern. That's the laziest way to match a pattern that is possible in principle.
~-prefixed patterns are often called "lazy" by programmers, but the language spec calls them "irrefutable". Their defining property is that they always match. Because they always match, you don't have to evaluate the scrutinee to determine whether they match or not, so a lazy implementation won't. The difference between regular and irrefutable patterns is what expressions they match, not what evaluation strategy you're supposed to use. The spec says nothing about evaluation strategies.
main is a strictness point. It is specially designated as the primary strictness point of its context: the program. When the program (main's context) is evaluated, the strictness point of main is activated. [...] Main is usually composed of IO actions, which are also strictness points, whose context is main.
I'm not convinced that any of that has any meaning.
Main's depth is maximal: it must be fully evaluated.
No, main only has to be evaluated "shallowly", to make I/O actions appear at the top level. main is the entire program, and the program isn't completely evaluated on every run because not all code is relevant to every run (in general).
discuss seq and pattern matching in these terms.
I already talked about pattern matching. seq can be defined by rules that are similar to case and application: for example, (\x -> <expr1>) `seq` <expr2> reduces to <expr2>. This "forces evaluation" in the same way that case and application do. WHNF is just a name for what these expressions "force evaluation" to.
Explain the nuances of function application: how is it strict? How is it not?
It's strict in its left expression just like case is strict in its scrutinee. It's also strict in the function body after substitution just like case is strict in the RHS of the selected alternative after substitution.
What about deepseq?
It's just a library function, not a builtin.
Incidentally, deepseq is semantically weird. It should take only one argument. I think that whoever invented it just blindly copied seq with no understanding of why seq needs two arguments. I count deepseq's name and specification as evidence that a poor understanding of Haskell evaluation is common even among experienced Haskell programmers.
let and case statements?
I talked about case. let, after desugaring and type checking, is just a way of writing an arbitrary expression graph in tree form. Here's a paper about it.
unsafePerformIO?
To an extent it can be defined by reduction rules. For example, case unsafePerformIO <expr> of <alts> reduces to unsafePerformIO (<expr> >>= \x -> return (case x of <alts>)), and at the top level only, unsafePerformIO <expr> reduces to <expr>.
This doesn't do any memoization. You could try to simulate memoization by rewriting every unsafePerformIO expression to explicitly memoize itself, and creating the associated IORefs... somewhere. But you could never reproduce GHC's memoization behavior, because it depends on unpredictable details of the optimization process, and because it isn't even type safe (as shown by the infamous polymorphic IORef example in the GHC documentation).
Debug.Trace?
Debug.Trace.trace is just a simple wrapper around unsafePerformIO.
Top-level definitions?
Top-level variable bindings are the same as nested let bindings. data, class, import, and such are a whole different ball game.
Strict data types? Bang patterns?
Just sugar for seq.

Why Haskell doesn't have a single element tuple?

I'm wondering why Haskell doesn't have a single element tuple. Is it just because nobody needed it so far, or any rational reasons? I found an interesting thread in a comment at the Real World Haskell's website http://book.realworldhaskell.org/read/types-and-functions.html#funcstypes.composite, and people guessed various reasons like:
No good syntax sugar.
It is useless.
You can think that a normal value like (1) is actually a single element tuple.
But does anyone know the reason except a guess?

There's a lib for that!
http://hackage.haskell.org/packages/archive/OneTuple/0.2.1/doc/html/Data-Tuple-OneTuple.html
Actually, we have a OneTuple we use all the time. It's called Identity, and is now used as the base of standard pure monads in the new mtl:
http://hackage.haskell.org/packages/archive/transformers/0.2.2.0/doc/html/Data-Functor-Identity.html
And it has an important use! By virtue of providing a type constructor of kind * -> *, it can be made an instance (a trival one, granted, though not the most trivial) of Monad, Functor, etc., which lets us use it as a base for transformer stacks.

The exact reason is because it's totally unnecessary. Why would you need a one-tuple if you can just have its value?
The syntax also tends to be a bit clunky. In Python, you can have one-tuples, but you need a trailing comma to distinguish it from a parenthesized expression:
onetuple = (3,)
All in all, there's no reason for it. I'm sure there's no "official" reason because the designers of Haskell probably never even considered a single element tuple because it has no use.
I don't know if you were looking for some reasons beyond the obvious, but in this case the obvious answer is the right one.

My answer is not exactly about Haskell semantics, but about the theoretical mathematical elegance of making a value the same as its one-tuple. (So this answer should not be taken as an explanation of the standard behavior expected of a Haskell implementation, because it isn't intended as such.)
In programming languages and computation models where all functions are curried, such as lambda-calculus and combinatory logic, every function has exactly one input argument and one output/return value. No more, no less.
When we want a particular function f to have more than one input argument – say 3 –, we simulate it under this curried regime by creating a 1-argument function that returns a 2-argument function. Thus, f x y z = ((f x) y) z, and f x would return a 2-argument function.
Likewise, sometimes we might want to return more than one value from a function. It is not literally possible under this semantics, but we can simulate it by returning a tuple. We can generalize this.
If, for uniformity, we constrain the only return value of any function to be an (n-)tuple, we are able to harmonize some interesting features of the unit value and of supposedly non-tuple return values with the features of tuples in general, as follows.
Let's adopt as the general syntax of n-tuples the following schema, where ci is the component with the index i:
Notice that n-tuples have delimiting parentheses in this syntax.
Under this schema, how would we represent a 0-tuple? Since it has no components, this degenerate case would be represented like this: ( ). This syntax precisely coincides with the syntax we adopt to represent the unit value. So, we are tempted to make the unit value the same as a 0-tuple.
What about a 1-tuple? It would have this representation: . Here a syntactical ambiguity would immediately arise: parentheses would be used in the language both as 1-tuple delimiters and as mere grouping of values or expressions. So, in a context where (v) appears, a compiler or interpreter would be unsure whether this is a 1-tuple with a component whose value is v, or just an isolated value v inside superfluous parentheses.
A way to solve this ambiguity is to force a value to be the same as the 1-tuple that would have it as its only component. Not much would be sacrificed, since the only non-empty projection we can perform on a 1-tuple is to obtain its only value.
For this to be consistently enforced, the syntactical consequence is that we would have to relax a bit our former requirement that delimiting parentheses are mandatory for all n-tuples: now they would be optional for 1-tuples, and mandatory for all other values of n. (Or we could require all values to be delimited by parentheses, but this would be inconvenient for practical use.)
In summary, under the interpretation that a 1-tuple is the same as its only component value, we could, by making syntactic puns with parentheses, consider all return values of functions in our programming language or computing model as n-tuples: the 0-tuple in the case of the unit type, 1-tuples in the case of ordinary/"atomic" values which we usually don't think of as tuples, and pairs/triples/quadruples/... for other kinds of tuples. This heterodox interpretation is mathematically parsimonious and uniform, is expressive enough to simulate functions with multiple input arguments and multiple return values, and is not incompatible with Haskell (in the sense that no harm is done if the programmer assumes this unofficial interpretation).
This was an argument by syntactic puns. Whether you are satisfied or not with it, we can do even better. A more principled argument can be taken from the mathematical theory of relations, by exploring the Cartesian product operation.
An (n-adic) relation is extensionally defined as a uniform set of (n-)tuples. (This characterization is fundamental to relational database theory and is therefore important knowledge for professional computer programmers.)
A dyadic relation – a set of pairs (2-tuples) – is a subset of the Cartesian product of 2 sets:
. For a homogeneous relation: .
A triadic relation – a set of triples (3-tuples) – is a subset of the Cartesian product of 3 sets:
. For a homogeneous relation: .
A monadic relation – a set of monuples (1-tuples) – is a subset of the Cartesian product of 1 set:
(by the usual mathematical convention).
As we can see, a monadic relation is just a set of atomic values! This means that a set of 1-tuples is a set of atomic values. Therefore, it is convenient to consider that 1-tuples and atomic values are the same thing.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string