Is everything in Haskell stored in thunks, even simple values?

Is everything in Haskell stored in thunks, even simple values? - haskell

What do the thunks for the following value/expression/function look like in the Haskell heap?
val = 5 -- is `val` a pointer to a box containing 5?
add x y = x + y
result = add 2 val
main = print $ result
Would be nice to have a picture of how these are represented in Haskell, given its lazy evaluation mode.

Official answer
It's none of your business. Strictly implementation detail of your compiler.
Short answer
Yes.
Longer answer
To the Haskell program itself, the answer is always yes, but the compiler can and will do things differently if it finds out that it can get away with it, for performance reasons.
For example, for '''add x y = x + y''', a compiler might generate code that works with thunks for x and y and constructs a thunk as a result.
But consider the following:
foo :: Int -> Int -> Int
foo x y = x * x + y * y
Here, an optimizing compiler will generate code that first takes x and y out of their boxes, then does all the arithmetic, and then stores the result in a box.
Advanced answer
This paper describes how GHC switched from one way of implementing thunks to another that was actually both simpler and faster:
http://research.microsoft.com/en-us/um/people/simonpj/papers/eval-apply/

In general, even primitive values in Haskell (e.g. of type Int and Float) are represented by thunks. This is indeed required by the non-strict semantics; consider the following fragment:
bottom :: Int
bottom = div 1 0
This definition will generate a div-by-zero exception only if the value of bottom is inspected, but not if the value is never used.
Consider now the add function:
add :: Int -> Int -> Int
add x y = x+y
A naive implementation of add must force the thunk for x, force the thunk for y, add the values and create an (evaluated) thunk for the result. This is a huge overhead for arithmetic compared to strict functional languages (not to mention imperative ones).
However, an optimizing compiler such as GHC can mostly avoid this overhead; this is a simplified view of how GHC translates the add function:
add :: Int -> Int -> Int
add (I# x) (I# y) = case# (x +# y) of z -> I# z
Internally, basic types like Int is seen as datatype with a single constructor. The type Int# is the "raw" machine type for integers and +# is the primitive addition on raw types.
Operations on raw types are implemented directly on bit-patterns (e.g. registers) --- not thunks. Boxing and unboxing are then translated as constructor application and pattern matching.
The advantage of this approach (not visible in this simple example) is that the compiler is often capable of inlining such definitions and removing intermediate boxing/unboxing operations, leaving only the outermost ones.

It would be absolutely correct to wrap every value in a thunk. But since Haskell is non-strict, compilers can choose when to evaluate thunks/expressions. In particular, compilers can choose to evaluate an expression earlier than strictly necessary, if it results in better code.
Optimizing Haskell compilers (GHC) perform Strictness analysis to figure out, which values will always be computed.
In the beginning, the compiler has to assume, that none of a function's arguments are ever used. Then it goes over the body of the function and tries to find functions applications that 1) are known to be strict in (at least some of) their arguments and 2) always have to be evaluated to compute the function's result.
In your example, we have the function (+) that is strict in both it's arguments. Thus the compiler knows that both x and y are always required to be evaluated at this point.
Now it just so happens, that the expression x+y is always necessary to compute the function's result, therefore the compiler can store the information that the function add is strict in both x and y.
The generated code for add* will thus expect integer values as parameters and not thunks. The algorithm becomes much more complicated when recursion is involved (a fixed point problem), but the basic idea remains the same.
Another example:
mkList x y =
if x then y : []
else []
This function will take x in evaluated form (as a boolean) and y as a thunk. The expression x needs to be evaluated in every possible execution path through mkList, thus we can have the caller evaluate it. The expression y, on the other hand, is never used in any function application that is strict in it's arguments. The cons-function : never looks at y it just stores it in a list. Thus y needs to be passed as a thunk in order to satisfy the lazy Haskell semantics.
mkList False undefined -- absolutely legal
*: add is of course polymorphic and the exact type of x and y depends on the instantiation.

Short answer: Yes.
Long answer:
val = 5
This has to be stored in a thunk, because imagine if we wrote this anywhere in our code (like, in a library we imported or something):
val = undefined
If this has to be evaluated when our program starts, it would crash, right? If we actually use that value for something, that would be what we want, but if we don't use it, it shouldn't be able to influence our program so catastrophically.
For your second example, let me change it a little:
div x y = x / y
This value has to be stored in a thunk as well, because imagine some code like this:
average list =
if null list
then 0
else div (sum list) (length list)
If div was strict here, it would be evaluated even when the list is null (aka. empty), meaning that writing the function like this wouldn't work, because it wouldn't have a chance to return 0 when given the empty list, even though this is what we would want in this case.
Your final example is just a variation of example 1, and it has to be lazy for the same reasons.
All this being said, it is possible to force the compiler to make some values strict, but that goes beyond the scope of this question.

I think the others answered your question nicely, but just for completeness's sake let me add that GHC offers you the possibility of using unboxed values directly as well. This is what Haskell Wiki says about it:
When you are really desperate for speed, and you want to get right down to the “raw bits.” Please see GHC Primitives for some information about using unboxed types.
This should be a last resort, however, since unboxed types and primitives are non-portable. Fortunately, it is usually not necessary to resort to using explicit unboxed types and primitives, because GHC's optimiser can do the work for you by inlining operations it knows about, and unboxing strict function arguments. Strict and unpacked constructor fields can also help a lot. Sometimes GHC needs a little help to generate the right code, so you might have to look at the Core output to see whether your tweaks are actually having the desired effect.
One thing that can be said for using unboxed types and primitives is that you know you're writing efficient code, rather than relying on GHC's optimiser to do the right thing, and being at the mercy of changes in GHC's optimiser down the line. This may well be important to you, in which case go for it.
As mentioned, it's non-portable, so you need a GHC language extension. See here for their docs.

Related

What is the difference between normal and lambda Haskell functions?

I'm a beginner to Haskell and I've been following the e-book Get Programming with Haskell
I'm learning about closures with Lambda functions but I fail to see the difference in the following code:
genIfEven :: Integral p => p -> (p -> p) -> p
genIfEven x = (\f -> isEven f x)
genIfEven2 :: Integral p => (p -> p) -> p -> p
genIfEven2 f x = isEven f x
It would be great if anyone could explain what the precise difference here is

At a basic level1, there isn't really a difference between "normal" functions and ones created with lambda syntax. What made you think there was a difference to ask what it is? (In the particular example you've shown, the functions take their parameters in a different order, but are otherwise the same; either of them could be defined with lambda syntax or "normal" syntax)
Functions are first class values in Haskell. Which means you can pass them to other functions, return them as results, store and retrieve them in data structures, etc, etc. Just like you can with numbers, or strings, or any other value.
Just like with numbers, strings, etc, it's helpful to have syntax for denoting a function value, because you might want to make a simple one right in the middle of other code. It would be pretty horrible if you, say, needed to pass x + 1 to some function and you couldn't just write the literal 1 for the number one, you had to instead go elsewhere in the file and add a one = 1 binding so that you could come back and write x + one. In exactly the same way, you might need to pass to some other function a function for adding 1; it would be annoying to go elsewhere in the file and add a separate definition plusOne x = x + 1, so lambda syntax gives us a way of writing "function literals": \x -> x + 1.2
Considering "normal" function definition syntax, like this:
incrementAllBy _ [] = []
incrementAllBy n (x:xs) = (x + n) : xs
Here we don't have any bit of source code that just represents the function value that incrementAllBy is a name for. The function is implied in this syntax, spread over (possibly) multiple "rules" that say what value our function returns given that it is applied to arguments of a certain form. This syntax also fundamentally forces us to bind the function to a name. All of that is in contrast to lambda syntax which just directly represents the function itself, with no bundled case analysis or name.
However they are both merely different ways of writing down functions. Once defined, there is no difference between the functions, whichever syntax you used to express them.
You mentioned that you were learning about closures. It's not really clear how that relates to the question, but I can guess it's being a bit confusing.
I'm going to say something slightly controversial here: you don't need to learn about closures.3
A closure is what is involved in making something like incrementAllBy n xs = map (\x -> x + n) xs work. The function created here \x -> x + n depends on n, which is a parameter so it can be different every time incrementAllBy is called, and there can be multiple such calls running at the same time. So this \x -> x + n function can't end up as just a chunk of compiled code at a particular address in the program's binary, the way top-level functions are. The structure in memory that is passed to map has to either store a copy of n or store a reference to it. Such a structure is called a "closure", and is said to have "closed over" n, or "captured" it.
In Haskell, you don't need to know any of that. I view the expression \n -> x + n as simply creating a new function value, depending on the value n (and also the value +, which is a first-class value too!) that happens to be in scope. I don't think you need to think about this any differently than you would think about the expression x + n creating a new numeric value depending on a local n. Closures in Haskell only matter when you're trying to understand how the language is implemented, not when you're programming in Haskell.
Closures do matter in imperative languages. There the question of whether (the equivalent of) \x -> x + n stores a reference to n or a copy of n (and when the copy is taken, and how deeply) is critical to understanding how code using this function works, because n isn't just a way of referring to a value, it's a variable that has (potentially) different values over time.
But in Haskell I don't really think we should teach beginners about the term or concept of closures. It vastly over-complicates "you can make new functions out of existing values, even ones that are only locally in scope".
So if you have been given these two functions as examples to try and illustrate the concept of closures, and it isn't making sense to you what difference this "closure" makes, you can probably ignore the whole issue and move on to something more important.
1 Sometimes which choice of "equivalent" syntax you use to write your code does affect operational behaviour, like performance. Usually (but not always) the effect is negligible. As a beginner, I highly recommend ignoring such issues for now, so I haven't mentioned them. It's much easier to learn the principles involved in reasoning about how your code might be executed once you've got a thorough understanding of what all the language elements are supposed to mean.
It can sometimes also affect the way GHC infers types (mostly not actually the fact that they're lambdas, but if you bind function names without syntactic parameters as in plusOne = \x -> x + 1 you can trip up the monomorphism restriction, but that's another topic covered in many Stack Overflow questions, so I won't address it here).
2 In this case you could also use an operator section to write an even simpler function literal as (+1).
3 Now I'm going to teach you about closures so I can explain why you don't need to know about closures. :P

There is no difference whatsoever, except one: lambda expressions don't need a name. For that reason, they are sometimes called "anonymous functions" in other languages.
If you plan to use a function often, you'll want to give it a name, if you only need it once, a lambda will usually do, as you can define it at the site where you use it.
You can, of course, name an anonymous function after it is born:
genIfEven2 = \f x -> isEven f x
That would be completely equivalent to your definition.

Primitive types as expanded classes in Haskell?

In Eiffel one is allowed to use an expanded class which doesn't allocate from the heap. From a developer's perspective one rarely has to think about conversion from Int to Float as it is automatic. My question is this: Why did Haskell not choose a similar approach to modeling Num. Specifically, lets consider the Int instance. Here is the rationale for my question:
[1..3] = [1,2,3]
[1..3.5] = [1.0,2.0,3.0,4.0] -- rounds up
The second list was something that I was not expecting because there are by definition infinite floating point numbers between any two integers. Of course once we test the sequence it is clear that it is returning the floor of the floating point number rounded up. One of the reasons these conversions are needed is allow us to compute mean of a set of Integers for example.
In Eiffel the number type hierarchy is a bit more programmer friendly and the conversion happens as needed: for example creating a sequence can still be a set of Ints that result in a floating point mean. This has a readability advantage.
Is there a reason that expanded class was not implemented in Haskell?Any references will greatly help.
#ony: the point about parallel strategies: wont we face the same issue when using primitives? The manual does discourage using primitives and that makes sense to me in general where ever we can use primitives we probably need to use the abstract type. The issue I faced when trying to a mean of numbers is the missing Fractional Int instance and as to why does 5/3 not promote to a floating point instead of having to create floating point array to achieve the same result. There must be a reason as to why Fractional instance of Int and Integer is not defined? That could help me understand the rationale better.
#leftroundabout: the question is not about expanded classes per se but the convenience that such a feature can offer although that feature alone is not sufficient to handle the type promotion to float from an int for example as mentioned in my response to #ony. Lets take the classic example of a mean and try to define it as
> [Int] :: Double
let mean xs = sum xs / length xs (--not valid haskell code)
[Int] :: Double
let mean = sum xs / fromIntegral (length xs)
I would have liked it if I did not have to call the fromIntegral to get the mean function to work and that ties to the missing Fractional Int. Although the explanation seems to make sense, it has to, what I dont understand is if I am clear that I expect a double and I state it in my type signature is that not sufficient to do the appropriate conversion?

[a..b] is shorthand for enumFromTo a b, a method of the Enum typeclass. It begins at a and succs until the first time b is exceeded.
[a,b..c] is shorthand for enumFromThenTo a b c is similar to enumFromTo except instead of succing it adds the difference b-a each time. By default this difference is computed by roundtripping through Int so fractional differences may or may not be respected. That said, Double works as you'd expect
Prelude> [0.0, 0.5.. 10]
[0.0,0.5,1.0,1.5,2.0,2.5,3.0,3.5,4.0,4.5,5.0,5.5,6.0,6.5,7.0,7.5,8.0,8.5,9.0,9.5,10.0]
[a..] is shorthand for enumFrom a which just succs forever.
[a,b..] is shorthand for enumFromThen a b which just adds (b-a) forever.

As for behaviour #J.Abrahamson already replied. That's definition enumFromThenTo.
As for design...
Actually GHC have Float# that represents unboxed type (can be allocated anywhere, but value is strict).
Since Haskell is a lazy language it assumes that most of the values are not required initially, until they actually referred with a primitive with a strict arguments.
Consider length [2..10]. In this case without optimization Haskell may even avoid generation of numbers and simply build up a list (without values). Probably more useful example takeWhile (<100) [x*(x-1) | x <- [2..]].
But you shouldn't think that we have overhead here since you are writing in language that abstracts away all that stuff with thumbs (except of strict notation). Haskell compiler have to take this as a work for itself. I.e. when compiler will be able to tell that all elements of list will be referenced (transformed to normal form) and it decides to process it within one stack of returns it can allocate it on stack.
Also with such approach you can gain more out of your code by using multiple CPU cores. Imagine that using Strategies your list being processed on a different cores and thus they should share common data on heap (not on stack).

Can compilers deduce/prove mathematically?

I'm starting to learn functional programming language like Haskell, ML and most of the exercises will show off things like:
foldr (+) 0 [ 1 ..10]
which is equivalent to
sum = 0
for( i in [1..10] )
sum += i
So that leads me to think why can't compiler know that this is Arithmetic Progression and use O(1) formula to calculate?
Especially for pure FP languages without side effect?
The same applies for
sum reverse list == sum list
Given a + b = b + a
and definition of reverse, can compilers/languages prove it automatically?

Compilers generally don't try to prove this kind of thing automatically, because it's hard to implement.
As well as adding the logic to the compiler to transform one fragment of code into another, you have to be very careful that it only tries to do it when it's actually safe - i.e. there are often lots of "side conditions" to worry about. For example in your example above, someone might have written an instance of the type class Num (and hence the (+) operator) where the a + b is not b + a.
However, GHC does have rewrite rules which you can add to your own source code and could be used to cover some relatively simple cases like the ones you list above, particularly if you're not too bothered about the side conditions.
For example, and I haven't tested this, you might use the following rule for one of your examples above:
{-# RULES
"sum/reverse" forall list . sum (reverse list) = sum list
#-}
Note the parentheses around reverse list - what you've written in your question actually means (sum reverse) list and wouldn't typecheck.
EDIT:
As you're looking for official sources and pointers to research, I've listed a few.
Obviously it's hard to prove a negative but the fact that no-one has given an example of a general-purpose compiler that does this kind of thing routinely is probably quite strong evidence in itself.
As others have pointed out, even simple arithmetic optimisations are surprisingly dangerous, particularly on floating point numbers, and compilers generally have flags to turn them off - for example Visual C++, gcc. Even integer arithmetic isn't always clear-cut and people occasionally have big arguments about how to deal with things like overflow.
As Joachim noted, integer variables in loops are one place where slightly more sophisticated optimisations are applied because there are actually significant wins to be had. Muchnick's book is probably the best general source on the topic but it's not that cheap. The wikipedia page on strength reduction is probably as good an introduction as any to one of the standard optimisations of this kind, and has some references to the relevant literature.
FFTW is an example of a library that does all kinds of mathematical optimization internally. Some of its code is generated by a customised compiler the authors wrote specifically for the purpose. It's worthwhile because the authors have domain-specific knowledge of optimizations that in the specific context of the library are both worth the effort and safe
People sometimes use template metaprogramming to write "self-optimising libraries" that again might rely on arithmetic identities, see for example Blitz++. Todd Veldhuizen's PhD dissertation has a good overview.
If you descend into the realms of toy and academic compilers all sorts of things go. For example my own PhD dissertation is about writing inefficient functional programs along with little scripts that explain how to optimise them. Many of the examples (see Chapter 6) rely on applying arithmetic rules to justify the underlying optimisations.
Also, it's worth emphasising that the last few examples are of specialised optimisations being applied only to certain parts of the code (e.g. calls to specific libraries) where it is expected to be worthwhile. As other answers have pointed out, it's simply too expensive for a compiler to go searching for all possible places in an entire program where an optimisation might apply. The GHC rewrite rules that I mentioned above are a great example of a compiler exposing a generic mechanism for individual libraries to use in a way that's most appropriate for them.

The answer
No, compilers don’t do that kind of stuff.
One reason why
And for your examples, it would even be wrong: Since you did not give type annotations, the Haskell compiler will infer the most general type, which would be
foldr (+) 0 [ 1 ..10] :: Num a => a
and similar
(\list -> sum (reverse list)) :: Num a => [a] -> a
and the Num instance for the type that is being used might well not fulfil the mathematical laws required for the transformation you suggest. The compiler should, before everything else, avoid to change the meaning (i.e. the semantics) of your program.
More pragmatically: The cases where the compiler could detect such large-scale transformations rarely occur in practice, so it would not be worth it to implement them.
An exception
Note notable exceptions are linear transformations in loops. Most compilers will rewrite
for (int i = 0; i < n; i++) {
... 200 + 4 * i ...
}
to
for (int i = 0, j = 200; i < n; i++, j += 4) {
... j ...
}
or something similar, as that pattern does often occur in code working on array.

The optimizations you have in mind will probably not be done even in the presence of monomorphic types, because there are so many possibilities and so much knowledge required. For example, in this example:
sum list == sum (reverse list)
The compiler would need to know or take into account the following facts:
sum = foldl (+) 0
(+) is commutative
reverse list is a permutation of list
foldl x c l, where x is commutative and c is a constant, yields the same result for all permutations of l.
This all seems trivial. Sure, the compiler can most probably look up the definition of sumand inline it. It could be required that (+) be commutative, but remember that +is just another symbol without attached meaning to the compiler. The third point would require the compiler to prove some non trivial properties about reverse.
But the point is:
You don't want to perform the compiler to do those calculations with each and every expression. Remember, to make this really useful, you'd have to heap up a lot of knowledge about many, many standard functions and operators.
You still can't replace the expression above with True unless you can rule out the possibility that list or some list element is bottom. Usually, one cannot do this. You can't even do the following "trivial" optimization of f x == f x in all cases
f x `seq` True
For, consider
f x = (undefined :: Bool, x)
then
f x `seq` True ==> True
f x == f x ==> undefined
That being said, regarding your first example slightly modified for monomorphism:
f n = n * foldl (+) 0 [1..10] :: Int
it is imaginable to optimize the program by moving the expression out of its context and replace it with the name of a constant, like so:
const1 = foldl (+) 0 [1..10] :: Int
f n = n * const1
This is because the compiler can see that the expression must be constant.

What you're describing looks like super-compilation. In your case, if the expression had a monomorphic type like Int (as opposed to polymorphic Num a => a), the compiler could infer that the expression foldr (+) 0 [1 ..10] has no external dependencies, therefore it could be evaluated at compile time and replaced by 55. However, AFAIK no mainstream compiler currently does this kind of optimization.
(In functional programming "proving" is usually associated with something different. In languages with dependent types types are powerful enough to express complex proposition and then through the Curry-Howard correspondence programs become proofs of such propositions.)

As others have noted, it's unclear that your simplifications even hold in Haskell. For instance, I can define
newtype NInt = N Int
instance Num NInt where
N a + _ = N a
N b * _ = N b
... -- etc
and now sum . reverse :: Num [a] -> a does not equal sum :: Num [a] -> a since I can specialize each to [NInt] -> NInt where sum . reverse == sum clearly does not hold.
This is one general tension that exists around optimizing "complex" operations—you actually need quite a lot of information in order to successfully prove that it's okay to optimize something. This is why the syntax-level compiler optimization which do exist are usually monomorphic and related to the structure of programs---it's usually such a simplified domain that there's "no way" for the optimization to go wrong. Even that is often unsafe because the domain is never quite so simplified and well-known to the compiler.
As an example, a very popular "high-level" syntactic optimization is stream fusion. In this case the compiler is given enough information to know that stream fusion can occur and is basically safe, but even in this canonical example we have to skirt around notions of non-termination.
So what does it take to have \x -> sum [0..x] get replaced by \x -> x*(x + 1)/2? The compiler would need a theory of numbers and algebra built-in. This is not possible in Haskell or ML, but becomes possible in dependently typed languages like Coq, Agda, or Idris. There you could specify things like
revCommute :: (_+_ :: a -> a -> a)
-> Commutative _+_
-> foldr _+_ z (reverse as) == foldr _+_ z as
and then, theoretically, tell the compiler to rewrite according to revCommute. This would still be difficult and finicky, but at least we'd have enough information around. To be clear, I'm writing something very strange above, a dependent type. The type not only depends on the ability to introduce both a type and a name for the argument inline, but also the existence of the entire syntax of your language "at the type level".
There are a lot of differences between what I just wrote and what you'd do in Haskell, though. First, in order to form a basis where such promises can be taken seriously, we must throw away general recursion (and thus we already don't have to worry about questions of non-termination like stream-fusion does). We also must have enough structure around to create something like the promise Commutative _+_---this likely depends upon there being an entire theory of operators and mathematics built into the language's standard library else you would need to create that yourself. Finally, the richness of type system required to even express these kinds of theories adds a lot of complexity to the entire system and tosses out type inference as you know it today.
But, given all that structure, I'd never be able to create an obligation Commutative _+_ for the _+_ defined to work on NInts and so we could be certain that foldr (+) 0 . reverse == foldr (+) 0 actually does hold.
But now we'd need to tell the compiler how to actually perform that optimization. For stream-fusion, the compiler rules only kick in when we write something in exactly the right syntactic form to be "clearly" an optimization redex. The same kinds of restrictions would apply to our sum . reverse rule. In fact, already we're sunk because
foldr (+) 0 . reverse
foldr (+) 0 (reverse as)
don't match. They're "obviously" the same due to some rules we could prove about (.), but that means that now the compiler must invoke two built-in rules in order to perform our optimization.
At the end of the day, you need a very smart optimization search over the sets of known laws in order to achieve the kinds of automatic optimizations you're talking about.
So not only do we add a lot of complexity to the entire system, require a lot of base work to build-in some useful algebraic theories, and lose Turing completeness (which might not be the worst thing), we also only get a finicky promise that our rule would even fire unless we perform an exponentially painful search during compilation.
Blech.
The compromise that exists today tends to be that sometimes we have enough control over what's being written to be mostly certain that a certain obvious optimization can be performed. This is the regime of stream fusion and it requires a lot of hidden types, carefully written proofs, exploitations of parametricity, and hand-waving before it's something the community trusts enough to run on their code.
And it doesn't even always fire. For an example of battling that problem take a look at the source of Vector for all of the RULES pragmas that specify all of the common circumstances where Vector's stream-fusion optimizations should kick in.
All of this is not at all a critique of compiler optimizations or dependent type theories. Both are really incredible. Instead it's just an amplification of the tradeoffs involved in introducing such an optimization. It's not to be done lightly.

Fun fact: Given two arbitrary formulas, do they both give the same output for the same inputs? The answer to this trivial question is not computable! In other words, it is mathematically impossible to write a computer program that always gives the correct answer in finite time.
Given this fact, it's perhaps not surprising that nobody has a compiler that can magically transform every possible computation into its most efficient form.
Also, isn't this the programmer's job? If you want the sum of an arithmetic sequence commonly enough that it's a performance bottleneck, why not just write some more efficient code yourself? Similarly, if you really want Fibonacci numbers (why?), use the O(1) algorithm.

What requirements does non-strict semantics of Haskell have on the evaluation strategy?

The Haskell language specification states that it is a non-strict language, but nothing about the evaluation strategy (like when and how an expression is evaluated, and to what level). It does mention the word "evaluate" several times when talking about pattern matching.
I have read a wonderful tutorial about lazy evaluation and weak head normal form, but it is just an implementation strategy of some compiler, which I should not depend on when writing codes.
I come from a strict language background and I just don't feel right if I don't understand how my codes are executed. I wonder why the language specification does not define the evaluation strategy.
I hope someone can enlighten me. Thanks!

I would argue that trying to care about evaluation order is counter productive in Haskell. Not only is the language designed to make evaluation order irrelevant but the evaluation order can jump all over the place in strange and confusing ways. Additionally, the implementation has substantial freedom to execute things differently[1] or to vastly restructure your program[2] if the end result is still the same so different parts of the program might be evaluated in different ways.
The only real restriction is what evaluation strategies you can't use. For example, you can't always use strict evaluation because that would cause valid programs to crash or enter infinite loops.
const 17 undefined
take 10 (repeat 17)
That said, if you really care, one valid strategy you could possibly use to implement all of Haskell is lazy evaluation with thunks. Each value is represented in a box that either contains the value or a thunk subroutine that can be used to compute the value when you finally need to use it.
So when you write
let xs = 1 : []
You would be kind of doing this:
x --> {thunk}
If you never inspect the contents of x then the thunk stays unevaluated. However, if you ever do some pattern matching on the thunk then you need to evaluate the thunk to see what branch you take:
case xs of
[] -> ...
(y:ys) -> ...
Now, x's thunk is evaluated and the resulting value is stored in the box in case you ever need it. This is to avoid needing to recompute the thunk. Note that in the following diagram I'm using Cons to stand for the : list constructor
x ---> {Cons {thunk} {thunk}}
^ ^
| |
y ys
Of course, just the presence of the pattern math isn't enough you need first be forced to evaluate that pattern match in the first place. Ultimately, this boils down to your main function needing to print a value or something like that.
Another thing to point out is that we didn't immediately evaluate the contents of the Cons constructor when we first evaluate it. You can check this by running a program that doesn't use the contents of the list:
length [undefined, undefined]
Of course, when we actually use y for something then its corresponding thunk gets evaluated.
Additionally, you can use bang patterns to mark constructor fields as strict so they get immediately evaluated when the constructor gets evaluated (I don't remember if you need to turn an extension for this though)
data LazyBox = LazyBox Int
data StrictBox = StrictBox !Int
case (LazyBox undefined) of (LazyBox _) -> 17 -- Gives 17
case (StrictBox undefined) of (StrictBox _) -> 17 -- *** Exception: Prelude.undefined
[1]: One important optimization that GHC does is using strict evaluation in the sections that the strictness analyzers determines to be strict.
[2]: One of the most radical examples would be deforestation.

What does sharing refer to in the implementation of a functional programming language

Sharing means that temporary data is stored if it is going to be used multiple times. That is, a function evaluates it's arguments only once. An example would be:
let x = sin x in x * x
What other features contribute to sharing and how would they interact with the need for practical programs to perform IO?

Sharing is about a kind of equality: pointer equality. In Haskell's value land (semantic interpretation) there is no such thing as sharing. Values can only be equal if they have instances of Eq and then "equality" is defined to be the binary relation (==).
Sharing thus escapes the semantic interpretation by referring to this underlying pointer equality which exists by virtue of implementation instead of semantics. This is a useful side effect sometimes, though. Unfortunately, since Haskell is defined by its semantic interpretation, use of sharing is undefined behavior. It's tied to a particular implementation. A few libraries use sharing and thus have behavior tied to GHC.
There is one built-in sharing mechanism, though. This is exposed by the StableName interface. We generate StableName a objects using makeStableName :: a -> IO (StableName a) and have an instance Eq (StableName a)—thus StableName introduces some kind of equality for any type.
StableName equality is almost pointer equality. To quote the Haddock documentation
If sn1 :: StableName and sn2 :: StableName and sn1 == sn2 then sn1 and sn2 were created by calls to makeStableName on the same object.
Note that this is just an if statement, not an if and only if. The fact that two things can be "pointer equivalent" but still have different stable names sometimes is (a) forced by the flexibility Haskell leaves to the GC and (b) a loophole that allows StableNames to exist in any Haskell implementation even if there's no such thing as "pointer equality" in the implementation whatsoever.
These StableNames still have no semantic meaning, but since they're introduced in the IO monad that's "OK". Instead, they might be said to implement an (ironically) unstable subset of the smallest (most specific) equality relation possible on any type.

The clearest example of sharing in functional programming comes from Clean, which is based on graph rewriting. There, a computation refers to a DAG, so we can view the expression (sin x) * (sin x) as
(*)
/ \
sin x sin x
Graph rewriting systems have an explicit notion of sharing, so we could express that computation as
(*)
/ \
\ /
sin x
pointing the multiplication node to the same node, thereby sharing the computation of sin x. Term rewriting systems do not have so explicit a notion of sharing, but the optimization is still relevant. In GHC, one can sometimes express sharing with local variables, e.g. rewriting
f x = (sin x) * (sin x)
into
f x = sinx * sinx
where sinx = sin x
But since the two are semantically equivalent, the compiler is free to implement both the same way, with or without sharing. By my understanding, the GHC will generally preserve sharing introduced with local variables and sometimes introduce it (adding sharing to the first), but with no formal expression of sharing in term rewriting systems either behaviour is implementation dependent (see tel's comment and answer).
Sharing touches IO because side-effecting values cannot be shared. If we consider an impure language, there is a difference between
(string-append (read-line)
(read-line))
and
(let ((s (read-line)))
(string-append s s))
The first executes the IO action twice, requesting two lines from the user and appending them; the second "shares" the IO action, executing it once and appending it to itself. In general, sharing a pure computation reduces execution time without changing the result of the program, while sharing a side-effecting value (one that may change over time, or interacts with the user) changes the result. For the compiler to automatically share computations, it needs to know that they are pure, and thus that reducing the number of evaluations does not matter. Clean does this with uniqueness types; an IO action has the type attribute UNQ, which tells the compiler that it should not be shared. Haskell does the same somewhat differently with the IO monad.

Your example is not an example of sharing -- it just multiplies a value with itself (and then throws the original value away).
Sharing is the case where some part of a data structure occurs twice in a larger data structure, or in different data structures. For example:
p = (1, 2)
pp = (p, p) -- p is shared between the first and second component of pp
xs = [1, 2, 3]
ys = 0::1::xs
zs = 5::xs -- ys and zs share the same tail
In memory, such sharing will result in a DAG structure.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string