Why haskell seq function makes first argument to WHNF?

Why haskell seq function makes first argument to WHNF? - haskell

I thinks that seq is hard to understand because it makes first argument to WHNF.
If seq makes first argument to normal form, it will be easy to understand.
What is the benefit of using WHNF over NF?
Why haskell committee choose using WHNF in seq function?

First, Haskell's API has no preference between WHNF and NF : it provides both seq and deepseq.
So maybe your question is why does WHNF exist at all ? Simply put, it is the least evaluation possible. A full calculation would go like this :
unevaluated (thunk)
WHNF : outmost constructor known
deeper evaluations
NF : the value is completely calculated
However, Haskell is lazy, it tries to do as little calculation as possible to get its final result. For example, to compute the length of a list l, Haskell must know whether l is [] or _ : t. Then recursively the same question about t. Therefore it only needs the number of constructors : before the final constructor []. This is by definition successive WHNFs, which only extract the outmost constructor of a value.
The outmost constructor is the minimal information you must know about a value to actually do something with it, such as selecting a branch in a case of, like in length just above. That is the interest of WHNF over NF.
Note that for simple types like Int, each value is its own constructor (2, -7, 15, ...), so WHNF = NF for them.
Finally, you rarely care about all this : it is the compiler's job to execute your program as fast as possible. In the evaluation of foldl (+) 0 [1..100], if you try to figure out when the list [1..100] finishes in NF you will be completely wrong. The compiler transforms that fold into this tail-recursive loop
let sum ret i =
if i == 100 then 100 + ret
else sum (ret+i) (i+1) in
sum 0 1
which means it doesn't evaluate the list at all. Unfortunately there are situations where the compiler cannot know that a value will always be in NF when the final result of the program is produced. Then it is a waste of time and RAM to keep it unevaluated (forming a thunk has a cost) : better deepseq it right from the start. Or your compiler may have performance bugs and you must help it with seq or deepseq. A profiler will tell you what parts of your code are slow.

Related

Does Haskell discards intermediary results during lazy evaluation?

If I define the Fibonacci sequence recursively:
fibo_lazy_list = 0 : 1 : zipWith (+) fibo_lazy_list (tail fibo_lazy_list)
Then ask for the first element above a given value, say:
print $ find (>100) fibo_lazy_list
I understand that Haskell evaluates only the elements which are required to get the printed results. But are they all kept in memory until the print ? Since only two elements of the list are required to compute the last one, does Haskell release the left-most elements or does the list keep growing in memory ?

It depends.
This is actually one of the most tricky things to get right for real-world Haskell code: to avoid memory leaks caused by holding on to unnecessary data, that was only supposed to be intermediary but turns out to be actually a dependency to some yet-unevaluated lazy thunk, and therefore can't be garbage-collected.
In your example, the leading elements of fibo_lazy_list (BTW, please use camelCase, not underscore_case in Haskell) will not be garbage-collected as long as fibo_lazy_list is refered by something that could still be evaluated. But as soon as it goes out of scope, that isn't possible. So if you write it like this
print $ let fibo_lazy_list = 0 : 1 : zipWith (+) fibo_lazy_list (tail fibo_lazy_list)
in find (>100) fibo_lazy_list
then you can be pretty confident that the unused elements will be garbage collected, possibly before the one to be printed is even found.
If however fibo_lazy_list is defined at the top-level, and is a CAF (as it will be if the type is not polymorphic)
fiboLazyList :: [Integer]
fiboLazyList = 0 : 1 : zipWith (+) fiboLazyList (tail fiboLazyList)
main :: IO ()
main = do
...
print $ find (>100) fiboLazyList
...
then you should better expect all the leading elements to stay in memory even after the >100 one has been extracted.
Compiler optimisation may come in helpful here, so can strictness annotations. But as I said, this is a bit of a pain in Haskell.

GHC: Are there consistent rules for memoization for calls with fixed values?

In my quest to understand and harness GHC automatic memoization, I've hit a wall: when pure functions are called with fixed values like fib 42, they are sometimes fast and sometimes slow when called again. It varies if they're called plainly like fib 42 or implicitly through some math, e.g. (\x -> fib (x - 1)) 43. The cases have no seeming rhyme or reason, so I'll present them with the intention of asking what the logic is behind the behavior.
Consider a slow Fibonacci implementation, which makes it obvious when the memoization is working:
slow_fib :: Int -> Integer
slow_fib n = if n < 2 then 1 else (slow_fib (n - 1)) + (slow_fib (n - 2))
I tested three basic questions to see if GHC (version 8.2.2) will memoize calls with fixed args:
Can slow_fib access previous top-level calls to slow_fib?
Are previous results memoized for later non-trivial (e.g. math) top-level expressions?
Are previous results memoized for later identical top-level expressions?
The answers seem to be:
No
No
Yes [??]
The fact that the last case works is very confusing to me: if I can reprint the result for example, then I should expect to be able to add them. Here's the code that shows this:
main = do
-- 1. all three of these are slow, even though `slow_fib 37` is
-- just the sum of the other two results. Definitely no memoization.
putStrLn $ show $ slow_fib 35
putStrLn $ show $ slow_fib 36
putStrLn $ show $ slow_fib 37
-- 2. also slow, definitely no memoization as well.
putStrLn $ show $ (slow_fib 35) + (slow_fib 36) + (slow_fib 37)
putStrLn $ show $ (slow_fib 35) + 1
-- 3. all three of these are instant. Huh?
putStrLn $ show $ slow_fib 35
putStrLn $ show $ slow_fib 36
putStrLn $ show $ slow_fib 37
Yet stranger, doing math on the results worked when it's embedded in a recursive function: this fibonacci variant that starts at Fib(40):
let fib_plus_40 n = if n <= 0
then slow_fib 40
else (fib_plus_40 (n - 1)) + (fib_plus_40 (n - 2))
Shown by the following:
main = do
-- slow as expected
putStrLn $ show $ fib_plus_40 0
-- instant. Why?!
putStrLn $ show $ fib_plus_40 1
I can't find any reasoning for this in any explanations for GHC memoization, which typically incriminate explicit variables (e.g. here, here, and here). This is why I expected fib_plus_40 to fail to memoize.

To elaborate in case it wasn't clear from #amalloy's answer, the problem is that you're conflating two things here -- the implicit memoization-like-behavior (what people mean when they talk about Haskell's "automatic memoization", though it is not true memoization!) that results directly from thunk-based lazy evaluation, and a compiler optimization technique that's basically a form of common subexpression elimination. The former is predictable, more or less; the latter is at the whim of the compiler.
Recall that real memoization is a property of the implementation of a function: the function "remembers" results calculated for certain combinations of arguments, and may reuse those results instead of recalculating them from scratch when called multiple times with the same arguments. When GHC generates code for functions, it does not automatically generate code to perform this kind of memoization.
Instead, the GHC code generates to implement function application is unusual. Instead of actually applying the function to arguments to generate the final result as a value, a "result" is immediately constructed in the form of a thunk, which you can view as a suspended function call or a "promise" to deliver a value at a later time.
When, at some future point, the actual value is needed, the thunk is forced (which actually causes the original function call to take place), and the thunk is updated with the value. If that same value is needed again later, the value is already available, so the thunk doesn't need to be forced a second time. This is the "automatic memoization". Note that it takes place at the "result" level rather than the "function" level -- the result of a function application remembers its value; a function does not remember the results it previously produced.
Now, normally the concept of the result of a function application remembering its value would be ridiculous. In strict languages, we don't worry that after x = sqrt(10), reusing x will cause multiple sqrt calls because x hasn't "memoized" its own value. That is, in strict languages, all function application results are "automatically memoized" in the same sense they are in Haskell.
The difference is lazy evaluation, which allows us to write something like:
stuff = map expensiveComputation [1..10000]
which returns a thunk immediately without performing any expensive computations. Afterwards:
f n = stuff !! n
magically creates a memoized function, not because GHC generates code in the implementation of f to somehow memoize the call f 1000, but because f 1000 forces (a bunch of list constructor thunks and then) a single expensiveComputation whose return value is "memoized" as the value at index 1000 in the list stuff -- it was a thunk, but after being forced, it remembers its own value, just like any value in a strict language would.
So, given your definition of slow_fib, none of your examples are actually making use of Haskell's automatic memoization, in the usual sense people mean. Any speedups you're seeing are the result of various compiler optimizations that are (or aren't) recognizing common subexpressions or inlining / unwrapping short loops.
To write a memoized fib, you need to do it as explicitly as you would in a strict language, by creating a data structure to hold the memoized values, though lazy evaluation and mutually recursive definitions can sometimes make it seem like it's "automatic":
import qualified Data.Vector as V
import Data.Vector (Vector,(!))
fibv :: Vector Integer
fibv = V.generate 1000000 getfib
where getfib 0 = 1
getfib 1 = 1
getfib i = fibv ! (i-1) + fibv ! (i-2)
fib :: Int -> Integer
fib n = fibv ! n

All of the examples you link at the end exploit the same technique: instead of implementing function f directly, they first introduce a list whose contents are all the calls to f that could ever be made. That list is computed only once, lazily; and then a simple lookup in that list is used as the implementation of the user-facing function. So, they are not relying on any caching from GHC.
Your question is different: you hope that calling some function will be automatically cached for you, and in general that does not happen. The real question is why any of your results are fast. I'm not sure, but I think it is to do with Constant Applicative Forms (CAFs), which GHC may share between multiple use sites, at its discretion.
The most relevant feature of a CAF here is the "Constant" part: GHC will only introduce such a cache for an expression whose value is constant throughout the entire run of the program, not just for some particular scope. So, you can be sure that f x <> f x will never reuse the result of f x (at least not due to CAF folding; maybe GHC can find some other excuse to memoize this for some functions, but typically it does not).
The two things in your program that are not CAFs are the implementation of slow_fib, and the recursive case of fib_plus_40. GHC definitely cannot introduce any caching of the results of those expressions. The base case for fib_plus_40 is a CAF, as are all of the expressions and subexpressions in main. So, GHC can choose to cache/share any of those subexpressions, and not share any of them, as it pleases. Perhaps it sees that slow_fib 40 is "obviously" simple enough to save, but it's not so sure about whether the slow_fib 35 expressions in main should be shared. Meanwhile, it sounds like it does decide to share the IO action putStrLn $ show $ slow_fib 35 for whatever reason. Seems like a weird choice to you and me, but we're not compilers.
The moral here is that you cannot count on this at all: if you want to ensure you compute a value only once, you need to save it in a variable somewhere, and refer to that variable instead of recomputing it.
To confirm this, I took luqui's advice and looked at the -ddump-simpl output. Here are some snippets showing the explicit caching:
-- RHS size: {terms: 2, types: 0, coercions: 0}
lvl1_r4ER :: Integer
[GblId, Str=DmdType]
lvl1_r4ER = $wslow_fib_r4EP 40#
Rec {
-- RHS size: {terms: 21, types: 4, coercions: 0}
Main.main_fib_plus_40 [Occ=LoopBreaker] :: Integer -> Integer
[GblId, Arity=1, Str=DmdType <S,U>]
Main.main_fib_plus_40 =
\ (n_a1DF :: Integer) ->
case integer-gmp-1.0.0.1:GHC.Integer.Type.leInteger#
n_a1DF Main.main7
of wild_a2aQ { __DEFAULT ->
case GHC.Prim.tagToEnum# # Bool wild_a2aQ of _ [Occ=Dead] {
False ->
integer-gmp-1.0.0.1:GHC.Integer.Type.plusInteger
(Main.main_fib_plus_40
(integer-gmp-1.0.0.1:GHC.Integer.Type.minusInteger
n_a1DF Main.main4))
(Main.main_fib_plus_40
(integer-gmp-1.0.0.1:GHC.Integer.Type.minusInteger
n_a1DF lvl_r4EQ));
True -> lvl1_r4ER
}
}
end Rec }
This doesn't tell us why GHC is choosing to introduce this cache - remember, it's allowed to do what it wants. But it does confirm the mechanism, that it introduces a variable to hold the repeated calculation. I can't show you core for your longer main involving smaller numbers, because when I compile it I get more sharing: the expressions in section 2 are cached for me as well.

Does a function in Haskell always evaluate its return value?

I'm trying to better understand Haskell's laziness, such as when it evaluates an argument to a function.
From this source:
But when a call to const is evaluated (that’s the situation we are interested in, here, after all), its return value is evaluated too ... This is a good general principle: a function obviously is strict in its return value, because when a function application needs to be evaluated, it needs to evaluate, in the body of the function, what gets returned. Starting from there, you can know what must be evaluated by looking at what the return value depends on invariably. Your function will be strict in these arguments, and lazy in the others.
So a function in Haskell always evaluates its own return value? If I have:
foo :: Num a => [a] -> [a]
foo [] = []
foo (_:xs) = map (* 2) xs
head (foo [1..]) -- = 4
According to the above paragraph, map (* 2) xs, must be evaluated. Intuitively, I would think that means applying the map to the entire list- resulting in an infinite loop.
But, I can successfully take the head of the result. I know that : is lazy in Haskell, so does this mean that evaluating map (* 2) xs just means constructing something else that isn't fully evaluated yet?
What does it mean to evaluate a function applied to an infinite list? If the return value of a function is always evaluated when the function is evaluated, can a function ever actually return a thunk?
Edit:
bar x y = x
var = bar (product [1..]) 1
This code doesn't hang. When I create var, does it not evaluate its body? Or does it set bar to product [1..] and not evaluate that? If the latter, bar is not returning its body in WHNF, right, so did it really 'evaluate' x? How could bar be strict in x if it doesn't hang on computing product [1..]?

First of all, Haskell does not specify when evaluation happens so the question can only be given a definite answer for specific implementations.
The following is true for all non-parallel implementations that I know of, like ghc, hbc, nhc, hugs, etc (all G-machine based, btw).
BTW, something to remember is that when you hear "evaluate" for Haskell it normally means "evaluate to WHNF".
Unlike strict languages you have to distinguish between two "callers" of a function, the first is where the call occurs lexically, and the second is where the value is demanded. For a strict language these two always coincide, but not for a lazy language.
Let's take your example and complicate it a little:
foo [] = []
foo (_:xs) = map (* 2) xs
bar x = (foo [1..], x)
main = print (head (fst (bar 42)))
The foo function occurs in bar. Evaluating bar will return a pair, and the first component of the pair is a thunk corresponding to foo [1..]. So bar is what would be the caller in a strict language, but in the case of a lazy language it doesn't call foo at all, instead it just builds the closure.
Now, in the main function we actually need the value of head (fst (bar 42)) since we have to print it. So the head function will actually be called. The head function is defined by pattern matching, so it needs the value of the argument. So fst is called. It too is defined by pattern matching and needs its argument so bar is called, and bar will return a pair, and fst will evaluate and return its first component. And now finally foo is "called"; and by called I mean that the thunk is evaluated (entered as it's sometimes called in TIM terminology), because the value is needed. The only reason the actual code for foo is called is that we want a value. So foo had better return a value (i.e., a WHNF). The foo function will evaluate its argument and end up in the second branch. Here it will tail call into the code for map. The map function is defined by pattern match and it will evaluate its argument, which is a cons. So map will return the following {(*2) y} : {map (*2) ys}, where I have used {} to indicate a closure being built. So as you can see map just returns a cons cell with the head being a closure and the tail being a closure.
To understand the operational semantics of Haskell better I suggest you look at some paper describing how to translate Haskell to some abstract machine, like the G-machine.

I always found that the term "evaluate," which I had learned in other contexts (e.g., Scheme programming), always got me all confused when I tried to apply it to Haskell, and that I made a breakthrough when I started to think of Haskell in terms of forcing expressions instead of "evaluating" them. Some key differences:
"Evaluation," as I learned the term before, strongly connotes mapping expressions to values that are themselves not expressions. (One common technical term here is "denotations.")
In Haskell, the process of forcing is IMHO most easily understood as expression rewriting. You start with an expression, and you repeatedly rewrite it according to certain rules until you get an equivalent expression that satisfies a certain property.
In Haskell the "certain property" has the unfriendly name weak head normal form ("WHNF"), which really just means that the expression is either a nullary data constructor or an application of a data constructor.
Let's translate that to a very rough set of informal rules. To force an expression expr:
If expr is a nullary constructor or a constructor application, the result of forcing it is expr itself. (It's already in WHNF.)
If expr is a function application f arg, then the result of forcing it is obtained this way:
Find the definition of f.
Can you pattern match this definition against the expression arg? If not, then force arg and try again with the result of that.
Substitute the pattern match variables in the body of f with the parts of (the possibly rewritten) arg that correspond to them, and force the resulting expression.
One way of thinking of this is that when you force an expression, you're trying to rewrite it minimally to reduce it to an equivalent expression in WHNF.
Let's apply this to your example:
foo :: Num a => [a] -> [a]
foo [] = []
foo (_:xs) = map (* 2) xs
-- We want to force this expression:
head (foo [1..])
We will need definitions for head and `map:
head [] = undefined
head (x:_) = x
map _ [] = []
map f (x:xs) = f x : map f x
-- Not real code, but a rule we'll be using for forcing infinite ranges.
[n..] ==> n : [(n+1)..]
So now:
head (foo [1..]) ==> head (map (*2) [1..]) -- using the definition of foo
==> head (map (*2) (1 : [2..])) -- using the forcing rule for [n..]
==> head (1*2 : map (*2) [2..]) -- using the definition of map
==> 1*2 -- using the definition of head
==> 2 -- using the definition of *

I believe the idea must be that in a lazy language if you're evaluating a function application, it must be because you need the result of the application for something. So whatever reason caused the function application to be reduced in the first place is going to continue to need to reduce the returned result. If we didn't need the function's result we wouldn't be evaluating the call in the first place, the whole application would be left as a thunk.
A key point is that the standard "lazy evaluation" order is demand-driven. You only evaluate what you need. Evaluating more risks violating the language spec's definition of "non-strict semantics" and looping or failing for some programs that should be able to terminate; lazy evaluation has the interesting property that if any evaluation order can cause a particular program to terminate, so can lazy evaluation.1
But if we only evaluate what we need, what does "need" mean? Generally it means either
a pattern match needs to know what constructor a particular value is (e.g. I can't know what branch to take in your definition of foo without knowing whether the argument is [] or _:xs)
a primitive operation needs to know the entire value (e.g. the arithmetic circuits in the CPU can't add or compare thunks; I need to fully evaluate two Int values to call such operations)
the outer driver that executes the main IO action needs to know what the next thing to execute is
So say we've got this program:
foo :: Num a => [a] -> [a]
foo [] = []
foo (_:xs) = map (* 2) xs
main :: IO ()
main = print (head (foo [1..]))
To execute main, the IO driver has to evaluate the thunk print (head (foo [1..])) to work out that it's print applied to the thunk head (foo [1..]). print needs to evaluate its argument on order to print it, so now we need to evaluate that thunk.
head starts by pattern matching its argument, so now we need to evaluate foo [1..], but only to WHNF - just enough to tell whether the outermost list constructor is [] or :.
foo starts by pattern matching on its argument. So we need to evaluate [1..], also only to WHNF. That's basically 1 : [2..], which is enough to see which branch to take in foo.2
The : case of foo (with xs bound to the thunk [2..]) evaluates to the thunk map (*2) [2..].
So foo is evaluated, and didn't evaluate its body. However, we only did that because head was pattern matching to see if we had [] or x : _. We still don't know that, so we must immediately continue to evaluate the result of foo.
This is what the article means when it says functions are strict in their result. Given that a call to foo is evaluated at all, its result will also be evaluated (and so, anything needed to evaluate the result will also be evaluated).
But how far it needs to be evaluated depends on the calling context. head is only pattern matching on the result of foo, so it only needs a result to WHNF. We can get an infinite list to WHNF (we already did so, with 1 : [2..]), so we don't necessarily get in an infinite loop when evaluating a call to foo. But if head were some sort of primitive operation implemented outside of Haskell that needed to be passed a completely evaluated list, then we'd be evaluating foo [1..] completely, and thus would never finish in order to come back to head.
So, just to complete my example, we're evaluating map (2 *) [2..].
map pattern matches its second argument, so we need to evaluate [2..] as far as 2 : [3..]. That's enough for map to return the thunk (2 *) 2 : map (2 *) [3..], which is in WHNF. And so it's done, we can finally return to head.
head ((2 *) 2 : map (2 *) [3..]) doesn't need to inspect either side of the :, it just needs to know that there is one so it can return the left side. So it just returns the unevaluated thunk (2 *) 2.
Again though, we only evaluated the call to head this far because print needed to know what its result is, so although head doesn't evaluate its result, its result is always evaluated whenever the call to head is.
(2 *) 2 evaluates to 4, print converts that into the string "4" (via show), and the line gets printed to the output. That was the entire main IO action, so the program is done.
1 Implementations of Haskell, such as GHC, do not always use "standard lazy evaluation", and the language spec does not require it. If the compiler can prove that something will always be needed, or cannot loop/error, then it's safe to evaluate it even when lazy evaluation wouldn't (yet) do so. This can often be faster so GHC optimizations do actually do this.
2 I'm skipping over a few details here, like that print does have some non-primitive implementation we could step inside and lazily evaluate, and that [1..] could be further expanded to the functions that actually implement that syntax.

Not necessarily. Haskell is lazy, meaning that it only evaluates when it needs to. This has some interesting effects. If we take the below code, for example:
-- File: lazinessTest.hs
(>?) :: a -> b -> b
a >? b = b
main = (putStrLn "Something") >? (putStrLn "Something else")
This is the output of the program:
$ ./lazinessTest
Something else
This indicates that putStrLn "Something" is never evaluated. But it's still being passed to the function, in the form of a 'thunk'. These 'thunks' are unevaluated values that, rather than being concrete values, are like a breadcrumb-trail of how to compute the value. This is how Haskell laziness works.
In our case, two 'thunks' are passed to >?, but only one is passed out, meaning that only one is evaluated in the end. This also applies in const, where the second argument can be safely ignored, and therefore is never computed. As for map, GHC is smart enough to realise that we don't care about the end of the array, and only bothers to compute what it needs to, in your case the second element of the original list.
However, it's best to leave the thinking about laziness to the compiler and keep coding, unless you're dealing with IO, in which case you really, really should think about laziness, because you can easily go wrong, as I've just demonstrated.
There are lots and lots of online articles on the Haskell wiki to look at, if you want more detail.

Function could evaluate either return type:
head (x:_) = x
or exception/error:
head _ = error "Head: List is empty!"
or bottom (⊥)
a = a
b = last [1 ..]

How often is an expanding list evaluated

Is fib evaluated from start for each element of cumfib?
fib = (1:1: zipWith (+) fib (tail fib))
cumfib = [ sum $ take i fib | i<-[1..]]
Or are the first i elements cached and reused for element (i+1) of cumsum?
I am more or less guessing that fib is used in the same lambda expression and hence is is calculated only once.
Furthermore, does the implementation of fib matter regarding how often the i-th Fibonacci number is evaluated? My actual problem concerns prime numbers instead of Fibonacci numbers, which I wish to 'cache' to easily evaluate the prime factors of some number n. However, I only use
takeWhile (\x-> x*x<n) primes
of the primes. Since I evaluate the factors for small n first and later for bigger n, this subset of primes increases, and hence I wonder, how often is primes evaluated if I do:
primes = ... some way of calculating primes ...
helpHandlePrimes ... = ... using primes ...
handlePrimes = ... using primes and helpHandlePrimes ...
Please let me know whether primes evaluates once, multiple times, or whether this cannot be determined from how I formulated the question.

A let-bound term is usually shared within its scope. In particular, a top-level term in a module is shared in the entire program. However, you have to be careful about the type of the term. If the term is a function, then sharing means that just the lambda abstraction is shared, so the function isn't memoized. An overloaded term is internally represented as a function, and therefore sharing is rather meaningless for an overloaded term as well.
So if you have a monomorphic list of numbers, then it's going to be shared. By default, a list such as fib as you've given will be monomorphic, because of the "monomorphism restriction" (actually here's a case where it's useful). However, these days it's in fashion to disable the monomorphism restriction, so in any case I recommend giving an explicit type signature such as
fib :: [Integer]
to be sure and make it clear to everyone that you're expecting this to be a monomorphic list.

I'd like to add that this way, cumfib needlessly re-computes the sum of first i elements of fib. It can be more efficiently defined as
cumfib = tail $ scanl (+) 0 fib

Does this Haskell example effectively demonstrate laziness?

I'm new to Haskell, and I'm writing a paper on it for my Programming Languages class. I want to to demonstrate Haskell's laziness with some sample code, but I'm not sure if what I'm seeing is actually laziness.
doubleMe xs = [x*2 | x <- xs]
In ghci:
let xs = [1..10]
import Debug.Trace
trace (show lst) doubleMe (trace (show lst) doubleMe (trace (show lst) doubleMe(lst)))
Output:
[1,2,3,4,5,6,7,8,9,10]
[1,2,3,4,5,6,7,8,9,10]
[1,2,3,4,5,6,7,8,9,10]
[8,16,24,32,40,48,56,64,72,80]
Thanks for your time and help!

Your usage of trace here isn't particularly insightful, or in fact at all. All you do is print out the same list at four different points in the evaluation, that doesn't tell you anything about the actual state of the program. What actually happens here is that trace is forced in every doubling step before calculation even starts (when the result list is requested to weak head normal form). Which is pretty much the same as you would get in a language with fully strict evaluation.
To see some lazyness, you could do something like
Prelude Debug.Trace> let doubleLsTracing xs = [trace("{Now doubling "++show x++"}")$ x*2 | x<-xs]
Prelude Debug.Trace> take 5 $ doubleLsTracing [1 .. 10]
{Now doubling 1}
{Now doubling 2}
{Now doubling 3}
{Now doubling 4}
{Now doubling 5}
[2,4,6,8,10]
where you can see that only five numbers are doubled, because only five results were requested; even though the list that doubleLsTracing was given has 10 entries.
Note that trace is generally not a great tool to monitor "flow of execution", it's merely a hack to allow "looking into" local variables to see what's going on in some function.

Infinite streams are always a good example. You cannot obtain them in other languages without special constructs - but in Haskell they are very natural.
One example is the fibonacci stream:
fib = 0 : 1 : zipWith (+) fib (tail fib)
take 10 fib => [0,1,1,2,3,5,8,13,21,34]
Another one is obtaining the stream of prime numbers by using the trial division method:
primes = sieve [2..]
where sieve (x:xs) = x : filter (not . (== 0) . (`mod` x)) (sieve xs)
take 10 primes => [2,3,5,7,11,13,17,19,23,29]
Also, implementing backtracking in Haskell is very simple, giving you the ability to obtain the list of solutions lazily, on demand:
http://rosettacode.org/wiki/N-queens_problem#Haskell
A more complex example, showing how you can implement min is the one found here:
Lazy Evaluation and Time Complexity
It basically shows how using Haskell's laziness you can obtain a very elegant definition of the minimum function (that finds the minimum in a list of elements):
minimum = head . sort
You could demonstrate Haskell's laziness by contrived examples. But I think it is far better to show how laziness helps you develop solutions to common problems that exhibit far greater modularity than in other languages.

The short answer is, "no". leftaroundabout explains that pretty well in his answer.
My suggestion is to:
Read and understand the definition of lazy evaluation.
Write a function where one of the arguments can diverge, an example that cannot work in your favorite strict (non-lazy) language (C, python, Java). For example, sumIfFirstArgIsNonZero(x, y), which returns x+y if x != 0 and 0 otherwise.
For bonus points, define your own function ifThenElse that doesn't use Haskell's built-in if-then-else syntax, and explain why writing new control-flow structures in lazy languages is easy.
That should be easier than trying to wrap your head around infinite data streams or tying-the-knot tricks.

The main point of laziness is that values that aren't needed aren't computed -- so to demonstrate this, you'll have to show things not being evaluated. Your example isn't the best to demonstrate laziness since all values are eventually computed.
Here's a small example, as a starting point:
someValueThatNeverTerminates = undefined -- for example, a divide-by-zero error
main = do (greeting, _) = ("Hello terminating world!", someValueThatNeverTerminates)
putStrLn greeting
This says hello immediately -- if it weren't lazy, the whole thing would break halfway through.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string