Haskell tester for evaluation / apply steps taken by the parser? [duplicate] - haskell

Prelude> let a = 3
Prelude> :sprint a
a = _
Prelude> let c = "ab"
Prelude> :sprint c
c = _
Why does it always print a _? I don't quite get the semantics of the :sprint command.

Haskell is a lazy language. It doesn't evaluate results until they are "needed".
Now, just printing a value causes all of it to be "needed". In other words, if you type an expression in GHCi, it will try to print out the result, which causes it all to be evaluated. Usually that's what you want.
The sprint command (which is a GHCi feature, not part of the Haskell language) allows you to see how much of a value has been evaluated at this point.
For example:
Prelude> let xs = [1..]
Prelude> :sprint xs
xs = _
So, we just declared xs, and it's currently unevaluated. Now let's print out the first element:
Prelude> head xs
1
Prelude> :sprint xs
xs = 1 : _
Now GHCi has evaluated the head of the list, but nothing more.
Prelude> take 10 xs
[1,2,3,4,5,6,7,8,9,10]
Prelude> :sprint xs
xs = 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10 : _
Now the first 10 elements are evaluated, but more remain. (Since xs is an infinite list, that's not surprising.)
You can construct other expressions and evaluate them a bit at a time to see what's going on. This is really part of the GHCi debugger, which lets you step through your code one bit at a time. Especially if your code is getting caught in an infinite loop, you don't want to print anything, because that might lock up GHCi. But you still want to see what's going on... hence sprint, which lets you see what's evaluated so far.

I'm a bit late, but I had a similar issue:
λ: let xs = [1,2,3]
xs :: Num t => [t]
λ: :sprint xs
xs = _
λ: print xs
λ: :sprint xs
xs = _
This issue is specific to polymorphic values. If you have -XNoMonomorphismRestriction enabled ghci will never really evaluate/force xs, it'll only evaluate/force specializations:
λ: :set -XMonomorphismRestriction
λ: let xs = [1,2,3]
xs :: [Integer]
λ: print xs
λ: :sprint xs
xs = [1,2,3]

Haskell is lazy. It doesn't evaluate things until they are needed.
The GHCi sprint command (not part of Haskell, just a debugging command of the interpreter) prints the value of an expression without forcing evaluation.
When you write
let a = 3
you bind a new name a to the right-hand side expression, but Haskell won't evaluate that thing yet. Therefore, when you sprint it, it prints _ as the value to indicate that the expression has not yet been evaluated.
Try this:
let a = 3
:sprint a -- a has not been evaluated yet
print a -- forces evaluation of a
:sprint a -- now a has been evaluated

Related

Haskell last string

I'm trying to make a function that takes the last character from a string and add it to be the first character. In string I can do this (xs:x) and then x is the last character?
xs is just a naming convention for lists in Haskell (which you should use!). (x:xs) is a pattern matching using the (:) function, it is up to you how you name it e.g. (this:makesnosense) is also valid.
Also remember that a String is just another list, so your question is equal to: "How can I make the last element of a list the first one."
This would be one way to solve it:
lastToFirst :: [a] -> [a]
lastToFirst [] = []
lastToFirst [x] = [x]
lastToFirst xs = last xs : init xs
I'm trying to make a function that takes away the last character from a string and add it to be the first character.
In Haskell, list operator ':' is asymmetric. If the left operand is of type α, the right operand must be of type [α]. Hence, a pattern such as xs:x is just using misleading variable names. The operator is right-associative, so that x0:x1:xs means x0:(x1:xs).
Unlike Python lists, which are basically arrays, Haskell lists are just forward-chained linked lists. Classic imperative languages often maintain both a pointer to the head of a linked list and to its tail, but the main point of the tail pointer is to be able to append new elements at the tail of the list.
As Haskell lists are immutable, the tail pointer would be mostly useless, and so Haskell only maintains a pointer to the head of a list.
This means there is no cheap way to access the last element. The only way is to traverse the whole list, starting from the head. Furthermore, immutability implies that the only way to generate the [1,2,3] list from the [1,2,3,4] list is by duplicating the first 3 elements, which again require a full traversal.
So an expression such as last xs : init xs, if compiled naïvely, implies 2 costly traversals of the input list.
The best one can hope is to leverage the duplication work to grab the last element at no extra cost, thus solving the problem in a single traversal. This can be done, for example, by recursion:
makeLastFirst :: [a] -> [a]
makeLastFirst [] = [] -- empty input list
makeLastFirst [end] = [end] -- just the last element
makeLastFirst (x0:(x1:xs)) = let (end:ys) = makeLastFirst (x1:xs)
in end : (x0:ys)
where the recursive clause takes care of keeping the input tail element at the head of the output list.
Watching the gears turn:
One can visualize the recursive process by importing package Debug.Trace and using its trace function. Expression trace msg value evaluates to just value, but has the side effect of printing the msg string. Yes, side effects are normally forbidden in Haskell, but function trace has special privileges.
So we can write a more talkative version of our function:
import Debug.Trace
traceMakeLastFirst :: Show a => [a] -> [a]
traceMakeLastFirst [] = [] -- empty input list
traceMakeLastFirst [end] = [end] -- just the last element
traceMakeLastFirst (x0:(x1:xs)) = let (end:ys) = traceMakeLastFirst (x1:xs)
result = end : (x0:ys)
in trace (show result) result
Testing under the ghci interpreter:
$ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/ :? for help
λ>
λ> :load q66927560.hs
...
Ok, one module loaded.
λ>
λ> traceMakeLastFirst ""
""
λ>
λ> traceMakeLastFirst "a"
"a"
λ>
λ> makeLastFirst "Mercury"
"yMercur"
λ>
λ> traceMakeLastFirst "Mercury"
""yr"
"yur"
"ycur"
"yrcur"
"yercur"
"yMercur"
yMercur"
λ>
-- makeLastFirst "abcd" == "dabc"
-- makeLastFirst "hello" == "ohell"
-- makeLastFirst "orange" == "eorang"
makeLastFirst :: [a] -> [a]
makeLastFirst lst = [ head (reverse lst) ] ++ (init lst)

Why does function concat use foldr? Why not foldl'

In most resources it is recommended to use foldl', but that cause of using foldr in concat instead of foldl'?
EDIT I talk about laziness and productivity in this answer, and in my excitement I forgot a very important point that jpmariner focuses on in their answer: left-associating (++) is quadratic time!
foldl' is appropriate when your accumulator is a strict type, like most small types such as Int, or even large spine-strict data structures like Data.Map. If the accumulator is strict, then the entire list must be consumed before any output can be given. foldl' uses tail recursion to avoid blowing up the stack in these cases, but foldr doesn't and will perform badly. On the other hand, foldl' must consume the entire list in this way.
foldl f z [] = z
foldl f z [1] = f z 1
foldl f z [1,2] = f (f z 1) 2
foldl f z [1,2,3] = f (f (f z 1) 2) 3
The final element of the list is required to evaluate the outermost application, so there is no way to partially consume the list. If we expand this with (++), we will see:
foldl (++) [] [[1,2],[3,4],[5,6]]
= (([] ++ [1,2]) ++ [3,4]) ++ [5,6]
^^
= ([1,2] ++ [3,4]) ++ [5,6]
= ((1 : [2]) ++ [3,4]) ++ [5,6]
^^
= (1 : ([2] ++ [3,4])) ++ [5,6]
^^
= 1 : (([2] ++ [3,4]) ++ [5,6])
(I admit this looks a little magical if you don't have a good feel for cons lists; it's worth getting dirty with the details though)
See how we have to evaluate every (++) (marked with ^^ when they are evaluated) on the way down before before the 1 bubbles out to the front? The 1 is "hiding" under function applications until then.
foldr, on the other hand, is good for non-strict accumulators like lists, because it allows the accumulator to yield information before the entire list is consumed, which can bring many classically linear-space algorithms down to constant space! This also means that if your list is infinite, foldr is your only choice, unless your goal is to heat your room using your CPU.
foldr f z [] = z
foldr f z [1] = f 1 z
foldr f z [1,2] = f 1 (f 2 z)
foldr f z [1,2,3] = f 1 (f 2 (f 3 z))
foldr f z [1..] = f 1 (f 2 (f 3 (f 4 (f 5 ...
We have no trouble expressing the outermost applications without having to see the entire list. Expanding foldr the same way we did foldl:
foldr (++) z [[1,2],[3,4],[5,6]]
= [1,2] ++ ([3,4] ++ ([5,6] ++ []))
= (1 : [2]) ++ (3,4] ++ ([5,6] ++ []))
^^
= 1 : ([2] ++ ([3,4] ++ ([5,6] ++ [])))
1 is yielded immediately without having to evaluate any of the (++)s but the first one. Because none of those (++)s are evaluated, and Haskell is lazy, they don't even have to be generated until more of the output list is consumed, meaning concat can run in constant space for a function like this
concat [ [1..n] | n <- [1..] ]
which in a strict language would require intermediate lists of arbitrary length.
If these reductions look a little too magical, and if you want to go deeper, I suggest examining the source of (++) and doing some simple manual reductions against its definition to get a feel for it. (Just remember [1,2,3,4] is notation for 1 : (2 : (3 : (4 : [])))).
In general, the following seems to be a strong rule of thumb for efficiency: use foldl' when your accumulator is a strict data structure, and foldr when it's not. And if you see a friend using regular foldl and don't stop them, what kind of friend are you?
cause of using foldr in concat instead of foldl' ?
What if the result gets fully evaluated ?
If you consider [1,2,3] ++ [6,7,8] within an imperative programming mindset, all you have to do is redirect the next pointer at node 3 towards node 6, assuming of course you may alter your left side operand.
This being Haskell, you may NOT alter your left side operand, unless the optimizer is able to prove that ++ is the sole user of its left side operand.
Short of such a proof, other Haskell expressions pointing to node 1 have every right to assume that node 1 is forever at the beginning of a list of length 3. In Haskell, the properties of a pure expression cannot be altered during its lifetime.
So, in the general case, operator ++ has to do its job by duplicating its left side operand, and the duplicate of node 3 may then be set to point to node 6. On the other hand, the right side operand can be taken as is.
So if you fold the concat expression starting from the right, each component of the concatenation must be duplicated exactly once. But if you fold the expression starting from the left, you are facing a lot of repetitive duplication work.
Let's try to check that quantitatively. To ensure that no optimizer will get in the way by proving anything, we'll just use the ghci interpreter. Its strong point is interactivity not optimization.
So let's introduce the various candidates to ghci, and switch statistics mode on:
$ ghci
λ>
λ> myConcat0 = L.foldr (++) []
λ> myConcat1 = L.foldl (++) []
λ> myConcat2 = L.foldl' (++) []
λ>
λ> :set +s
λ>
We'll force full evaluation by using lists of numbers and printing their sum.
First, let's get baseline performance by folding from the right:
λ>
λ> sum $ concat [ [x] | x <- [1..10000::Integer] ]
50005000
(0.01 secs, 3,513,104 bytes)
λ>
λ> sum $ myConcat0 [ [x] | x <- [1..10000::Integer] ]
50005000
(0.01 secs, 3,513,144 bytes)
λ>
Second, let's fold from the left, to see whether that improves matters or not.
λ>
λ> sum $ myConcat1 [ [x] | x <- [1..10000::Integer] ]
50005000
(1.26 secs, 4,296,646,240 bytes)
λ>
λ> sum $ myConcat2 [ [x] | x <- [1..10000::Integer] ]
50005000
(1.28 secs, 4,295,918,560 bytes)
λ>
So folding from the left allocates much more transient memory and takes much more time, probably because of this repetitive duplication work.
As a last check, let's double the problem size:
λ>
λ> sum $ myConcat2 [ [x] | x <- [1..20000::Integer] ]
200010000
(5.91 secs, 17,514,447,616 bytes)
λ>
We see that doubling the problem size causes the resource consumptions to get multiplied by about 4. Folding from the left has quadratic cost in the case of concat.
Looking at the excellent answer by luqui, we see that both concerns:
the need to be able to access the beginning of the result list lazily
the need to avoid quadratic cost for full evaluation
happen to vote both in the same way, that is in favor of folding from the right.
Hence the Haskell library concat function using foldr.
Addendum:
After running some tests using GHC v8.6.5 with -O3 option instead of ghci, it appears that my preconceived idea of the optimizer messing up with the measurements was erroneous.
Even with -O3, for a problem size of 20,000, the foldr-based concat function is about 500 times faster that the foldl'-based one.
So either the optimizer fails to prove that it is OK to alter/reuse the left operand, or it just does not try at all.

Strict list evaluation in GHCi

Consider the program:
l = [0..10]
l' = map (+1) [0..10]
Running it with GHCi, and typing :sprint l and :sprint l' will reveal both lists to be unevaluated. However, after running length l and length l' and then again using sprint:
l = [0,1,2,3,4,5,6,7,8,9,10]
and
l' = [_,_,_,_,_,_,_,_,_,_,_]
I've made similar experiments and tried binding variables to lists in GHCi with let, however only in the case of l (defined as above in a program top-level) is the list always completely evaluated.
These behaviours all point to an optimisation feature, however I was wondering if there is a more elaborate answer (strategy) 'under-the-hood'.
The elements of the original [0..10] lists were evaluated in both cases. What was left unevaluated in the l' case were the results of applying (+1) to the list elements. In contrast, here is what happens if we map the function strictly:
GHCi> import Control.Monad
GHCi> l'' = (+1) <$!> [0 :: Integer ..10]
GHCi> :sprint l''
l'' = _
GHCi> length l''
11
GHCi> :sprint l''
l'' = [1,2,3,4,5,6,7,8,9,10,11]
(Note that I am specialising the integer literals, so that the absence of the monomorphism restriction in the GHCi prompt doesn't lead to different results from what you get upon loading the code from a file.)
It is worth noting that enumFromTo for Integer (which is what using the range boils down to), as implemented by base, evaluates the elements in order to know when to stop generating them. That's to say it is not length that forces the list elements, as we'd hope from looking at its definition:
length :: [a] -> Int
length xs = lenAcc xs 0
lenAcc :: [a] -> Int -> Int
lenAcc [] n = n
lenAcc (_:ys) n = lenAcc ys (n+1)
To get a better feeling for how length behaves here, we might try repeating your experiment with a list generated by using replicate (which, like length, doesn't look at the elements) on a not fully evaluated value:
GHCi> n = 2 * (7 :: Integer) -- let-bindings are lazy.
GHCi> :sprint n
n = _
GHCi> l''' = replicate 3 n
GHCi> :sprint l'''
l''' = _
GHCi> length l'''
3
GHCi> :sprint l'''
l''' = [_,_,_]

Why does not get evaluate? [duplicate]

Prelude> let a = 3
Prelude> :sprint a
a = _
Prelude> let c = "ab"
Prelude> :sprint c
c = _
Why does it always print a _? I don't quite get the semantics of the :sprint command.
Haskell is a lazy language. It doesn't evaluate results until they are "needed".
Now, just printing a value causes all of it to be "needed". In other words, if you type an expression in GHCi, it will try to print out the result, which causes it all to be evaluated. Usually that's what you want.
The sprint command (which is a GHCi feature, not part of the Haskell language) allows you to see how much of a value has been evaluated at this point.
For example:
Prelude> let xs = [1..]
Prelude> :sprint xs
xs = _
So, we just declared xs, and it's currently unevaluated. Now let's print out the first element:
Prelude> head xs
1
Prelude> :sprint xs
xs = 1 : _
Now GHCi has evaluated the head of the list, but nothing more.
Prelude> take 10 xs
[1,2,3,4,5,6,7,8,9,10]
Prelude> :sprint xs
xs = 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10 : _
Now the first 10 elements are evaluated, but more remain. (Since xs is an infinite list, that's not surprising.)
You can construct other expressions and evaluate them a bit at a time to see what's going on. This is really part of the GHCi debugger, which lets you step through your code one bit at a time. Especially if your code is getting caught in an infinite loop, you don't want to print anything, because that might lock up GHCi. But you still want to see what's going on... hence sprint, which lets you see what's evaluated so far.
I'm a bit late, but I had a similar issue:
λ: let xs = [1,2,3]
xs :: Num t => [t]
λ: :sprint xs
xs = _
λ: print xs
λ: :sprint xs
xs = _
This issue is specific to polymorphic values. If you have -XNoMonomorphismRestriction enabled ghci will never really evaluate/force xs, it'll only evaluate/force specializations:
λ: :set -XMonomorphismRestriction
λ: let xs = [1,2,3]
xs :: [Integer]
λ: print xs
λ: :sprint xs
xs = [1,2,3]
Haskell is lazy. It doesn't evaluate things until they are needed.
The GHCi sprint command (not part of Haskell, just a debugging command of the interpreter) prints the value of an expression without forcing evaluation.
When you write
let a = 3
you bind a new name a to the right-hand side expression, but Haskell won't evaluate that thing yet. Therefore, when you sprint it, it prints _ as the value to indicate that the expression has not yet been evaluated.
Try this:
let a = 3
:sprint a -- a has not been evaluated yet
print a -- forces evaluation of a
:sprint a -- now a has been evaluated

Understanding `runEval` - Not WHNF?

Given:
Prelude> import Control.Parallel.Strategies
Prelude> import Control.Parallel
Prelude> let fact n = if (n <= 0) then 1 else n * fact (n-1) :: Integer
Prelude> let xs = map (runEval . (\x -> return x :: Eval Integer) . fact) [1..100]
Prelude> let ys = map fact [1..100]
Prelude> :sprint xs
xs = _
Prelude> :sprint ys
ys = _
As I understand, xs is in Weak Head Normal Form. Why is that? Didn't the runEval have any affect on bringing the value/computation to Normal Form?
The reason is that let just binds a name with an expression but it doesn't trigger any evaluation of the expression.
To understand better, let me use a more simple example
Main> let x = error "foobar!" in 1
1
As you can see, the error "foobar!", that should throw exception, is just ignored. The reason is that x is not used and thus Haskell doesn't evaluate it. You need something to trigger the evaluation of x
Main> let x = error "foobar!" in x `seq` 1
*** Exception: foobar!
Going back to your example, note that Eval x specifies how to evaluate a x, not when it will be evaluated in your program.
Have a look at this wiki article on Lazyness for more.

Resources