Why doesnt map sqrt[1..] not give an infinite recursion????
How can i better understand the haskell?
sqrtSums :: Int
sqrtSums = length ( takeWhile (<1000) (scanl1 (+) (map sqrt[1..]))) + 1
Laziness turns lists into streams
Lists in Haskell behave as if they have a built-in iterator or stream interface, because the entire language uses lazy evaluation by default, which means only calculating results when they're needed by the calling function.
In your example,
sqrtSums = length ( takeWhile (<1000) (scanl1 (+) (map sqrt[1..]))) + 1
it's as if length keeps asking takeWhile for another element,
which asks scanl1 for another element,
which asks map for another element,
which asks [1..] for another element.
Once takeWhile gets something that's not <1000, it doesn't ask scanl1 for any more elements, so [1..] never gets fully evaluated.
Thunks
An unevaluated expression is called a thunk, and getting answers out of thunks is called reducing them. For example, the thunk [1..] first gets reduced to 1:[2..]. In a lot of programming languages, by writing the expression, you force the compiler/runtime to calculate it, but not in Haskell. I could write ignore x = 3 and do ignore (1/0) - I'd get 3 without causing an error, because 1/0 doesn't need to be calculated to produce the 3 - it just doesn't appear in the right hand side that I'm trying to produce.
Similarly, you don't need to produce any elements in your list beyond 131 because by then the sum has exceeded 1000, and takeWhile produces an empty list [], at which point length returns 130 and sqrtSums produces 131.
Haskell evaluates expressions lazily. This means that evaluation only occurs when it is demanded. In this example takeWhile (< 1000) repeatedly demands answers from scanl1 (+) (map sqrt [1..]) but stops after one of them exceeds 1000. The moment this starts happening Haskell ceases to evaluate more of the (truly infinite) list.
We can see this in the small by cutting away some pieces from this example
>>> takeWhile (< 10) [1..]
[1,2,3,4,5,6,7,8,9]
Here we have an expression that represents an infinite list ([1..]) but takeWhile is ensuring that the total expression only demands some of those countless values. Without the takeWhile Haskell will try to print the entire infinite list
>>> [1..]
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24Interrupted.
But again we notice that Haskell demands each element one-by-one only as it needs them in order to print. In a strict language we'd run out of ram trying to represent the infinite list internally prior to printing the very first answer.
Related
I read the old Russian translate of the Learn You a Haskell for Great Good! book. I see that the current English version (online) is newer, therefore I look it time of time also.
The quote:
When you put together two lists (even if you append a singleton list
to a list, for instance: [1,2,3] ++ [4]), internally, Haskell has to
walk through the whole list on the left side of ++. That's not a
problem when dealing with lists that aren't too big. But putting
something at the end of a list that's fifty million entries long is
going to take a while. However, putting something at the beginning of
a list using the : operator (also called the cons operator) is
instantaneous.
I assumed that Haskell has to walk through the whole list to get the last item of the list for the foldr, foldr1, scanr and scanr1 functions. Also I assumed that Haskell will do the same for getting a previous element (and so on for each item).
But I see I was mistaken:
UPD
I try this code and I see the similar time of processing for both cases:
data' = [1 .. 10000000]
sum'r = foldr1 (\x acc -> x + acc ) data'
sum'l = foldl1 (\acc x -> x + acc ) data'
Is each list of Haskell bidirectional? I assume that for getting last item of list Haskell at first are to iterate each item and to remember the necessary item (last item for example) for getting (later) the previous item of bidirectional list (for lazy computation). Am I right?
It's tricky since Haskell is lazy.
Evaluating head ([1..1000000]++[1..1000000]) will return immediately, with 1. The lists will never be fully created in memory: only the first element of the first list will be.
If you instead demand the full list [1..1000000]++[1..1000000] then ++ will indeed have to create a two-million long list.
foldr may or may not evaluate the full list. It depends on whether the function we use is lazy. For example, here's map f xs written using foldr:
foldr (\y ys -> f y : ys) [] xs
This is efficient as map f xs is: lists cells are produced on demand, in a streaming fashion. If we need only the first ten elements of the resulting list, then we indeed create only the first ten cells -- foldr will not be applied to the rest of the list. If we need the full resulting list, then foldr will be run over the full list.
Also note that xs++ys can be defined similarly in terms of foldr:
foldr (:) ys xs
and has similar performance properties.
By comparison, foldl instead always runs over the whole list.
In the example you mention we have longList ++ [something], appending to the end of the list. This only costs constant time if all we demand is the first element of the resulting list. But if we really need the last element we added, then appending will need to run over the whole list. This is why appending at the end is considered O(n) instead of O(1).
In the last update, the question speaks about computing the sum with foldr vs foldl, using the (+) operator. In such case, since (+) is strict (it needs both arguments to compute result) then both folds witll need to scan the whole list. The performance in such cases can be comparable. Indeed, they would compute, respectively
1 + (2 + (3 + (4 + ..... -- foldr
(...(((1 + 2) + 3) +4) + .... -- foldl
By comparison foldl' would be more memory efficient, since it starts reducing the above sum before building the above giant expression. That is, it would compute 1+2 first (3), then 3+3 (6), then 6 + 4 (10),... keeping in memory only the last result (a single integer) while the list is being scanned.
To the OP: the topic of laziness is not easy to grasp the first time. It is quite vast -- you just met a ton of different examples which have subtle but significant performance differences. It's hard to explain everything succinctly -- it's just too broad. I'd recommend to focus on small examples and start digesting those first.
This question already has answers here:
Finite comprehension of an infinite list
(3 answers)
Closed 7 years ago.
How lazy is Haskell?
Why does the following not know when to stop?
sum ([n^2 | n <- [1..], odd (n^2), n^2 < 100])
This isn't about how lazy it is, but rather whether there is any way it can possibly know that n^2 < 100 will never be true again once it was false once. Which it doesn't.
These parts of an array comprehension are filter expressions, not stop conditions.
Just because 11^2 is more than 100 doesn't mean 13^2 is more than 100. Well... okay, it does, but how is GHC supposed to figure that out? It's a compiler, not a prover of arbitrary mathematical truths.
The expression you've given effectively desugars into:
sum $ do
n <- [1..]
_ <- if odd (n^2) then [()] else []
_ <- if n^2 < 100 then [()] else []
return (n^2)
If you have never seen the List monad then this probably seems like a weird way to use do but it turns ultimately into:
sum $ concatMap (\n -> if odd (n^2) && (n^2 < 100) then [n^2] else []) [1..]
where concatMap is in the Prelude (in earlier versions you can define it as (concat .) . map but now it applies to any Foldable, not just lists, and is therefore closer to (concat .) . fmap).
Now the key thing here is that when Haskell sees this function it stops analyzing! It is a theorem of computer science that the only general way to prove properties about arbitrary general functions is to run them -- Haskell does not peek inside, and does not give concatMap any way to peek inside, a function to try to determine if it will eventually yield [] for all further inputs!
Computers are dumb, and that is good: the smarter a program is, the harder it is to model it in your head. concatMap is a very dumb function, it just applies its function-argument to every element of the list and puts them all together with concat, and that's all it does. List comprehensions are a very dumb syntax replacement for the do notation of the list monad, that's all they do. The do notation is just a very dumb syntax replacement for the function >>= in the Monad typeclass, which for lists is concatMap. Because all of these things are very dumb, you can understand everything that goes on very easily.
I'm playing around with the language to start learning and I am puzzled beyond my wits about how a recursive definition works.
For example, let's take the sequence of the Triangular numbers (TN n = sum [1..n])
The solution provided was:
triangularNumbers = scanl1 (+) [1..]
So far, so good.
But the solution I did come up with was:
triangularNumbers = zipWith (+) [1..] $ 0 : triangularNumbers
Which is also correct.
Now my question is: how does this translate to a lower level implementation? What happens exactly behind the scene when such a recursive definition is met?
Here is a simple recursive function that gives you the nth Triangular number:
triag 0 = 0
triag n = n + triag (n-1)
Your solution triag' = zipWith (+) [1..] $ 0 : triag' is something more fancy: It's corecursive (click, click). Instead of calculating the nth number by reducing it to the value of smaller inputs, you define the whole infinite sequence of triangular numbers by recursively specifying the next value, given an initial segment.
How does Haskell handle such corecursion? When it encounters your definition, no calculation is actually performed, it is deferred until results are needed for further computations. When you access a particular element of your list triag', Haskell starts computing the elements of the list based on the definition, up to the element that gets accessed. For more detail, I found this article on lazy evaluation helpful. In summary, lazy evaluation is great to have, unless you need to predict memory usage.
Here is a similar SO question, with a step-by step explanation of the evaluation of fibs = 0 : 1 : zipWith (+) fibs (tail fibs), a corecursive definition of the Fibonacci sequence.
Given:
take 5 (filter p xs)
say if filter p xs would return 1K match, would Haskell only filter out 5 matches and without producing a large intermediate result?
It will scan xs only as much as needed to produce 5 matches, evaluating p only on this prefix of xs.
To be more precise, it can actually perform less computation, depending on how the result is used. For instance,
main = do
let p x = (x==3) || (x>=1000000)
list1 = [0..1000000000]
list2 = take 5 (filter p list1)
print (head list2)
will only scan list1 until 3 is found, and no more, despite take asking for five elements. This is because head is demanding only the first of these five, so laziness causes to evaluate just that.
"Would return 1K matches" under what circumstances?
Haskell doesn't work by first evaluating filter p xs (as you would in an ordinary call-by-value language like Java or Ruby). It works by evaluating take 5 first (in this case). take 5 will evaluate enough of filter p xs to end up with a result, and not evaluate the rest.
Yes, it will not.
If it does, something like the following would not work anymore
take 5 (filter (> 10) [1..])
This feature is called Lazy evaluation.
I am being asked to make an haskell function that computes something like
1^2 + 2^2 + 3^2 ...
While I find it quite easy to implement with list comprehensions
sum [ k^2 | k <- [1..100]]
or maps
sum (map (\x -> x*x) [1..100])
I'm having some hard time getting how to achieve it with foldls.
If I am not wrong, one needs no less than 3 parameters in a recursive function to achieve a result with this:
The current position (1... up to n)
The current sum
Where to stop
Even if I define this function, it will still return a tuple, not a number (like I need it to!).
Could anyone be kind enough to give me some clues on what I may be missing?
Thanks
If you look at the definition of sum, it's just sum = foldl (+) 0. So if you replace sum with foldl (+) 0 in either of your solutions, you have a solution using foldl.
You can also get rid of the need for list comprehensions or map by using foldl with a function that adds the square of its second argument to its first argument.
I'm not sure where your considerations about recursive functions figure into this. If you're using foldl, you don't need to use recursion (except in so far that foldl is implemented using recursion).
However, you are wrong that a recursive function would need three arguments: A recursive functions summing the squares of each element in a list, would most straight-forwardly implemented by taking a list and adding the head of the list to the result of calling the function on the list's tail. The base case being squareSum [] = 0. This has nothing to with foldl, though.
The "current position" (actually the next item in the list, just like in your map and list comprehension versions) and where to stop are implicit in the list being folded over. The current sum is the "accumulator" parameter of the fold. So, fill in the blank:
foldl (\runningSum nextNumber -> ____) 0 [1..100]