Haskell: infinite lists - how lazy is Haskell? [duplicate] - haskell

This question already has answers here:
Finite comprehension of an infinite list
(3 answers)
Closed 7 years ago.
How lazy is Haskell?
Why does the following not know when to stop?
sum ([n^2 | n <- [1..], odd (n^2), n^2 < 100])

This isn't about how lazy it is, but rather whether there is any way it can possibly know that n^2 < 100 will never be true again once it was false once. Which it doesn't.
These parts of an array comprehension are filter expressions, not stop conditions.

Just because 11^2 is more than 100 doesn't mean 13^2 is more than 100. Well... okay, it does, but how is GHC supposed to figure that out? It's a compiler, not a prover of arbitrary mathematical truths.

The expression you've given effectively desugars into:
sum $ do
n <- [1..]
_ <- if odd (n^2) then [()] else []
_ <- if n^2 < 100 then [()] else []
return (n^2)
If you have never seen the List monad then this probably seems like a weird way to use do but it turns ultimately into:
sum $ concatMap (\n -> if odd (n^2) && (n^2 < 100) then [n^2] else []) [1..]
where concatMap is in the Prelude (in earlier versions you can define it as (concat .) . map but now it applies to any Foldable, not just lists, and is therefore closer to (concat .) . fmap).
Now the key thing here is that when Haskell sees this function it stops analyzing! It is a theorem of computer science that the only general way to prove properties about arbitrary general functions is to run them -- Haskell does not peek inside, and does not give concatMap any way to peek inside, a function to try to determine if it will eventually yield [] for all further inputs!
Computers are dumb, and that is good: the smarter a program is, the harder it is to model it in your head. concatMap is a very dumb function, it just applies its function-argument to every element of the list and puts them all together with concat, and that's all it does. List comprehensions are a very dumb syntax replacement for the do notation of the list monad, that's all they do. The do notation is just a very dumb syntax replacement for the function >>= in the Monad typeclass, which for lists is concatMap. Because all of these things are very dumb, you can understand everything that goes on very easily.

Related

foldr1 and infinite list on Haskell

Reading about folds on this wonderful book I have a question regarding foldr1 and the head' implementation proposed there, the code in question is:
head' = foldr1 (\x _ -> x)
this code works on infinite list, whereas foldl1 don't. A good visual explanation about why is this answer.
I do not quite understand though why does it work, considering that foldr1 is using the last element as accumulator. For example:
foldr1 (\x _ -> x) [1..]
This works because (I Think) lazy evaluation, even though foldr is starting from the last element of the list (which is infinite), I'm assuming because the function is not making use of any intermediate result, just return the first element.
So, is the compiler smart enough to know that, because inside of the lambda function only x is being used, just returns the first element of the list? even though it should start from the end?
On the contrary, doing
scanr1 (\x _ -> x) [1..]
Will print all elements of the infinite list without ending, which I suppose it's what the foldr is doing, just the compiler is smart enough to not evaluate it and return the head.
Thanks in advance.
Update
I found a really good answer that helped me understand how foldr works more deeply:
https://stackoverflow.com/a/63177677/1612432
foldr1 is using the last element as an initial accumulator value, but the combining function (\x _ -> x) is lazy in its second argument.
So provided the list is non-empty (let alone infinite), the "accumulator" value is never needed, thus never demanded.
foldr does not mean it should start from the right, just that the operations are grouped / associated / parenthesized on the right. If the combining function is strict in its 2nd argument that will entail indeed starting the calculations from the right, but if not -- then not.
So no, this is not about compiler being smart, this is about Haskell's lazy semantics that demand this. foldr is defined so that
foldr g z [x1,x2,...,xn] = g x1 (foldr g z [x2,...,xn])
and
foldr1 g xs = foldr g (last xs) (init xs)
and that's that.

Haskell count of all elements in list of lists

In Haskell, given a list of lists, where each sublist contains any number of integers, how can I write a function that returns the total number of elements in all the lists?
For example if my list is:
[[1,2,3],[4,3],[2,1],[5]]
The function would return 8, since there are 8 total elements in the list of lists. I know you can use length [] to get the length of a normal list, but how do I do this with a list of lists? I would assume the solution to be recursive, but could use some help, since I am new to the language.
Three ways:
Get the length of each inner list, and sum them all:
GHCi> sum (fmap length [[1,2,3],[4,3],[2,1],[5]])
8
(Note this is equivalent to Thomas English's answer: map is fmap specialised to lists.)
Flatten the list of lists, and then get the length:
GHCi> length (concat [[1,2,3],[4,3],[2,1],[5]])
8
Use the Compose wrapper, which will make length drill through the two layers of lists.
GHCi> import Data.Functor.Compose
GHCi> length (Compose [[1,2,3],[4,3],[2,1],[5]])
8
(While explaining exactly what is going on here is a little bit tricky -- in a nutshell, we are exploiting that Compose has a Foldable instance -- behind the scenes it boils down to something very much like the first solution.)
I would assume the solution to be recursive
Indeed. It's just that the additional recursion is performed by the other functions we use (fmap for lists, sum, concat, etc.), and so we don't have to write the recursive algorithms explicitly.
You should check out how to use the 'map' function. Learn You a Haskell is a good resource to learn more!
mylist = [[1,2,3],[4,3],[2,1],[5]]
-- Get the length of each sublist with map
sublist_lengths = map length mylist
-- sublist_lengths = [3, 2, 2, 1]
result = sum sublist_lengths
One additional (pedantic) solution using folds:
foldr ((+) . foldr ((+) . const 1) 0) 0
-- or more simply:
foldr ((+) . length) 0
This incredibly ugly fold generalizes to:
sum [1 | xs <- xss, x <- xs]
which is certainly easier to read.
So all you need is to treat each list in the list as separate. What tools can do that? As Adam Smith demonstrates foldr is probably the tool of choice however fmap looks good, too and may be shorter.
What other tools are there? One of my favorites, the list comprehension.
The basic list comprehension lets you process each element of a list in turn.
For yours:
yourList = [[1,2,3],[4,3],[2,1],[5]]
[length l | l <- yourList] -- gets the length of each list and
sum [length l | l <- yourList] -- adds up all the lengths produced

Haskell filter function laziness

Given:
take 5 (filter p xs)
say if filter p xs would return 1K match, would Haskell only filter out 5 matches and without producing a large intermediate result?
It will scan xs only as much as needed to produce 5 matches, evaluating p only on this prefix of xs.
To be more precise, it can actually perform less computation, depending on how the result is used. For instance,
main = do
let p x = (x==3) || (x>=1000000)
list1 = [0..1000000000]
list2 = take 5 (filter p list1)
print (head list2)
will only scan list1 until 3 is found, and no more, despite take asking for five elements. This is because head is demanding only the first of these five, so laziness causes to evaluate just that.
"Would return 1K matches" under what circumstances?
Haskell doesn't work by first evaluating filter p xs (as you would in an ordinary call-by-value language like Java or Ruby). It works by evaluating take 5 first (in this case). take 5 will evaluate enough of filter p xs to end up with a result, and not evaluate the rest.
Yes, it will not.
If it does, something like the following would not work anymore
take 5 (filter (> 10) [1..])
This feature is called Lazy evaluation.

Haskell function example, why infinite list doesnt stop

Why doesnt map sqrt[1..] not give an infinite recursion????
How can i better understand the haskell?
sqrtSums :: Int
sqrtSums = length ( takeWhile (<1000) (scanl1 (+) (map sqrt[1..]))) + 1
Laziness turns lists into streams
Lists in Haskell behave as if they have a built-in iterator or stream interface, because the entire language uses lazy evaluation by default, which means only calculating results when they're needed by the calling function.
In your example,
sqrtSums = length ( takeWhile (<1000) (scanl1 (+) (map sqrt[1..]))) + 1
it's as if length keeps asking takeWhile for another element,
which asks scanl1 for another element,
which asks map for another element,
which asks [1..] for another element.
Once takeWhile gets something that's not <1000, it doesn't ask scanl1 for any more elements, so [1..] never gets fully evaluated.
Thunks
An unevaluated expression is called a thunk, and getting answers out of thunks is called reducing them. For example, the thunk [1..] first gets reduced to 1:[2..]. In a lot of programming languages, by writing the expression, you force the compiler/runtime to calculate it, but not in Haskell. I could write ignore x = 3 and do ignore (1/0) - I'd get 3 without causing an error, because 1/0 doesn't need to be calculated to produce the 3 - it just doesn't appear in the right hand side that I'm trying to produce.
Similarly, you don't need to produce any elements in your list beyond 131 because by then the sum has exceeded 1000, and takeWhile produces an empty list [], at which point length returns 130 and sqrtSums produces 131.
Haskell evaluates expressions lazily. This means that evaluation only occurs when it is demanded. In this example takeWhile (< 1000) repeatedly demands answers from scanl1 (+) (map sqrt [1..]) but stops after one of them exceeds 1000. The moment this starts happening Haskell ceases to evaluate more of the (truly infinite) list.
We can see this in the small by cutting away some pieces from this example
>>> takeWhile (< 10) [1..]
[1,2,3,4,5,6,7,8,9]
Here we have an expression that represents an infinite list ([1..]) but takeWhile is ensuring that the total expression only demands some of those countless values. Without the takeWhile Haskell will try to print the entire infinite list
>>> [1..]
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24Interrupted.
But again we notice that Haskell demands each element one-by-one only as it needs them in order to print. In a strict language we'd run out of ram trying to represent the infinite list internally prior to printing the very first answer.

Functionally solving questions: how to use Haskell?

I am trying to solve one of the problem in H99:
Split a list into two parts; the length of the first part is given.
Do not use any predefined predicates.
Example:
> (split '(a b c d e f g h i k) 3)
( (A B C) (D E F G H I K))
And I can quickly come with a solution:
split'::[a]->Int->Int->[a]->[[a]]
split' [] _ _ _ = []
split' (x:xs) y z w = if y == z then [w,xs] else split' xs y (z+1) (w++[x])
split::[a]->Int->[[a]]
split x y = split' x y 0 []
My question is that what I am doing is kind of just rewriting the loop version in a recursion format. Is this the right way you do things in Haskell? Isn't it just the same as imperative programming?
EDIT: Also, how do you generally avoid the extra function here?
It's convenient that you can often convert an imperative solution to Haskell, but you're right, you do usually want to find a more natural recursive statement. For this one in particular, reasoning in terms of base case and inductive case can be very helpful. So what's your base case? Why, when the split location is 0:
split x 0 = ([], x)
The inductive case can be built on that by prepending the first element of the list onto the result of splitting with n-1:
split (x:xs) n = (x:left, right)
where (left, right) = split xs (n-1)
This may not perform wonderfully (it's probably not as bad as you'd think) but it illustrates my thought process when I first encounter a problem and want to approach it functionally.
Edit: Another solution relying more heavily on the Prelude might be:
split l n = (take n l, drop n l)
It's not the same as imperative programming really, each function call avoids any side effects, they're just simple expressions. But I have a suggestion for your code
split :: Int -> [a] -> ([a], [a])
split p xs = go p ([], xs)
where go 0 (xs, ys) = (reverse xs, ys)
go n (xs, y:ys) = go (n-1) (y : xs, ys)
So how we've declared that we're only returning two things ([a], [a]) instead of a list of things (which is a bit misleading) and that we've constrained our tail recursive call to be in local scope.
I'm also using pattern matching, which is a more idiomatic way to write recursive functions in Haskell, when go is called with a zero, then the first case is run. It's more pleasant generally to write recursive functions that go down rather than up since you can use pattern matching rather than if statements.
Finally this is more efficient since ++ is linear in the length of the first list, which means that the complexity of your function is quadratic rather than linear. This method is also tail recursive unlike Daniel's solution, which is important for handling any large lists.
TLDR: Both versions are functional style, avoiding mutation, using recursion instead of loops. But the version I've presented is a little more Haskell-ish and slightly faster.
A word on tail recursion
This solution uses tail recursion which isn't always essential in Haskell but in this case is helpful when you use the resulting lists, but at other times is actually a bad thing. For example, map isn't tail recursive, but if it was you couldn't use it over infinite lists!
In this case, we can use tail recursion, since an integer is always finite. But, if we only use the first element of the list, Daniel's solution is much faster, since it produces the list lazily. On the other hand, if we use the whole list, my solution is much faster.
split'::[a]->Int->([a],[a])
split' [] _ = ([],[])
split' xs 0 = ([],xs)
split' (x:xs) n = (x:(fst splitResult),snd splitResult)
where splitResult = split' xs (n-1)
It seems you have already shown an example of a better solution.
I would recommend you read SICP. Then you come to the conclusion that the extra function is normal. There's also widely used approach to hide functions in the local area. The book may seem boring to you but in the early chapters she will get used to the functional approach in solving problems.
There are tasks in which the recursive approach is more necessary. But for example if you use tail recursion (which is so often praised without cause) then you will notice that this is just the usual iteration. Often with "extra-function" which hide iteration variable (oh.. word variable is not very appropriate, likely argument).

Resources