Project Euler 2: Haskell implementation confusion

Project Euler 2: Haskell implementation confusion - haskell

I am using GHCi to try to solve problem 2 on Project Euler.
http://projecteuler.net/problem=2
I defined the infinite list fibs as:
Prelude> let fibs = 1 : 2 : zipWith(+) fibs (tail fibs)
I tried using list comprehensions in the following manner:
Prelude> [x | x<-fibs, x mod 2 == 0, x<4000000]
[1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765,10946,17711,28657,46368,75025,121393,196418,317811,514229,832040,1346269,2178309,3524578
Prelude> sum $ [x | x <- [1..], x mod 2 == 0, x<4000000]
But the shell hangs on the second command. I am confused as to why the list comprehension can build the list but the sum function cannot process it.
I found that a working solution is
Prelude> sum $ filter even $ takeWhile (<= 4000000) fibs
But once again I am confused as to why that works when the list comprehension method doesn't.

When evaluating
[x | x<-fibs, x mod 2 == 0, x<4000000]
your program/the Haskell compiler does not know that this list is actually finite. When you call a function that consumes a list entirely such as sum, it just keeps generating Fibonacci numbers, testing them for x<4000000 (and failing every time after a certain point).
In fact, the Haskell compiler cannot know in the general case whether a comprehension expression denotes a finite list; the problem is undecidable.

it hangs on the first command as well, isn't it? :)
a list comprehension with test is equivalent to a filter expression, not a takeWhile expression:
sum $ [x | x <- [1..], x mod 2 == 0, x<4000000] ===
sum $ filter (<4000000) $ filter even $ [1..]
this clearly describes a non-terminating computation.
Haskell comprehensions have no Prolog's cut (i.e. "!") equivalent. In Prolog, we are able to stop a computation "from within the test":
sum(I,Acc,Res):- I >= 4000000, !, Res = Acc ;
Acc2 is Acc+I, sum(I+1,Acc2,Res).
but in Haskell we must emulate the cut by using takeWhile, or take.

Replace [1..] with fibs, and your second variant will be equivalent to the first.
EDIT: oh sorry, it won’t. But it will be “less” wrong :) I should be paying more attention…

Related

Primes in Haskell

I'm learning Haskell, and I've tried to generate an infinite list of primes, but I can't understand what my function is doing wrong.
The function:
prime = 2:3:filter (\x -> all (\y -> (mod x y) > 0) (init prime)) [5..]
I think it's the init prime, but the strange thing is that even if I set an upper bound to the range (5..10 for example), the function loops forever and never gets any result for prime !! 2
Can you please tell me what I'm doing wrong?

Well, for one let's look at what init does for a finite list:
init [1] == []
init [1,2] == [1]
init [1,2,3] == [1,2]
ok, so it gives us all but the last element of the list.
So what's init primes? Well, prime without the last element. Hopefully if we implemented prime correctly it shouldn't have a last element (because there are infinitely many primes!), but more importantly we don't quite need to care yet because we don't have the full list for now anyway - we only care about the first couple of elements after all, so for us it's pretty much the same as just prime itself.
Now, looking at all: What does this do? Well, it takes a list and a predicate and tells us if all the elements of the list satisfy the predicate:
all (<5) [1..4] == True
all even [1..4] == False
it even works with infinite lists!
all (<5) [1..] == False
so what's going on here? Well, here's the thing: It does work with infinite lists... but only if we can actually evaluate the list up to the first element of the list that violates the predicate! Let's see if this holds true here:
all (\y -> (mod 5 y) > 0) (init prime)
so to find out if 5 is a prime number, we'd have to check if there's a number in prime minus the last element of prime that divides it. Let's see if we can do that.
Now let's look at the definition of prime, we get
all (\y -> (mod 5 y) > 0) (2:3:filter (\x -> all (\y -> (mod x y) > 0) (init prime)) [5..])
So to determine whether 5 is a prime number, we only have to check if it's:
divisible by 2 - it's not, let's continue
divisible by 3 - still no
divisible by ...? Well, we're in the process of checking what the 3rd prime is so we don't know yet...
and there's the crux of the problem. With this logic, to determine the third prime number you need to know the third prime number! Of course logically, we actually don't want to check this at all, rather we only need to check if any of the smaller prime numbers are divisors of the current candidate.
So how do we go about doing that? Well, we'll have to change our logic unfortunately. One thing we can do is try to remember how many primes we already have, and only take as many as we need for our comparison:
prime = 2 : 3 : morePrimes 2 [5..]
morePrimes n (x:xs)
| all (\y -> mod x y > 0) (take n prime) = x : morePrimes (n+1) xs
| otherwise = morePrimes n xs
so how does this work? Well, it basically does what we were just talking about: We remember how many primes we already have (starting at 2 because we know we have at least [2,3] in n. We then check if our next prime is divisible by any of the of n primes we already know by using take n, and if it is we know it's our next prime and we need to increment n - otherwise we just carry on.
There's also the more well known form inspired by (although not quite the same as) the Sieve of Eratosthenes:
prime = sieve [2..] where
sieve (p:xs) = p : sieve (filter (\x -> mod x p > 0) xs)
so how does this work? Well, again with a similar idea: We know that the next prime number needs to be non-divisible by any previous prime number. So what do we do? Well, starting at 2 we know that the first element in the list is a prime number. We then throw away every number divisible by that prime number using filter. And afterwards, the next item in the list is going to be a prime number again (because we didn't throw it away), so we can repeat the process.
Neither of these are one liners like the one you were hoping for though.

If the code in the other answer is restructured under the identity
[take n primes | n <- [0..]] == inits primes
eventually we get
import Data.List
-- [ ([], 2), ([2], 3), ([2,3], 5), ... ]
primes = 2 : [ c | (ps, p) <- zip (inits primes) primes,
c <- take 1 [c | c <- [p+1..],
and [mod c p > 0 | p <- ps]]]
Further improving it algorithmically, it becomes
primes = 2 : [ c | (ps, r:q:_) <- zip (inits primes) -- [] [3,4,...]
(tails $ 3 : map (^2) primes), -- [2] [4,9,...]
c <- [r..q-1], and [mod c p > 0 | p <- ps]] -- [2,3] [9,25,...]

How do I merge the results of two functions into a list in Haskell?

I want to create an infinite list with fibonacci in a good spot. My original thought was just to call and then truncate the list returned by and then stick them together. I don't know how to just take the first element from the list. Would someone be able to point me in the right direction?

I think you're basically looking for indexing the fibonaccis at every prime:
result = [fibs !! p | p <- primes]
However, that's not really efficient, repeatedly scanning through the fibs. Better do it by repeatedly dropping a bit - the distance between every two primes - from the list, and taking the heads of each of the so-produced lists:
result = map head $ tail $ scanl (flip drop) fibs $ zipWith (-) primes (0:primes)
To use one-based indexing, it's
result = [fibs !! (p-1) | p <- primes]
result = map head $ tail $ scanl (flip drop) fibs $ zipWith (-) primes (1:primes)

Haskell Does Not Evaluate Lazily takeWhile

isqrt :: Integer -> Integer
isqrt = floor . sqrt . fromIntegral
primes :: [Integer]
primes = sieve [2..] where
sieve (p:ps) = p : sieve [x | x <- ps, x `mod` p > 0]
primeFactors :: Integer -> [Integer]
primeFactors n = takeWhile (< n) [x | x <- primes, n `mod` x == 0]
Here is my code. I think you guessed what I am trying to do: A list of prime factors of a given number using infinite list of prime numbers. But this code does not evaluate lazily.
When I use ghci and :l mycode.hs and enter primeFactors 24, the result is [2, 3 ( and the cursor constantly flashing there) there isn't a further Prelude> prompt. I think there is a problem there. What am I doing wrong?
Thanks.

takeWhile never terminates for composite arguments. If n is composite, it has no prime factors >= n, so takeWhile will just sit there.
Apply takeWhile to the primes list and then filter the result with n mod x, like this:
primeFactors n = [x | x <- takeWhile (<= n) primes, n `mod` x == 0]
(<= is used instead of < for maximum correctness, so that prime factors of a prime number would consist of that number).

Have an illustration of what happens:
http://sketchtoy.com/67338195

Your problem isn't directly takeWhile, but rather the list comprehension.
[x | x <- primes, n `mod` x == 0]
For n = 24, we get 24 `mod` 2 == 0 and 24 `mod` 3 == 0, so the value of this list comprehension starts with 2 : 3 : .... But consider the ... part.
The list comprehension has to keep pulling values from primes and checking 24 `mod` x == 0. Since there are no more prime factors of 24 nothing will ever pass that test and get emitted as the third value of the list comprehension. But since there's always another prime to test, it will never stop and conclude that the remaining tail of the list is empty.
Because this is lazily evaluated, if you only ever ask for the first two elements of this list then you're fine. But if your program ever needs the third one (or even just to know whether or not there is a third element), then the list comprehension will just spin forever trying to come up with one.
takeWhile (< 24) keeps pulling elements from its argument until it finds one that is not < 24. 2 and 3 both pass that test, so takeWhile (< 24) does need to know what the third element of the list comprehension is.
But it's not really a problem with takeWhile; the problem is that you've written a list comprehension to find all of the prime factors (and nothing else), and then trying to use a filter on the results of that to cut off the infinite exploration of all the higher primes that can't possibly be factors. That doesn't really make sense if you stop to think about it; by definition anything that isn't a prime factor can't be an element of that list, so you can't filter out the non-factors larger than n from that list. Instead you need to filter the input to that list comprehension so that it doesn't try to explore an infinite space, as #n.m's answer shows.

Haskell filter function laziness

Given:
take 5 (filter p xs)
say if filter p xs would return 1K match, would Haskell only filter out 5 matches and without producing a large intermediate result?

It will scan xs only as much as needed to produce 5 matches, evaluating p only on this prefix of xs.
To be more precise, it can actually perform less computation, depending on how the result is used. For instance,
main = do
let p x = (x==3) || (x>=1000000)
list1 = [0..1000000000]
list2 = take 5 (filter p list1)
print (head list2)
will only scan list1 until 3 is found, and no more, despite take asking for five elements. This is because head is demanding only the first of these five, so laziness causes to evaluate just that.

"Would return 1K matches" under what circumstances?
Haskell doesn't work by first evaluating filter p xs (as you would in an ordinary call-by-value language like Java or Ruby). It works by evaluating take 5 first (in this case). take 5 will evaluate enough of filter p xs to end up with a result, and not evaluate the rest.

Yes, it will not.
If it does, something like the following would not work anymore
take 5 (filter (> 10) [1..])
This feature is called Lazy evaluation.

Haskell lazy evaluation

If I call the following Haskell code
find_first_occurrence :: (Eq a) => a -> [a] -> Int
find_first_occurrence elem list = (snd . head) [x | x <- zip list [0..], fst x == elem]
with the arguments
'X' "abcdXkjdkljklfjdlfksjdljjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj"
how much of the zipped list [('a',0), ('b',1), ] is going to be built?
UPDATE:
I tried to run
find_first_occurrence 10 [1..]
and returns 9 almost instantly, so I guess it does use lazy evaluation at least for simple cases? The answer is also computed "instantly" when I run
let f n = 100 - n
find_first_occurrence 10 (map f [1..])

Short answer: it will be built only up to the element you're searching for. This means that only in the worst case you'll need to build the whole list, that is when no element satisfies the conditions.
Long answer: let me explain why with a pair of examples:
ghci> head [a | (a,b) <- zip [1..] [1..], a > 10]
11
In this case, zip should produce an infinite list, however the laziness enables Haskell to build it only up to (11,11): as you can see, the execution does not diverge but actually gives us the correct answer.
Now, let me consider another issue:
ghci> find_first_occurrence 1 [0, 0, 1 `div` 0, 1]
*** Exception: divide by zero
ghci> find_first_occurrence 1 [0, 1, 1 `div` 0, 0]
1
it :: Int
(0.02 secs, 1577136 bytes)
Since the whole zipped list is not built, haskell obviously will not even evaluate each expression occurring in the list, so when the element is before div 1 0, the function is correctly evaluated without raising exceptions: the division by zero did not occur.

All of it.
Since StackOverflow won't let me post such a short answer: you can't get away with doing less work than looking through the whole list if the thing you're looking for isn't there.
Edit: The question now asks something much more interesting. The short answer is that we will build the list:
('a',0):('b',1):('c',2):('d',3):('X',4):<thunk>
(Actually, this answer is just the slightest bit subtle. Your type signature uses the monomorphic return type Int, which is strict in basically all operations, so all the numbers in the tuples above will be fully evaluated. There are certainly implementations of Num for which you would get something with more thunks, though.)

You can easily answer such a question by introducing undefineds here and there. In our case it is sufficient to change our inputs:
find_first_occurrence 'X' ("abcdX" ++ undefined)
You can see that it produces the result, which means that it does not even look beyond the 'X' it found (otherwise it would have thrown an Exception). Obviously, the zipped list can not be built without looking at the original list.
Another (possibly less reliable) way to analyse your laziness is to use trace function from Debug.Trace:
> let find_first_occurrence elem list = (snd . head) [x | x <- map (\i -> trace (show i) i) $ zip list [0..], fst x == elem]
> find_first_occurrence 'X' "abcdXkjdkljklfjdlfksjdljjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj"
Prints
('a',0)
('b',1)
('c',2)
('d',3)
('X',4)
4

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string