Is it possible to exit a generator?

Is it possible to exit a generator? - haskell

Consider the following:
list = [1,3..]
generate n = [compute y | y <- list , (compute y) < n ]
compute a = ... whatever ...
Is it possible to exit the generator before getting to the last element of my list
(e.g. if (compute y > 20)?
I want to save computing power. I only need the elements smaller than n.
I'm new to Haskell. A simple answer might be the best answer.

The wonderful thing about Haskell is that it's lazy. If you said
> let x = generate 100000
then Haskell doesn't immediately calculate generate 100000, it just creates a promise to start calculating it (we normally call this a thunk).
If you want only elements only until compute y > 20, then you can do
> takeWhile (<= 20) (generate 100000)
This is the same semantics that let you do something like
> let nums = [1..] :: [Integer]
This makes a lazy reference to all Integer values from 1 to infinity. You can then do things like
> take 10 $ map (* 10) $ drop 12345 $ map (\x -> x ^ 2 + x ^ 3 + x ^ 4) $ filter even nums
[3717428823832552480,3718633373599415160,3719838216073150080,3721043351301172120,3722248779330900000,3723454500209756280,3724660513985167360,3725866820704563480,3727073420415378720,3728280313165051000]
And while tihs seems like a lot of work, it only calculates the bare minimum necessary to return the 10 elements you requested. The argument to take 10 in this example is still an infinite list, where we first grabbed all the evens, then mapped an algebraic expression to it, then dropped the first 12345 elements, then multiplied all remaining (infinite) elements by 10. Working with infinite structures in Haskell is very common and often advantageous.
As a side note, your current definition of generate will do extra work, you'd want something more like
generate n = [compute_y | y <- list, let compute_y = compute y, compute_y < n]
This way compute y is only calculated once and the value is shared between your filter compute_y < n and the left hand side of the | in the comprehension. Also be aware that when you have a condition in a comprehension, this gets translated to a filter, not a takeWhile:
> -- filter applies the predicate to all elements in the list
> filter (\x -> x `mod` 5 == 0) [5,10,15,21,25]
[5,10,15,20]
> -- takeWhile pulls values until the predicate returns False
> takeWhile (\x -> x `mod` 5 == 0) [5,10,15,21,25]
[5,10,15]

Related

Haskell Listing the first 10 numbers starting from 1 which are divisible by all the numbers from 2 to 15

--for number divisible by 15 we can get it easily
take 10 [x | x <- [1..] , x `mod` 15 == 0 ]
--but for all how do I use the all option
take 10 [x | x <- [1..] , x `mod` [2..15] == 0 ]
take 10 [x | x <- [1..] , all x `mod` [2..15] == 0 ]
I want to understand how to use all in this particular case.
I have read Haskell documentation but I am new to this language coming from Python so I am unable to figure the logic.

First you can have a function to check if a number is mod by all [2..15].
modByNumbers x ns = all (\n -> x `mod` n == 0) ns
Then you can use it like the mod function:
take 10 [x | x <- [1..] , x `modByNumbers` [2..15] ]

Alternatively, using math, we know that the smallest number divible by all numbers less than n is the product of all of the prime numbers x less than n raised to the floor of the result of logBase x n.
A basic isPrime function:
isPrime n = length [ x | x <- [2..n], n `mod` x == 0] == 1
Using that to get all of the primes less than 15:
p = [fromIntegral x :: Float | x <- [2..15], isPrime x]
-- [2.0,3.0,5.0,7.0,11.0,13.0]
Now we can get the exponents:
e = [fromIntegral (floor $ logBase x 15) :: Float | x <- p']
-- [3.0,2.0,1.0,1.0,1.0,1.0]
If we zip these together.
z = zipWith (**) p e
-- [8.0,9.0,5.0,7.0,11.0,13.0]
And then find the product of these we get the smallest number divisible by all numbers between 2 and 15.
smallest = product z
-- 360360.0
And now to get the rest we just need to multiply that by the numbers from 1 to 15.
map round $ take 10 [smallest * x | x <- [1..15]]
-- [360360,720720,1081080,1441440,1801800,2162160,2522520,2882880,3243240,3603600]
This has the advantage of running substantially faster.

Decompose the problem.
You already know how to take the first 10 elements of a list, so set that aside and forget about it. There are infinitely many numbers divisible by all of [2,15], your remaining task is to list them all.
There are infinitely many natural numbers (unconstrained), and you already know how to list them all ([1..]), so your remaining task is to transform that list into the "sub-list" who's elements are divisible by all of [2,15].
You already know how to transform a list into the "sub-list" satisfying some constraint (predicate :: X -> Bool). You're using a list comprehension in your posted code, but I think the rest of this is going to be easier if you use filter instead. Either way, your remaining task is to represent "is divisible by all of [2,15]" as a predicate..
You already know how to check if a number x is divisible by another number y. Now for something new: you want to abstract that as a predicate on x, and you want to parameterize that predicate by y. I'm sure you could get this part on your own if asked:
divisibleBy :: Int -> (Int -> Bool)
divisibleBy y x = 0 == (x `mod` y)
You already know how to represent [2,15] as [2..15]; we can turn that into a list of predicates using fmap divisibleBy. (Or map, worry about that difference tomorrow.) Your remaining task is to turn a list of predicates into a predicate.
You have a couple of options, but you already found all :: (a -> Bool) -> [a] -> Bool, so I'll suggest all ($ x). (note)
Once you've put all these pieces together into something that works, you'll probably be able to boil it back down into something that looks a little bit like what you first wrote.

Primes in Haskell

I'm learning Haskell, and I've tried to generate an infinite list of primes, but I can't understand what my function is doing wrong.
The function:
prime = 2:3:filter (\x -> all (\y -> (mod x y) > 0) (init prime)) [5..]
I think it's the init prime, but the strange thing is that even if I set an upper bound to the range (5..10 for example), the function loops forever and never gets any result for prime !! 2
Can you please tell me what I'm doing wrong?

Well, for one let's look at what init does for a finite list:
init [1] == []
init [1,2] == [1]
init [1,2,3] == [1,2]
ok, so it gives us all but the last element of the list.
So what's init primes? Well, prime without the last element. Hopefully if we implemented prime correctly it shouldn't have a last element (because there are infinitely many primes!), but more importantly we don't quite need to care yet because we don't have the full list for now anyway - we only care about the first couple of elements after all, so for us it's pretty much the same as just prime itself.
Now, looking at all: What does this do? Well, it takes a list and a predicate and tells us if all the elements of the list satisfy the predicate:
all (<5) [1..4] == True
all even [1..4] == False
it even works with infinite lists!
all (<5) [1..] == False
so what's going on here? Well, here's the thing: It does work with infinite lists... but only if we can actually evaluate the list up to the first element of the list that violates the predicate! Let's see if this holds true here:
all (\y -> (mod 5 y) > 0) (init prime)
so to find out if 5 is a prime number, we'd have to check if there's a number in prime minus the last element of prime that divides it. Let's see if we can do that.
Now let's look at the definition of prime, we get
all (\y -> (mod 5 y) > 0) (2:3:filter (\x -> all (\y -> (mod x y) > 0) (init prime)) [5..])
So to determine whether 5 is a prime number, we only have to check if it's:
divisible by 2 - it's not, let's continue
divisible by 3 - still no
divisible by ...? Well, we're in the process of checking what the 3rd prime is so we don't know yet...
and there's the crux of the problem. With this logic, to determine the third prime number you need to know the third prime number! Of course logically, we actually don't want to check this at all, rather we only need to check if any of the smaller prime numbers are divisors of the current candidate.
So how do we go about doing that? Well, we'll have to change our logic unfortunately. One thing we can do is try to remember how many primes we already have, and only take as many as we need for our comparison:
prime = 2 : 3 : morePrimes 2 [5..]
morePrimes n (x:xs)
| all (\y -> mod x y > 0) (take n prime) = x : morePrimes (n+1) xs
| otherwise = morePrimes n xs
so how does this work? Well, it basically does what we were just talking about: We remember how many primes we already have (starting at 2 because we know we have at least [2,3] in n. We then check if our next prime is divisible by any of the of n primes we already know by using take n, and if it is we know it's our next prime and we need to increment n - otherwise we just carry on.
There's also the more well known form inspired by (although not quite the same as) the Sieve of Eratosthenes:
prime = sieve [2..] where
sieve (p:xs) = p : sieve (filter (\x -> mod x p > 0) xs)
so how does this work? Well, again with a similar idea: We know that the next prime number needs to be non-divisible by any previous prime number. So what do we do? Well, starting at 2 we know that the first element in the list is a prime number. We then throw away every number divisible by that prime number using filter. And afterwards, the next item in the list is going to be a prime number again (because we didn't throw it away), so we can repeat the process.
Neither of these are one liners like the one you were hoping for though.

If the code in the other answer is restructured under the identity
[take n primes | n <- [0..]] == inits primes
eventually we get
import Data.List
-- [ ([], 2), ([2], 3), ([2,3], 5), ... ]
primes = 2 : [ c | (ps, p) <- zip (inits primes) primes,
c <- take 1 [c | c <- [p+1..],
and [mod c p > 0 | p <- ps]]]
Further improving it algorithmically, it becomes
primes = 2 : [ c | (ps, r:q:_) <- zip (inits primes) -- [] [3,4,...]
(tails $ 3 : map (^2) primes), -- [2] [4,9,...]
c <- [r..q-1], and [mod c p > 0 | p <- ps]] -- [2,3] [9,25,...]

Get the minimum value

I have a function 'getMin' which should return the minimum value. This function uses another function 'urvalFun' in order to determine this minimim value. Here is a demonstration of 'urvalFun'
get1:: (String,String,Double ) -> String
get1 (x,_,_ ) = x
get2 :: (String,String,Double)-> String
get2 (_,x,_) = x
get3:: (String,String,Double)->Double
get3 (_,_,x) = x
distDiff:: String-> String->[(String,String,Double)] ->Double
distDiff a b diMat = sum [z |(x,y,z)<- diMat, (x == a && y /= b)
|| (y == a && x /= b) ]
urvalFun:: Int -> (String,String,Double)->[(String,String,Double) ]->Double
urvalFun size triple diMat = ((fromIntegral size)-2)*(get3 triple)
- ( (distDiff (get1 triple) (get2 triple) diMat ) + (distDiff (get2 triple)
(get1 triple) diMat ))
Its not necessary to understand this chunck of code, the only thing that is important is that if i evaluate:
urvalFun 3 ("a","b",0.304) [("a","b",0.304),("a","d",0.52),("a","e",0.824)]
this will evaluate to -1.03999
urvalFun 3 ("a","d",0.52) [("a","b",0.304),("a","d",0.52),("a","e",0.824)]
this will evaluate to - 0.60799
urvalFun 3 ("a","e",0.824) [("a","b",0.304),("a","d",0.52),("a","e",0.824)]
this will evaluate to 1.1e^-16
now we know that calling urvalFun with ("a","b",0.304) and [("a","b",0.304),("a","d",0.52),("a","e",0.824)] will get the value which is the smallest. I want to create a function 'getMin' which will return this minimum value (with the same vector as in the above examples as a parameter) ,as shown above. The problem is that it won't work I have no clue why
getMin:: [(String,String,Double)]->Double
getMin diMat = inner 0 diMat 2000
where
inner 3 diMat min = min
inner n diMat min
|current > min = inner (n + 1) (diMatMinus ++ [(head diMat) ]) min
|otherwise = inner (n+1) (diMatMinus ++ [(head diMat) ]) current
current = urvalFun (length diMat) (head diMat) diMat
diMatMinus = tail diMat
try to evaluate for example
getMin [("a","e",0.824),("a","d",0.52),("a","b",0.304)]
which will evaluate to -1.1e^-16
which is not want i intended because I want this to return -1.03999
Could someone help me out here?
(this code is a little bit ad hoc but it is under construction, I am just doing some tests right now)
notice that i have rearranged the vector so that the triple ("a","b",0.304) is the last element in the vector.

First you need a way to get the minimum element to pass to urvalFun. This can be done with minimumBy.
Observe the following
λ> let ls = [("a","d",0.52),("a","e",0.824),("a","b",0.304)]
λ> let min = minimumBy (\(_,_,c) (_,_,c') -> compare c c') ls
λ> urvalFun 3 min ls
-1.0399999999999998
Or maybe this is what you intended:
λ> minimum [ urvalFun 3 x ls | x <- ls]
-1.0399999999999998
If you also want to alter n from 0 to 3 or something, then this can be further modified. I'd suggest stating in english what you want your function to do.

You want to find the value in a given list which minimizes your function.
I'll work with a simpler example than yours: finding the string with minimum length. You'll be able to adapt this to your case by replacing the length function.
> import Data.Foldable
> let list = ["hello", "world!", "here", "is", "fine"]
> minimumBy (comparing length) list
"is"
The above works, but it will call length every time it performs a comparison, i.e. roughly twice the length of the list. It is also possible to avoid this, and use length only once per element, by precomputing it:
> snd $ minimumBy (comparing fst) [ (length x, x) | x <- list ]
We first pair each string with its length, take the minimum length-string pair according to the length only, and finally take only the string in such pair.
By the way, I'd recommend you avoid functions such as get1 to access tuples: they are not idiomatic in Haskell code, which usually exploits pattern matching for the same effect: for instance
urvalFun:: Int -> (String,String,Double)->[(String,String,Double) ]->Double
urvalFun size (s1,s2,d) diMat =
(fromIntegral size - 2)*d - distDiff s1 s2 diMat + distDiff s2 s1 diMat
looks more readable.
head,tail are also particularly dangerous since they are partial: they will crash your program if you ever use them on a empty list, without providing much explanation.
In general, when working with lists, using indexes, or counter variables to count up to the length of the list, is typically not needed and not idiomatic. One can often simply resort to pattern matching and recursion, or even to standard fold/map functions.

Note that for a list of size 3 your getMin function may be written:
getMin' [x,y,z]
= minimum [ urvalFun 3 x [x,y,z]
, urvalFun 3 y [y,z,x]
, urvalFun 3 z [z,x,y]
]
You can create the sequence of arguments to the urvalFun function with zip and and a helper function:
rotate (x:xs) = xs ++ [x]
zip "xyzw" (iterate rotate "xyzw")
= [ ('x', "xyzw"),
('y', "yzwx"),
('z', "zwxy"),
('w', "wxyz") ]
Thus:
import Data.List
getMin tuples = minimum (zipWith go tuples (iterate rotate tuples))
where n = length tuples
go (x,xs) = urvalFun n x xs

Keep getting stack overflow

I am repeatedly getting a stack overflow on my solution to Project Euler #7 and i have no idea why.
Here is my code:
import System.Environment
checkPrime :: Int -> Bool
checkPrime n = not $ testList n [2..n `div` 2]
--testList :: Int -> [Int] -> Bool
testList _ [] = False
testList n xs
| (n `rem` (head xs) == 0) = True
| otherwise = testList n (tail xs)
primesTill n = sum [1 | x <- [2..n], checkPrime x]
nthPrime n = nthPrime' n 2
nthPrime' n x
| (primesTill x == n) = x
| otherwise = nthPrime' n x+1
main = print (nthPrime 10001)

resolving the stackoverflow
As #bheklilr mentioned in his comment the stackoverflow is caused by a wrong evaluation order in the otherwise branch of the nthPrime' function:
nthPrime' n x+1
Will be interpreted as
(nthPrime' n x)+1
Because this expression is called recursively, your call of nthPrime' n 2 will expand into
(nthPrime' n 2)+1+1+1+1+1+1+1+1 ...
but the second parameter will never get incremented and your program collects a mass of unevaluated thunks. The evaluation can only happen if the first parameter is reduced to an Int, but your function is in an endless recursion so this will never take place. All the plus ones are stored on the stack, if there is no more space left you'll get a stackoverflow error.
To solve this problem you need to put parranteses around the x+1 so your recursive call will look like this
nthPrime' n (x+1)
Now the parameters gets incremented before it is passed to the recursive call.
This should solve your stackoverflow problem, you can try it out with a smaller number e.g. 101 and you'll get the desired result.
runtime optimization
If you test your program with the original value 10001 you may realize that it still won't finish in a reasonable amount of time.
I won't go into the details of fancy algorithms to solve this problems, if you're interested in them you can easily find them online.
Instead I'll show you were the problem in your code is and show you a simple solution.
The bottleneck is your nthPrime function:
primesTill n = sum [1 | x <- [2..n], checkPrime x]
nthPrime n = nthPrime' n 2
nthPrime' n x
| (primesTill x == n) = x
| otherwise = nthPrime' n (x+1)
This function checks if the number of primes between 2 and x is equal to n. The idea is correct, but it leads to an exponential runtime. The problem is that you recalculate primesTill x for every iteration. To count the primes smaller than x you calculate them all and than sum them up. In the next step for x+1 you forget every thing you know about the numbers between 2 and x and test them all again if they are prime only as a last step you test the if x+1 is prime. Than you repeat this - forget every thing and test all numbers again - until you are finished.
Wouldn't it be great if the computer could remember the primes it has already found?
There are many possibilities to do this I'll use a simple infinite list, if you are interested in other approaches you can search for the terms memoization or dynamic programming.
We start with the list comprehension you used in primesTill:
[1 | x <- [2..n], checkPrime x]
This calculates all primes between 2 and n, but immediately forgets the prime number and replaces it with 1, so the first step will be to keep the actual numbers.
[x | x <- [2..n], checkPrime x]
This gives us a list of all prime numbers between 2 and n. If we had a sufficiently large list of prime numbers we could use the index function !! to get the 10001st prime number. So we need to set n to a really really big number, to be sure that the filtered list is long enough?
Lazy evaluation to the rescue!
Lazy evaluation in haskell allows us to build an infinite list, that is only evaluated as much as needed. If we don't supply an upper bound to a list generator it will build such an infinite list for us.
[x | x <- [2..], checkPrime x]
Now we have a infinite list of all prime numbers.
We can bind it to the a name e.g. primes and use it to define nthPrime
primes = [x | x <- [2..], checkPrime x]
nthPrime n = primes !! n
Now you can compile it with ghc -O2, run it and the result will be promptly delivered to you.

Iterating a function and analysing the result in haskell

Ok, referring back to my previous question, I am still working on learning haskell and solving the current problem of finding the longest chain from the following iteration:
chain n | n == 0 = error "What are you on about?"
| n == 1 = [1]
| rem n 2 == 0 = n : chain (n `div` 2)
| otherwise = n : chain (3 * n + 1)
I have this bit sorted, but I need to find the longest chain from a starting number below 1,000,000. So how do I make it do each starting number up to 1,000,000 and then print the one with the longest chain length.
I can do it for one example with:
Main> length (chain n)
I assume I need the output as an array and then use the maximum function to find the value largest chain length and then see how far along it is in the array of answers.
Is this a good way to go about finding a solution or is there a better way (perhaps with better efficiency)?

You are right about the maximum part. To get the list (that's what Haskell's []s are, arrays are different structures) you need to use the map higher-order function, like this:
chainLength n = length (chain n)
lengths = map chainLength [1..1000000]
Essentially, map takes as arguments a function and a list. It applies the function to each element in the list and returns the list of the results.
Since you will be needing the number whose chain has that length, you may want change the chainLength function to return the number as well, like this:
chainLength n = (n, length (chain n))
That way you will have an array of pairs, with each number and its chain length.
Now you need to get the pair with the largest second component. That's where the maximumBy function comes in. It works just like maximum but takes a function as a parameter to select how to compare the values. In this case, the second component of the pair. This comparison function takes two numbers and returns a value of type Ordering. This type has only three possible values: LT, EQ, GT, for less than, equal, and greater than, respectively.
So, we need a function that given two pairs tells us how the second components compare to each other:
compareSnd (_, y1) (_, y2) = compare y1 y2
-- Or, if you import Data.Function, you can write it like this (thanks alexey_r):
compareSnd = compare `on` snd -- reads nicely
I used the default compare function that compares numbers (well, not just numbers).
Now we only need to get the maximum using this function:
longestChain = maximumBy compareSnd lengths
That gets you a pair of the number with the longest chain and the corresponding length. Feel free to apply fst and snd as you please.
Note that this could be more much more concisely using zip and composition, but since you tagged the question as newbie, I thought it better to break it down like this.

SPOILER (solving the problem for positive integers under 100):
module Test where
import Data.List -- this contains maximumBy
chain n
| n == 0 = error "What are you on about?"
| n == 1 = [1]
| rem n 2 == 0 = n : chain (n `div` 2)
| otherwise = n : chain (3 * n + 1)
chains = map (\x -> (x,chain x)) [1..100]
cmpSnd (a,b) (c,d)
| length b > length d = GT
| length b == length d = EQ
| otherwise = LT
solve = (fst . maximumBy cmpSnd) chains
The chains function makes use of map. It applies a function to every element of a list of a values, so
map succ [1,2]
is the same as
[succ 1,succ 2]
The cmpSnd function is a comparison function that probably exists somewhere deep in the Hierarchical Libraries, but I could not find it, so I created it. GT means "the first value is greater than the second", the rest is trivial.
Solve takes the maximum (by utilizing the comparison function we defined earlier) of the list. This will be a pair of an integer and a list. It will return the integer only (because of the fst).
A comment: Your chain function is not tail-recursive. This means that large chains will inevitably result in a Stack Overflow. You shall add an explicit accumulator variable and make it tail-recursive.

Something like
fst $ maximumBy (length . snd) $ zip [1..1000000] $ map chain [1..1000000]
(untested)
i.e. don't work out how far along the longest chain is in the list of longest chains, but carry around the seed values with the chains instead.

I studied Haskell years ago, so I don't remember it that well. On the other hand I've tested this code and it works. You will get the max chain and the number that generates it. But as fiships has stated before, it will overflow for big values.
chain :: Int -> [Int]
chain n
| n == 0 = []
| n == 1 = [1]
| rem n 2 == 0 = n : chain (n `div` 2)
| otherwise = n : chain (3 * n + 1)
length_chain :: Int -> Int
length_chain n = length (chain n)
max_pos :: (Int,Int) -> Int -> [Int] -> (Int,Int)
max_pos (m,p) _ [] = (m,p)
max_pos (m,p) a (x:xs)
| x > m = max_pos (x,a) (a+1) xs
| otherwise = max_pos (m,p) (a+1) xs
The instruction will be
Main> max_pos (0,0) 1 (map length_chain [1..10000])
(262,6171)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string