Recursion and parallelism in Haskell - haskell

I'm trying to understand how paralleling in Haskell works and I've found following example in Control.Parallel docs.
import Control.Parallel
-- Equation for the upper hemisphere of the unit circle
circle :: Double -> Double
circle x = sqrt (abs(1 - x^2))
-- Calculate the area of a right-handed Riemann rectangle
area :: Double -> Double -> Double
area x1 x2 = (x2 - x1) * circle x2
-- Recursively add the areas of the Riemann rectangles
parEstimate :: [Double] -> Double
parEstimate (x:[]) = 0
parEstimate (x:y:[]) = area x y
parEstimate (x:y:xs) =
smaller `par` (larger `pseq` smaller + larger)
where smaller = area x y
larger = parEstimate (y:xs)
But I couldn't find an explanation of how this recursion works: parEstimate (x:y:xs), cause all examples I've found contains only two arguments.
That's why I cannot find out how to run this function. That's how I do:
main = print (parEstimate [1.0, 2.0])
but not sure, if it's correct.
Also I would like to implement function calculating definite integral based on this example.

The recursion, essentially, is a simple fold-like recursion scheme; if this were purely sequential, you might write it as
seqEstimate :: [Double] -> Double
seqEstimate (x:[]) = 0
seqEstimate (x:y:[]) = area x y
seqEstimate (x:y:xs) = smaller + larger
where smaller = area x y
larger = seqEstimate (y:xs)
(In fact, you would probably just use zipWith instead: seqEstimate xs = sum (zipWith area xs (tail xs)).)
The parallelized version is similar. This time, though, par is used to indicate that the left-hand side (smaller) can be evaluated in parallel with the right-hand side (pseq larger (smaller + larger)). Whether or not the compiler chooses to do so, and regardless of whether smaller completes before or after larger, the sum of smaller + larger will be correctly computed.

Related

Perplexing behaviour when approximating the derivative in haskell

I have defined a typeclass Differentiable to be implemented by any type which can operate on infinitesimals.
Here is an example:
class Fractional a => Differentiable a where
dif :: (a -> a) -> (a -> a)
difs :: (a -> a) -> [a -> a]
difs = iterate dif
instance Differentiable Double where
dif f x = (f (x + dx) - f(x)) / dx
where dx = 0.000001
func :: Double -> Double
func = exp
I have also defined a simple Double -> Double function to differentiate.
But when I test this in the ghc this happens:
... $ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/ :? for help
Prelude> :l testing
[1 of 1] Compiling Main ( testing.hs, interpreted )
Ok, one module loaded.
*Main> :t func
func :: Double -> Double
*Main> derivatives = difs func
*Main> :t derivatives
derivatives :: [Double -> Double]
*Main> terms = map (\f -> f 0) derivatives
*Main> :t terms
terms :: [Double]
*Main> take 5 terms
[1.0,1.0000004999621837,1.000088900582341,-222.0446049250313,4.440892098500626e8]
*Main>
The approximations to the nth derivative of e^x|x=0 are:
[1.0,1.0000004999621837,1.000088900582341,-222.0446049250313,4.440892098500626e8]
The first and 2nd derivatives are perfectly reasonable approximations given the setup, but suddenly, the third derivative of func at 0 is... -222.0446049250313! HOW!!?
The method you're using here is a finite difference method of 1st-order accuracy.
Layman's translation: it works, but is pretty rubbish numerically speaking. Specifically, because it's only 1st-order accurate, you need those really small steps to get good accuracy even with exact-real-arithmetic. You did choose a small step size so that's fine, but small step size brings in another problem: rounding errors. You need to take the difference f (x+δx) - f x with small δx, meaning the difference is small whereas the individual values may be large. That always brings up the floating-point inaccuracy – consider for example
Prelude> (1 + pi*1e-13) - 1
3.141931159689193e-13
That might not actually hurt that much, but since you then need to divide by δx you boost up the error.
This issue just gets worse/compounded as you go to the higher derivatives, because now each of the f' x and f' (x+δx) has already an (non-identical!) boosted error on it, so taking the difference and boosting again is a clear recipe for disaster.
The simplest way to remediate the problem is to switch to a 2nd-order accurate method, the obvious being central difference. Then you can make the step a lot bigger, and thus largely avoid rounding issues:
Prelude> let dif f x = (f (x + δx) - f(x - δx)) / (2*δx) where δx = 1e-3
Prelude> take 8 $ ($0) <$> iterate dif exp
[1.0,1.0000001666666813,1.0000003333454632,1.0000004990740052,0.9999917560676863,0.9957312752106873,8.673617379884035,7806.255641895632]
You see the first couple of derivatives are good now, but then eventually it also becomes unstable – and this will happen with any FD method as you iterate it. But that's anyway not really a good approach: note that every evaluation of the n-th derivative requires 2 evaluations of the n−1-th. So, the complexity is exponential in the derivative degree.
A better approach to approximate the n-th derivative of an opaque function is to fit an n-th order polynomial to it and differentiate this symbolically/automatically. Or, if the function is not opaque, differentiate itself symbolically/automatically.
tl;dr: the dx denominator gets small exponentially quickly, which means that even small errors in the numerator get blown out of proportion.
Let's do some equational reasoning on the first "bad" approximation, the third derivative.
dif (dif (dif exp))
= { definition of dif }
dif (dif (\x -> (exp (x+dx) - exp x)/dx))
= { definition of dif }
dif (\y -> ((\x -> (exp (x+dx) - exp x)/dx) (y+dx)
- (\x -> (exp (x+dx) - exp x)/dx) y
)/dx)
= { questionable algebra }
dif (\y -> (exp (y + 2*dx) - 2*exp (y + dx) + exp y)/dx^2)
= { alpha }
dif (\x -> (exp (x + 2*dx) - 2*exp (x + dx) + exp x)/dx^2)
= { definition of dif and questionable algebra }
\x -> (exp (x + 3*dx) - 3*exp (x + 2*dx) + 3*exp (x + dx) - exp x)/dx^3
Hopefully by now you can see the pattern we're getting into: as we take more and more derivatives, the error in the numerator gets worse (because we are computing exp farther and farther away from the original point, x + 3*dx is three times as far away e.g.) while the sensitivity to error in the denominator gets higher (because we are computing dx^n for the nth derivative). By the third derivative, these two factors become untenable:
> exp (3*dx) - 3*exp (2*dx) + 3*exp (dx) - exp 0
-4.440892098500626e-16
> dx^3
9.999999999999999e-19
So you can see that, although the error in the numerator is only about 5e-16, the sensitivity to error in the denominator is so high that you start to see nonsensical answers.

Process large lists in Haskell into a single value

I have 2 lists of Int of equal size (roughly 10,000 elements): say x and y. I need to compute a product of the following expression for each corresponding pair of elements from the lists: x_i/(x_i+y_i), i.e. x_i and y_i are the first elements of the lists, then second, etc.
My approaches work fine on small test cases, but ghci hangs for the larger lists. Any insight as to the cause and solution would be appreciated.
I tried to do this with fold, zipping the lists first:
getP:: [Int] -> [Int] -> Double
getP zippedCounts = foldr (\(x,y) acc -> let intX = fromIntegral x
intY = fromIntegral y
intSum = intX + intY in
(acc*(intX/intSum)))
1.0
zippedCounts
I also tried recusion:
getP lst [] = 1.0
getP [] lst = 1.0
getP (h1:t1) (h2:t2) = ((fromIntegral h1) / ((fromIntegral h1) + (fromIntegral h2))) * (getP t1 t2)
As well as list comprehension:
getP lst1 lst2 = (product [((fromIntegral x) / ((fromIntegral x) + (fromIntegral y)))|x <- lst1, y <- lst2])
All three solutions have space leaks, maybe that's what causes the unresponsiveness.
In Haskell, when reducing a big list to a single summary value, it is very easy to inadvertently cause space leaks if we never "look into" the intermediate values of the computation. We can end up with a gigantic tree of unevaluated thunks hiding behind a seemingly inoffensive single Double value.
The foldr example leaks because foldr never forces its accumulator to weak head normal form. Use the strict left foldl' instead (you will need to reorder some function arguments). foldl' should ensure that the intermediate Double values remain "small" and thunks don't accumulate.
The explicit recursion example is dangerous because it is not tail-recursive and for large lists can cause a stack overflow (we repeatedly put values in the stack, waiting for the next recursive call to complete). A solution would involve making the function tail-recursive by passing the intermediate result as an extra parameter, and putting a bang pattern on that parameter to ensure thunks don't accumulate.
The product example leaks because, unfortunately, neither the sum nor the product functions are strict. For large lists, it's better to use foldl' instead. (There's also a bug, as it has been mentioned in the comments.)
You could try zipWith followed by product:
getP :: [Int] -> [Int] -> Double
getP xs ys = product $ zipWith func xs ys
where
func x y = let dx = fromIntegral x
dy = fromIntegral y
in dx / (dx + dy)
I would avoid using explicit recursion and stick to library functions for speed. You could also use certain ghc flags to speed up the compiled code.

How to tell if a number is a square number with recursion?

I solved the following exercise, but I'm not a fan of the solution:
Write the function isPerfectSquare using recursion, to tell if an
Int is a perfectSquare
isPerfectSquare 1 -> Should return True
isPerfectSquare 3 -> Should return False
the num+1 part is for the case for isPerfectSquare 0 and isPerfectSquare 1, one of the parts I don't like one bit, this is my solutiuon:
perfectSquare 0 1 = [0] ++ perfectSquare 1 3
perfectSquare current diff = [current] ++ perfectSquare (current + diff) (diff + 2)
isPerfectSquare num = any (==num) (take (num+1) (perfectSquare 0 1))
What is a more elegant solution to this problem? of course we can't use sqrt, nor floating point operations.
#luqui you mean like this?
pow n = n*n
perfectSquare pRoot pSquare | pow(pRoot) == pSquare = True
| pow(pRoot)>pSquare = perfectSquare (pRoot-1) pSquare
| otherwise = False
--
isPerfectSquare number = perfectSquare number number
I can't believe I didn't see it xD thanks a lot! I must be really tired
You can perform some sort of "binary search" on some implicit list of squares. There is however a problem of course, and that is that we first need an upper bound. We can use as upper bound the number itself, since for all integral squares, the square is larger than the value we square.
So it could look like:
isPerfectSquare n = search 0 n
where search i k | i > k = False
| j2 > n = search i (j-1)
| j2 < n = search (j+1) k
| otherwise = True
where j = div (i+k) 2
j2 = j * j
To verify that a number n is a perfect square, we thus have an algorithm that runs in O(log n) in case the integer operations are done in constant time (for example if the number of bits is fixed).
Wikipedia suggests using Newton's method. Here's how that would look. We'll start with some boilerplate. ensure is a little combinator I've used fairly frequently. It's written to be very general, but I've included a short comment that should be pretty explanatory for how we'll plan to use it.
import Control.Applicative
import Control.Monad
ensure :: Alternative f => (a -> Bool) -> a -> f a
ensure p x = x <$ guard (p x)
-- ensure p x | p x = Just x
-- | otherwise = Nothing
Here's the implementation of the formula given by Wikipedia for taking one step in Newton's method. x is our current guess about the square root, and n is the number we're taking the square root of.
stepApprox :: Integer -> Integer -> Integer
stepApprox x n = (x + n `div` x) `div` 2
Now we can recursively call this stepping function until we get the floor of the square root. Since we're using integer division, the right termination condition is to watch for the next step of the approximation to be equal or one greater to the current step. This is the only recursive function.
iterateStepApprox :: Integer -> Integer -> Integer
iterateStepApprox x n = case x' - x of
0 -> x
1 -> x
_ -> iterateStepApprox x' n
where x' = stepApprox x n
To wrap the whole development up in a nice API, to check if a number is a square we can just check that the floor of its square root squares to it. We also need to pick a starting approximation, but we don't have to be super smart -- Newton's method converges very quickly for square roots. We'll pick half the number (rounded up) as our approximation. To avoid division by zero and other nonsense, we'll make zero and negative numbers special cases.
isqrt :: Integer -> Maybe Integer
isqrt n | n < 0 = Nothing
isqrt 0 = Just 0
isqrt n = ensure (\x -> x*x == n) (iterateStepApprox ((n+1)`div`2) n)
Now we're done! It's pretty fast even for large numbers:
> :set +s
> isqrt (10^10000) == Just (10^5000)
True
(0.58 secs, 182,610,408 bytes)
Yours would spend rather a longer time than the universe has got left computing that. It is also marginally faster than the binary search algorithm in my tests. (Of course, not hand-rolling it yourself is several orders of magnitude faster still, probably in part because it uses a better, but more complicated, algorithm based on Karatsuba multiplication.)
If the function is recursive then it is primitive recursive as are 90% of all recursive functions. For these folds are fast and effective. Considering the programmers time, while keeping things simple and correct is important.
Now, that said, it might be fruitful to cinsider text patterns of functions like sqrt. sqrt return a floating point number. If a number is a perfect square then two characters are ".0" at the end. The pattern might occur, however, at the start of any mantissa. If a string goes in, in reverse, then "0." is at the top of the list.
This function takes a Number and returns a Bool
fps n = (take 2.reverse.show $ (n / (sqrt n))) == "0."
fps 10000.00001
False
fps 10000
True

Unresolved top level overloading

Task is to find all two-valued numbers representable as the sum of the sqrt's of two natural numbers.
I try this:
func = [sqrt (x) + sqrt (y) | x <- [10..99], y <- [10..99], sqrt (x) `mod` 1 == 0, sqrt (y) `mod` 1 == 0]
Result:
Unresolved top-level overloading Binding : func
Outstanding context : (Integral b, Floating b)
How can I fix this?
This happens because of a conflict between these two types:
sqrt :: Floating a => a -> a
mod :: Integral a => a -> a -> a
Because you write mod (sqrt x) 1, and sqrt is constrained to return the same type as it takes, the compiler is left trying to find a type for x that simultaneously satisfies the Floating constraint of sqrt and the Integral constraint of mod. There are no types in the base library that satisfy both constraints.
A quick fix is to use mod' :: Real a => a -> a -> a:
import Data.Fixed
func = [sqrt (x) + sqrt (y) | x <- [10..99], y <- [10..99], sqrt (x) `mod'` 1 == 0, sqrt (y) `mod'` 1 == 0]
However, from the error you posted, it looks like you may not be using GHC, and mod' is probably a GHC-ism. In that case you could copy the definition (and the definition of the helper function div') from here.
But I recommend a more involved fix. The key observation is that if x = sqrt y, then x*x = y, so we can avoid calling sqrt at all. Instead of iterating over numbers and checking if they have a clean sqrt, we can iterate over square roots; their squares will definitely have clean square roots. A straightforward application of this refactoring might look like this:
sqrts = takeWhile (\n -> n*n <= 99)
. dropWhile (\n -> n*n < 10)
$ [0..]
func = [x + y | x <- sqrts, y <- sqrts]
Of course, func is a terrible name (it's not even a function!), and sqrts is a constant we could compute ourselves, and is so short we should probably just inline it. So we might then simplify to:
numberSums = [x + y | x <- [4..9], y <- [4..9]]
At this point, I would be wondering whether I really wanted to write this at all, preferring just
numberSums = [8..18]
which, unlike the previous iteration, doesn't have any duplicates. It has lost all of the explanatory power of why this is an interesting constant, though, so you would definitely want a comment.
-- sums of pairs of numbers, each of whose squares lies in the range [10..99]
numberSums = [8..18]
This would be my final version.
Also, although the above definitions were not parameterized by the range to search for perfect squares in, all the proposed refactorings can be applied when that is a parameter; I leave this as a good exercise for the reader to check that they have understood each change.

Computing the probability of an offspring having at least one dominant allele

I am trying to solve the 'Mendel's First Law' problem on http://rosalind.info/
I have tried several different approaches, but I just can't get my solution to return the same answer as the sample problem on their page. I know their sample output is correct though.
Here is what I have:
traitProb :: Int -> Int -> Int -> Double
traitProb k m n = getProb list
where list = cartProd genotypes genotypes
genotypes = (replicate k Dominant) ++ (replicate m Heterozygous) ++ (replicate n Recessive)
getProb = sum . map ((flip (/)) total . getMultiplier)
total = fromIntegral $ length list
getMultiplier (Dominant, Dominant) = 1.0
getMultiplier (Recessive, Dominant) = 1.0
getMultiplier (Dominant, Recessive) = 1.0
getMultiplier (Dominant, Heterozygous) = 1.0
getMultiplier (Heterozygous, Dominant) = 1.0
getMultiplier (Heterozygous, Heterozygous) = 0.75
getMultiplier (Heterozygous, Recessive) = 0.5
getMultiplier (Recessive, Heterozygous) = 0.5
getMultiplier (Recessive, Recessive) = 0.0
I am not sure whether the code is wrong, or my method of computing the probability is wrong. Essentially the idea is to get a list of all possible parents, and then based on whether they are Homozygous Dominant, Recessive or Heterozygous, compute the probability of each pair of parents producing a child with at least one dominant allele. Then divide each result by the total number of pairs of parents. After that I just sum the list. But my answer is wrong by a little bit.
Can anyone point me in the right direction?
EDIT: cartProd is the 'cartesian product' of the two lists passed to it, if you will.
cartProd :: [a] -> [a] -> [(a, a)]
cartProd xs ys = [ (x, y) | x <- xs, y <- ys ]
I suggest making a slight change in your thinking by doing the calculation in three steps:
What is the probability of getting genotype X for the first parent? (Also, how many different choices are there for X?)
What is the probability of getting genotype Y for the second parent?
Given the genotypes X and Y of the parents, what is the probability of a child displaying the dominant genotype?
Sum steps 1-3 for each (X, Y) pair.
When I drew the tree diagram by hand, I found it easier to calculate the probability of a child NOT having the dominant allele. There are fewer choices to sum and then you can subtract this sum from 1.

Resources