Averaging ages across a list with foldl (Haskell) - haskell

I am doing some arbitrary operations in Haskell as I learn, and have been playing with a list of animals with certain properties, including age.
This is my script:
module Animals where
data Animal = CatThing String Int
| DogThing String Int
deriving Show
animalList :: [Animal]
animalList = [CatThing "Spot" 2, DogThing "Rex" 5]
-- write a function that returns the string component given an animal
getName :: Animal -> String
getName (CatThing name _) = name
getName (DogThing name _) = name
-- get the age of an animal (uses "map")
getAge :: Animal -> Int
getAge (CatThing _ age) = age
getAge (DogThing _ age) = age
-- sum age
sumAge :: Int -> [Int] -> Int
sumAge _ [b, c] = foldl (+) 0 [b, c]
-- average age
???
I am stuck on how to sum using foldl'. I know there is a sum function built in, but I am really trying to practice folds, so am trying to do it that way.
Does anyone have suggestions on how to proceed?

The code for your sum looks fine, I'd use foldl' instead of foldl so you don't risk a stack overflow, and also change that [b,c] pattern to a generic variable or even better point free so it looks better and it's also more general:
sumAge :: [Double] -> [Double]
sumAge = foldl' (+) 0
As for the average, you just sum and divide by the length:
averageAge :: [Double] -> Double
averageAge ls = sumAge ls / length ls
PS. In case your ages are integer, then the first function still works, but the average need to change:
averageInt :: [Int] -> Double
averageInt ls = (fromInteger . sum) ls / (fromInteger . length) ls

TL;DR version
Sum: sumAges animals = foldl (\age animal -> age + (getAge animal)) 0 animals
Average:
import Data.Sequence(foldlWithIndex, fromList)
average numbers = foldlWithIndex (\a i x -> let k = fromIntegral i in (k*a + x) / (k + 1)) 0 . fromList $ numbers
Long version
If you have an interest in math, it may help to understand the design of fold functions as equivalent to discovering sequence formulas by induction.
Sum
For sum, since you have s[i+1] = s[i] + x[i+1], you can simply use addition like you did, although you may have to convert before you add:
sumAges :: [Animal] -> Int
sumAges animals = foldl (\age animal -> age + (getAge animal)) 0 animals
sumAgesPointFree :: [Animal] -> Int
sumAgesPointFree = foldl (flip $ (+) . getAge) 0
Average
For example, one way to calculate the average of a list using a single fold function is to use a recursive mathematical version of calculating the rolling average of a sequence: m[i+1] = (i * m[i] + x[i+1]) / (i + 1). You can see this in how you calculate the average of lists of varying sizes:
{-
Not Haskell, just attempting mathematical notation without knowing MathML in Markdown.
m: mean or average
x: an element of a list or sequence
[]: subscript
-}
m[1] = x[1]
m[2] = (x[1] + x[2]) / 2 = (m[1] + x[2]) / 2 -- m[1] = x[1], so substitute
m[3] = (x[1] + x[2] + x[3]) / 3 -- (a+b)/n = a/n + b/n, so distribute
= (x[1] + x[2]) / 3 + x[3] / 3 -- a = n/n * a, n not in {0, Infinity}
= 2/2 * (x[1] + x[2]) / 3 + x[3] / 3 -- n/n * 1/a = n/a * 1/n
= 2/3 * (x[1] + x[2]) / 2 + x[3] / 3 -- m[2] = (x[1] + x[2])/2, so substitute
= 2/3 * m[2] + x[3] / 3
= 2*m[2] / 3 + x[3] / 3
= (2*m[2] + x[3]) / 3
...
m[i+1] = (i * m[i] + x[i+1]) / (i+1)
However, since this function would require the element index as a parameter, due to the List structure's lack of (convenient) indexing, the Sequence type from the Data.Sequence module may work out better than a List, especially considering the Data.Sequence module has this really nice foldlWithIndex function:
module Average(average) where
import Data.Sequence(foldlWithIndex, fromList)
average :: Fractional a => [a] -> a
average = foldlWithIndex averageByPrevious 0 . fromList
where averageByPrevious previous index current = (coefficient*previous + current) / (coefficient + 1)
where coefficient = fromIntegral index
Then you can simply run average list where list is some list you want to find the rolling average of. This is one way to calculate the average of a list using a single fold without adding a large performance overhead as you would by running multiple O(n) functions over the same list, even considering laziness as a benefit to the performance of multiple calls.
NOTE: I will admit, this is not easy to read, so average xs = (sum xs) / (length xs) as #Lorenzo said will work much better if legibility is more important than performance here.

Related

Summing a finite prefix of an infinite series

The number π can be calculated with the following infinite series sum:
I want to define a Haskell function roughlyPI that, given a natural number k, calculates the series sum from 0 to the k value.
Example: roughlyPi 1000 (or whatever) => 3.1415926535897922
What I did was this (in VS Code):
roughlyPI :: Double -> Double
roughlyPI 0 = 2
roughlyPI n = e1/e2 + (roughlyPI (n-1))
where
e1 = 2**(n+1)*(factorial n)**2
e2 = factorial (2*n +1)
factorial 0 = 1
factorial n = n * factorial (n-1)
but it doesn't really work....
*Main> roughlyPI 100
NaN
I don't know what's wrong. I'm new to Haskell, by the way.
All I really want is to be able to type in a number that will give me PI at the end. It can't be that hard...
As mentioned in the comments, we need to avoid large divisions and instead intersperse smaller divisions within the factorials. We use Double for representing PI but even Double has its limits. For instance 1 / 0 == Infinity and (1 / 0) / (1 / 0) == Infinity / Infinity == NaN.
Luckily, we can use algebra to simplify the formula and hopefully delay the blowup of our Doubles. By dividing within our factorial the numbers don't grow too unwieldy too quickly.
This solution will calculate roughlyPI 1000, but it fails on 1023 with NaN because 2 ^ 1024 :: Double == Infinity. Note how each iteration of fac has a division as well as a multiplication to help keep the numbers from blowing up. If you are trying to approximate PI with a computer, I believe there are better algorithms, but I tried to keep it as conceptually close to your attempt as possible.
roughlyPI :: Integer -> Double
roughlyPI 0 = 2
roughlyPI k = e + roughlyPI (k - 1)
where
k' = fromIntegral k
e = 2 ** (k' + 1) * fac k / (2 * k' + 1)
where
fac 1 = 1 / (k' + 1)
fac p = (fromIntegral p / (k' + fromIntegral p)) * fac (p - 1)
We can do better than having a blowup of Double after 1000 by doing computations with Rationals then converting to Double with realToFrac (credit to #leftaroundabout):
roughlyPI' :: Integer -> Double
roughlyPI' = realToFrac . go
where
go 0 = 2
go k = e + go (k - 1)
where
e = 2 ^ (k + 1) * fac k / (2 * fromIntegral k + 1)
where
fac 1 = 1 % (k + 1)
fac p = (p % (k + p)) * fac (p - 1)
For further reference see Wikipedia page on approximations of PI
P.S. Sorry for the bulky equations, stackoverflow does not support LaTex
First note that your code actually works:
*Main> roughlyPI 91
3.1415926535897922
The problem, as was already said, is that when you try to make the approximation better, the factorial terms become too big to be representable in double-precision floats. The simplest – albeit somewhat brute-force – way to fix that is to do all the computation in rational arithmetic instead. Because numerical operations in Haskell are polymorphic, this works with almost the same code as you have, only the ** operator can't be used since that allows fractional exponents (which are in general irrational). Instead, you should use integer exponents, which is anyway the conceptually right thing. That requires a few fromIntegral:
roughlyPI :: Integer -> Rational
roughlyPI 0 = 2
roughlyPI n = e1/e2 + (roughlyPI (n-1))
where
e1 = 2^(n+1)*fromIntegral (factorial n^2)
e2 = fromIntegral . factorial $ 2*n + 1
factorial 0 = 1
factorial n = n * factorial (n-1)
This now works also for much higher degrees of approximation, although it takes a long time to carry around the giant fractions involved:
*Main> realToFrac $ roughlyPI 1000
3.141592653589793
The way to go in such cases is to calculate the ratio of consecutive terms and calculate the terms by rolling multiplications of the ratios:
-- 1. -------------
pi1 n = Sum { k = 0 .. n } T(k)
where
T(k) = 2^(k+1)(k!)^2 / (2k+1)!
-- 2. -------------
ts2 = [ 2^(k+1)*(k!)^2 / (2k+1)! | k <- [0..] ]
pis2 = scanl1 (+) ts2
pi2 n = pis2 !! n
-- 3. -------------
T(k) = 2^(k+1)(k!)^2 / (2k+1)!
T(k+1) = 2^(k+2)((k+1)!)^2 / (2(k+1)+1)!
= T(k) 2 (k+1)^2 / (2k+2) (2k+3)
= T(k) (k+1)^2 / ( k+1) (2k+3)
= T(k) (k+1) / (k+1 + k+2)
= T(k) / (1 + (k+2)/(k+1))
= T(k) / (2 + 1 /(k+1))
-- 4. -------------
ts4 = scanl (/) 2 [ 2 + 1/(k+1) | k <- [0..]] :: [Double]
pis4 = scanl1 (+) ts4
pi4 n = pis4 !! n
This way we share and reuse the calculations as much as possible. This leads to the most efficient code, hopefully leading to the smallest cumulative numerical error. The formula also turned out to be exceptionally simple, and could even be simplified further as ts5 = scanl (/) 2 [ 2 + recip k | k <- [1..]].
Trying it out:
> pis2 = scanl1 (+) $ [ fromIntegral (2^(k+1))*fromIntegral (product[1..k])^2 /
fromIntegral (product[1..(2*k+1)]) | k <- [0..] ] :: [Double]
> take 8 $ drop 30 pis2
[3.1415926533011587,3.141592653447635,3.141592653519746,3.1415926535552634,
3.141592653572765,3.1415926535813923,3.141592653585647,3.141592653587746]
> take 8 $ drop 90 pis2
[3.1415926535897922,3.1415926535897922,NaN,NaN,NaN,NaN,NaN,NaN]
> take 8 $ drop 30 pis4
[3.1415926533011587,3.141592653447635,3.141592653519746,3.1415926535552634,
3.141592653572765,3.1415926535813923,3.141592653585647,3.141592653587746]
> take 8 $ drop 90 pis4
[3.1415926535897922,3.1415926535897922,3.1415926535897922,3.1415926535897922,
3.1415926535897922,3.1415926535897922,3.1415926535897922,3.1415926535897922]
> pis4 !! 1000
3.1415926535897922

Solve the equation a * b = c, where a, b and c are natural numbers

I have some natural number c. I want to find all pairs of natural numbers a and b, where a < b, such as a * b = c.
I have a solution:
solve c = do solveHelper [1..c] c where
solveHelper xs c = do
x <- xs
(division, modulo ) <- return (c `divMod` x)
True <- return (modulo == 0)
True <- return (x <= division)
return (x, division)
Example:
*Main> solve 10
[(1,10),(2,5)]
Is there a way to accelerate my code, or a better algorithm I should use?
You can do much, much better. The basic idea is this: first, factorize the number; then enumerate the partitions of the factorization. The product of each partition is a solution. There are fast factorization algorithms out there, but even the naive one is quite an improvement on your code; so:
factorize :: Integer -> [Integer]
factorize n
| n < 1 = error "no. =("
| otherwise = go 2 n
where
go p n | p * p > n = [n]
go p n = case quotRem n p of
(q, 0) -> p:go p q
_ -> go (p+1) n
I will use the very nice multiset-comb package to compute partitions of the set of factors. It doesn't support the usual Foldable/Traversable stuff out of the box, so we have to roll our own product operation -- but in fact this can be a bit more efficient than using the product that the standard interface would give us anyway.
import Math.Combinatorics.Multiset
productMS :: Multiset Integer -> Integer
productMS (MS cs) = product [n^p | (n, p) <- cs]
divisors :: Integer -> [(Integer, Integer)]
divisors n =
[ (a, b)
| (aMS, bMS) <- splits (fromList (factorize n))
, let a = productMS aMS; b = productMS bMS
, a <= b
]
For unfair timings, we can compare in ghci:
*Main> :set +s
*Main> length $ solve (product [1..10])
135
(3.55 secs, 2,884,836,952 bytes)
*Main> length $ divisors (product [1..10])
135
(0.00 secs, 4,612,104 bytes)
*Main> length $ solve (product [1..15])
^CInterrupted. [after several minutes, I gave up]
*Main> length $ divisors (product [1..15])
2016
(0.03 secs, 33,823,168 bytes)
Here solve is your solution, divisors is mine. For a fair comparison, we should compile; I used this program:
main = print . last . solve . product $ [1..11]
(And similar with divisors in place of solve.) I compiled with -O2; yours used 1.367s total, mine 0.002s total.
There's one optimization you don't use: you don't have to try every value from 0 to c.
a < b and a * b = c, so a * a < c, meaning you only have to try numbers from 0 to sqrt c. Or, if you don't want to compute the square root of c, you can stop as soon as a * a >= c.
To do so, you can replace [1..c] by (takeWhile (\x -> x * x < c) [1..]).

Numerical issue with `foldl` and `foldr` in Haskell

I have the following Haskell script which computes the function f(x) = (2- x) - (2^3 - x^3/12)
calc x = (x - (x ^ 3) / 12)
calc2 x = (calc 2) - (calc x)
calcList1 :: [Float] -> Float
calcList1 l = foldl (+) 0.0 (map calc2 l)
calcList2 :: [Float] -> Float
calcList2 l = foldr (+) 0.0 (map calc2 l)
test1 :: Float -> Float
test1 step = (calcList1 l) - (calcList2 l)
where
l = [0.0,step..2.0]
Function calcList1 and calcList2 run calc2 function on each of list and then uses foldl and foldr respectively to sum the list. I was expecting both function to return the same answer but it does not.
*Main> test1 0.1
9.536743e-7
*Main> test1 0.01
2.2888184e-5
*Main> test1 0.001
2.4414063e-4
*Main> test1 0.0001
-3.7109375e-2
*Main>
Now I am confused. I can't see why numerical issues has to be involved here. Fold are essentially how ones collect each element which should be same in both cases, right?
In general, the order in which floating point values are added is important. An entry point for own research could be http://en.wikipedia.org/wiki/Loss_of_significance . To summarize the basic caveat, in an oversimplified form:
Due to the limited number of significant bits, you have to assume something like
100000000000000000.0 + 1.0 = 100000000000000000.0
in floating-point computations. Consequently, when computing
100000000000000000.0
+ 1.0
- 100000000000000000.0
the result will be 0.0 - and thus, be different from
100000000000000000.0
- 100000000000000000.0
+ 1.0
where the result will be 1.0.

Haskell reverse Integer with recursion

I want to reverse an Integer in Haskell with recursion. I have a small issue.
Here is the code :
reverseInt :: Integer -> Integer
reverseInt n
| n>0 = (mod n 10)*10 + reverseInt(div n 10)
| otherwise = 0
Example 345
I use as input 345 and I want to output 543
In my program it will do....
reverseInt 345
345>0
mod 345 10 -> 5
reverseInt 34
34
34>0
mod 34 10 -> 4
reverseInt 3
3>0
mod 3 10 -> 3
reverseInt 0
0=0 (ends)
And at the end it returns the sum of them... 5+4+3 = 12.
So I want each time before it sums them, to multiple the sum * 10. So it will go...
5
5*10 + 4
54*10 + 3
543
Here's a relatively simple one:
reverseInt :: Int -> Int
reverseInt 0 = 0
reverseInt n = firstDigit + 10 * (reverseInt $ n - firstDigit * 10^place)
where
n' = fromIntegral n
place = (floor . logBase 10) n'
firstDigit = n `div` 10^place
Basically,
You take the logBase 10 of your input integer, to give you in what place it is (10s, 100s, 1000s...)
Because the previous calculation gives you a floating point number, of which we do not need the decimals, we use the floor function to truncate everything after the decimal.
We determine the first digit of the number by doing n 'div' 10^place. For example, if we had 543, we'd find place to be 2, so firstDigit = 543/100 = 5 (integer division)
We use this value, and add it to 10 * the reverse of the 'rest' of the integer, in this case, 43.
Edit: Perhaps an even more concise and understandable version might be:
reverseInt :: Int -> Int
reverseInt 0 = 0
reverseInt n = mod n 10 * 10^place + reverseInt (div n 10)
where
n' = fromIntegral n
place = (floor . logBase 10) n'
This time, instead of recursing through the first digit, we're recursing through the last one and using place to give it the right number of zeroes.
reverseInt :: Integer -> Integer
reverseInt n = snd $ rev n
where
rev x
| x>0 = let (a,b) = rev(div x 10)
in ((a*10), (mod x 10)*a + b)
| otherwise = (1,0)
Explanation left to reader :)
I don't know convenient way to found how many times you should multiply (mod n 10) on 10 in your 3rd line. I like solution with unfoldr more:
import Data.List
listify = unfoldr (\ x -> case x of
_ | x <= 0 -> Nothing
_ -> Just(mod x 10, div x 10) )
reverse_n n = foldl (\ acc x -> acc*10+x) 0 (listify n)
In listify function we generate list of numbers from integer in reverse order and after that we build result simple folding a list.
Or just convert it to a string, reverse it and convert it back to an integer:
reverseInt :: Integer -> Integer
reverseInt = read . reverse . show
More (not necessarily recursion based) answers for great good!
reverseInt 0 = 0
reverseInt x = foldl (\x y -> 10*x + y) 0 $ numToList x
where
numToList x = if x == 0 then [] else (x `rem` 10) : numToList (x `div` 10)
This is basically the concatenation of two functions : numToList (convert a given integer to a list 123 -> [1,2,3]) and listToNum (do the opposite).
The numToList function works by repeatedly getting the lowest unit of the number (using rem, Haskell's remainder function), and then chops it off (using div, Haskell's integer division function). Once the number is 0, the empty list is returned and the result concatenates into the final list. Keep in mind that this list is in reverse order!
The listToNum function (not seen) is quite a sexy piece of code:
foldl (\x y -> 10*x + y) 0 xs
This starts from the left and moves to the right, multiplying the current value at each step by 10 and then adding the next number to it.
I know the answer has already been given, but it's always nice to see alternative solutions :)
The first function is recursive to convert the integer to a list. It was originally reversing but the re-conversion function reversed easier so I took it out of the first. The functions can be run separately. The first outputs a tuple pair. The second takes a tuple pair. The second is not recursive nor did it need to be.
di 0 ls = (ls,sum ls); di n ls = di nn $ d:ls where (nn,d) = divMod n 10
di 3456789 []
([3,4,5,6,7,8,9],42)
rec (ls,n) = (sum [y*(10^x)|(x,y) <- zip [0..] ls ],n)
Run both as
rec $ di 3456789 []
(9876543,42)

Fibonacci's Closed-form expression in Haskell

How would the Fibonacci's closed form code look like in haskell?
Here's a straightforward translation of the formula to Haskell:
fib n = round $ (phi^n - (1 - phi)^n) / sqrt 5
where phi = (1 + sqrt 5) / 2
This gives correct values only up to n = 75, because it uses Double precision floating-point arithmetic.
However, we can avoid floating-point arithmetic by working with numbers of the form a + b * sqrt 5! Let's make a data type for them:
data Ext = Ext !Integer !Integer
deriving (Eq, Show)
instance Num Ext where
fromInteger a = Ext a 0
negate (Ext a b) = Ext (-a) (-b)
(Ext a b) + (Ext c d) = Ext (a+c) (b+d)
(Ext a b) * (Ext c d) = Ext (a*c + 5*b*d) (a*d + b*c) -- easy to work out on paper
-- remaining instance methods are not needed
We get exponentiation for free since it is implemented in terms of the Num methods. Now, we have to rearrange the formula slightly to use this.
fib n = divide $ twoPhi^n - (2-twoPhi)^n
where twoPhi = Ext 1 1
divide (Ext 0 b) = b `div` 2^n -- effectively divides by 2^n * sqrt 5
This gives an exact answer.
Daniel Fischer points out that we can use the formula phi^n = fib(n-1) + fib(n)*phi and work with numbers of the form a + b * phi (i.e. ℤ[φ]). This avoids the clumsy division step, and uses only one exponentiation. This gives a much nicer implementation:
data ZPhi = ZPhi !Integer !Integer
deriving (Eq, Show)
instance Num ZPhi where
fromInteger n = ZPhi n 0
negate (ZPhi a b) = ZPhi (-a) (-b)
(ZPhi a b) + (ZPhi c d) = ZPhi (a+c) (b+d)
(ZPhi a b) * (ZPhi c d) = ZPhi (a*c+b*d) (a*d+b*c+b*d)
fib n = let ZPhi _ x = phi^n in x
where phi = ZPhi 0 1
Trivially, Binet's formula, from the Haskell wiki page is given in Haskell as:
fib n = round $ phi ^ n / sq5
where
sq5 = sqrt 5
phi = (1 + sq5) / 2
Which includes sharing of the result of the square root. For example:
*Main> fib 1000
4346655768693891486263750038675
5014010958388901725051132915256
4761122929200525397202952340604
5745805780073202508613097599871
6977051839168242483814062805283
3118210513272735180508820756626
59534523370463746326528
For arbitrary integers, you'll need to be a bit more careful about the conversion to floating point values.
Note that Binet's value differs from the recursive formula by quite a bit at this point:
*Main> let fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
*Main> fibs !! 1000
4346655768693745643568852767504
0625802564660517371780402481729
0895365554179490518904038798400
7925516929592259308032263477520
9689623239873322471161642996440
9065331879382989696499285160037
04476137795166849228875
You may need more precision :-)

Resources