I have written both Gauss Seidel and Conjugate Gradient iterative algorithms for solving matricies in Haskell (but this question is related to the methods not so much the language). My understanding was that both of these algorithms should have similar convergence characteristics and that the CG method should be faster in most cases. I have run many tests on symmetric positive definite matrices from http://math.nist.gov/MatrixMarket/ and I can almost never get the CG alg. to converge, while the GS almost always does. I cannot find any symmetric positive definite matrices with an accompanying right hand side vector for testing purposes online, so I have been just arbitrarily creating my own RHS (maybe this is part of the problem?). I can get the CG method to converge if I use (transpose A) * A instead of A in Ax = b, which is just forcing the matrix to be symmetric. I have included the CG code here. It will obviously not compile as-is. If someone needs it functioning to help, I will post it all. It is working correctly for the simple example here (Similar question) that came from (Pseudocode and example). Is there something I'm missing regarding Conjugate Gradient vs. Gauss Seidel Convergence criteria? Can anyone point me in the right direction to get this working? Thanks.
conjGrad :: (Floating a, Ord a, Show a) => a -> SpMCR a -> SpVCR a -> SpVCR a -> (SpVCR a, Int)
conjGrad tol mA b x0 = loop x0 r0 r0 rs0 1
where r0 = b - (mulMV mA x0)
rs0 = dot r0 r0
loop x r p rs i
| (varLog "residual = " $ sqrt rs') < tol = (x',i)
| otherwise = loop x' r' p' rs' (i+1)
where mAp = mulMV mA p
alpha = rs / (dot p mAp)
x' = x + (alpha .* p)
r' = r - (alpha .* mAp)
rs' = dot r' r'
beta = rs' / rs
p' = r' + (beta .* p)
(.*) :: (Num a) => a -> SpVCR a -> SpVCR a
(.*) s v = fmap (s *) v
EDIT : Sure enough, I failed to account for the fact that the MM file format only includes the lower diagonal of a symmetric matrix. Thanks. Now the algorithm converges but seems to take more iterations than it should. My understanding was that CG should always converge with a number of iterations less than the matrix order, when using exact arithmetic. Would the fact that were working with floating point (Double) make such a big difference (1.5 - 2 x the matrix order being the iterations required to reasonably converge) ?
Follow Up: For anyone who might stumble upon this, it turns out most of my problem was related to the matrices that I was using for the tests. It seems they were rather ill-conditioned for solving using the CG algorithm. Simple preconditioning helped in some cases.
You can answer your second question by using an exact library with floating such as CReal from here: http://hackage.haskell.org/package/numbers or getting rid of your logging (which I think is what introduces the floating constraint) and just using the rationals from Data.Ratio.
This will of course be terribly slow. But it should let you investigate the impact of floating point approximation on convergence.
Related
I'm trying to understand histomorphisms from this blog on recursion schemes. I'm facing a problem when I'm running the example to solve the change making problem as mentioned in the blog.
Change making problem takes the denominations for a currency and tries to find the minimum number of coins required to create a given sum of money. The code below is taken from the blog and should compute the answer.
{-# LANGUAGE DeriveFunctor #-}
module Main where
import Control.Arrow ( (>>>) )
import Data.List ( partition )
import Prelude hiding (lookup)
newtype Term f = In {out :: f (Term f)}
data Attr f a = Attr
{ attribute :: a
, hole :: f (Attr f a)
}
type CVAlgebra f a = f (Attr f a) -> a
histo :: Functor f => CVAlgebra f a -> Term f -> a
histo h = out >>> fmap worker >>> h
where
worker t = Attr (histo h t) (fmap worker (out t))
type Cent = Int
coins :: [Cent]
coins = [50, 25, 10, 5, 1]
data Nat a
= Zero
| Next a
deriving (Functor)
-- Convert from a natural number to its foldable equivalent, and vice versa.
expand :: Int -> Term Nat
expand 0 = In Zero
expand n = In (Next (expand (n - 1)))
compress :: Nat (Attr Nat a) -> Int
compress Zero = 0
compress (Next (Attr _ x)) = 1 + compress x
change :: Cent -> Int
change amt = histo go (expand amt)
where
go :: Nat (Attr Nat Int) -> Int
go Zero = 1
go curr#(Next attr) =
let given = compress curr
validCoins = filter (<= given) coins
remaining = map (given -) validCoins
(zeroes, toProcess) = partition (== 0) remaining
results = sum (map (lookup attr) toProcess)
in length zeroes + results
lookup :: Attr Nat a -> Int -> a
lookup cache 0 = attribute cache
lookup cache n = lookup inner (n - 1) where (Next inner) = hole cache
Now if you evaluate change 10 it will give you 3.
Which is... incorrect because you can make 10 using 1 coin of value 10.
So I considered maybe it's solving the coin change problem, which finds the maximum number of ways in which you can make the given sum of money. For e.g. you can make 10 in 4 ways with { 1, 1, ... 10 times }, { 1, 1, 1, 1, 5}, { 5, 5 }, { 10 }.
So what is wrong with this piece of code? Where is it going wrong in solving the problem?
TLDR
The above piece of code from this blog on recursion schemes is not finding minimum or maximum ways to change a sum of money. Why is it not working?
I put some more thought into encoding this problem with recursion schemes. Maybe there's a good way to solve the unordered problem (i.e., considering 5c + 1c to be different from 1c + 5c) using a histomorphism to cache the undirected recursive calls, but I don't know what it is. Instead, I looked for a way to use recursion schemes to implement the dynamic-programming algorithm, where the search tree is probed in a specific order so that you're sure you never visit any node more than once.
The tool that I used is the hylomorphism, which comes up a bit later in the article series you're reading. It composes an unfold (anamorphism) with a fold (catamorphism). A hylomorphism uses ana to build up an intermediate structure, and then cata to tear it down into a final result. In this case, the intermediate structure I used describes a subproblem. It has two constructors: either the subproblem is solved already, or there is some amount of money left to make change for, and a pool of coin denominations to use:
data ChangePuzzle a = Solved Int
| Pending {spend, forget :: a}
deriving Functor
type Cent = Int
type ChangePuzzleArgs = ([Cent], Cent)
We need a coalgebra that turns a single problem into subproblems:
divide :: Coalgebra ChangePuzzle ChangePuzzleArgs
divide (_, 0) = Solved 1
divide ([], _) = Solved 0
divide (coins#(x:xs), n) | n < 0 = Solved 0
| otherwise = Pending (coins, n - x) (xs, n)
I hope the first three cases are obvious. The last case is the only one with multiple subproblems. We can either use one coin of the first listed denomination, and continue to make change for that smaller amount, or we can leave the amount the same but reduce the list of coin denominations we're willing to use.
The algebra for combining subproblem results is much simpler: we simply add them up.
conquer :: Algebra ChangePuzzle Int
conquer (Solved n) = n
conquer (Pending a b) = a + b
I originally tried to write conquer = sum (with the appropriate Foldable instance), but this is incorrect. We're not summing up the a types in the subproblem; rather, all the interesting values are in the Int field of the Solved constructor, and sum doesn't look at those because they're not of type a.
Finally, we let recursion schemes do the actual recursion for us with a simple hylo call:
waysToMakeChange :: ChangePuzzleArgs -> Int
waysToMakeChange = hylo conquer divide
And we can confirm it works in GHCI:
*Main> waysToMakeChange (coins, 10)
4
*Main> waysToMakeChange (coins, 100)
292
Whether you think this is worth the effort is up to you. Recursion schemes have saved us very little work here, as this problem is easy to solve by hand. But you may find reifying the intermediate states makes the recursive structure explicit, instead of implicit in the call graph. Anyway it's an interesting exercise if you want to practice recursion schemes in preparation for more complicated tasks.
The full, working file is included below for convenience.
{-# LANGUAGE DeriveFunctor #-}
import Control.Arrow ( (>>>), (<<<) )
newtype Term f = In {out :: f (Term f)}
type Algebra f a = f a -> a
type Coalgebra f a = a -> f a
cata :: (Functor f) => Algebra f a -> Term f -> a
cata fn = out >>> fmap (cata fn) >>> fn
ana :: (Functor f) => Coalgebra f a -> a -> Term f
ana f = In <<< fmap (ana f) <<< f
hylo :: Functor f => Algebra f b -> Coalgebra f a -> a -> b
hylo alg coalg = ana coalg >>> cata alg
data ChangePuzzle a = Solved Int
| Pending {spend, forget :: a}
deriving Functor
type Cent = Int
type ChangePuzzleArgs = ([Cent], Cent)
coins :: [Cent]
coins = [50, 25, 10, 5, 1]
divide :: Coalgebra ChangePuzzle ChangePuzzleArgs
divide (_, 0) = Solved 1
divide ([], _) = Solved 0
divide (coins#(x:xs), n) | n < 0 = Solved 0
| otherwise = Pending (coins, n - x) (xs, n)
conquer :: Algebra ChangePuzzle Int
conquer (Solved n) = n
conquer (Pending a b) = a + b
waysToMakeChange :: ChangePuzzleArgs -> Int
waysToMakeChange = hylo conquer divide
The initial confusion with the blog post was because it was pointing to a different problem in the wikipedia link.
Retaking a look at change, it's trying to find the number of "ordered" ways of making change for a given value. This means that the ordering of coins matters. The correct value of change 10 should be 9.
Coming back to the problem, the main issue is with the implementation of the lookup method. The key point to note is that lookup is backwards i.e to calculate the contribution of a denomination to the sum it should be passed as argument to the lookup and not it's difference with the given value.
-- to find contribution of 5 to the number of ways we can
-- change 15. We should pass the cache of 15 and 5 as the
-- parameters. So the cache will be unrolled 5 times to
-- to get the value from cache of 10
lookup :: Attr Nat a -- ^ cache
-> Int -- ^ how much to roll back
-> a
lookup cache 1 = attribute cache
lookup cache n = lookup inner (n - 1) where (Next inner) = hole cache
The complete solution is described in this issue by #howsiwei.
Edit: Base on discussion in the comments this can be solved using histomorphisms but with a few challenges
It can be solved using histomorphisms but the cache and functor types will need to be more complex to hold more state. Namely -
The cache will need to keep a list of permitted denominations for a particular amount this will allow us eliminate overlap
The harder challenge is to come up with a functor that can order all the information. Nat will not be sufficient because it cannot distinguish between different values of a complex cache type.
I see two problems with this program. One of them I know how to fix, but the other apparently requires more knowledge of recursion schemes than I have.
The one I can fix is that it's looking up the wrong values in its cache. When given = 10, of course validCoins = [10,5,1], and so we find (zeroes, toProcess) = ([0], [5,9]). So far so good: we can give a dime directly, or give a nickel and then make change for the remaining five cents, or we can give a penny and change the remaining nine cents. But then when we write lookup 9 attr, we're saying "look 9 steps in history to when curr = 1", where what we meant was "look 1 step into history to when curr = 9". As a result we drastically undercount in pretty much all cases: even change 100 is only 16, while a Google search claims the right result is 292 (I haven't verified this today by implementing it myself).
There are a few equivalent ways to fix this; the smallest diff would be to replace
results = sum (map (lookup attr)) toProcess)
with
results = sum (map (lookup attr . (given -)) toProcess)
The second problem is: the values in the cache are wrong. As I mentioned in a comment on the question, this counts different orderings of the same denominations as separate answers to the question. After I fix the first problem, the lowest input where this second problem manifests is 7, with the incorrect result change 7 = 3. If you try change 100 I don't know how long it takes to compute: much longer than it should, probably a very long time. But even a modest value like change 30 yields a number that's much larger than it should be.
I don't see a way to fix this without a substantial algorithm rework. Traditional dynamic-programming solutions to this problem involve producing the solutions in a specific order so you can avoid double-counting. i.e., they first decide how many dimes to use (here, 0 or 1), then compute how to make change for the remaining amounts without using any dimes. I don't know how to work that idea in here - your cache key would need to be larger, including both the target amount and also the allowed set of coins.
I have defined a typeclass Differentiable to be implemented by any type which can operate on infinitesimals.
Here is an example:
class Fractional a => Differentiable a where
dif :: (a -> a) -> (a -> a)
difs :: (a -> a) -> [a -> a]
difs = iterate dif
instance Differentiable Double where
dif f x = (f (x + dx) - f(x)) / dx
where dx = 0.000001
func :: Double -> Double
func = exp
I have also defined a simple Double -> Double function to differentiate.
But when I test this in the ghc this happens:
... $ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/ :? for help
Prelude> :l testing
[1 of 1] Compiling Main ( testing.hs, interpreted )
Ok, one module loaded.
*Main> :t func
func :: Double -> Double
*Main> derivatives = difs func
*Main> :t derivatives
derivatives :: [Double -> Double]
*Main> terms = map (\f -> f 0) derivatives
*Main> :t terms
terms :: [Double]
*Main> take 5 terms
[1.0,1.0000004999621837,1.000088900582341,-222.0446049250313,4.440892098500626e8]
*Main>
The approximations to the nth derivative of e^x|x=0 are:
[1.0,1.0000004999621837,1.000088900582341,-222.0446049250313,4.440892098500626e8]
The first and 2nd derivatives are perfectly reasonable approximations given the setup, but suddenly, the third derivative of func at 0 is... -222.0446049250313! HOW!!?
The method you're using here is a finite difference method of 1st-order accuracy.
Layman's translation: it works, but is pretty rubbish numerically speaking. Specifically, because it's only 1st-order accurate, you need those really small steps to get good accuracy even with exact-real-arithmetic. You did choose a small step size so that's fine, but small step size brings in another problem: rounding errors. You need to take the difference f (x+δx) - f x with small δx, meaning the difference is small whereas the individual values may be large. That always brings up the floating-point inaccuracy – consider for example
Prelude> (1 + pi*1e-13) - 1
3.141931159689193e-13
That might not actually hurt that much, but since you then need to divide by δx you boost up the error.
This issue just gets worse/compounded as you go to the higher derivatives, because now each of the f' x and f' (x+δx) has already an (non-identical!) boosted error on it, so taking the difference and boosting again is a clear recipe for disaster.
The simplest way to remediate the problem is to switch to a 2nd-order accurate method, the obvious being central difference. Then you can make the step a lot bigger, and thus largely avoid rounding issues:
Prelude> let dif f x = (f (x + δx) - f(x - δx)) / (2*δx) where δx = 1e-3
Prelude> take 8 $ ($0) <$> iterate dif exp
[1.0,1.0000001666666813,1.0000003333454632,1.0000004990740052,0.9999917560676863,0.9957312752106873,8.673617379884035,7806.255641895632]
You see the first couple of derivatives are good now, but then eventually it also becomes unstable – and this will happen with any FD method as you iterate it. But that's anyway not really a good approach: note that every evaluation of the n-th derivative requires 2 evaluations of the n−1-th. So, the complexity is exponential in the derivative degree.
A better approach to approximate the n-th derivative of an opaque function is to fit an n-th order polynomial to it and differentiate this symbolically/automatically. Or, if the function is not opaque, differentiate itself symbolically/automatically.
tl;dr: the dx denominator gets small exponentially quickly, which means that even small errors in the numerator get blown out of proportion.
Let's do some equational reasoning on the first "bad" approximation, the third derivative.
dif (dif (dif exp))
= { definition of dif }
dif (dif (\x -> (exp (x+dx) - exp x)/dx))
= { definition of dif }
dif (\y -> ((\x -> (exp (x+dx) - exp x)/dx) (y+dx)
- (\x -> (exp (x+dx) - exp x)/dx) y
)/dx)
= { questionable algebra }
dif (\y -> (exp (y + 2*dx) - 2*exp (y + dx) + exp y)/dx^2)
= { alpha }
dif (\x -> (exp (x + 2*dx) - 2*exp (x + dx) + exp x)/dx^2)
= { definition of dif and questionable algebra }
\x -> (exp (x + 3*dx) - 3*exp (x + 2*dx) + 3*exp (x + dx) - exp x)/dx^3
Hopefully by now you can see the pattern we're getting into: as we take more and more derivatives, the error in the numerator gets worse (because we are computing exp farther and farther away from the original point, x + 3*dx is three times as far away e.g.) while the sensitivity to error in the denominator gets higher (because we are computing dx^n for the nth derivative). By the third derivative, these two factors become untenable:
> exp (3*dx) - 3*exp (2*dx) + 3*exp (dx) - exp 0
-4.440892098500626e-16
> dx^3
9.999999999999999e-19
So you can see that, although the error in the numerator is only about 5e-16, the sensitivity to error in the denominator is so high that you start to see nonsensical answers.
I am trying to develop a custom math library in Haskell that encodes expressions for a project. I'm wondering whether it is possible to determine (with high accuracy) if a mathematical from n to infinity converges or diverges. I have a sigma sum function available in my library, which I initially thought I could use to determine if a series converges by finding the difference of the sigma sum from some n to say something like 1000 and the sum from the same n to something much higher 100000. However, while I thought this might work at first, I realized this wouldn't work as any constant times a convergent series is also convergent and will wind up increasing the difference in the sums I was using to determine if a series is convergent or not.
This is what I have:
isConverging :: String -> a -> Expr a -> Bool
isConverging x a expr = if abs((eval (Map.fromList [(x, a)]) $ sigma x a (1000) expr) - (eval (Map.fromList [(x, a)]) $ sigma x a (100000) expr)) <= 1.0e-3 then True else False
As a result, I'm wondering if there is a better way to compute if a series likely converges. Any help would be much appreciated.
I'm trying to use Yampa for some basic system simulation like I'd do in Simulink. In this case I want to simulate a spring and damper system, introduced by this simulink tutorial. I've written the following signal functions to represent the system:
system = time >>> force >>> displacement
force = constant (m * g)
displacement = feedback (-) (velocity >>> integral) (gain $ k / m) 0
velocity = feedback (-) integral (gain $ c / m) 0
Where the feedback function creates a basic feedback loop and is implemented like this:
feedback op a b b0 = loopPre b0 inner
where inner = arr (uncurry op) >>> a >>> (identity &&& b)
Oh, and:
gain x = arr (*x)
With sensible positive constants, I get a wildly unstable system:
Is there something obviously wrong in the way I'm constructing feedback loops or applying the integration?
Change integral to imIntegral 0
displacement = feedback (-) (velocity >>> imIntegral 0) (gain $ k / m) 0
velocity = feedback (-) (imIntegral 0) (gain $ c / m) 0
From spring.hs:
Using Simulink:
Something funny is happening in the integral function, changing to imIntegral 0 gives the same curve as in matlab.
My guess is that Integral is delayed by one sample, since it doesn't have a starting value, changing the behaviour of the loop.
I'm trying to generate random masses for hypothetical planets in Haskell. I want to produce these masses by sampling a bi-modal distribution (ideally the superposition of two normal distributions: one corresponding to small planets and one corresponding to gas giants). I've looked at the statistics package, which provides the quantile function, which can turn a uniformly distributed Double into a Double on a number of distributions. But there doesn't seem to be any support for composing distributions.
This particular case could be hacked around by picking one distribution or the other to sample before-hand, but I'd like to do it with a single distribution, especially since I might need to tweak the overall distribution later. Eventually I might replace the normal distribution with real data from sky surveys.
I'm considering implementing rejection sampling myself, which can handle arbitrary distributions fairly simply, but it seems rather inefficient, and it certainly wouldn't be a good idea to implement it if a solution exists already as a library.
Is there a Haskell library that supports sampling from composed or explicitly specified distributions? Or an existing Haskell implementation of rejection sampling? Alternatively, is there an explicit formula for the inverse of the CDF of the sum of two normal distributions?
In the case of a simple mixture of distributions, you can get an efficient sampler via the 'hack' you first mentioned:
This particular case could be hacked around by picking one distribution or the other to sample before-hand, but I'd like to do it with a single distribution, especially since I might need to tweak the overall distribution later.
This is actually a case of Gibbs sampling, which is very prevalent in statistics. It's very flexible, and if you know the number of mixtures you're using, it will probably be hard to beat. Choose one individual distribution from the entire ensemble to sample from, and then sample from that conditional distribution. Rinse and repeat.
Here's a simple, unoptimized Haskell implementation for a mixture-of-Gaussians Gibbs sampler. It's pretty basic, but you get the idea:
import System.Random
import Control.Monad.State
type ModeList = [(Double, Double)] -- A list of mean/stdev pairs, for each mode.
-- Generate a Gaussian (0, 1) variate.
boxMuller :: StdGen -> (Double, StdGen)
boxMuller gen = (sqrt (-2 * log u1) * cos (2 * pi * u2), gen'')
where (u1, gen') = randomR (0, 1) gen
(u2, gen'') = randomR (0, 1) gen'
sampler :: ModeList -> State StdGen Double
sampler modeInfo = do
gen <- get
let n = length modeInfo
(z0, g0) = boxMuller gen
(c, g1) = randomR (0, n - 1) g0 -- Sample from the components.
(cmu, csig) = modeInfo !! c
put g1
return $ cmu + csig * z0 -- Sample from the conditional distribution.
Here's a example run: sampling 100 times from a one-dimensional mixture of two Gaussians. The modes are at x = -3 and x = 2.5, and each mixture component has its own separate variance. You could add as many modes as you want here.
main = do
let gen = mkStdGen 42
modeInfo = [(2.5, 1.0), (-3, 1.5)]
samples = (`evalState` gen) . replicateM 100 $ sampler modeInfo
print samples
Here's a smoothed density plot of those 100 samples (using R and ggplot2):
A more general purpose algorithm would be a rejection or importance sampler, and in the case of more complicated distributions you're probably going to want to hand-roll an appropriate MCMC routine. Here is a good introduction to Monte Carlo and MCMC.
Hmmmm. The best way I'm familiar with is to adapt the MonadRandom package to get a "probability monad", borrowing some tools from http://en.wikipedia.org/wiki/Normal_distribution#Generating_values_from_normal_distribution :
getRandomStrictlyBetween :: (Ord a, Random a, RandomGen m) =>
(a, a) -> a
getRandomStrictlyBetween (lo, hi) = do
x <- getRandomR (lo, hi)
-- x is uniformly randomly chosen from the *closed* interval
if lo < x && x < hi then return x else getRandomStrictlyBetween (lo, hi)
normalValue :: MonadRandom m => m Double
normalValue = do
u <- getRandomStrictlyBetween (0, 1)
v <- getRandomStrictlyBetween (0, 2 * pi)
return (sqrt (-2 * log u) * cos v) -- according to Wikipedia
and then you can derive more or less arbitrary distributions; for example, to get the distribution of a random variable that is y with probability p and z with probability (1 - p), you just write
do alpha <- getRandom -- double chosen from [0, 1)
if alpha < p then y else z
of which bimodal distributions appear to be a special case. To sample from these distributions, just do evalRandIO distribution to sample in the IO monad.