Haskell insights - haskell

I have found myself in a dire need of your insights.
Here's my object of interest:
class Mergable m where
merge :: m -> m -> Maybe m
mergeList :: [m] -> [m]
mergeList [] = []
mergeList [x] = [x]
mergeList (x:y:t) = r1 ++ mergeList (r2 ++ t)
where
(r1,r2) = case (x `merge` y) of
Just m -> ([ ], [m])
Nothing -> ([x], [y])
But I'll come back to it later. For now I prepared some examples:
data AffineTransform = Identity
| Translation Float Float
| Rotation Float
| Scaling Float Float
| Affine Matrix3x3
instance Monoid AffineTransform where
mempty = Identity
Identity `mappend` x = x
x `mappend` Identity = x
(Translation dx1 dy1) `mappend` (Translation dx2 dy2) = Translation (dx1+dx2) (dy1+dy2)
(Rotation theta1) `mappend` (Rotation theta2) = Rotation (theta1+theta2)
(Scaling sx1 sy1) `mappend` (Scaling sx2 sy2) = Scaling (sx1*sx2) (sy1*sy2)
-- last resort: compose transforms from different subgroups
-- using an "expensive" matrix multiplication
x `mappend` y = Affine (toMatrix x `mult3x3` toMatrix y)
So now I can do:
toMatrix $ Rotation theta1 `mappend` Translation dx1 dy1 `mappend` Translation dx2 dy2 `mappend` Rotation theta2
or more briefly:
(toMatrix . mconcat) [Rotation theta1, Translation dx1 dy1, Translation dx2 dy2, Rotation theta2]
or more generally:
(toMatrix . (fold[r|r'|l|l'] mappend)) [Rotatio...], etc
In the above examples the first rotation and translation will be combined (expensively) to a matrix; then, that matrix combined with translation (also using multiplication) and then once again a multiplication will be used to produce the final result, even though (due to associativity) two translations in the middle could be combined cheaply for a total of two multiplications instead of three.
Anyhow, along comes my Mergable class to the rescue:
instance Mergable AffineTransform where
x `merge` Identity = Just x
Identity `merge` x = Just x
x#(Translation _ _) `merge` y#(Translation _ _) = Just $ x `mappend` y
x#(Rotation _) `merge` y#(Rotation _) = Just $ x `mappend` y
x#(Scaling _ _) `merge` y#(Scaling _ _) = Just $ x `mappend` y
_ `merge` _ = Nothing
so now (toMatrix . mconcat . mergeList) ~ (toMatrix . mconcat), as it should:
mergeList [Rotation theta1, Translation dx1 dy1, Translation dx2 dy2, Rotation theta2] == [Rotation theta1, Translation (dx1+dx2) (dy1+dy2), Rotation theta2]
Other examples I have in mind are more involved (code-wise) so I will just state the ideas.
Let's say I have some
data Message = ...
and a
dispatch :: [Message] -> IO a
where dispatch takes a message from the list, depending on it's type opens an appropriate channel (file, stream, etc), writes that message, closes the channel and continues with next message. So if opening and closing channels is an "expensive" operation, simply composing (dispatch . mergeList) can help improve performance with minimal effort.
Other times i have used it to handle events in gui applications like merging mousemoves, key presses, commands in an undo-redo system, etc.
The general pattern is that i take two items from the list, check if they are "mergeable" in some way and if so try to merge the result with the next item in the list or otherwise I leave the first item as it were and continue with the next pair (now that i think of it's a bit like generalized run length encoding)
My problem is that I can't shake the feeling that I'm reinventing the wheel and there has to be a similar structure in haskell that i could use. If that's not the case then:
1) How do I generalize it to other containers other than lists?
2) Can you spot any other structures Mergable is an instance of? (particularly Arrows if applicable, i have trouble wrapping my head around them)
3) Any insights on how strict/lazy should mergeList be and how to present it to user?
4) Optimization tips? Stackoverflow? Anything else?
Thanks!

I don't think there is anything like this already in a library. Hoogle and Hayoo don't turn up anything suitable.
Mergeable (I think it's spelt that way) looks like a generalisation of Monoid. Not an Arrow, sorry.
Sometimes you need to merge preserving order. Sometimes you don't need to preserve order when you merge.
I might do something like
newtype MergedInOrder a = MergedInOrder [a] -- without exporting the constructor
mergeInOrder :: Mergeable a => [a] -> MergedInOrder a
mergeInOrder = MergedInOrder . foldr f []
where f x [] = [x]
f x xs # (y : ys) = case merge x y of
Just z -> z : ys
Nothing -> x : xs
and similar newtypes for unordered lists, that take advantage of and do not require an Ord instance, respectively.
These newtypes have obvious Monoid instances.
I don't think we can write code to merge arbitrary containers of Mergeables, I think it would have to be done explicitly for each container.

Here was my first thought. Notice "deriving Ord". Otherwise this first section is almost exactly the same as some of the code you presented:
import Data.Monoid
import Data.List
data AffineTransform = Identity
| Translation Float Float
| Rotation Float
| Scaling Float Float
| Affine Matrix3x3
deriving (Eq, Show, Ord)
-- some dummy definitions to satisfy the typechecker
data Matrix3x3 = Matrix3x3
deriving (Eq, Show, Ord)
toMatrix :: AffineTransform -> Matrix3x3
toMatrix _ = Matrix3x3
mult3x3 :: Matrix3x3 -> Matrix3x3 -> Matrix3x3
mult3x3 _ _ = Matrix3x3
instance Monoid AffineTransform where
mempty = Identity
Identity `mappend` x = x
x `mappend` Identity = x
(Translation dx1 dy1) `mappend` (Translation dx2 dy2) =
Translation (dx1+dx2) (dy1+dy2)
(Rotation theta1) `mappend` (Rotation theta2) = Rotation (theta1+theta2)
(Scaling sx1 sy1) `mappend` (Scaling sx2 sy2) = Scaling (sx1*sx2) (sy1*sy2)
-- last resort: compose transforms from different subgroups
-- using an "expensive" matrix multiplication
x `mappend` y = Affine (toMatrix x `mult3x3` toMatrix y)
And now, the kicker:
mergeList :: [AffineTransform] -> [AffineTransform]
mergeList = map mconcat . groupBy sameConstructor . sort
where sameConstructor Identity Identity = True
sameConstructor (Translation _ _) (Translation _ _) = True
sameConstructor (Rotation _) (Rotation _) = True
sameConstructor (Scaling _ _) (Scaling _ _) = True
sameConstructor (Affine _) (Affine _) = True
sameConstructor _ _ = False
Assuming that translations, rotations, and scalings are orthagonal, why not reorder the list and group up all of those same operations together? (Is that a bad assumption?) That is the Haskell pattern that I saw: the good ol' group . sort trick. If you really want, you could pull sameConstructor out of mergeList:
mergeList :: (Monoid a, Ord a) => (a -> a -> Bool) -> [a] -> [a]
mergeList f = map mconcat . groupBy f . sort
P.S. if that was a bad assumption, then you could still do something like
mergeList = map mconcat . groupBy canMerge
But it seems to me that there is unusual overlap between merge and mappend the way you defined them.

Related

Generic transformations on a set of a given datatype

If I have a datatype representing a subset of propositional logic such as
data Prop = Lit String
| Neg Prop
| And Prop Prop
| Or Prop Prop
Are there then easy ways to do generic transformations on [[Prop]]? E.g.
replace [[And a b, c]] with [[a, b, c]]
replace [[Or a b, c]] with [[a], [b], [c]], or
removing occurrences of sublists containing both Neg a and a, e.g. turning [[Neg a, x, a], [b]] into [[b]]
This feels like something close to what e.g. uniplate does, but “two levels up”.
I assume that your second rule is wrong, and you really meant to say either:
replace [[Or a b],[c]] with [[a],[b],[c]]
or else:
replace [[Or a b, c]] with [[a,c],[b,c]]
In other words, I assume you're trying to convert a Prop into an alternate representation [[Prop]] where the first-level list is an "or" and the second-level lists are "and"s, with all terms being either literals or Neg-literals. So, you're trying to imagine how you could apply a bunch of generic structural rules to make transformations like:
[[And a (Or b c)]]
[[a, Or b c]] -- apply "And" rule
[[a,b],[a,c]] -- apply some kind of "Or" distribution rule
If so, having generic transformations isn't much use. With your current datatype, you can only apply these transformations to top-level expressions anyway. For example, there's no obvious way to apply an Or rule here:
[[And a (And b (Or c d))]]
without first applying And rules a couple of times. If you change your data type to add, say, an L2 [[Prop]] constructor, so you can transform the above expression to:
[[And a (And b (L2 [[c],[d]]))]] -- apply "Or" rule
it's not clear what that buys you.
Ultimately, I don't think this is the right approach...
You have a perfectly adequate representation of your prepositional logic in the Prop data type; and you have a desired final representation. Instead of trying to translate your Prop representation into the final representation using piecemeal generic transformations, transform your Prop representation using standard recursive Prop-to-Prop transformations into a canonical Prop form, and do the translation as the final step.
Here, a reasonable canonical form is:
Or e1 (Or e2 (... (Or e3 e4)))
where each ek is of form:
And t1 (And t2 (... (And t3 t4)))
and each tk is either a Lit _ or a Neg (Lit _). Obviously, this canonical form can be translated pretty easily into the desired final representation as a [[Prop]].
I've included a possible solution below. I don't see that much opportunity for simplifying things via generic transformations. Most of the pattern matching seems to be doing non-trivial work.
Possible Solution
After a bit of preamble:
import Data.List
data Prop = Lit String
| Neg Prop
| And Prop Prop
| Or Prop Prop
deriving (Eq)
then one way to translate an arbitrary Prop into this canonical form is to first push all the Negs down to the literal terms:
pushNeg :: Prop -> Prop
pushNeg = push False
where
-- de Morgan's laws
push neg (And x y) = (if neg then Or else And) (push neg x) (push neg y)
push neg (Or x y) = (if neg then And else Or) (push neg x) (push neg y)
-- handle Neg and Lit
push neg (Neg y) = push (not neg) y
push neg (Lit l) = if neg then Neg (Lit l) else Lit l
then push all the Ands down on top of them. This is tougher to get right, but I think the following is correct, even though it does a bit of unnecessary work in some cases:
pushAnd :: Prop -> Prop
pushAnd (Or x y) = Or (pushAnd x) (pushAnd y)
pushAnd (And x y)
= let x' = pushAnd x
in case x' of
Or u v -> Or (pushAnd (And u y)) (pushAnd (And v y))
_ -> let y' = pushAnd y
in case y' of
Or u v -> Or (pushAnd (And x' u)) (pushAnd (And x' v))
_ -> And x' y'
pushAnd x = x
and then recursively make all the And and Or clauses right-associative:
rassoc :: Prop -> Prop
rassoc (Or (Or x y) z) = rassoc (Or x (Or y z))
rassoc (Or x z) = Or (rassoc x) (rassoc z)
rassoc (And (And x y) z) = rassoc (And x (And y z))
rassoc (And x z) = And x (rassoc z)
rassoc x = x
and finally convert the canonical form to its final representation (dropping the inconsistent clauses and duplicate terms while we're at it):
translate :: Prop -> [[Prop]]
translate = nub . map nub . filter consistent . doOr
where
doOr x = case x of
Or x y -> doAnd x : doOr y
x -> doAnd x : []
doAnd x = case x of
And x y -> x : doAnd y
x -> x : []
consistent lits =
let (falses, trues) = partition isNeg lits
falses' = map (\(Neg (Lit l)) -> l) falses
trues' = map (\ (Lit l) -> l) trues
in null (intersect falses' trues')
isNeg (Neg x) = True
isNeg _ = False
The whole pipeline is:
final :: Prop -> [[Prop]]
final = translate . rassoc . pushAnd . pushNeg
and here's some test code:
a = Lit "a"
b = Lit "b"
c = Lit "c"
d = Lit "d"
e = Lit "e"
-- Show instance, but only for `final` forms
instance Show Prop where
show (Lit x) = x
show (Neg (Lit x)) = '~':x
main :: IO ()
main = do print $ final (Neg a)
print $ final (Or a b)
print $ final (Or a a)
print $ final (And a b)
print $ final (And (Or (And (Or a b) c) d) e)
print $ final (And (Or (Or a b) c) (Neg (And a (Or b d))))
which outputs:
[[~a]]
[[a],[b]]
[[a]]
[[a,b]]
[[a,c,e],[b,c,e],[d,e]]
[[a,~b,~d],[b,~a],[c,~a],[c,~b,~d]]
There's still some opportunity for further simplification, as:
final (And a (Or a b))
gives final form [[a],[a,b]] instead of just [[a]].

Implementing recurrence relations on State monads (in Haskell or Scala)

I am working on a new implementation of the operators in http://www.thalesians.com/archive/public/academic/finance/papers/Zumbach_2000.pdf
EDIT: clearer explanation here: https://www.olseninvest.com/customer/pdf/paper/001207-emaOfEma.pdf
Briefly, it's a whole bunch of cool time series operators based on the recurrence relation of the exponential moving average, where each application of the ema() operator takes the new value and the previous result of the ema. I can't seem to do latex on this stack exchange, but anyway my problem now is a software problem.
I implemented this in Scala by hiding a var deep inside the thunks that create EMA functions. This all works, but it's super tricky, because calling ema(5) and then ema(5) again will naturally lead to a different result. I'd like to try redoing all of this using State Monads, but I'm quickly getting lost in the weeds.
For example, I have the following simplified EMA State monad in Haskell:
import Control.Monad.State
type EMAState = Double
type Tau = Double
ema :: Tau -> Double -> State EMAState Double
ema tau x = state $ \y ->
let alpha = 1 / tau
mu = exp(-alpha)
mu' = 1 - mu
y' = (mu * y) + (mu' * x)
in (y', y')
which I can readily test in GHCI:
*Main Control.Monad.State> runState (ema 5 10) 0
(1.8126924692201818,1.8126924692201818)
applying the input 10 to a 5-period EMA initialized to 0. This is all well and good, using forM I can apply multiple input values etc. Now, the next step is to implement an "iterated EMA", which is an EMA applied to itself N times.
iEMA[n](x) = EMA(iEMA[n-1](x))
Each of these intermediate EMAs will need to have their own state, aka previous result, to correctly calculate the vector of iterated EMAs. So, what I am looking for, is something which like this, (I think):
iema :: Int -> Tau -> Double -> State [EMAState] [Double]
Which is essentially a daisy chain of EMAs:
iEMA[3](x) = EMA(EMA(EMA(x,s1),s2),s3) = (x, [s1,s2,s3]) -> ([y1,y2,y3], [s1',s2',s3'])
And if all I care about is the 3rd iterated EMA ...
... -> (y3, [s1', s2', s3'])
The paper moves on from there, creating ever more complex operators built on iterated EMAs and averages of them etc, so I want to be able to functionally and purely compose these stateful operators building ever more complex states, but still quite simple input and output.
I really feel like this is what functional programming is good at, but I don't yet have the expertise to see how to put together these State monads in the correct way. Could someone please point me in the right direction with these iterated recurrence operators?
EDIT:
A couple of helpful folks have suggested repeated application of the same ema operator to the input data, but this is not sufficient. Each ema operator needs to maintain it's own previous value. Here's an example:
tau 5
mu 0.818730753
muprime 0.181269247
ema1 ema2 ema3
x 0 0 0 <- States_0
1 0.1812 0.03285 0.00595 <- States_1
5 1.0547 0.21809 0.04441 <- States_2
The x column is the raw input, ema1 uses its left for input and it's up for recurrence/state. ema2 uses its left for input (not x!) and it's up for state. It's an ema (ema (x) ). Ditto ema3 = ema (ema (ema (x) ) ). What I would like to do, which I think must be possible, is given an ema state monad, compose the ema3 state monad, or even better, the [ema] state monad with each each subsequent ema operating on the output of the previous.
Let's build the handy old Mealy machine
data Mealy i o where
Mealy :: (i -> s -> (i, s)) -> s -> Mealy i o
which has all kinds of instances
instance Arrow Mealy
instance ArrowChoice Mealy
instance ArrowApply Mealy
instance Strong Mealy
instance Choice Mealy
instance Profunctor Mealy
instance Category * Mealy
instance Monad (Mealy a)
instance Functor (Mealy a)
instance Applicative (Mealy a)
instance Pointed (Mealy a)
We can use it to build recurrence relations
recur :: (a -> a -> a) -> a -> Mealy a a
recur f a0 = Mealy (\inp prior -> let post = f inp prior in (post, post)) a0
we can iterate them with our Category instance
iter :: Int -> Mealy a a -> Mealy a a
iter 0 _ = id
iter 1 m = m
iter n m = m >>> iter (n-1) m
and then, with all this machinery, we can create an infinite stream of iterated Mealy machines
data Stream a = Stream a (Stream a) deriving Functor
instance Functor Stream
instance Applicative Stream
instance Foldable Stream
instance Traversable Stream
ints :: Stream Int
ints = go 0 where go n = Stream n (go $ n + 1)
jet :: Mealy a a -> Stream (Mealy a a)
jet m = fmap (`iter` m) ints
All of these together give us, essentially, your desired structure. But it's a little difficult to interact with directly. We'll give it its own instances to help
newtype MealyJet i o = MealyJet { runMealyJet :: Stream (Mealy i o) }
instance Profunctor MealyJet
instance Applicative (MealyJet i)
instance Category MealyJet where
id = MealyJet (pure id) -- technically this should be `jet id`, but it's equal to pure
MealyJet f . MealyJet g = MealyJet (liftA2 (.) f g)
viewMealyJet :: MealyJet i o -> Mealy i (Stream o)
viewMealyJet (MealyJet m) = sequenceA m
And now, we can write these EMAs as needed
type Tau = Double
ema :: Tau -> Mealy Double Double
ema tau = recur $ \fresh prior ->
let alpha = 1 / tau
mu = exp (negate alpha)
mu' = 1 - mu
in (mu * y) + (mu' * x)
emaJet :: Tau -> MealyJet Double Double
emaJet = MealyJet . jet . ema
emaComp :: MealyJet Double Double
emaComp = emaJet 1 >>> emaJet 2 >>> emaJet 3 >>> emaJet 4 >>> emaJet 5
fiveStack :: Mealy Double (Stream Double)
fiveStack = viewMealyJet emaComp
Updated answer...
Define:
combine :: [ a -> State s a ] -> a -> State [s] a
combine fs a = state $ \ys ->
let zs = zipWith (\f y a -> runState (f a) y) fs ys
pairs = chain a zs
as' = map fst pairs
a' = last as' -- we are only returning one result in this case
ys' = map snd pairs
in (a', ys')
chain :: a -> [ a -> (a,s) ] -> [ (a,s) ]
chain a [] = []
chain a (f:fs) = let (a',s) = f a
in (a',s) : chain a' fs
ema3 t = combine $ replicate 3 (ema t)
ghci> runState (ema3 5 1) [0,0,0]
(5.956242778945897e-3,[0.18126924692201818,3.2858539879675595e-2,5.956242778945897e-3])
ghci> runState (do ema3 5 1; ema3 5 5) [0,0,0]
(4.441089130249448e-2,[1.0547569416524334,0.21809729359983737,4.441089130249448e-2])
The combine is easily modified to return all of the results - just return as' instead of a'.
Original answer:
combine :: (a -> State s b) -> (b -> State t c) -> (a -> State (s,t) c)
combine f g a = state $ \(s,t) ->
let (b,s') = runState (f a) s
(c,t') = runState (g b) t
in (c,(s',t'))
Then:
ema3 tau = ema tau `combine` ema tau `combine` ema tau
and em3 has type:
ema3 :: Tau -> Double -> State ((EMAState, EMAState), EMAState) Double
For instance:
ghci> runState (ema3 5 1) ((0,0),0)
(5.956242778945897e-3,((0.18126924692201818,3.2858539879675595e-2),5.956242778945897e-3))
Note that the state type of ema3 is ((Double,Double),Double) and not a 3-tuple or list.
In your example you run (ema3 5) first with input x = 1 and then with input x = 5
with initial state ((0,0),0):
ghci> runState (do ema3 5 1; ema3 5 5) ((0,0),0)
(4.441089130249448e-2,((1.0547569416524334,0.21809729359983737),4.441089130249448e-2))
and that gives you the second row in the table.
I may not be fully understanding your use case, but possibly you are looking for something like this:
ema' _ [] = get >>= return
ema' tau (x:xs) = do
y <- get
let alpha = 1 / tau
mu = exp $ negate alpha
mu' = 1 - mu
y' = (mu * y) + (mu' * x)
put y'
ema' tau xs
It is like your original function except it accepts a list of x values, and it recursively executes for each one, updating y each time. When none are left, it returns the value of y as the answer.
It can be run like so:
*Main> evalState (ema' 5 [10]) 0
1.8126924692201818
*Main> evalState (ema' 5 [10, 10]) 0
3.2967995396436076
*Main> evalState (ema' 5 [10, 10, 10]) 0
4.511883639059737
When using the State monad, you don't need to wrap your functions in the state $ \y -> ... business. You can simply enclose your monadic code in a do block and use put and get to access the state. In this case, for each recursive execution of the function, I grab the last y with get, and then use put after doing math to update the state.
I think that in your version, you are including the State monad without actually getting anything for it (since you don't use put or get).
Also, the State monad may be overkill for this; you could accomplish the same thing using a fold over a list of x values.
Update based on the comments...
Three iterations of ema can be written using the monadic bind operator >>= like this:
ema3 tau x = ema tau x >>= ema tau >>= ema tau
or using the Kleisli arrow:
ema3 tau = ema tau >=> ema tau >=> ema tau
As a diagram the computation flows like this:
y1 /---------\
| | |
v | v
x --> EMA --> EMA --> EMA --> x' = y3'
tau tau tau
| ^ |
| | v
\----------/ y3'
(Original answer)
This is not a complete answer, but perhaps the OP comment
on whether this is going in the right direction.
Here is what I understand the computation looks like:
y1 y2 y3
| | |
v v v
x --> EMA --> EMA --> EMA --> x'
tau1 tau2 tau3
| | |
v v v
y1' y2' y3'
The question is whether there is an elegant way to
express this as a composition of EMA blocks, e.g.
something like:
ema tau1 >o> ema tau2 >o> ema tau3
for some operator >o>.

weird type issue in haskell giving me issues (Par Monad)

For reference my code.
import Control.Monad.Par
makeGridx:: (Enum a,Num a)=>a->a->a->[a]
makeGridx start end h = [start,(start+h)..end]
makeGridt:: (Enum a, Num a)=>a->a->a->[a]
makeGridt start end h = [start,(start+h)..end]
generateBaseLine:: (Eq a,Num a)=>(a->a)-> [a] -> [(a,a,a)]
generateBaseLine f (x:xs) = if (null xs)
then [(x,0,0)]
else if(x==0)
then (x,0,0) : (generateBaseLine f xs)
else (x,0,(f x)) : (generateBaseLine f xs)
--fdm :: (Enum a,Num a) =>a->a->a->a->a->a->a->(a->a)->[(a,a,a)]
--fdm alpha startt endt startx endx dx dt bbFunction = start alpha (makeGridx startx endx dx) (makeGridt startt endt dt) (generateBaseLine bbFunction (makeGridx startx endx dx)) dx dt
--start:: Num a=>a->[a]->[a]->[(a,a,a)]->a->a->[(a,a,a)]
--start alpha (x:xs) (t:ts) (phi:phis) dx dt = (startPar alpha (x:xs) (ts) (phi:phis) dx dt [] [])
startPar:: Num a =>a->[a]->[a]->[(a,a,a)]->a->a->[(a,a,a)]
startPar alpha (x:xs) (t:ts) (phi1:(ph2:(ph3:phis))) dx dt = (phi1:(ph2:(ph3:phis))) ++ (buildPhiListIds alpha (x:xs) (t:ts) (phi1:(ph2:(ph3:phis))) dx dt [] [])
buildPhiListIds:: Num a=> a->[a]->[a]->[(a,a,a)]->a->a->[Par (IVar (a, a, a))]->[a]->[(a,a,a)]
buildPhiListIds alpha (x:xs) (t:ts) (phi1:(ph2:(ph3:phis))) dx dt phiIds newX = do
one<-third phi1
two<-third ph2
three<-third ph3
newSolId<- spawn( return (newPhi (x:xs) t (one,two,three,dx,dt,alpha) ))
buildPhiListIds alpha xs (t:ts) (ph2:(ph3:phis)) dx dt (phiIds ++ [newSolId]) (newX ++ [x])
buildPhiListIds alpha (0:xs) (t:ts) (phi1:(ph2:(ph3:phis))) dx dt phiIds newX = do
newSolId<-spawn (return (newPhi (0:xs) t (1,2,3,4,5,6)))
buildPhiListIds alpha xs (t:ts) (phi1:(ph2:(ph3:phis))) dx dt (phiIds ++ [newSolId]) (newX ++ [0])
buildPhiListIds alpha [] (t:ts) (phi1:(ph2:(ph3:phis))) dx dt phiIds newX = do
(getSolutions (getTuples(getSolutions phiIds))) ++ (buildPhiListIds alpha newX ts (getSolutions (getTuples(getSolutions phiIds))) dx dt [] [])
buildPhiListIds _ _ [] _ _ _ _ _ = []
getTuples::[IVar a]->[Par a]
getTuples (x:xs) = (get x) : (getSolutions xs)
getTuples [] = []
getSolutions:: [Par a]->[a]
getSolutions (x:xs) = (runPar x):(getTuples xs)
getSolutions [] = []
third (_,_,x)=x
ex f g x = runPar $ do
fx <- spawn (return (f x))
gx <- spawn (return (g x))
a <- get fx
b <- get gx
return (a,b)
newPhi:: (Eq a,Fractional a)=> [a]->a->(a,a,a,a,a,a)->(a,a,a)
newPhi (0:xs) t (phiL,phiC,phiR,dx,dt,alpha)= (0,t,0)
newPhi (x:[]) t (phiL,phiC,phiR,dx,dt,alpha)= (x,t,0)
newPhi (x:xs) t (phiL,phiC,phiR,dx,dt,alpha)= (x,t,(phiC + (alpha * (dt/(dx^2)))*(phiR -(2*phiC) + phiL)))
I get a bunch of errors, but one very much complexes me.
heateqpar.hs:28:156:
Couldn't match type `Par' with `[]'
Expected type: [IVar (a1, a1, a1)]
Actual type: Par (IVar (a1, a1, a1))
In a stmt of a 'do' block:
newSolId <- spawn
(return (newPhi (x : xs) t (one, two, three, dx, dt, alpha))) ::
Par (IVar (a, a, a))
In the expression:
do { one <- third phi1;
two <- third ph2;
three <- third ph3;
newSolId <- spawn
(return (newPhi (x : xs) t (one, two, three, dx, dt, alpha))) ::
Par (IVar (a, a, a));
.... }
In an equation for `buildPhiListIds':
buildPhiListIds
alpha
(x : xs)
(t : ts)
(phi1 : (ph2 : (ph3 : phis)))
dx
dt
phiIds
newX
= do { one <- third phi1;
two <- third ph2;
three <- third ph3;
.... }
The actual type of this is what i want it to be, but for some reason it is trying to enforce this type that isnt the return type of spawn? When i see this it seems like in my type declaration is trying to enforce this however i have the type as followed
buildPhiListIds:: Num a=> a->[a]->[a]->[(a,a,a)]->a->a->[Par (IVar (a, a, a))]->[a]->[(a,a,a)]
I see no type specifically of [IVar (a1, a1, a1)], which is really confusing me. If someone could lead me on the right road, it would be very much appreciated.
I get a bunch of errors, but one very much complexes me.
In a do expression, every monadic action must belong to the same monad. The return type of buildPhiListIds is [something], so the result of do has type [something]. Therefore, all your actions should be in the list monad, not in the Par monad. Now look at spawn again:
spawn :: NFData a => Par a -> Par (IVar a)
Compare what I mentioned above with your error: "Couldn't match type `Par' with `[]'". Aha! It expects a list, but you're using something of wrong type (Par)!
Now, extrapolating from your previous questions I suppose that you're new to Haskell and the concept of monads. There are many tutorials about them, including chapters in RWH or in LYAH, so I won't provide one in this answer (they're actually rather easy, don't be intimidated by the number of tutorials). Either way, your current usage is completely off.
That being said, you should refactor buildPhiListIds to have the following type:
buildPhiListIds:: Num a => ... -> Par [(a,a,a)]
Also, your definitions of getTuples and getSolutions don't make much sense. The following are much simpler and probably achieve what you actually want:
getTuples :: [IVar a] -> [Par a]
getTuples = map get
getSolutions :: [Par a] -> [a]
getSolutions = runPar . sequence
Also, you should try to keep the calls to runPar to a minimum:
The runPar function itself is relatively expensive [...]. So when using the Par monad, you should usually try to thread the Par monad around to all the places that need parallelism to avoid needing multiple runPar calls. [...] In particular, nested calls to runPar (where a runPar is evaluated during the course of executing another Par computation) usually give poor results.
I suggest you to write some simpler programs which actually compile, till you get both monads in general and Par.

Optimising Haskell data reading from file

I am trying to implement Kosaraju's graph algorithm, on a 3.5m line file where each row is two (space separated) Ints representing a graph edge. To start I need to create a summary data structure that has the node and lists of its incoming and outgoing edges. The code below achieves that, but takes over a minute, whereas I can see from posts on the MOOC forum that people using other languages are completing in <<10s. (getLines is taking 10s compared to under 1s in benchmarks I read about.)
I'm new to Haskell and have implemented an accumulation method using foldl' (the ' was a breakthrough in making it terminate at all), but it feels rather imperative in style, and I'm hoping that that's the reason why it is running slow. Moreover, I'm currently planning to use a similar pattern to conduct the depth-first-search, and I fear it will all just become too slow.
I have found this presentation and blog that talk about these sort of issues but at too expert a level.
import System.IO
import Control.Monad
import Data.Map.Strict as Map
import Data.List as L
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored (Edges, Edges) deriving (Show)
type Graph1 = Map NodeName Node
getLines :: FilePath -> IO [[Int]]
getLines = liftM (fmap (fmap read . words) . lines) . readFile
getLines' :: FilePath -> IO [(Int,Int)]
getLines' = liftM (fmap (tuplify2 . fmap read . words) . lines) . readFile
tuplify2 :: [a] -> (a,a)
tuplify2 [x,y] = (x,y)
main = do
list <- getLines "testdata.txt" -- [String]
--list <- getLines "SCC.txt" -- [String]
let
list' = createGraph list
return list'
createGraph :: [[Int]] -> Graph1
createGraph xs = L.foldl' build Map.empty xs
where
build :: Graph1-> [Int] -> Graph1
build = \acc (x:y:_) ->
let tmpAcc = case Map.lookup x acc of
Nothing -> Map.insert x (Node False ([y],[])) acc
Just a -> Map.adjust (\(Node _ (fwd, bck)) -> (Node False ((y:fwd), bck))) x acc
in case Map.lookup y tmpAcc of
Nothing -> Map.insert y (Node False ([],[x])) tmpAcc
Just a -> Map.adjust (\(Node _ (fwd, bck)) -> (Node False (fwd, (x:bck)))) y tmpAcc
Using maps:
Use IntMap or HashMap when possible. Both are significantly faster for Int keys than Map. HashMap is usually faster than IntMap but uses more RAM and has a less rich library.
Don't do unnecessary lookups. The containers package has a large number of specialized functions. With alter the number of lookups can be halved compared to the createGraph implementation in the question.
Example for createGraph:
import Data.List (foldl')
import qualified Data.IntMap.Strict as IM
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored Edges Edges deriving (Eq, Show)
type Graph1 = IM.IntMap Node
createGraph :: [(Int, Int)] -> Graph1
createGraph xs = foldl' build IM.empty xs
where
addFwd y (Just (Node _ f b)) = Just (Node False (y:f) b)
addFwd y _ = Just (Node False [y] [])
addBwd x (Just (Node _ f b)) = Just (Node False f (x:b))
addBwd x _ = Just (Node False [] [x])
build :: Graph1 -> (Int, Int) -> Graph1
build acc (x, y) = IM.alter (addBwd x) y $ IM.alter (addFwd y) x acc
Using vectors:
Consider the efficient construction functions (the accumulators, unfolds, generate, iterate, constructN, etc.). These may use mutation behind the scenes but are considerably more convenient to use than actual mutable vectors.
In the more general case, use the laziness of boxed vectors to enable self-reference when constructing a vector.
Use unboxed vectors when possible.
Use unsafe functions when you're absolutely sure about the bounds.
Only use mutable vectors when there aren't pure alternatives. In that case, prefer the ST monad to IO. Also, avoid creating many mutable heap objects (i. e. prefer mutable vectors to immutable vectors of mutable references).
Example for createGraph:
import qualified Data.Vector as V
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored Edges Edges deriving (Eq, Show)
type Graph1 = V.Vector Node
createGraph :: Int -> [(Int, Int)] -> Graph1
createGraph maxIndex edges = graph'' where
graph = V.replicate maxIndex (Node False [] [])
graph' = V.accum (\(Node e f b) x -> Node e (x:f) b) graph edges
graph'' = V.accum (\(Node e f b) x -> Node e f (x:b)) graph' (map (\(a, b) -> (b, a)) edges)
Note that if there are gaps in the range of the node indices, then it'd be wise to either
Contiguously relabel the indices before doing anything else.
Introduce an empty constructor to Node to signify a missing index.
Faster I/O:
Use the IO functions from Data.Text or Data.ByteString. In both cases there are also efficient functions for breaking input into lines or words.
Example:
import qualified Data.ByteString.Char8 as BS
import System.IO
getLines :: FilePath -> IO [(Int, Int)]
getLines path = do
lines <- (map BS.words . BS.lines) `fmap` BS.readFile path
let pairs = (map . map) (maybe (error "can't read Int") fst . BS.readInt) lines
return [(a, b) | [a, b] <- pairs]
Benchmarking:
Always do it, unlike me in this answer. Use criterion.
Based pretty much on András' suggestions, I've reduced a 113 second task down to 24 (measured by stopwatch as I can't quite get Criterion to do anything yet) (and then down to 10 by compiling -O2)!!! I've attended some courses this last year that talked about the challenge of optimising for large datasets but this was the first time I faced a question that actually involved one, and it was as non-trivial as my instructors' suggested. This is what I have now:
import System.IO
import Control.Monad
import Data.List (foldl')
import qualified Data.IntMap.Strict as IM
import qualified Data.ByteString.Char8 as BS
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored Edges Edges deriving (Eq, Show)
type Graph1 = IM.IntMap Node
-- DFS uses a stack to store next points to explore, a list can do this
type Stack = [(NodeName, NodeName)]
getBytes :: FilePath -> IO [(Int, Int)]
getBytes path = do
lines <- (map BS.words . BS.lines) `fmap` BS.readFile path
let
pairs = (map . map) (maybe (error "Can't read integers") fst . BS.readInt) lines
return [(a,b) | [a,b] <- pairs]
main = do
--list <- getLines' "testdata.txt" -- [String]
list <- getBytes "SCC.txt" -- [String]
let list' = createGraph' list
putStrLn $ show $ list' IM.! 66
-- return list'
bmark = defaultMain [
bgroup "1" [
bench "Sim test" $ whnf bmark' "SCC.txt"
]
]
bmark' :: FilePath -> IO ()
bmark' path = do
list <- getLines path
let
list' = createGraph list
putStrLn $ show $ list' IM.! 2
createGraph' :: [(Int, Int)] -> Graph1
createGraph' xs = foldl' build IM.empty xs
where
addFwd y (Just (Node _ f b)) = Just (Node False (y:f) b)
addFwd y _ = Just (Node False [y] [])
addBwd x (Just (Node _ f b)) = Just (Node False f (x:b))
addBwd x _ = Just (Node False [] [x])
build :: Graph1 -> (Int, Int) -> Graph1
build acc (x, y) = IM.alter (addBwd x) y $ IM.alter (addFwd y) x acc
And now on with the rest of the exercise....
This is not really an answer, I would rather comment András Kovács post, if I add those 50 points...
I have implemented the loading of the graph in both IntMap and MVector, in a attempt to benchmark mutability vs. immutability.
Both program use Attoparsec for the parsing. There is surely more economic way to do it, but Attoparsec is relatively fast compared to its high abstraction level (the parser can stand in one line). The guideline is to avoid String and read. read is partial and slow, [Char] is slow and not memory efficient, unless properly fused.
As András Kovács noted, IntMap is better than Map for Int keys. My code provides another example of alter usage. If the node identifier mapping is dense, you may also want to use Vector and Array. They allow O(1) indexing by the identifier.
The mutable version handle on demand the exponential growth of the MVector. This avoid to precise an upper bound on node identifiers, but introduce more complexity (the reference on the vector may change).
I benchmarked with a file of 5M edges with identifiers in the range [0..2^16]. The MVector version is ~2x faster than the IntMap code (12s vs 25s on my computer).
The code is here [Gist].
I will edit when more profiling is done on my side.

Applicative functors analysis

I've been trying to learn about static analysis of applicative functors. Many sources say that an advantage of using them over monads is the susceptibility to static analysis.
However, the only example I can find of actually performing static analysis is too complicated for me to understand. Are there any simpler examples of this?
Specifically, I want to know if I can performing static analysis on recursive applications. For example, something like:
y = f <$> x <*> y <*> z
When analyzing the above code, is it possible to detect that it is recursive on y? Or does referential transparency still prevent this from being possible?
Applicative functors allow static analysis at runtime. This is better explained by a simpler example.
Imagine you want to calculate a value, but want to track what dependencies that value has. Eg you may use IO a to calculate the value, and have a list of Strings for the dependencies:
data Input a = Input { dependencies :: [String], runInput :: IO a }
Now we can easily make this an instance of Functor and Applicative. The functor instance is trivial. As it doesn't introduce any new dependencies, you just need to map over the runInput value:
instance Functor (Input) where
fmap f (Input deps runInput) = Input deps (fmap f runInput)
The Applicative instance is more complicated. the pure function will just return a value with no dependencies. The <*> combiner will concat the two list of dependencies (removing duplicates), and combine the two actions:
instance Applicative Input where
pure = Input [] . return
(Input deps1 getF) <*> (Input deps2 runInput) = Input (nub $ deps1 ++ deps2) (getF <*> runInput)
With that, we can also make an Input a an instance of Num if Num a:
instance (Num a) => Num (Input a) where
(+) = liftA2 (+)
(*) = liftA2 (*)
abs = liftA abs
signum = liftA signum
fromInteger = pure . fromInteger
Nexts, lets make a couple of Inputs:
getTime :: Input UTCTime
getTime = Input { dependencies = ["Time"], runInput = getCurrentTime }
-- | Ideally this would fetch it from somewhere
stockPriceOf :: String -> Input Double
stockPriceOf stock = Input { dependencies = ["Stock ( " ++ stock ++ " )"], runInput = action } where
action = case stock of
"Apple" -> return 500
"Toyota" -> return 20
Finally, lets make a value that uses some inputs:
portfolioValue :: Input Double
portfolioValue = stockPriceOf "Apple" * 10 + stockPriceOf "Toyota" * 20
This is a pretty cool value. Firstly, we can find the dependencies of portfolioValue as a pure value:
> :t dependencies portfolioValue
dependencies portfolioValue :: [String]
> dependencies portfolioValue
["Stock ( Apple )","Stock ( Toyota )"]
That is the static analysis that Applicative allows - we know the dependencies without having to execute the action.
We can still get the value of the action though:
> runInput portfolioValue >>= print
5400.0
Now, why can't we do the same with Monad? The reason is Monad can express choice, in that one action can determine what the next action will be.
Imagine there was a Monad interface for Input, and you had the following code:
mostPopularStock :: Input String
mostPopularStock = Input { dependencies ["Popular Stock"], getInput = readFromWebMostPopularStock }
newPortfolio = do
stock <- mostPopularStock
stockPriceOf "Apple" * 40 + stockPriceOf stock * 10
Now, how can we calculate the dependencies of newPortolio? It turns out we can't do it without using IO! It will depend on the most popular stock, and the only way to know is to run the IO action. Therefore it isn't possible to statically track dependencies when the type uses Monad, but completely possible with just Applicative. This is a good example of why often less power means more useful - as Applicative doesn't allow choice, dependencies can be calculated statically.
Edit: With regards to the checking if y is recursive on itself, such a check is possible with applicative functors if you are willing to annotate your function names.
data TrackedComp a = TrackedComp { deps :: [String], recursive :: Bool, run :: a}
instance (Show a) => Show (TrackedComp a) where
show comp = "TrackedComp " ++ show (run comp)
instance Functor (TrackedComp) where
fmap f (TrackedComp deps rec1 run) = TrackedComp deps rec1 (f run)
instance Applicative TrackedComp where
pure = TrackedComp [] False
(TrackedComp deps1 rec1 getF) <*> (TrackedComp deps2 rec2 value) =
TrackedComp (combine deps1 deps2) (rec1 || rec2) (getF value)
-- | combine [1,1,1] [2,2,2] = [1,2,1,2,1,2]
combine :: [a] -> [a] -> [a]
combine x [] = x
combine [] y = y
combine (x:xs) (y:ys) = x : y : combine xs ys
instance (Num a) => Num (TrackedComp a) where
(+) = liftA2 (+)
(*) = liftA2 (*)
abs = liftA abs
signum = liftA signum
fromInteger = pure . fromInteger
newComp :: String -> TrackedComp a -> TrackedComp a
newComp name tracked = TrackedComp (name : deps tracked) isRecursive (run tracked) where
isRecursive = (name `elem` deps tracked) || recursive tracked
y :: TrackedComp [Int]
y = newComp "y" $ liftA2 (:) x z
x :: TrackedComp Int
x = newComp "x" $ 38
z :: TrackedComp [Int]
z = newComp "z" $ liftA2 (:) 3 y
> recursive x
False
> recursive y
True
> take 10 $ run y
[38,3,38,3,38,3,38,3,38,3]
Yes, applicative functors allow more analysis than monads. But no, you can't observe the recursion. I've written a paper about parsing which explains the problem in detail:
https://lirias.kuleuven.be/bitstream/123456789/352570/1/gc-jfp.pdf
The paper then discusses an alternative encoding of recursion which does allow analysis and has some other advantages and some downsides. Other related work is:
https://lirias.kuleuven.be/bitstream/123456789/376843/1/p97-devriese.pdf
And more related work can be found in the related work sections of those papers...

Resources