I'm learning Monad Transformers and decided to write an interpreter for a simple language(with loop constructs) similar to Brainfuck using Monad Transformers. I would like to terminate the interpreter after certain number of statements.
This simple language is made of single memory cell capable of holding an Int and 5 instructions Input, Output, Increment, Decrement and Loop. A loop terminates when value in the memory is zero. Input is read from a list and similarly output is written to another list. Increment and Decrement does +1 and -1 to memory correspondingly.
I'm using World type to keep track of input, output (streams) and memory, Sum Int to count number of instructions evaluated. Except World to terminate evaluation after certain statements.
module Transformers where
import qualified Data.Map as Map
import Data.Maybe
import Control.Monad.State.Lazy
import Control.Monad.Writer.Lazy
import Control.Monad.Except
data Term = Input
| Output
| Increment
| Decrement
| Loop [Term]
deriving (Show)
data World = World {
inp :: [Int],
out :: [Int],
mem :: Int
} deriving Show
op_limit = 5
loop
:: [Term]
-> StateT World (WriterT (Sum Int) (Except World)) ()
-> StateT World (WriterT (Sum Int) (Except World)) ()
loop terms sp = sp >> do
s <- get
if mem s == 0 then put s else loop terms (foldM (\_ t -> eval t) () terms)
limit :: StateT World (WriterT (Sum Int) (Except World)) ()
limit = do
(s, count) <- listen get
when (count >= op_limit) $ throwError s
tick :: StateT World (WriterT (Sum Int) (Except World)) ()
tick = tell 1
eval :: Term -> StateT World (WriterT (Sum Int) (Except World)) ()
eval Input =
limit >> tick >> modify (\s -> s { inp = tail (inp s), mem = head (inp s) })
eval Output = limit >> tick >> modify (\s -> s { out = mem s : out s })
eval Increment = limit >> tick >> modify (\s -> s { mem = mem s + 1 })
eval Decrement = limit >> tick >> modify (\s -> s { mem = mem s - 1 })
eval (Loop terms) = loop terms (void get)
type Instructions = [Term]
interp :: Instructions -> World -> Either World (World, Sum Int)
interp insts w =
let sp = foldM (\_ inst -> eval inst) () insts
in runExcept (runWriterT (execStateT sp w))
Example run in ghci:
*Transformers> interp [Loop [Output, Decrement]] $ World [] [] 5
Right (World {inp = [], out = [1,2,3,4,5], mem = 0},Sum {getSum = 10})
The monad limit based on count and should decide to either Fail with current state or do nothing. But I noticed that count in (s, count) <- listen get is always zero. I don't understand why is this happening. Please help me understand where I went wrong.
Is my ordering of transformers in the stack correct? Are there any rules (informal) to decide the layering?
Computations inside the Writer monad can't have access to their own accumulator. What's more: the accumulator is never forced while the computation runs, not even to WHNF. This applies to both the strict and lazy variants of Writer—the strict variant is strict in a sense unrelated to the accumulator. This unavoidable laziness in the accumulator can be a source of space leaks if the computation runs for too long.
Your limit function is not branching on the value of the "mainline" WriterT accumulator. The get action (you are using mtl) simply reads the state from the StateT layer, and performs no effects in the other layers: it adds mempty to its WriterT accumulator an throws no error.
Then, the listen extracts the Writer accumulator of the get action (only of the get, not of the whole computation) and adds it to the "mainline" accumulator. But this extracted value (the one returned in the tuple) will always be mempty, that is, Sum 0!
Instead of WriterT, you could put the counter in the StateT state, as #chi has mentioned. You could also use AccumT, which is very similar to WriterT but lets you inspect the accumulator (it also lets you force it to WHNF using bang patterns).
AccumT doesn't seem to have a corresponding mtl typeclass though, so you'll need to sprinkle a few lifts in order to use it.
Related
I'm trying to work out if it's possible to write an abstraction for the following situation. Suppose I have a type a with function a -> m Bool e.g. MVar Bool and readMVar. To abstract this concept out I create a newtype wrapper for the type and its function:
newtype MPredicate m a = MPredicate (a,a -> m Bool)
I can define a fairly simple operation like so:
doUnless :: (Monad m) => Predicate m a -> m () -> m ()
doUnless (MPredicate (a,mg)) g = mg a >>= \b -> unless b g
main = do
b <- newMVar False
let mpred = MPredicate (b,readMVar)
doUnless mpred (print "foo")
In this case doUnless would print "foo". Aside: I'm not sure whether a type class might be more appropriate to use instead of a newtype.
Now take the code below, which outputs an incrementing number then waits a second and repeats. It does this until it receives a "turn off" instruction via the MVar.
foobar :: MVar Bool -> IO ()
foobar mvb = foobar' 0
where
foobar' :: Int -> IO ()
foobar' x = readMVar mvb >>= \b -> unless b $ do
let x' = x + 1
print x'
threadDelay 1000000
foobar' x'
goTillEnter :: MVar Bool -> IO ()
goTillEnter mv = do
_ <- getLine
_ <- takeMVar mv
putMVar mv True
main = do
mvb <- newMVar False
forkIO $ foobar mvb
goTillEnter mvb
Is it possible to refactor foobar so that it uses MPredicate and doUnless?
Ignoring the actual implementation of foobar' I can think of a simplistic way of doing something similar:
cycleUnless :: x -> (x -> x) -> MPredicate m a -> m ()
cycleUnless x g mp = let g' x' = doUnless mp (g' $ g x')
in g' $ g x
Aside: I feel like fix could be used to make the above neater, though I still have trouble working out how to use it
But cycleUnless won't work on foobar because the type of foobar' is actually Int -> IO () (from the use of print x').
I'd also like to take this abstraction further, so that it can work threading around a Monad. With stateful Monads it becomes even harder. E.g.
-- EDIT: Updated the below to show an example of how the code is used
{- ^^ some parent function which has the MVar ^^ -}
cycleST :: (forall s. ST s (STArray s Int Int)) -> IO ()
cycleST sta = readMVar mvb >>= \b -> unless b $ do
n <- readMVar someMVar
i <- readMVar someOtherMVar
let sta' = do
arr <- sta
x <- readArray arr n
writeArray arr n (x + i)
return arr
y = runSTArray sta'
print y
cycleST sta'
I have something similar to the above working with RankNTypes. Now there's the additional problem of trying to thread through the existential s, which is not likely to type check if threaded around through an abstraction the likes of cycleUnless.
Additionally, this is simplified to make the question easier to answer. I also use a set of semaphores built from MVar [MVar ()] similar to the skip channel example in the MVar module. If I can solve the above problem I plan to generalize the semaphores as well.
Ultimately this isn't some blocking problem. I have 3 components of the application operating in a cycle off the same MVar Bool but doing fairly different asynchronous tasks. In each one I have written a custom function that performs the appropriate cycle.
I'm trying to learn the "don't write large programs" approach. What I'd like to do is refactor chunks of code into their own mini libraries so that I'm not building a large program but assembling lots of small ones. But so far this particular abstraction is escaping me.
Any thoughts on how I might go about this are very much appreciated!
You want to cleanly combine a stateful action having side effects, a delay, and an independent stopping condition.
The iterative monad transformer from the free package can be useful in these cases.
This monad transformer lets you describe a (possibly nonending) computation as a series of discrete steps. And what's better, it let's you interleave "stepped" computations using mplus. The combined computation stops when any of the individual computations stops.
Some preliminary imports:
import Data.Bool
import Control.Monad
import Control.Monad.Trans
import Control.Monad.Trans.Iter (delay,untilJust,IterT,retract,cutoff)
import Control.Concurrent
Your foobar function could be understood as a "sum" of three things:
A computation that does nothing but reading from the MVar at each step, and finishes when the Mvar is True.
untilTrue :: (MonadIO m) => MVar Bool -> IterT m ()
untilTrue = untilJust . liftM guard . liftIO . readMVar
An infinite computation that takes a delay at each step.
delays :: (MonadIO m) => Int -> IterT m a
delays = forever . delay . liftIO . threadDelay
An infinite computation that prints an increasing series of numbers.
foobar' :: (MonadIO m) => Int -> IterT m a
foobar' x = do
let x' = x + 1
liftIO (print x')
delay (foobar' x')
With this in place, we can write foobar as:
foobar :: (MonadIO m) => MVar Bool -> m ()
foobar v = retract (delays 1000000 `mplus` untilTrue v `mplus` foobar' 0)
The neat thing about this is that you can change or remove the "stopping condition" and the delay very easily.
Some clarifications:
The delay function is not a delay in IO, it just tells the iterative monad transformer to "put the argument in a separate step".
retract brings you back from the iterative monad transformer to the base monad. It's like saying "I don't care about the steps, just run the computation". You can combine retract with cutoff if you want to limit the maximum number of iterations.
untilJustconverts a value m (Maybe a) of the base monad into a IterT m a by retrying in each step until a Just is returned. Of course, this risks non-termination!
MPredicate is rather superfluous here; m Bool can be used instead. The monad-loops package contains plenty of control structures with m Bool conditions. whileM_ in particular is applicable here, although we need to include a State monad for the Int that we're threading around:
import Control.Monad.State
import Control.Monad.Loops
import Control.Applicative
foobar :: MVar Bool -> IO ()
foobar mvb = (`evalStateT` (0 :: Int)) $
whileM_ (not <$> lift (readMVar mvb)) $ do
modify (+1)
lift . print =<< get
lift $ threadDelay 1000000
Alternatively, we can use a monadic version of unless. For some reason monad-loops doesn't export such a function, so let's write it:
unlessM :: Monad m => m Bool -> m () -> m ()
unlessM mb action = do
b <- mb
unless b action
It's somewhat more convenient and more modular in a monadic setting, since we can always go from a pure Bool to m Bool, but not vice versa.
foobar :: MVar Bool -> IO ()
foobar mvb = go 0
where
go :: Int -> IO ()
go x = unlessM (readMVar mvb) $ do
let x' = x + 1
print x'
threadDelay 1000000
go x'
You mentioned fix; sometimes people indeed use it for ad-hoc monadic loops, for example:
printUntil0 :: IO ()
printUntil0 =
putStrLn "hello"
fix $ \loop -> do
n <- fmap read getLine :: IO Int
print n
when (n /= 0) loop
putStrLn "bye"
With some juggling it's possible to use fix with multi-argument functions. In the case of foobar:
foobar :: MVar Bool -> IO ()
foobar mvb = ($(0 :: Int)) $ fix $ \loop x -> do
unlessM (readMVar mvb) $ do
let x' = x + 1
print x'
threadDelay 1000000
loop x'
I'm not sure what's your MPredicate is doing.
First, instead of newtyping a tuple, it's probably better to use a normal algebric data type
data MPredicate a m = MPredicate a (a -> m Bool)
Second, the way you use it, MPredicate is equivalent to m Bool.
Haskell is lazzy, therefore there is no need to pass, a function and it's argument (even though
it's usefull with strict languages). Just pass the result, and the function will be called when needed.
I mean, instead of passing (x, f) around, just pass f x
Of course, if you are not trying to delay the evaluation and really need at some point, the argument or the function as well as the result, a tuple is fine.
Anyway, in the case your MPredicate is only there to delay the function evaluation, MPredicat reduces to m Bool and doUnless to unless.
Your first example is strictly equivalent :
main = do
b <- newMVar False
unless (readMVar b) (print "foo")
Now, if you want to loop a monad until a condition is reach (or equivalent) you should have a look at the monad-loop package. What you are looking it at is probably untilM_ or equivalent.
I have a conduit pipeline processing a long file. I want to print a progress report for the user every 1000 records, so I've written this:
-- | Every n records, perform the IO action.
-- Used for progress reports to the user.
progress :: (MonadIO m) => Int -> (Int -> i -> IO ()) -> Conduit i m i
progress n act = skipN n 1
where
skipN c t = do
mv <- await
case mv of
Nothing -> return ()
Just v ->
if c <= 1
then do
liftIO $ act t v
yield v
skipN n (succ t)
else do
yield v
skipN (pred c) (succ t)
No matter what action I call this with, it leaks memory, even if I just tell it to print a full stop.
As far as I can see the function is tail recursive and both counters are regularly forced (I tried putting "seq c" and "seq t" in, to no avail). Any clue?
If I put in an "awaitForever" that prints a report for every record then it works fine.
Update 1: This occurs only when compiled with -O2. Profiling indicates that the leaking memory is allocated in the recursive "skipN" function and being retained by "SYSTEM" (whatever that means).
Update 2: I've managed to cure it, at least in the context of my current program. I've replaced the function above with this. Note that "proc" is of type "Int -> Int -> Maybe i -> m ()": to use it you call "await" and pass it the result. For some reason swapping over the "await" and "yield" solved the problem. So now it awaits the next input before yielding the previous result.
-- | Every n records, perform the monadic action.
-- Used for progress reports to the user.
progress :: (MonadIO m) => Int -> (Int -> i -> IO ()) -> Conduit i m i
progress n act = await >>= proc 1 n
where
proc c t = seq c $ seq t $ maybe (return ()) $ \v ->
if c <= 1
then {-# SCC "progress.then" #-} do
liftIO $ act t v
v1 <- await
yield v
proc n (succ t) v1
else {-# SCC "progress.else" #-} do
v1 <- await
yield v
proc (pred c) (succ t) v1
So if you have a memory leak in a Conduit, try swapping the yield and await actions.
This isn't an anwser but it is some complete code I hacked up for testing. I don't know conduit at all, so it may not be the best conduit code. I've forced everything that seems like it needs to be forced, but it still leaks.
{-# LANGUAGE BangPatterns #-}
import Data.Conduit
import Data.Conduit.List
import Control.Monad.IO.Class
-- | Every n records, perform the IO action.
-- Used for progress reports to the user.
progress :: (MonadIO m) => Int -> (Int -> i -> IO ()) -> Conduit i m i
progress n act = skipN n 1
where
skipN !c !t = do
mv <- await
case mv of
Nothing -> return ()
Just !v ->
if (c :: Int) <= 1
then do
liftIO $ act t v
yield v
skipN n (succ t)
else do
yield v
skipN (pred c) (succ t)
main :: IO ()
main = unfold (\b -> b `seq` Just (b, b+1)) 1
$= progress 100000 (\_ b -> print b)
$$ fold (\_ _ -> ()) ()
On the other hand,
main = unfold (\b -> b `seq` Just (b, b+1)) 1 $$ fold (\_ _ -> ()) ()
does not leak, so something in progress does indeed seem to be the problem. I can't see what.
EDIT: The leak only occurs with ghci! If I compile a binary and run it there is no leak (I should have tested this earlier ...)
I think Tom's answer is the right one, I'm starting this as a separate answer as it will likely introduce some new discussion (and because it's too long for just a comment). In my testing, replacing the print b in Tom's example with return () gets rid of the memory leak. This made me think that the problem is in fact with print, not conduit. To test this theory, I wrote a simple helper function in C (placed in helper.c):
#include <stdio.h>
void helper(int c)
{
printf("%d\n", c);
}
Then I foreign imported this function in the Haskell code:
foreign import ccall "helper" helper :: Int -> IO ()
and I replaced the call to print with a call to helper. The output from the program is identical, but I show no leak, and a max residency of 32kb vs 62kb (I also modified the code to stop at 10m records for better comparison).
I see similar behavior when I cut out conduit entirely, e.g.:
main :: IO ()
main = forM_ [1..10000000] $ \i ->
when (i `mod` 100000 == 0) (helper i)
I'm not convinced, however, that this is really a bug in print or Handle. My testing never showed the leak reaching any substantial memory usage, so it could just be that a buffer is growing towards a limit. I'd have to do more research to understand this better, but I wanted to first see if this analysis meshes with what others are seeing.
I know it's two years later, but I suspect what's happening is that full laziness is lifting part of the body the await until before the await, and this is causing a space leak. It looks similar to the case in section "Increasing Sharing" in my blog post on this very topic.
Being quite new to Haskell, I'm currently trying to improve my skills by writing an interpreter for a simple imperative toy language.
One of the expressions in this language is input, which reads a single integer from standard input. However, when I assign the value of this expression to a variable and then use this variable later, it seems ot me that I actually stored the computation of reading a value rather the read value itself. This means that e.g. the statements
x = input;
y = x + x;
will cause the interpreter to invoke the input procedure three times rather than one.
Internally in the evaluator module, I use a Map to store the values of variables. Because I need to deal with IO, this gets wrapped in an IO monad, as immortalized in the following minimal example:
import qualified Data.Map as Map
type State = Map.Map String Int
type Op = Int -> Int -> Int
input :: String -> IO State -> IO State
input x state = do line <- getLine
st <- state
return $ Map.insert x (read line) st
get :: String -> IO State -> IO Int
get x state = do st <- state
return $ case Map.lookup x st of
Just i -> i
eval :: String -> Op -> String -> IO State -> IO Int
eval l op r state = do i <- get l state
j <- get r state
return $ op i j
main :: IO ()
main = do let state = return Map.empty
let state' = input "x" state
val <- eval "x" (+) "x" state'
putStrLn . show $ val
The second line in the main function simulates the assignment of x, while the third line simulates the evaluation of the binary + operator.
My question is: How do I get around this, such that the code above only inputs once? I suspect that it is the IO-wrapping that causes the problem, but as we're dealing with IO I see no way out of that..?
Remember that IO State is not an actual state, but instead the specification for an IO machine which eventually produces a State. Let's consider input as an IO-machine transformer
input :: String -> IO State -> IO State
input x state = do line <- getLine
st <- state
return $ Map.insert x (read line) st
Here, provided a machine for producing a state, we create a bigger machine which takes that passed state and adding a read from an input line. Again, to be clear, input name st is an IO-machine which is a slight modification of the IO-machine st.
Let's now examine get
get :: String -> IO State -> IO Int
get x state = do st <- state
return $ case Map.lookup x st of
Just i -> i
Here we have another IO-machine transformer. Given a name and an IO-machine which produces a State, get will produce an IO-machine which returns a number. Note again that get name st is fixed to always use the state produced by the (fixed, input) IO-machine st.
Let's combine these pieces in eval
eval :: String -> Op -> String -> IO State -> IO Int
eval l op r state = do i <- get l state
j <- get r state
return $ op i j
Here we call get l and get r each on the same IO-machine state and thus produce two (completely independent) IO-machines get l state and get r state. We then evaluate their IO effects one after another and return the op-combination of their results.
Let's examine the kinds of IO-machines built in main. In the first line we produce a trivial IO-machine, called state, written return Map.empty. This IO-machine, each time it's run, performs no side effects in order to return a fresh, blank Map.Map.
In the second line, we produce a new kind of IO-machine called state'. This IO-machine is based off of the state IO-machine, but it also requests an input line. Thus, to be clear, each time state' runs, a fresh Map.Map is generated and then an input line is read to read some Int, stored at "x".
It should be clear where this is going, but now when we examine the third line we see that we pass state', the IO-machine, into eval. Previously we stated that eval runs its input IO-machine twice, once for each name, and then combines the results. By this point it should be clear what's happening.
All together, we build a certain kind of machine which draws input and reads it as an integer, assigning it to a name in a blank Map.Map. We then build this IO-machine into a larger one which uses the first IO-machine twice, in two separate invocations, in order to collect data and combine it with an Op.
Finally, we run this eval machine using do notation (the (<-) arrow indicates running the machine). Clearly it should collect two separate lines.
So what do we really want to do? Well, we need to simulate ambient state in the IO monad, not just pass around Map.Maps. This is easy to do by using an IORef.
import Data.IORef
input :: IORef State -> String -> IO ()
input ref name = do
line <- getLine
modifyIORef ref (Map.insert name (read line))
eval :: IORef State -> Op -> String -> String -> IO Int
eval ref op l r = do
stateSnapshot <- readIORef ref
let Just i = Map.lookup l stateSnapshot
Just j = Map.lookup l stateSnapshot
return (op i j)
main = do
st <- newIORef Map.empty -- create a blank state, embedded into IO, not a value
input st "x" -- request input *once*
val <- eval st (+) "x" "x" -- compute the op
putStrLn . show $ val
It's fine to wrap your actions such as getLine in IO, but to me it looks like your problem is that you're trying to pass your state in the IO monad. Instead, I think this is probably time you get introduced to monad transformers and how they'll let you layer the IO and State monads to get the functionality of both in one.
Monad transformers are a pretty complex topic and it'll take a while to get to where you're comfortable with them (I'm still learning new things all the time about them), but they're a very useful tool when you need to layer multiple monads. You'll need the mtl library to follow this example.
First, imports
import qualified Data.Map as Map
import Control.Monad.State
Then types
type Op = Int -> Int -> Int
-- Renamed to not conflict with Control.Monad.State.State
type AppState = Map.Map String Int
type Interpreter a = StateT AppState IO a
Here Interpreter is the Monad in which we'll build our interpreter. We also need a way to run the interpreter
-- A utility function for kicking off an interpreter
runInterpreter :: Interpreter a -> IO a
runInterpreter interp = evalStateT interp Map.empty
I figured defaulting to Map.empty was sufficient.
Now, we can build our interpreter actions in our new monad. First we start with input. Instead of returning our new state, we just modify what is current in our map:
input :: String -> Interpreter ()
input x = do
-- IO actions have to be passed to liftIO
line <- liftIO getLine
-- modify is a member of the MonadState typeclass, which StateT implements
modify (Map.insert x (read line))
I had to rename get so that it didn't conflict with get from Control.Monad.State, but it does basically the same thing as before, it just takes our map and looks up that variable in it.
-- Had to rename to not conflict with Control.Monad.State.get
-- Also returns Maybe Int because it's safer
getVar :: String -> Interpreter (Maybe Int)
getVar x = do
-- get is a member of MonadState
vars <- get
return $ Map.lookup x vars
-- or
-- get x = fmap (Map.lookup x) get
Next, eval now just looks up each variable in our map, then uses liftM2 to keep the return value as Maybe Int. I prefer the safety of Maybe, but you can rewrite it if you prefer
eval :: String -> Op -> String -> Interpreter (Maybe Int)
eval l op r = do
i <- getVar l
j <- getVar r
-- liftM2 op :: Maybe Int -> Maybe Int -> Maybe Int
return $ liftM2 op i j
Finally, we write our sample program. It stores user input to the variable "x", adds it to itself, and prints out the result.
-- Now we can write our actions in our own monad
program :: Interpreter ()
program = do
input "x"
y <- eval "x" (+) "x"
case y of
Just y' -> liftIO $ putStrLn $ "y = " ++ show y'
Nothing -> liftIO $ putStrLn "Error!"
-- main is kept very simple
main :: IO ()
main = runInterpreter program
The basic idea is that there is a "base" monad, here IO, and these actions are "lifted" up to the "parent" monad, here StateT AppState. There is a typeclass implementation for the different state operations get, put, and modify in the MonadState typeclass, which StateT implements, and in order to lift IO actions there's a pre-made liftIO function that "lifts" IO actions to the parent monad. Now we don't have to worry about passing around our state explicitly, we can still perform IO, and it has even simplified the code!
I would recommend reading the Real World Haskell chapter on monad transformers to get a better feel for them. There are other useful ones as well, such as ErrorT for handling errors, ReaderT for static configuration, WriterT for aggregating results (usually used for logging), and many others. These can be layered into what is called a transformer stack, and it's not too difficult to make your own either.
Instead of passing an IO State, you can pass State and then use higher-level functions to deal with IO. You can go further and make get and eval free from side-effects:
input :: String -> State -> IO State
input x state = do
line <- getLine
return $ Map.insert x (read line) state
get :: String -> State -> Int
get x state = case Map.lookup x state of
Just i -> i
eval :: String -> Op -> String -> State -> Int
eval l op r state = let i = get l state
j = get r state
in op i j
main :: IO ()
main = do
let state = Map.empty
state' <- input "x" state
let val = eval "x" (+) "x" state'
putStrLn . show $ val
If you're actually building an interpreter, you'll presumably have a list of instructions to execute at some point.
This is my rough translation of your code (although I'm only a beginner myself)
import Data.Map (Map, empty, insert, (!))
import Control.Monad (foldM)
type ValMap = Map String Int
instrRead :: String -> ValMap -> IO ValMap
instrRead varname mem = do
putStr "Enter an int: "
line <- getLine
let intval = (read line)::Int
return $ insert varname intval mem
instrAdd :: String -> String -> String -> ValMap -> IO ValMap
instrAdd varname l r mem = do
return $ insert varname result mem
where result = (mem ! l) + (mem ! r)
apply :: ValMap -> (ValMap -> IO ValMap) -> IO ValMap
apply mem instr = instr mem
main = do
let mem0 = empty
let instructions = [ instrRead "x", instrAdd "y" "x" "x" ]
final <- foldM apply mem0 instructions
print (final ! "y")
putStrLn "done"
The foldM applies a function (apply) to a start value (mem0) and a list (instructions) but does so within a monad.
I have a computation that is best described as iterative mutations on a vector; the final result is the final state of the vector.
The "idiomatic" approach to making this functional, I think, is to simply pass on a new vector object along whenever it is "modified". So your iterative method would be operate_on_vector :: Vector -> Vector, which takes in a vector and outputs the modified vector, which is then fed through the method again.
This method is pretty straightforward and I had no problems implementing it, even being new to Haskell.
Alternatively, one could encapsulate all of this in a State monad and pass along a constantly re-created and modified vector as the state value.
However, I suffer a huge, huge performance cost, as these calculations are pretty intensive, the iterations many (on the order of millions) and the data vectors can get pretty large (on the order of at least thousands of primitives). Re-creating a new vector in memory at every step of the iteration seems pretty costly, data collection or not.
Then I considered how IO works -- it can be seen as basically like State, except the state value is the "World", which is constantly changing.
Maybe I could use something that is like IO to "operate" on a "world"? And the "world" would be the vector in-memory? Sort of like a database query, but everything is in memory.
For example with io you could do
do
putStrLn "enter something"
something <- getLine
putStrLine $ "you entered " ++ something
which can be seen as "performing" putStrLn and "modifying" the World object, returning a new World object and feeding it into the next function, which queryies the world object for a string that is the result of the modification, and then returns another world object after another modification.
Is there anything like that that can do this for mutable vectors?
do
putInVec 0 9 -- index 0, value 9
val <- getFromVec 0
putInVec 0 (val + 1)
, with "impure" "mutable" vectors, instead of passing along a new modified vector at each step.
I believe you can do this using mutable vector and a thin wrapper over Reader + ST (or IO) monad.
It can look like this:
type MyVector = IOVector $x -- Use your own elements type here instead of $x
newtype VectorIO a = VectorIO (ReaderT MyVector IO a) deriving (Monad, MonadReader, MonadIO)
-- You will need GeneralizedNewtypeDeriving extension here
-- Run your computation over an existing vector
runComputation :: MyVector -> VectorIO a -> IO MyVector
runComputation vector (VectorIO action) = runReaderT action vector >> return vector
-- Run your computation over a new vector of the specified length
runNewComputation :: Int -> VectorIO a -> IO MyVector
runNewComputation n action = do
vector <- new n
runComputation vector action
putInVec :: Int -> $x -> VectorIO ()
putInVec idx val = do
v <- ask
liftIO $ write v idx val
getFromVec :: Int -> VectorIO $x
getFromVec idx = do
v <- ask
liftIO $ read v idx
That's really all. You can use VectorIO monad to perform your computations, just like you wanted in your example. If you do not want IO but want pure computations, you can use ST monad; modifications to the code above will be trivial.
Update
Here is an ST-based version:
{-# LANGUAGE GeneralizedNewtypeDeriving, FlexibleInstances, MultiParamTypeClasses, Rank2Types #-}
module Main where
import Control.Monad
import Control.Monad.Trans.Class
import Control.Monad.Reader
import Control.Monad.Reader.Class
import Control.Monad.ST
import Data.Vector as V
import Data.Vector.Mutable as MV
-- Your type of the elements
type E = Int
-- Mutable vector which will be used as a context
type MyVector s = MV.STVector s E
-- Immutable vector compatible with MyVector in its type
type MyPureVector = V.Vector E
-- Simple monad stack consisting of a reader with the mutable vector as a context
-- and of an ST action
newtype VectorST s a = VectorST (ReaderT (MyVector s) (ST s) a) deriving Monad
-- Make the VectorST a reader monad
instance MonadReader (MyVector s) (VectorST s) where
ask = VectorST $ ask
local f (VectorST a) = VectorST $ local f a
reader = VectorST . reader
-- Lift an ST action to a VectorST action
liftST :: ST s a -> VectorST s a
liftST = VectorST . lift
-- Run your computation over an existing vector
runComputation :: MyVector s -> VectorST s a -> ST s (MyVector s)
runComputation vector (VectorST action) = runReaderT action vector >> return vector
-- Run your computation over a new vector of the specified length
runNewComputation :: Int -> VectorST s a -> ST s (MyVector s)
runNewComputation n action = do
vector <- MV.new n
runComputation vector action
-- Run a computation on a new mutable vector and then freeze it to an immutable one
runComputationPure :: Int -> (forall s. VectorST s a) -> MyPureVector
runComputationPure n action = runST $ do
vector <- runNewComputation n action
V.unsafeFreeze vector
-- Put an element into the current vector
putInVec :: Int -> E -> VectorST s ()
putInVec idx val = do
v <- ask
liftST $ MV.write v idx val
-- Retrieve an element from the current vector
getFromVec :: Int -> VectorST s E
getFromVec idx = do
v <- ask
liftST $ MV.read v idx
I am building some moderately large DIMACS files, however with the method used below the memory usage is rather large compared to the size of the files generated, and on some of the larger files I need to generate I run in to out of memory problems.
import Control.Monad.State.Strict
import Control.Monad.Writer.Strict
import qualified Data.ByteString.Lazy.Char8 as B
import Control.Monad
import qualified Text.Show.ByteString as BS
import Data.List
main = printDIMACS "test.cnf" test
test = do
xs <- freshs 100000
forM_ (zip xs (tail xs))
(\(x,y) -> addAll [[negate x, negate y],[x,y]])
type Var = Int
type Clause = [Var]
data DIMACSS = DS{
nextFresh :: Int,
numClauses :: Int
} deriving (Show)
type DIMACSM a = StateT DIMACSS (Writer B.ByteString) a
freshs :: Int -> DIMACSM [Var]
freshs i = do
next <- gets nextFresh
let toRet = [next..next+i-1]
modify (\s -> s{nextFresh = next+i})
return toRet
fresh :: DIMACSM Int
fresh = do
i <- gets nextFresh
modify (\s -> s{nextFresh = i+1})
return i
addAll :: [Clause] -> DIMACSM ()
addAll c = do
tell
(B.concat .
intersperse (B.pack " 0\n") .
map (B.unwords . map BS.show) $ c)
tell (B.pack " 0\n")
modify (\s -> s{numClauses = numClauses s + length c})
add h = addAll [h]
printDIMACS :: FilePath -> DIMACSM a -> IO ()
printDIMACS file f = do
writeFile file ""
appendFile file (concat ["p cnf ", show i, " ", show j, "\n"])
B.appendFile file b
where
(s,b) = runWriter (execStateT f (DS 1 0))
i = nextFresh s - 1
j = numClauses s
I would like to keep the monadic building of clauses since it is very handy, but I need to overcome the memory problem. How do I optimize the above program so that it doesn't use too much memory?
If you want good memory behavior, you need to make sure that you write out the clauses as you generate them, instead of collecting them in memory and dumping them as such, either using lazyness or a more explicit approach such as conduits, enumerators, pipes or the like.
The main obstacle to that approach is that the DIMACS format expects the number of clauses and variables in the header. This prevents the naive implementation from being sufficiently lazy. There are two possibilities:
The pragmatic one is to write the clauses first to a temporary location. After that the numbers are known, so you write them to the real file and append the contents of the temporary file.
The prettier approach is possible if the generation of clauses has no side effects (besides the effects offered by your DIMACSM monad) and is sufficiently fast: Run it twice, first throwing away the clauses and just calculating the numbers, print the header line, run the generator again; now printing the clauses.
(This is from my experience with implementing SAT-Britney, where I took the second approach, because it fitted better with other requirements in that context.)
Also, in your code, addAll is not lazy enough: The list c needs to be retained even after writing (in the MonadWriter sense) the clauses. This is another space leak. I suggest you implement add as the primitive operation and then addAll = mapM_ add.
As explained in Joachim Breitner's answer the problem was that DIMACSM was not lazy enough, both because the strict versions of the monads was used and because the number of variables and clauses are needed before the ByteString can be written to the file. The solution is to use the lazy versions of the Monads and execute them twice. It turns out that it is also necessary to have WriterT be the outer monad:
import Control.Monad.State
import Control.Monad.Writer
...
type DIMACSM a = WriterT B.ByteString (State DIMACSS) a
...
printDIMACS :: FilePath -> DIMACSM a -> IO ()
printDIMACS file f = do
writeFile file ""
appendFile file (concat ["p cnf ", show i, " ", show j, "\n"])
B.appendFile file b
where
s = execState (execWriterT f) (DS 1 0)
b = evalState (execWriterT f) (DS 1 0)
i = nextFresh s - 1
j = numClauses s