We're working on a model filesystem that uses a state monad internally. We have a type class with operations like these:
class Monad m => FS m where
isDirectory :: Path -> m Bool
children :: Path -> m [Path]
...
We're working on a little interactive interpreter that will offer commands like cd, ls, cat, and so on. An operation in the interpreter can be written this way:
fsop :: FS m => Operation -> m Response
The definitions of Operation and Response aren't important; if you like, take them to be strings.
The problem I am trying to solve is to write a top-level loop in the I/O monad that interprets filesystem Operations and prints responses. If IO were an instance of FS (that is, if we were working directly with the IO monad), life would be simple: we could write
loop :: Path -> IO ()
loop currentDir = do
op <- getLine
case read op of
ChangeDir d -> loop d -- should test 'isDirectory d', but let's not
Ls -> do { files <- children currentDir
; mapM_ putStrLn files
; loop currentDir }
Exit -> return ()
But that's not what I want. I want to use Control.Monad.State:
newtype Filesystem a = Filesystem (State (Data.Map.Map Path Contents) a)
and to declare
instance Monad Filesystem ...
instance FS Filesystem ...
Using the FS abstraction, I can write a single-step function that should work with any instance, and indeed the following code compiles:
step :: FS fs => Path -> Operation -> fs (Path, Response)
step currentDir op =
case op of
ChangeDir d -> return (d, "")
Ls -> do { files <- children currentDir
; return (currentDir, unlines files) }
At this point I am totally stuck. What I want to do is write an interactive loop in the IO monad, which can read Operations and print Responses, but which works on a state monad that is not necessarily IO. (One of the reasons for having a model that is not in IO is that so we can test QuickCheck properties.)
I feel like this has to be a standard problem—an interactive read-eval-print loop on top of a stateful abstraction that is not IO—but I must be missing something breathtakingly obvious because I can't seem to figure it out. I've looked online but have not been enlightened.
Any help writing an interactive, IO-performing computation that can call step would be greatly appreciated.
What about using monad transformers? They are more or less standard way to combine monads. Here an simplistic example:
type Foo a = StateT String IO a
replT :: Foo ()
replT = do
str <- liftIO getLine
state <- get
liftIO $ putStrLn ("current state: " ++ state)
liftIO $ putStrLn ("setting state: " ++ str)
put str
replT
Below are results of running replT from within ghci.
*Main> runStateT replT "Initial state"
asd
current state: Initial state
setting state: asd
zxc
current state: asd
setting state: zxc
asdasd
There are three monad transformers libs. mtl, transformers and monadLib. I cannot recommend any of them since I don't use them much.
Disclaimer: I can't promise the following is a good way to go about it, but working through it sounds fun. Let's take it for a spin, shall we?.
A few obligatory imports
First, let's toss some data types out there. I'm going to fill in some details and tweak things a bit, in order to define a simple "file system" that we can actually interact with.
type Path = String
type Response = Maybe String
type Contents = [String]
data Operation = Cd Path
| Ls
| MkDir Path
| Quit
deriving (Read, Show)
Next, we'll do something a bit edgy... strip out all the monads. What? This is madness! Perhaps, but sometimes all the hidden plumbing that >>= provides hides things just a bit too much.
For the file system itself, we'll just store the current working directory and a map from paths to their children. We'll also need a handful of functions to interact with it.
data Filesystem = Filesystem { wd :: Path, files :: M.Map Path Contents }
deriving Show
newFS = Filesystem "/" (M.singleton "/" [])
isDirectory p fs = M.member p $ files fs
children p fs = fromMaybe [] . M.lookup p $ files fs
cd p fs = fs { wd = p }
create p fs = let newPath = wd fs ++ p ++ "/"
addPath = M.insert newPath [] . M.adjust (p:) (wd fs)
in (newPath, fs { files = addPath $ files fs })
Now for a monad-less version of the step function. It needs to take an Operation and a Filesystem, and return a Response and a (possibly modified) Filesystem:
step :: Operation -> Filesystem -> (Response, Filesystem)
step (Cd d) fs = (Just "Ok\n", cd d fs)
step (MkDir d) fs = first (\d -> Just $ "Created " ++ d ++ "\n") $ create d fs
step Ls fs = let files = children (wd fs) fs
in (Just $ unlines files, fs)
step Quit fs = (Nothing, fs)
...hmm, that type signature already looks a lot like the guts of a State monad. Oh well, just ignore it for now, and charge blindly onward.
Now, what we want is a function that will provide a general-purpose interface to a Filesystem interpreter. Particularly, we want the interface to be at least somewhat self-contained so that whatever uses the interface doesn't have to step through manually, yet we want the interface to be sufficiently oblivious to the code using it that we can wire it up to the IO monad, some other Monad, or even no monad at all.
What this tells us primarily is that we'll need to interleave the external code with the interpreter in some fashion, rather than having either part be in control. Now, Haskell is a functional language, so that means that using lots of higher-order functions is good, right? Sounds plausible to me, so here's the strategy we'll use: If a function doesn't know what to do next, we'll hand it another function that we assume does. Repeat until everybody knows what's going on. A flawless plan, no?
The heart of it all is the step function, so we'll start by just calling that.
interp1 :: Operation -> Filesystem -> (Response, Filesystem)
interp1 op fs = step op fs
...well, it's a start. I guess. But wait, where is the Operation coming from? We need the external code to provide that, but we can't just ask for it without getting all mixed up with unsavory characters like IO. So we get another function to do the dirty work for us:
interp2 :: ((Operation -> (Response, Filesystem)) -> t) -> Filesystem -> t
interp2 inp fs = inp (\op -> step op fs)
Of course, now all we have is some stupid t that we don't even know what it is. We know it has to have a Response and a Filesystem in it somewhere, but we can't do anything with it, so we'll hand it back to another function, along with some instructions on how to proceed... which will of course involve passing in yet more functions. It's functions all the way down, you know.
interp3 :: ((Operation -> (Response, Filesystem)) -> a)
-> (a -> ((Response, Filesystem) -> b) -> c)
-> (Filesystem -> b)
-> (String -> Filesystem -> b)
-> Filesystem
-> c
interp3 inp check done out fs = check (inp (\op -> step op fs)) test
where test (Nothing, fs) = done fs
test (Just s, fs) = out s fs
...well that's pretty ugly. But don't worry, all is going according to plan. We can make a couple observations next:
The type a only exists between inp and check, so in hindsight, we might as well combine them ahead of time and just pass the composed function to the interpreter.
When we call done, it ought to mean exactly what it says on the tin. So the return type for done should be the same as the whole interpreter, meaning b and c ought to be the same type.
Now, if done ends the whole thing, what's out? As the name none-too-subtly implies, it's providing output to the external code, but where does it go after that? It needs to loop back into the interpreter somehow, and we might note that our interpreter is not yet recursive. The way forward is clear--the interpreter, like Jormungand, thus seizes its own tail; looping back around indefinitely till the interpretation finishes (or until Ragnarök, whichever comes first).
interp4 :: ((Operation -> (Response, Filesystem))
-> ((Response, Filesystem) -> r) -> r)
-> (Filesystem -> r)
-> (String -> Filesystem -> (Filesystem -> r) -> r)
-> Filesystem
-> r
interp4 checkInp done out fs = checkInp (\op -> step op fs) test
where loop = interp4 checkInp done out
test (Nothing, fs) = done fs
test (Just s, fs) = out s fs loop
...oh, did I mention that it works now? No, seriously!
Here's some IO code to use the interface:
ioIn f k = putStr "> " >> (k . f =<< readLn)
ioDone fs = putStrLn "Done" >> return fs
ioOut x fs k = putStr x >> k fs
ioInterp :: IO Filesystem
ioInterp = interp4 ioIn ioDone ioOut newFS
And here's code that runs a list of commands, producing a list of output strings:
scriptIn f k (x:xs) = k (f x) xs
scriptDone fs xs = ["Done\n"]
scriptOut r fs k xs = r : k fs xs
scriptInterp :: [Operation] -> [String]
scriptInterp = interp4 scriptIn scriptDone scriptOut newFS
Examples of running both in GHCi here, if just the code doesn't tickle your imagination sufficiently.
Well, that's that. Or is it? Frankly, that interpreter is code only a mother could love. Is there something that would tie it all together elegantly? Something to reveal the underlying structure of the code?
...okay, so it's pretty obvious where this leads. The overall design of functions tail-calling each other in circles looks an awful lot like continuation-passing style, and not once but twice in the interpreter's type signature can be found the characteristic pattern (foo -> r) -> r, better known as the continuation monad.
Unfortunately, even after all that, continuations make my head hurt and I'm not sure how best to disentangle the very ad-hoc structure of the interpreter into a computation running in a MonadCont.
I can think of two solutions here:
1) Use a monad transformer library. I can't improve on Shimuuar's reply, except in some details on the libraries. Transformers by itself doesn't provide the necessary instances; you would need to use transformers and either monads-tf or monads-fd, which offer implementations based on type families and fundeps, respectively. I prefer monads-tf if you go this route. The api is almost identical to that of mtl. I don't have experience with MonadLib, but it looks quite good also.
2) Write your main loop in IO, and for each loop iteration call runState to evaluate the state monad. Something like the following:
loop path state = do
op <- readOp
let ((newpath, resp), newstate) = runState (step path op) state
print resp
loop newpath newstate
This should work, but it's far less idiomatic than using monad transformers.
Require your instances of FS to be instance of MonadIO, not just Monad:
class MonadIO m => FS m where ...
Then you will have available the liftIO method to lift FS into IO:
liftIO :: MonadIO m => m a -> IO a
so you can write in the IO monad:
files <- liftIO $ children currentDir
etc. Of course, that means you will need to implement liftIO
for each FS before you even write the FS instance, but for
this application (without having seen the actual details)
it sounds like that should be simple.
Related
My application needs to have several resources open during runtime. I am achieving this by mapping over openFile once at the start of the application.
let filePaths = ["/dev/ttyUSB0", "/dev/ttyUSB1", "/dev/ttyUSB2"]
fileHandles <- mapM (`openFile` ReadWriteMode) filePaths
This code is not safe as it might work for the first 2 file paths but then throw an exception when opening the 3rd file path. In that case i need to close the first 2 file paths that were opened already so i can exit the function without leaking resources. i have looked at the functions and patterns from Control.Exception but not found anything that helps in this case. I have not looked at ResourceT yet. Does it help in this situation?
I think i am looking for a function signature that is similar to this:
safeMapM:: [a] -> (a -> IO b) -> (b -> IO()) -> [b]
where (b -> IO()) is the clean up function that is called when some exception occurred.
Solutions i could think of that are probably not good:
wrapping each element in a Maybe. Exceptions can be catched and result in a Nothing. the mapM could always finish and i could check for Nothing/Exception afterwards and then close all successfully opened file handles by their Just Handle.
using fold instead of map. when an exception at the current element occurs i can close the file handles of all previous elements of the fold and then rethrow the exception to stop the fold from continuing.
If I understand correctly, the problem is how to ensure safe closing of all handles when an exception happens.
For a single file, the usual way of ensuring safety is withFile. The complication here is that you want to open a sequence of files.
Perhaps we could write this auxiliary function that performs nested allocations of withFile and passes the list of Handles to a callback in the innermost level:
nestedWithFile :: [FilePath] -> IOMode -> ([Handle] -> IO r) -> IO r
nestedWithFile filePaths mode callback = go [] filePaths
where
go acc [] =
callback acc -- innermost invocation, protected by the withFiles
go acc (p : ps) =
withFile p mode (\handle -> go (acc ++ [handle]) ps)
Another way of doing it begins by realizing that we are doing something with a replicateM flavor: we are performing an "effect" n times, and returning a list with the results. But what would be the "effect" (that is, the Applicative) here? It seems to be "protecting the allocation of a resource with a wrapper function that ensures release".
This kind of effect seems to require some control over the "rest of the computation", because when the "rest of the computation" finishes in any way, the finalizer must still be run. This points us to the continuation monad transformer, ContT:
import Control.Monad
import Control.Monad.Trans.Cont
import System.IO
openFile' :: FilePath -> ContT r IO Handle
openFile' filePath = ContT (withFile filePath ReadWriteMode)
openSameFileSeveralTimes :: Int -> FilePath -> ContT r IO [Handle]
openSameFileSeveralTimes count filePath = replicateM count (openFile' filePath)
-- The handles are freed when the ([Handle] -> IO r) callback exits
useHandles :: ContT r IO [Handle] -> ([Handle] -> IO r) -> IO r
useHandles = runContT
The continuation transformer might be a bit too general for this purpose. There are libraries like managed which follow the same basic mechanism but are more focused on resource handling.
I'm trying to get performance increases in a program I have that parses XML. The program can parse multiple XML files so I thought that I could make this run in parallel, but all my attempts have resulted in lower performance!
For XML parsing, I am using HXT.
I have a run function defined like this:
run printTasks xs = pExec xs >>= return . concat >>= doPrint printTasks 1
'pExec' is given a list of file names and is defined as:
pExec xs = do
ex <- mapM exec xs
as <- ex `usingIO` parList rdeepseq
return as
where 'exec' is defined as:
exec = runX . process
threadscope shows only one thread e ver being used (until the very end).
Can anyone explain why I have failed so miserably to parallelise this code?
In case it helps:
exec :: FilePath -> [CV_scene]
pExec :: [FilePath] -> IO [[CV_scene]]
data CV_scene = Scene [CV_layer] Time deriving (Show)
data CV_layer = Layer [DirtyRects] SourceCrop deriving (Show)
data Rect = Rect Int Int Int Int deriving (Show)-- Left Top Width Height
instance NFData CV_scene where
rnf = foldScene reduceScene
where reduceScene l t = rnf (seq t l)
instance NFData CV_layer where
rnf = foldLayer reduceLayer
where reduceLayer d s = rnf (seq s d)
instance NFData Rect where
rnf = foldRect reduceRect
where reduceRect l t w h = rnf [l,t,w,h]
type SourceCrop = Rect
type DirtyRect = Rect
type Time = Int64
Thanks in advance for your help!
First, it looks like you mislabeled the signature of exec, which should probably be:
exec :: FilePath -> IO [CV_scene]
Now for the important part. I've commented inline on what I think you think is going on.
pExec xs = do
-- A. Parse the file found at each location via exec.
ex <- mapM exec xs
-- B. Force the lazy parsing in parallel.
as <- ex `usingIO` parList rdeepseq
return as
Note that line A does not happen in paralell, which you might think is okay since it will just set up the parsing thunks which are forced in parallel in B. This is a fair assumption, and a clever use of laziness, but the results pull that into question for me.
I suspect that the implementation of exec forces most of the parsing before line B is even reached so that the deep seq doesn't do much. That fits pretty well with my experince parsing and the profiling supports that explanation.
Without the ability to test your code, I can only make the following suggestions. First try separating the parsing of the file from the IO and put the parsing in the parallel execution strategy. In that case lines A and B become something like:
ex <- mapM readFile xs
as <- ex `usingIO` parList (rdeepseq . exec')
with exec' the portion of exec after the file is read from disk.
exec' :: FilePath -> [CVScene]
Also, you may not even need rdeepSeq after this change.
As an alternative, you can do the IO and parsing in parallel using Software Transactional Memory. STM approaches are normally used for separate IO threads which act more like services, rather than pure computations. But if for some reason you cant get the strategies based approach to work, this might be worth a try.
import Control.Concurrent.STM.TChan --(from stm package)
import Control.Concurrent(forkIO)
pExec'' :: [FilePath] -> IO [[CVSene]]
pExec'' xs = do
-- A. create [(Filename,TChan [CVScene])]
tcx <- mapM (\x -> (x,) <$> newTChanIO) xs
-- B. do the reading/parsing in separate threads
mapM_ (forkIO . exec'') tcx
-- C. Collect the results
cvs <- mapM (atomically . readTChan . snd) tcx
exec'' :: [(FilePath,TChan [CVScene])] -> IO ()
exec'' (x,tch) = do
--D. The original exec function
cv <- exec x
--E. Put on the channel fifo buffer
atomically $ writeTChan tch cv
Good luck!
could you please help me with Turtle library.
I want to write simple program, that calculates disk space usage.
Here is the code:
getFileSize :: FilePath -> IO Size
getFileSize f = do
status <- stat f
return $ fileSize status
main = sh $ do
let sizes = fmap getFileSize $ find (suffix ".hs") "."
so now I have sizes bind of type Shell (IO Size). But I can't just sum it, with sum fold, cause there is IO Size in there. If it was something like [IO Size] I could pull IO monad out of there by using sequence to transform it to IO [Size]. But I can't do this with Shell monad since it is not Traversable. So I wrote something like this
import qualified Control.Foldl as F
main = sh $ do
let sizes = fmap getFileSize $ find (suffix ".hs") "."
lst <- fold sizes F.list
let cont = sequence lst
sz <- liftIO $ cont
liftIO $ putStrLn (show (sum sz))
First I folded Shell (IO Size) to [IO Size] and then to IO [Size] to sum list afterwards.
But I wonder if there is more canonical or elegant solution to this, because here I created two lists to accomplish my task. And I throught that Shell monad is for manipulating entities in constant space. Maybe there is some fold to make IO (Shell Size) from Shell (IO Size)?
Thanks.
You have an IO action, and you really want a Shell action. The usual way to handle that is with the liftIO method, which is available because Shell is an instance of MonadIO.
file <- find (suffix ".hs") "."
size <- liftIO $ getFileSize file
or even
size <- liftIO . getFileSize =<< find (suffix ".hs") "."
Fortunately, the Turtle package itself offers some size functions you can use directly with MonadIO instances like Shell in Turtle.Prelude so you don't need to use liftIO yourself.
Now you actually have to sum these up, but you can do that with fold and sum.
I would recommend that you avoid breaking open the Shell type itself. That should be reserved for adding totally new functionality to the API. That certainly isn't necessary in this case.
Actually I've managed to get rid of IO here by using helper transformation
sio :: Shell (IO a) -> Shell a
sio s = Shell (\(FoldShell step begin done) ->
let step' x a = do
a' <- a
step x a'
in
_foldShell s (FoldShell step' begin done))
But now I wonder is there any simpler solution to this task...
I have a concern regarding how far the introduction of IO trickles through a program. Say a function deep within my program is altered to include some IO; how do I isolate this change to not have to also change every function in the path to IO as well?
For instance, in a simplified example:
a :: String -> String
a s = (b s) ++ "!"
b :: String -> String
b s = '!':(fetch s)
fetch :: String -> String
fetch s = reverse s
main = putStrLn $ a "hello"
(fetch here could more realistically be reading a value from a static Map to give as its result)
But say if due to some business logic change, I needed to lookup the value returned by fetch in some database (which I can exemplify here with a call to getLine):
fetch :: String -> IO String
fetch s = do
x <- getLine
return $ s ++ x
So my question is, how to prevent having to rewrite every function call in this chain?
a :: String -> IO String
a s = fmap (\x -> x ++ "!") (b s)
b :: String -> IO String
b s = fmap (\x -> '!':x) (fetch s)
fetch :: String -> IO String
fetch s = do
x <- getLine
return $ s ++ x
main = a "hello" >>= putStrLn
I can see that refactoring this would be much simpler if the functions themselves did not depend on each other. That is fine for a simple example:
a :: String -> String
a s = s ++ "!"
b :: String -> String
b s = '!':s
fetch :: String -> IO String
fetch s = do
x <- getLine
return $ s ++ x
doit :: String -> IO String
doit s = fmap (a . b) (fetch s)
main = doit "hello" >>= putStrLn
but I don't know if that is necessarily practical in more complicated programs.
The only way I've found thus far to really isolate an IO addition like this is to use unsafePerformIO, but, by its very name, I don't want to do that if I can help it. Is there some other way to isolate this change? If the refactoring is substantial, I would start to feel inclined to avoid having to do it (especially under deadlines, etc).
Thanks for any advice!
Here are a few methods I use.
Reduce dependencies on effects by inverting control. (One of the methods you described in your question.) That is, execute the effects outside and pass the results (or functions with those results partially applied) into pure code. Instead of having main → a → b → fetch, have main → fetch and then main → a → b:
a :: String -> String
a f = b f ++ "!"
b :: String -> String
b f = '!' : f
fetch :: String -> IO String
fetch s = do
x <- getLine
return $ s ++ x
main = do
f <- fetch "hello"
putStrLn $ a f
For more complex cases of this, where you need to thread an argument to do this sort of “dependency injection” through many levels, Reader/ReaderT lets you abstract over the boilerplate.
Write pure code that you expect might need effects in monadic style from the start. (Polymorphic over the choice of monad.) Then if you do eventually need effects in that code, you don’t need to change the implementation, only the signature.
a :: (Monad m) => String -> m String
a s = (++ "!") <$> b s
b :: (Monad m) => String -> m String
b s = ('!' :) <$> fetch s
fetch :: (Monad m) => String -> m String
fetch s = pure (reverse s)
Since this code works for any m with a Monad instance (or in fact just Applicative), you can run it directly in IO, or purely with the “dummy” monad Identity:
main = putStrLn =<< a "hello"
main = putStrLn $ runIdentity $ a "hello"
Then as you need more effects, you can use “mtl style” (as #dfeuer’s answer describes) to enable effects on an as-needed basis, or if you’re using the same monad stack everywhere, just replace m with that concrete type, e.g.:
newtype Fetch a = Fetch { unFetch :: IO a }
deriving (Applicative, Functor, Monad, MonadIO)
a :: String -> Fetch String
a s = pure (b s ++ "!")
b :: String -> Fetch String
b s = ('!' :) <$> fetch s
fetch :: String -> Fetch String
fetch s = do
x <- liftIO getLine
return $ s ++ x
main = putStrLn =<< unFetch (a "hello")
The advantage of mtl style is that you can have multiple different implementations of your effects. That makes things like testing & mocking easy, since you can reuse the logic but run it with different “handlers” for production & testing. In fact, you can get even more flexibility (at the cost of some runtime performance) using an algebraic effects library such as freer-effects, which not only lets the caller change how each effect is handled, but also the order in which they’re handled.
Roll up your sleeves and do the refactoring. The compiler will tell you everywhere that needs to be updated anyway. After enough times doing this, you’ll naturally end up recognising when you’re writing code that will require this refactoring later, so you’ll consider effects from the beginning and not run into the problem.
You’re quite right to doubt unsafePerformIO! It’s not just unsafe because it breaks referential transparency, it’s unsafe because it can break type, memory, and concurrency safety as well—you can use it to coerce any type to any other, cause a segfault, or cause deadlocks and concurrency errors that would ordinarily be impossible. You’re telling the compiler that some code is pure, so it’s going to assume it can do all the transformations it does with pure code—such as duplicating, reordering, or even dropping it, which may completely change the correctness and performance of your code.
The main legitimate use cases for unsafePerformIO are things like using the FFI to wrap foreign code (that you know is pure), or doing GHC-specific performance hacks; stay away from it otherwise, since it’s not meant as an “escape hatch” for ordinary code.
First off, the refactoring doesn't tend to be as bad as you might imagine. Once you make the first change, the type checker will point you to the next few, and so on. But suppose you have a reason to suspect from the start that you might need some extra capability to make a function go. A common way to do this (called mtl-style, after the monad transformer library) is to express your needs in a constraint.
class Monad m => MonadFetch m where
fetch :: String -> m String
a :: MonadFetch m => String -> m String
a s = fmap (\x -> x ++ "!") (b s)
b :: MonadFetch m => String -> m String
b s = fmap (\x -> '!':x) (fetch s)
instance MonadFetch IO where
-- fetch :: String -> IO String
fetch s = do
x <- getLine
return $ s ++ x
instance MonadFetch Identity where
-- fetch :: String -> Identity String
fetch = Identity . reverse
You're no longer tied to a particular monad: you just need one that can fetch. Code operating on an arbitrary MonadFetch instance is pure, except that it can fetch.
I am building some moderately large DIMACS files, however with the method used below the memory usage is rather large compared to the size of the files generated, and on some of the larger files I need to generate I run in to out of memory problems.
import Control.Monad.State.Strict
import Control.Monad.Writer.Strict
import qualified Data.ByteString.Lazy.Char8 as B
import Control.Monad
import qualified Text.Show.ByteString as BS
import Data.List
main = printDIMACS "test.cnf" test
test = do
xs <- freshs 100000
forM_ (zip xs (tail xs))
(\(x,y) -> addAll [[negate x, negate y],[x,y]])
type Var = Int
type Clause = [Var]
data DIMACSS = DS{
nextFresh :: Int,
numClauses :: Int
} deriving (Show)
type DIMACSM a = StateT DIMACSS (Writer B.ByteString) a
freshs :: Int -> DIMACSM [Var]
freshs i = do
next <- gets nextFresh
let toRet = [next..next+i-1]
modify (\s -> s{nextFresh = next+i})
return toRet
fresh :: DIMACSM Int
fresh = do
i <- gets nextFresh
modify (\s -> s{nextFresh = i+1})
return i
addAll :: [Clause] -> DIMACSM ()
addAll c = do
tell
(B.concat .
intersperse (B.pack " 0\n") .
map (B.unwords . map BS.show) $ c)
tell (B.pack " 0\n")
modify (\s -> s{numClauses = numClauses s + length c})
add h = addAll [h]
printDIMACS :: FilePath -> DIMACSM a -> IO ()
printDIMACS file f = do
writeFile file ""
appendFile file (concat ["p cnf ", show i, " ", show j, "\n"])
B.appendFile file b
where
(s,b) = runWriter (execStateT f (DS 1 0))
i = nextFresh s - 1
j = numClauses s
I would like to keep the monadic building of clauses since it is very handy, but I need to overcome the memory problem. How do I optimize the above program so that it doesn't use too much memory?
If you want good memory behavior, you need to make sure that you write out the clauses as you generate them, instead of collecting them in memory and dumping them as such, either using lazyness or a more explicit approach such as conduits, enumerators, pipes or the like.
The main obstacle to that approach is that the DIMACS format expects the number of clauses and variables in the header. This prevents the naive implementation from being sufficiently lazy. There are two possibilities:
The pragmatic one is to write the clauses first to a temporary location. After that the numbers are known, so you write them to the real file and append the contents of the temporary file.
The prettier approach is possible if the generation of clauses has no side effects (besides the effects offered by your DIMACSM monad) and is sufficiently fast: Run it twice, first throwing away the clauses and just calculating the numbers, print the header line, run the generator again; now printing the clauses.
(This is from my experience with implementing SAT-Britney, where I took the second approach, because it fitted better with other requirements in that context.)
Also, in your code, addAll is not lazy enough: The list c needs to be retained even after writing (in the MonadWriter sense) the clauses. This is another space leak. I suggest you implement add as the primitive operation and then addAll = mapM_ add.
As explained in Joachim Breitner's answer the problem was that DIMACSM was not lazy enough, both because the strict versions of the monads was used and because the number of variables and clauses are needed before the ByteString can be written to the file. The solution is to use the lazy versions of the Monads and execute them twice. It turns out that it is also necessary to have WriterT be the outer monad:
import Control.Monad.State
import Control.Monad.Writer
...
type DIMACSM a = WriterT B.ByteString (State DIMACSS) a
...
printDIMACS :: FilePath -> DIMACSM a -> IO ()
printDIMACS file f = do
writeFile file ""
appendFile file (concat ["p cnf ", show i, " ", show j, "\n"])
B.appendFile file b
where
s = execState (execWriterT f) (DS 1 0)
b = evalState (execWriterT f) (DS 1 0)
i = nextFresh s - 1
j = numClauses s