I am trying to test a logic of custom data type. It receives a Map Int String as a parameter and then I need to add an element into the Map inside the object.
Type declaration and insertion function look like this:
import qualified Data.IntMap.Strict as M
import Data.UUID (UUID)
import Control.Monad.State
import System.Random
type StrMap = M.IntMap String
type MType = State StdGen
data MyType = MyType {
uuid :: UUID,
strs :: StrMap
} deriving (Show)
create :: StrMap -> MType MyType
create pm = do
state <- get
let (uuid, newState) = random state
put newState
return $ MyType uuid pm
strsSize :: MyType -> Int
strsSize e = M.size $ strs e
addStr :: MyType -> String -> MyType
addStr e p = e { strs = M.insert (strsSize e) p $ strs e }
It is important to have sequential keys in the Map, so having [0, 1, 3] is not acceptable.
I was trying to test it using HSpec with QuickCheck:
main :: IO ()
main = hspec spec
spec :: Spec
spec = describe "Creation and update" $ do
QuickCheck.prop "Check map addition" $ do
\xs str -> monadicIO $ do
state <- run(getStdGen)
let (result, newState) = runState (create xs) state
run(setStdGen newState)
let result' = addStr result str
assert $ (strsSize result) + 1 == strsSize result' -- fails here
The problem is that QuickCheck generates random keys and I am not sure how do I force it to generate a sequential keys for the Map. The problem with absense of the sequense is that function addStr may override values in case of repetetive keys, which is not desirable behavior.
UPDATE
Thanks for all the help! After a long discussion and some kind of a thinking I ended up with the following solution:
spec :: Spec
spec = describe "Creation and update" $ do
QuickCheck.prop "Check map addition" $ do
\xs str -> not (null xs) Property.==> monadicIO $ do
state <- run(getStdGen)
let mp = M.fromList $ zip [0..(length xs)] xs
let (result, newState) = runState (create mp) state
run(setStdGen newState)
let result' = addStr result str
assert $ (strsSize result) + 1 == strsSize result'
Basically, I had to generate some random set of strings and them convert in into a map manually. It is probably not the most elegant solution, but it works as needed.
Instead of using QuickCheck to generate arbitrary data that satisfies some complex invariant, which can be difficult, you can use QuickCheck to generate fully arbitrary data from which you can then construct data that satisfies the invariant (by some method external to the system being tested which you trust to be correct).
The invariant in this case is given as "keys must be contiguous", but is actually "keys must be contiguous and start from 0". This is sufficient, but more than necessary. The minimal invariant required by addStr is "the map must not contain a key that is the size of the map", since that is the key we intend to insert. By simplifying the constraint, we also make it easier to satisfy: we can generate an arbitrary map (which may contain the bad key) and then delete the bad key, giving a satisfactory map.
I'll also note that the UUID (and thus the mechanism for generating it, which requires State and perhaps IO) is irrelevant to the property being tested. This means we can construct the MyType with any UUID we have lying around (like the nil UUID provided by the package) and avoid the monadic stuff:
spec :: Spec
spec = describe "Creation and update" $ do
QuickCheck.prop "Check map addition" $ do
\strmap -> -- we don't actually care what the String being inserted is for this test
let myType = MyType UUID.nil (M.delete (M.size strmap) strmap) -- Enforce the invariant
in assert $ strsSize (addStr myType "") = strsSize myType + 1
If you wanted to, you could also make an instance of Arbitrary for MyType that does something like this, or something that satisfies the stronger invariant (which may be required for other tests). I'll leave that as an exercise for you, but feel free to ask more questions if you get stuck trying it.
Related
I am new to Haskell and need help. I am trying to build a new data type that has to be somehow unique, so I decided to use UUID as a unique identifier:
data MyType = MyType {
uuid :: UUID,
elements :: AnotherType
}
in this way, I can do following:
instance Eq MyType where
x == y = uuid x == uuid y
x /= y = not (x == y)
The problem is that all known (to me) UUID generators produce IO UUID, but I need to use it in a pure code as mentioned above. Could you please suggest if there is any way to extract UUID out of IO UUID, or maybe be there is a better way to do what I need in Haskell? Thanks.
UPDATE
Thanks for all the great suggestions and the code example. From what is posted here I can say you cannot break a referential transparency, but there are smart ways how to solve the problem without breaking it and, probably the most optimal one, is listed in the answer below.
There is also one alternative approach that I was able to explore myself based on provided recommendations with the usage of State Monad:
type M = State StdGen
type AnotherType = String
data MyType = MyType {
uuid :: UUID,
elements :: AnotherType
} deriving (Show)
mytype :: AnotherType -> M MyType
mytype x = do
gen <- get
let (val, gen') = random gen
put gen'
return $ MyType val x
main :: IO ()
main = do
state <- getStdGen
let (result, newState) = runState (mytype "Foo") state
putStrLn $ show result
let (result', newState') = runState (mytype "Bar") newState
setStdGen newState'
putStrLn $ show result'
Not sure if it is the most elegant implementation, but it works.
If you're looking at the functions in the uuid package, then UUID has a Random instance. This means that it's possible to generate a sequence of random UUIDs in pure code using standard functions from System.Random using a seed:
import System.Random
import Data.UUID
someUUIDs :: [UUID]
someUUIDs =
let seed = 123
g0 = mkStdGen seed -- RNG from seed
(u1, g1) = random g0
(u2, g2) = random g1
(u3, g3) = random g2
in [u1,u2,u3]
Note that someUUIDs creates the same three "unique" UUIDs every time it's called because the seed is hard-coded.
As with all pure Haskell code, unless you cheat (using unsafe functions), you can't expect to generate a sequence of actually unique UUIDs without explicitly passing some state (in this case, a StdGen RNG) between calls to random.
The usual solution to avoid the ugly boilerplate of passing the generator around is to run at least part of your code within a monad that can maintain the needed state. Some people like to use the MonadRandom package, though you can also use the regular State monad with a StdGen somewhere in the state. The main advantages of MonadRandom over State is that you get some dedicated syntax (getRandom) and can create a monad stack that includes both RandomT and StateT so you can separate your RNG state from the rest of your application state.
Using MonadRandom, you might write an application like:
import Control.Monad.Random.Strict
import System.Random
import Data.UUID
-- monad for the application
type M = Rand StdGen
-- get a generator and run the application in "M"
main :: IO ()
main = do
g <- getStdGen -- get a timestamp-seeded generator
let log = evalRand app g -- run the (pure) application in the monad
putStr log
-- the "pure" application, running in monad "M"
app :: M String
app = do
foo <- myType "foo"
bar <- myType "bar"
-- do some processing
return $ unlines ["Results:", show foo, show bar]
type AnotherType = String
data MyType = MyType {
uuid :: UUID,
elements :: AnotherType
} deriving (Show)
-- smart constructor for MyType with unique UUID
myType :: AnotherType -> M MyType
myType x = MyType <$> getRandom <*> pure x
Note that substantial parts of the application will need to be written in monadic syntax and run in the application M monad. This isn't a big restriction -- most non-trivial applications are going to be written in some monad.
I am new to Haskell, so maybe I am missing some fundamental concepts here (or maybe failed to find the appropriate extension). I was wondering if there was a way to optimize or further abstract the following scenario. This code seems very redundant.
Let's say I have the following data classes:
data Person = Person
{ personName :: !String
, personAge :: !Int
} deriving Show
data Dog = Dog
{ dogName :: !String
, dogAge :: !Int
} deriving Show
Let's say I have a service and I'm only concerned with outputing records as strings. In reality, the strings will probably be JSON and the records fetched from the DB, but let's take a simpler case. I basically need a URL token to fetch an appropriate object (say, the string "dog" will get me a Dog, or even just the Haskell "show" string, without expressly declaring it as (value)::Dog).
I have attempted to implement this in several ways...the only thing that seems to work is the following:
data Creature = DogC Dog
| PersonC Person
deriving Show
fromString :: String -> Maybe Creature
fromString "dog" = Just $ DogC $ Dog "muffin" 8
fromString "person" = Just $ PersonC $ Person "John" 22
fromString _ = Nothing
main :: IO ()
main = do
putStrLn $ show $ fromString "dog"
I'm not entirely fond of the new type, nor the list of fromString declarations. And to benefit from the original data declarations, I would probably need to write a similarly tedious expression (eg, "fromCreature") to revert Creature back into my original types. This information might change, so I would probably need TH for a few of the declarations...
Is there a way around some of this? I fiddled with GADTs and classes, but both seemed to be dependent on type- rather than value- based polymorphism (A string identifier tends to cause issues with ambiguous instances). It would be nice to map the constructor to a string (Say, with Data.Map), but constructors often have different kinds.
Update
So, I went with an approach that isn't exactly relevant to the question I had asked, but it may be useful to someone. I did want to maintain some record types, but most didn't add much value and were getting in my way. The steps I had followed went something like:
Use a different/lower-level DB driver, that returns workable types (eg, [ColumnDef] and [[SQLValue]] instead of tuples and records...).
Create ToJSON instances for SQLValue -- most of the types were covered, except a few ByteString types, and I had to handle the conversion of SQLNull to Null. To maintain compatibility with some record types, my default handler looked like: toJSON = genericToJSON defaultOptions { sumEncoding = UnTaggedValue} The untagged value should allow one to read the JSON into defined data types (eg, Dog / Person ) if desired....
Given that column name is accessible from ColumnDef, I wrote an expression that zips [ColumnDef] and [SqlValue] to a list of Aeson-compatible key-value pairs, eg: toJsPairs :: [ColumnDef] -> [SqlValue] -> [(Text,Value)]
Then, I wrote an expression to fetch the JSON from a table name, which more or less serves as my "universal dispatcher." It references a list of authorized tables, so it's less crazy than it might sound.
The code looked a bit like this (using mysql-haskell).
{-# LANGUAGE OverloadedStrings #-}
import qualified Control.Applicative as App
import Database.MySQL.Base
import qualified System.IO.Streams as Streams
import Data.Aeson (FromJSON, ToJSON)
import Data.Aeson.Encode.Pretty (encodePretty)
import Data.Aeson.Types
import Data.Text.Encoding
import Data.String (fromString)
import Data.ByteString.Internal
import qualified Data.ByteString.Lazy.Internal as BLI
import Data.HashMap.Strict (fromList)
appConnectInfo = defaultConnectInfo {
ciUser = "some_user"
, ciPassword = "some_password"
, ciDatabase = "some_db"
}
instance FromJSON ByteString where
parseJSON (String s) = pure $ encodeUtf8 s
parseJSON _ = App.empty
instance ToJSON ByteString where
toJSON = String . decodeUtf8
instance ToJSON MySQLValue where
toJSON (MySQLNull) = Null
toJSON x = genericToJSON defaultOptions
{ sumEncoding = UntaggedValue } x
-- This expression should fail on dimensional mismatch.
-- It's stupidly lenient, but really dimensional mismatch should
-- never occur...
toJsPairs :: [ColumnDef] -> [MySQLValue] -> [(Text,Value)]
toJsPairs [] _ = []
toJsPairs _ [] = []
toJsPairs (x:xs) (y:ys) = (txt x, toJSON y):toJsPairs xs ys
where
-- Implement any modifications to the key names here
txt = decodeUtf8.columnName
listRecords :: String -> IO BLI.ByteString
listRecords tbl = do
conn <- connect appConnectInfo
-- This is clearly an injection vulnerability.
-- Implemented, however, the values for 'tbl' are intensely
-- vetted. This is just an example.
(defs, is) <- query_ conn $ fromString ( "SELECT * FROM `" ++ tbl ++ "` LIMIT 100")
rcrds <- Streams.toList is
return $ encodePretty $ map (jsnobj defs) rcrds
where
jsnobj :: [ColumnDef] -> [MySQLValue] -> Value
jsnobj defs x = Object $ fromList $ toJsPairs defs x
If what you want to consume at the end is json value - it might make sense to
represent result as json value using aeson library:
{-# LANGUAGE DeriveGeneric #-}
import Data.Aeson
import GHC.Generics
data Dog = Dog Int String deriving (Show, Generic)
data Cat = Cat Int String deriving (Show, Generic)
-- here I'm using instance derived with generics, but you can write one by
-- hands
instance ToJSON Dog
instance ToJSON Cat
-- actions to get stuff from db
getDog :: Monad m => Int -> m Dog
getDog i = return (Dog i (show i))
getCat :: Monad m => Int -> m Cat
getCat i = return (Cat i (show i))
-- dispatcher - picks which action to use
getAnimal :: Monad m => String -> Int -> m (Maybe Value)
getAnimal "dog" i = Just . toJSON <$> getDog i
getAnimal "cat" i = Just . toJSON <$> getCat i
getAnimal _ _ = return Nothing
main :: IO ()
main = do
getAnimal "dog" 2 >>= print
getAnimal "cat" 3 >>= print
getAnimal "chupakabra" 12 >>= print
High energy magic version
class Monad m => MonadAnimal m where
-- basically you want something that fetches extra argumets from HTTP or
-- whatevere, perform DB query and so on.
class Animal a where
animalName :: Proxy a -> String
animalGetter :: MonadAnimal m => m a
locateAnimals :: MonadAnimal m => Q [(String, m Value)]
locateAnimals -- implement using TH (reify function is your friend). It should look for
-- all the animal instances in scope and make a list from them with serialized
-- fetcher.
-- with that in place dispatcher should be easy to implement
I need to backup some data to access it later.
At the interface level, I have two functions:
put: backs up data and returns a backup_Id.
get: retrieves data given a backup_Id.
My current code requires me to supply these two functions with the backup parameter.
import Data.Maybe
data Data = Data String deriving Show
type Backup = [(String,Data)]
put :: Backup -> String -> IO Backup
put boilerPlate a =
do let id = "id" ++ show(length (boilerPlate))
putStrLn $ id ++": " ++ a
return ((id,(Data a)):boilerPlate)
get :: Backup -> String -> Maybe Data
get boilerPlate id = lookup id (boilerPlate)
It works OK.
In the following sample, two values are backed up. The second one is retrieved.
main :: IO ()
main = do
let bp0 = []
bp1 <- put bp0 "a"
bp2 <- put bp1 "b"
let result = get bp2 "id1"
putStrLn $ "Looking for id1: " ++ show (fromJust(result))
But I need to simplify the signatures of put and get by getting rid of all the backup parameters.
I need something that looks like this:
main = do
put "a"
put "b"
let result = get "id1"
What is the simplest way to achieve this?
Here's an example using StateT. Note that the function names are changed because State and StateT already have get and put functions.
module Main where
import Control.Monad.State
data Data = Data String deriving Show
type Backup = [(String,Data)]
save :: String -> StateT Backup IO ()
save a = do
backup <- get
let id = "id" ++ ((show . length) backup)
liftIO $ putStrLn $ id ++ ": " ++ a
put ((id, Data a):backup)
retrieve :: String -> StateT Backup IO (Maybe Data)
retrieve id = do
backup <- get
return $ lookup id backup
run :: IO (Maybe Data)
run = flip evalStateT [] $ do
save "a"
save "b"
retrieve "id1"
main :: IO ()
main = do
result <- run
print result
The State monad threads a 'mutable' value through a computation. StateT combines State with other monads; in this case, allowing the use of IO.
As dfeuer mentioned, it is possible to make save and retrieve a bit more general with these types:
save :: (MonadState Backup m, MonadIO m) => String -> m ()
retrieve :: (MonadState Backup m, MonadIO m) => String -> m (Maybe Data)
(This also requires {-# LANGUAGE FlexibleContexts #-}) The advantage of this approach is that it allows our functions to work with any monad that provides the Backup state and IO. In particular, we can add effects to the monad and the functions will still work.
All this monad / monad transformer stuff can be pretty confusing at first, but it's actually pretty elegant once you get used to it. The advantage is that you can easily see what kind of effects are required in each function. That being said, I don't want you to think that there are things that Haskell can't do, so here's another way to achieve your goal which does away with the state monad in favor of a mutable reference.
module Main where
import Data.IORef
data Data = Data String deriving Show
type Backup = [(String,Data)]
mkSave :: IORef Backup -> String -> IO ()
mkSave r a = do
backup <- readIORef r
let id = "id" ++ ((show . length) backup)
putStrLn $ id ++ ": " ++ a
writeIORef r ((id, Data a):backup)
mkRetrieve :: IORef Backup -> String -> IO (Maybe Data)
mkRetrieve r id = do
backup <- readIORef r
return $ lookup id backup
main :: IO ()
main = do
ref <- newIORef []
let save = mkSave ref
retrieve = mkRetrieve ref
save "a"
save "b"
result <- retrieve "id0"
print result
Just be warned that this isn't usually the recommended approach.
Being quite new to Haskell, I'm currently trying to improve my skills by writing an interpreter for a simple imperative toy language.
One of the expressions in this language is input, which reads a single integer from standard input. However, when I assign the value of this expression to a variable and then use this variable later, it seems ot me that I actually stored the computation of reading a value rather the read value itself. This means that e.g. the statements
x = input;
y = x + x;
will cause the interpreter to invoke the input procedure three times rather than one.
Internally in the evaluator module, I use a Map to store the values of variables. Because I need to deal with IO, this gets wrapped in an IO monad, as immortalized in the following minimal example:
import qualified Data.Map as Map
type State = Map.Map String Int
type Op = Int -> Int -> Int
input :: String -> IO State -> IO State
input x state = do line <- getLine
st <- state
return $ Map.insert x (read line) st
get :: String -> IO State -> IO Int
get x state = do st <- state
return $ case Map.lookup x st of
Just i -> i
eval :: String -> Op -> String -> IO State -> IO Int
eval l op r state = do i <- get l state
j <- get r state
return $ op i j
main :: IO ()
main = do let state = return Map.empty
let state' = input "x" state
val <- eval "x" (+) "x" state'
putStrLn . show $ val
The second line in the main function simulates the assignment of x, while the third line simulates the evaluation of the binary + operator.
My question is: How do I get around this, such that the code above only inputs once? I suspect that it is the IO-wrapping that causes the problem, but as we're dealing with IO I see no way out of that..?
Remember that IO State is not an actual state, but instead the specification for an IO machine which eventually produces a State. Let's consider input as an IO-machine transformer
input :: String -> IO State -> IO State
input x state = do line <- getLine
st <- state
return $ Map.insert x (read line) st
Here, provided a machine for producing a state, we create a bigger machine which takes that passed state and adding a read from an input line. Again, to be clear, input name st is an IO-machine which is a slight modification of the IO-machine st.
Let's now examine get
get :: String -> IO State -> IO Int
get x state = do st <- state
return $ case Map.lookup x st of
Just i -> i
Here we have another IO-machine transformer. Given a name and an IO-machine which produces a State, get will produce an IO-machine which returns a number. Note again that get name st is fixed to always use the state produced by the (fixed, input) IO-machine st.
Let's combine these pieces in eval
eval :: String -> Op -> String -> IO State -> IO Int
eval l op r state = do i <- get l state
j <- get r state
return $ op i j
Here we call get l and get r each on the same IO-machine state and thus produce two (completely independent) IO-machines get l state and get r state. We then evaluate their IO effects one after another and return the op-combination of their results.
Let's examine the kinds of IO-machines built in main. In the first line we produce a trivial IO-machine, called state, written return Map.empty. This IO-machine, each time it's run, performs no side effects in order to return a fresh, blank Map.Map.
In the second line, we produce a new kind of IO-machine called state'. This IO-machine is based off of the state IO-machine, but it also requests an input line. Thus, to be clear, each time state' runs, a fresh Map.Map is generated and then an input line is read to read some Int, stored at "x".
It should be clear where this is going, but now when we examine the third line we see that we pass state', the IO-machine, into eval. Previously we stated that eval runs its input IO-machine twice, once for each name, and then combines the results. By this point it should be clear what's happening.
All together, we build a certain kind of machine which draws input and reads it as an integer, assigning it to a name in a blank Map.Map. We then build this IO-machine into a larger one which uses the first IO-machine twice, in two separate invocations, in order to collect data and combine it with an Op.
Finally, we run this eval machine using do notation (the (<-) arrow indicates running the machine). Clearly it should collect two separate lines.
So what do we really want to do? Well, we need to simulate ambient state in the IO monad, not just pass around Map.Maps. This is easy to do by using an IORef.
import Data.IORef
input :: IORef State -> String -> IO ()
input ref name = do
line <- getLine
modifyIORef ref (Map.insert name (read line))
eval :: IORef State -> Op -> String -> String -> IO Int
eval ref op l r = do
stateSnapshot <- readIORef ref
let Just i = Map.lookup l stateSnapshot
Just j = Map.lookup l stateSnapshot
return (op i j)
main = do
st <- newIORef Map.empty -- create a blank state, embedded into IO, not a value
input st "x" -- request input *once*
val <- eval st (+) "x" "x" -- compute the op
putStrLn . show $ val
It's fine to wrap your actions such as getLine in IO, but to me it looks like your problem is that you're trying to pass your state in the IO monad. Instead, I think this is probably time you get introduced to monad transformers and how they'll let you layer the IO and State monads to get the functionality of both in one.
Monad transformers are a pretty complex topic and it'll take a while to get to where you're comfortable with them (I'm still learning new things all the time about them), but they're a very useful tool when you need to layer multiple monads. You'll need the mtl library to follow this example.
First, imports
import qualified Data.Map as Map
import Control.Monad.State
Then types
type Op = Int -> Int -> Int
-- Renamed to not conflict with Control.Monad.State.State
type AppState = Map.Map String Int
type Interpreter a = StateT AppState IO a
Here Interpreter is the Monad in which we'll build our interpreter. We also need a way to run the interpreter
-- A utility function for kicking off an interpreter
runInterpreter :: Interpreter a -> IO a
runInterpreter interp = evalStateT interp Map.empty
I figured defaulting to Map.empty was sufficient.
Now, we can build our interpreter actions in our new monad. First we start with input. Instead of returning our new state, we just modify what is current in our map:
input :: String -> Interpreter ()
input x = do
-- IO actions have to be passed to liftIO
line <- liftIO getLine
-- modify is a member of the MonadState typeclass, which StateT implements
modify (Map.insert x (read line))
I had to rename get so that it didn't conflict with get from Control.Monad.State, but it does basically the same thing as before, it just takes our map and looks up that variable in it.
-- Had to rename to not conflict with Control.Monad.State.get
-- Also returns Maybe Int because it's safer
getVar :: String -> Interpreter (Maybe Int)
getVar x = do
-- get is a member of MonadState
vars <- get
return $ Map.lookup x vars
-- or
-- get x = fmap (Map.lookup x) get
Next, eval now just looks up each variable in our map, then uses liftM2 to keep the return value as Maybe Int. I prefer the safety of Maybe, but you can rewrite it if you prefer
eval :: String -> Op -> String -> Interpreter (Maybe Int)
eval l op r = do
i <- getVar l
j <- getVar r
-- liftM2 op :: Maybe Int -> Maybe Int -> Maybe Int
return $ liftM2 op i j
Finally, we write our sample program. It stores user input to the variable "x", adds it to itself, and prints out the result.
-- Now we can write our actions in our own monad
program :: Interpreter ()
program = do
input "x"
y <- eval "x" (+) "x"
case y of
Just y' -> liftIO $ putStrLn $ "y = " ++ show y'
Nothing -> liftIO $ putStrLn "Error!"
-- main is kept very simple
main :: IO ()
main = runInterpreter program
The basic idea is that there is a "base" monad, here IO, and these actions are "lifted" up to the "parent" monad, here StateT AppState. There is a typeclass implementation for the different state operations get, put, and modify in the MonadState typeclass, which StateT implements, and in order to lift IO actions there's a pre-made liftIO function that "lifts" IO actions to the parent monad. Now we don't have to worry about passing around our state explicitly, we can still perform IO, and it has even simplified the code!
I would recommend reading the Real World Haskell chapter on monad transformers to get a better feel for them. There are other useful ones as well, such as ErrorT for handling errors, ReaderT for static configuration, WriterT for aggregating results (usually used for logging), and many others. These can be layered into what is called a transformer stack, and it's not too difficult to make your own either.
Instead of passing an IO State, you can pass State and then use higher-level functions to deal with IO. You can go further and make get and eval free from side-effects:
input :: String -> State -> IO State
input x state = do
line <- getLine
return $ Map.insert x (read line) state
get :: String -> State -> Int
get x state = case Map.lookup x state of
Just i -> i
eval :: String -> Op -> String -> State -> Int
eval l op r state = let i = get l state
j = get r state
in op i j
main :: IO ()
main = do
let state = Map.empty
state' <- input "x" state
let val = eval "x" (+) "x" state'
putStrLn . show $ val
If you're actually building an interpreter, you'll presumably have a list of instructions to execute at some point.
This is my rough translation of your code (although I'm only a beginner myself)
import Data.Map (Map, empty, insert, (!))
import Control.Monad (foldM)
type ValMap = Map String Int
instrRead :: String -> ValMap -> IO ValMap
instrRead varname mem = do
putStr "Enter an int: "
line <- getLine
let intval = (read line)::Int
return $ insert varname intval mem
instrAdd :: String -> String -> String -> ValMap -> IO ValMap
instrAdd varname l r mem = do
return $ insert varname result mem
where result = (mem ! l) + (mem ! r)
apply :: ValMap -> (ValMap -> IO ValMap) -> IO ValMap
apply mem instr = instr mem
main = do
let mem0 = empty
let instructions = [ instrRead "x", instrAdd "y" "x" "x" ]
final <- foldM apply mem0 instructions
print (final ! "y")
putStrLn "done"
The foldM applies a function (apply) to a start value (mem0) and a list (instructions) but does so within a monad.
I am building some moderately large DIMACS files, however with the method used below the memory usage is rather large compared to the size of the files generated, and on some of the larger files I need to generate I run in to out of memory problems.
import Control.Monad.State.Strict
import Control.Monad.Writer.Strict
import qualified Data.ByteString.Lazy.Char8 as B
import Control.Monad
import qualified Text.Show.ByteString as BS
import Data.List
main = printDIMACS "test.cnf" test
test = do
xs <- freshs 100000
forM_ (zip xs (tail xs))
(\(x,y) -> addAll [[negate x, negate y],[x,y]])
type Var = Int
type Clause = [Var]
data DIMACSS = DS{
nextFresh :: Int,
numClauses :: Int
} deriving (Show)
type DIMACSM a = StateT DIMACSS (Writer B.ByteString) a
freshs :: Int -> DIMACSM [Var]
freshs i = do
next <- gets nextFresh
let toRet = [next..next+i-1]
modify (\s -> s{nextFresh = next+i})
return toRet
fresh :: DIMACSM Int
fresh = do
i <- gets nextFresh
modify (\s -> s{nextFresh = i+1})
return i
addAll :: [Clause] -> DIMACSM ()
addAll c = do
tell
(B.concat .
intersperse (B.pack " 0\n") .
map (B.unwords . map BS.show) $ c)
tell (B.pack " 0\n")
modify (\s -> s{numClauses = numClauses s + length c})
add h = addAll [h]
printDIMACS :: FilePath -> DIMACSM a -> IO ()
printDIMACS file f = do
writeFile file ""
appendFile file (concat ["p cnf ", show i, " ", show j, "\n"])
B.appendFile file b
where
(s,b) = runWriter (execStateT f (DS 1 0))
i = nextFresh s - 1
j = numClauses s
I would like to keep the monadic building of clauses since it is very handy, but I need to overcome the memory problem. How do I optimize the above program so that it doesn't use too much memory?
If you want good memory behavior, you need to make sure that you write out the clauses as you generate them, instead of collecting them in memory and dumping them as such, either using lazyness or a more explicit approach such as conduits, enumerators, pipes or the like.
The main obstacle to that approach is that the DIMACS format expects the number of clauses and variables in the header. This prevents the naive implementation from being sufficiently lazy. There are two possibilities:
The pragmatic one is to write the clauses first to a temporary location. After that the numbers are known, so you write them to the real file and append the contents of the temporary file.
The prettier approach is possible if the generation of clauses has no side effects (besides the effects offered by your DIMACSM monad) and is sufficiently fast: Run it twice, first throwing away the clauses and just calculating the numbers, print the header line, run the generator again; now printing the clauses.
(This is from my experience with implementing SAT-Britney, where I took the second approach, because it fitted better with other requirements in that context.)
Also, in your code, addAll is not lazy enough: The list c needs to be retained even after writing (in the MonadWriter sense) the clauses. This is another space leak. I suggest you implement add as the primitive operation and then addAll = mapM_ add.
As explained in Joachim Breitner's answer the problem was that DIMACSM was not lazy enough, both because the strict versions of the monads was used and because the number of variables and clauses are needed before the ByteString can be written to the file. The solution is to use the lazy versions of the Monads and execute them twice. It turns out that it is also necessary to have WriterT be the outer monad:
import Control.Monad.State
import Control.Monad.Writer
...
type DIMACSM a = WriterT B.ByteString (State DIMACSS) a
...
printDIMACS :: FilePath -> DIMACSM a -> IO ()
printDIMACS file f = do
writeFile file ""
appendFile file (concat ["p cnf ", show i, " ", show j, "\n"])
B.appendFile file b
where
s = execState (execWriterT f) (DS 1 0)
b = evalState (execWriterT f) (DS 1 0)
i = nextFresh s - 1
j = numClauses s