Is there a good way to QuickCheck Happstack.State methods? - haskell

I have a set of Happstack.State MACID methods that I want to test using QuickCheck, but I'm having trouble figuring out the most elegant way to accomplish that. The problems I'm running into are:
The only way to evaluate an Ev monad computation is in the IO monad via query or update.
There's no way to create a purely in-memory MACID store; this is by design. Therefore, running things in the IO monad means there are temporary files to clean up after each test.
There's no way to initialize a new MACID store except with the initialValue for the state; it can't be generated via Arbitrary unless I expose an access method that replaces the state wholesale.
Working around all of the above means writing methods that only use features of MonadReader or MonadState (and running the test inside Reader or State instead of Ev. This means forgoing the use of getRandom or getEventClockTime and the like inside the method definitions.
The only options I can see are:
Run the methods in a throw-away on-disk MACID store, cleaning up after each test and settling for starting from initialValue each time.
Write the methods to have most of the code run in a MonadReader or MonadState (which is more easily testable), and rely on a small amount of non-QuickCheck-able glue around it that calls getRandom or getEventClockTime as necessary.
Is there a better solution that I'm overlooking?

You might checkout out the quickcheck properties that are included with happstack-state:
http://patch-tag.com/r/mae/happstack/snapshot/current/content/pretty/happstack-state/tests/Happstack/State/Tests
If you are just doing testing, and you want a throw-away data store, then you can use the memory saver, which just stores the state, event files, and checkpoints in RAM. If you lose power, then all your state would be lost. That is fine for tests, but not for a real live server. That message you linked to was talk about real live servers, not just testing.
That won't help with the initialValue issue, but it does make option 1 easier since you don't have to do any disk cleanup.
To replace the initialValue, you would need to create your own method that replaces the current state wholesale.
something like:
newState :: YourState -> Update YourState ()
newState st = put st
or something.
jeremy

If you write your functions as polymorphic over MonadState (or MonadReader for queries) it can be a lot easier to set up a test harness with runState/runReader.
The happstack TH code generators are fine with signatures like that, from what I remember.

Related

looking for a cached function call mechanism

I'm calling a database (EventStore) that recommend using the same connection for the entire life span of your app. I want to implement a cached call for that, but the only thing I'm finding is memoization caching in that way (lib io-memoize) :
import Database.EventStore
import System.IO.Memoize
getCachedEventStoreConnection :: Settings -> ConnectionType -> IO (IO (Connection))
getCachedEventStoreConnection settings connectionType = once $ connect settings connectionType
What I would like is more a signature like that :
getCachedEventStoreConnection :: Settings -> ConnectionType -> IO Connection
otherwise I'm obliged to keep that IO (IO (Connection)) as a "global fct" that I'm passing everywhere which is bad for modularity...
otherwise I'm obliged to keep that IO (IO (Connection)) as a "global fct" that I'm passing everywhere which is bad for modularity.
Unfortunately, caching calls doesn't help eliminate function arguments: instead of passing around the result of a cached call, you must pass around the cache instead. There's no getting around it; part of the hair shirt you wear when choosing Haskell is that all the data a function wants to use must be made explicit in its type, so if part of your application needs a database Connection, there's nothing for it but to pass a Connection to that part of your application (and, by extension, all its callers).
There is some sugar like ReaderT you can sprinkle around to make things a bit more convenient, making it appear as though you're not passing around function arguments, but at the end of the day that's exactly what they're doing under the hood.
However, I reject your claim that this is bad for modularity. If you did have an implicit cache, this would break modularity: you would not be able to lift that function out of this application into a library and use it in many applications without also lifting the cache out. That is, the cache and any operations that use it become coupled -- one must lift them all or none, the exact opposite of modularity.* If the database connection is a function argument instead of an implicit cache, it can be lifted independently of lifting the chunk of code that creates the connection once at app startup.
* And suppose you do lift out all the operations and the implicit cache into a library. Now two downstream libraries depend on and use yours; do you get two caches, that must be separately initialized and therefore is maybe less efficient, or do you get one shared cache which therefore bleeds effects from the one library into the other and therefore is maybe less correct? A difficult choice -- one that should have to be made carefully and explicitly by the downstream users, not by the library with the cache in it.

Learning Happstack and Monad Transformers

So I have a project that I think is simple enough to learn with, but complex enough to be interesting that I would like to write using the Happstack library. At it's most fundamental level, this project would just be a fancy file server with some domain-specific REST methods (or whatever, I don't really care if it's truly RESTful or not) for searching and getting said files and metadata. Since I'm also trying to really learn monad transformers right now, I decided this would be the perfect project to learn with. However, I'm running into some difficulties getting it started, particularly with how to construct my transformer stack.
Right now, I'm only worried about a few things: config, error reporting, and state, and logging, so I started with
newtype MyApp a = MyApp {
runMyApp :: ReaderT Config (ErrorT String (StateT AppState IO)) a
} deriving (...)
Since I'm always going to be in IO, I can really easily use hslogger with this to take care of my logging. But I also knew I needed to use ServerPartT in order to interact with Happstack, thus
runMyApp :: ReaderT Config (ErrorT String (StateT AppState (ServerPartT IO))) a
I can get this to run, see requests, etc, but the problem I've run into is that this needs FilterMonad implemented for it in order to use methods like dir, path, and ok, but I have no idea how to implement it for this type. I just need it to pass the filters down to the underlying monad. Can someone give me some pointers on how to get this obviously crucial type class implemented? Or, if I'm just doing something terribly wrong, just steer me in the right direction. I've only been looking at Happstack for a few days, and transformers are still quite new to me. I think I understand them enough to be dangerous, but I don't know enough about them that I could implement one on my own. Any help you can provide is greatly appreciated!
FULL CODE
(X-posted from /r/haskell)
The easiest thing for you to do would be to get rid of ErrorT from your stack. If you look here you can see that Happstack defines built-in passthrough instances of FilterMonad for StateT and ReaderT, but none for ErrorT. If you really want to keep ErrorT in your stack, then you need to write a passthrough instance for it. It will probably look a lot like the one for ReaderT.
instance (FilterMonad res m) => FilterMonad res (ReaderT r m) where
setFilter f = lift $ setFilter f
composeFilter = lift . composeFilter
getFilter = mapReaderT getFilter
I tend to think that you shouldn't have ErrorT baked into your application monad because you won't always be dealing with computations that can fail. If you do need to handle failure in a section of your code, you can always drop into it easily just by wrapping runErrorT around the section and then using ErrorT, lift, and return as needed. Also, extra layer in your transformer stack adds a performance tax on every bind. So while the composability of monad transformers is really nice, you generally want to use them sparingly when performance is a significant consideration.
Also, I would recommend using EitherT instead of ErrorT. That way you can make use of the fantastic errors package. It has a lot of really common convenience functions like hush, just, etc.
Also, if you want to see a real example of what you're trying to do, check out Snap's Handler monad. Your MyApp monad is exactly the problem that snaplets were designed to solve. Handler has some extra complexity because it was designed to solve the problem in a generalized way so that Snap users wouldn't need to build this common transformer stack themselves. But if you look at the underlying implementation you can see that at its core it is really just the Reader and State monads condensed into one.

Undo records in database

I am creating a system that needs to store all the functions and parameters a user has run in a database. No records are ever deleted, but I need to be able to recreate the minimal function sequence and parameter set for deterministic regeneration.
The users interaction is very minimal, they are not programming - input interaction is handled in C++ is passed through the FFI as data to accumulate into lists and callback to process the current buffer of data. The function triggers a series of decisions on how to wire a processing graph of sets of data within the database, and functions they are input to. The graph is acyclic. This graph is initially run and values are visualized for the user. Later portions of the graph will be recombined to generate new graphs.
Haskell internal construction of these graphs is created from analysis of data in the database and simple random choices amongst combinations. I'd like to be able to just store a seed of a random generator, the module and parameter id to which it applies.
I think this may be best framed as storing the functions of a EDSL in a database, where only the highlevel interaction is stored but is fully deterministic.
I am not interested in storing the values, but rather the function graph of the action.
Each table refers to different function. Each record has a date and a task ID to group all the functions of specific actions to gether. The parameters reference a Table ID and Record ID. If a composed function is internally doing something like generating a random number, the seed for that number should be automatically stored.
I am using GHC stage 1 with no GHCI and Persistent SQlite.
I am still new to Haskell and am looking to find out what approach and packages would be appropriate for tackling this problem in a functional manner.
If you want to do this for source-level functions, such as:
myFoo x y = x + y
you are pretty much out of luck, unless you want to go hacking around in the compiler. However, you could define your own notion of function that does support this, with some suitable annotations. Let's call this notion a UserAction a, where a is the return type of the action. In order to compose computations in UserAction, it should be a Monad. Not thinking too awfully hard, my first impression would be to use this stack of monad transformers:
type UserAction = WriterT [LogEntry] (ReaderT FuncIdentifier IO)
The WriterT [LogEntry] component says that a UserAction, when run, produces a sequence of LogEntrys [1], which contain the information you want to write to the database; something like:
data LogEntry = Call FuncIdentifier FuncIdentifier
It's okay to put off storing the random seed, task identifier, etc. for now -- that can be incorporated into this design by adding information to LogEntry.
The ReaderT FuncIdentifier component says that a UserAction depends on a FuncIdentifier; namely, the identifier of the function that is calling it.
FuncIdentifier could be implemented by something as simple as
type FuncIdentifier = String
or you use something with more structure, if you like.
The IO component says that UserActions can do arbitrary input and output to files, the console, spawn threads, the whole lot. If your actions don't need this, don't use it (use Identity instead). But since you mentioned generating random numbers, I figured you did not have pure computations in mind[2].
Then you would annotate each action you want to record logs for with a function like this:
userAction :: FuncIdentifier -> UserAction a -> UserAction a
which would be used like so:
randRange :: (Integer, Integer) -> UserAction Integer
randRange (low,hi) = userAction "randRange" $ do
-- implementation
userAction would record the call and set up its callees to record their calls; e.g. something like:
userAction func action = do
caller <- ask
-- record the current call
tell [Call caller func]
-- Call the body of this action, passing the current identifier as its caller.
local (const func) action
From the top level, run the desired action and after it has finished, collect up all the LogEntrys and write them to the database.
If you need the calls to be written in real time as the code is executing, a different UserAction monad would be needed; but you could still present the same interface.
This approach uses some intermediate Haskell concepts such as monad transformers. I suggest going on IRC to irc.freenode.net #haskell channel to ask for guidance on filling out the details of this implementation sketch. They are a kind bunch and will happily help you learn :-).
[1] In practice you will not want to use [LogEntry] but rather DList LogEntry for performance. But the change is easy, and I suggest you go with [LogEntry] until you get more comfortable with Haskell, then switch over to DList.
[2] Random number generation can be done purely, but it takes further brain-rewiring which this sketch already has plenty of, so I suggest just treating it as an IO effect for the purpose of getting going.

How do you structure a stateful module in Haskell?

I'm looking to write a generic module that allows Haskell programs to interact with Cassandra. The module will need to maintain its own state. For example, it will have a connection pool and a list of callbacks to be invoked when a new record is saved. How should I structure the code so that this module can maintain its state? Here are some of the approaches I've been considering. Am I on the right track? (I'm new to Haskell and still learning the best ways to think functionally.)
Option 1:
The module runs in a (StateT s IO) monad, where s is the global state for the entire program using the Cassandra module. Of course, since the Cassandra module could be used by multiple programs, the details of what's in s should be invisible to the Cassandra module. The module would have to export a type class that allowed it to extract the CassandraState from s and push a new CassandraState back into s. Then, any program using the module would have to make its main state a member of this type class.
Option 2:
The module runs in a (StateT CassandraState IO) monad. Every time someone calls an action in the module, they would have to extract the CassandraState from wherever they have it stashed off, invoke the action with runState, and take the resulting state and stash it off again (wherever).
Option 3:
Don't put the Cassandra module's functions in a StateT monad at all. Instead, have the caller explicitly pass in CassandraState's when needed. The problem with option 2 is that not all of the functions in the module will modify the state. For example, obtaining a connection will modify the state and will require the caller to stash off the resulting state. But, saving a new record needs to read the state (to get the callbacks), but it doesn't need to change the state. Option 2 doesn't give the caller any hint that connect changes the state while create doesn't.
But, if I move away from using the StateT monad and just have functions that take in states as parameters and return either simple values or tuples of simple values and new states, then it's really obvious to the caller when the state needs to be saved off. (Under the covers in my module, I'd take the incoming states and build them into a (StateT CassandraState IO) monad, but the details of this would be hidden from the caller. So, to the caller, the interface is very explicit, but under the covers, it's just Option 2.)
Option 4:
Something else?
This problem must come up quite often when building reusable modules. Is there some sort of standard way to solve it?
(By the way, if someone knows a better way to interact with Cassandra from Haskell than using Thrift, please let me know! Maybe I don't have to write this at all. :-)
Something like the HDBC model would be to have an explicit CassandraConnection data type. It has an MVar inside with some mutable state. Since all your actions are in IO anyway I'd imagine, they can just take the CassandraConnection as an argument to these actions. The user then can pack that connection into a state or reader monad, or thread it explicitly, or do whatever they want.
Internally you can use a monad or not -- that's really your call. However, I favor APIs that when possible don't force users into any particular monad unless truly necessary.
So this is a sort of version of option 3. But the user shouldn't really care whether or not they're changing the connection state -- at that level you can really hide the details from them.
I'd go with Option 2. Users of your module shouldn't use runState directly; instead, you should provide an opaque Cassandra type with an instance of the Monad typeclass and some runCassandra :: Cassandra a -> IO a operation to "escape" Cassandra. The operations exported by your module should all run in the Cassandra monad (e.g. doSomethingInterestingInCassandra :: Int -> Bool -> Cassandra Char), and their definition can access the wrapped CassandraState.
If your users need some additional state for their application, they can always wrap a monad transformer around Cassandra, e.g. StateT MyState Cassandra.

Use of Haskell state monad a code smell?

God I hate the term "code smell", but I can't think of anything more accurate.
I'm designing a high-level language & compiler to Whitespace in my spare time to learn about compiler construction, language design, and functional programming (compiler is being written in Haskell).
During the code generation phase of the compiler, I have to maintain "state"-ish data as I traverse the syntax tree. For example, when compiling flow-control statements I need to generate unique names for the labels to jump to (labels generated from a counter that's passed in, updated, & returned, and the old value of the counter must never be used again). Another example is when I come across in-line string literals in the syntax tree, they need to be permanently converted into heap variables (in Whitespace, strings are best stored on the heap). I'm currently wrapping the entire code generation module in the state monad to handle this.
I've been told that writing a compiler is a problem well suited to the functional paradigm, but I find that I'm designing this in much the same way I would design it in C (you really can write C in any language - even Haskell w/ state monads).
I want to learn how to think in Haskell (rather, in the functional paradigm) - not in C with Haskell syntax. Should I really try to eliminate/minimize use of the state monad, or is it a legitimate functional "design pattern"?
I've written multiple compilers in Haskell, and a state monad is a reasonable solution to many compiler problems. But you want to keep it abstract---don't make it obvious you're using a monad.
Here's an example from the Glasgow Haskell Compiler (which I did not write; I just work around a few edges), where we build control-flow graphs. Here are the basic ways to make graphs:
empyGraph :: Graph
mkLabel :: Label -> Graph
mkAssignment :: Assignment -> Graph -- modify a register or memory
mkTransfer :: ControlTransfer -> Graph -- any control transfer
(<*>) :: Graph -> Graph -> Graph
But as you've discovered, maintaining a supply of unique labels is tedious at best, so we provide these functions as well:
withFreshLabel :: (Label -> Graph) -> Graph
mkIfThenElse :: (Label -> Label -> Graph) -- branch condition
-> Graph -- code in the 'then' branch
-> Graph -- code in the 'else' branch
-> Graph -- resulting if-then-else construct
The whole Graph thing is an abstract type, and the translator just merrily constructs graphs in purely functional fashion, without being aware that anything monadic is going on. Then, when the graph is finally constructed, in order to turn it into an algebraic datatype we can generate code from, we give it a supply of unique labels, run the state monad, and pull out the data structure.
The state monad is hidden underneath; although it's not exposed to the client, the definition of Graph is something like this:
type Graph = RealGraph -> [Label] -> (RealGraph, [Label])
or a bit more accurately
type Graph = RealGraph -> State [Label] RealGraph
-- a Graph is a monadic function from a successor RealGraph to a new RealGraph
With the state monad hidden behind a layer of abstraction, it's not smelly at all!
I'd say that state in general is not a code smell, so long as it's kept small and well controlled.
This means that using monads such as State, ST or custom-built ones, or just having a data structure containing state data that you pass around to a few places, is not a bad thing. (Actually, monads are just assistance in doing exactly this!) However, having state that goes all over the place (yes, this means you, IO monad!) is a bad smell.
An fairly clear example of this was when my team was working on our entry for the ICFP Programming Contest 2009 (the code is available at git://git.cynic.net/haskell/icfp-contest-2009). We ended up with several different modular parts to this:
VM: the virtual machine that ran the simulation program
Controllers: several different sets of routines that read the output of the simulator and generated new control inputs
Solution: generation of the solution file based on the output of the controllers
Visualizers: several different sets of routines that read both the input and output ports and generated some sort of visualization or log of what was going on as the simulation progressed
Each of these has its own state, and they all interact in various ways through the input and output values of the VM. We had several different controllers and visualizers, each of which had its own different kind of state.
The key point here was that the the internals of any particular state were limited to their own particular modules, and each module knew nothing about even the existence of state for other modules. Any particular set of stateful code and data was generally only a few dozen lines long, with a handful of data items in the state.
All this was glued together in one small function of about a dozen lines which had no access to the internals of any of the states, and which merely called the right things in the proper order as it looped through the simulation, and passed a very limited amount of outside information to each module (along with the module's previous state, of course).
When state is used in such a limited way, and the type system is preventing you from inadvertently modifying it, it's quite easy to handle. It's one of the beauties of Haskell that it lets you do this.
One answer says, "Don't use monads." From my point of view, this is exactly backwards. Monads are a control structure that, among other things, can help you minimize the amount of code that touches state. If you look at monadic parsers as an example, the state of the parse (i.e., the text being parsed, how far one has gotten in to it, any warnings that have accumulated, etc.) must run through every combinator used in the parser. Yet there will only be a few combinators that actually manipulate the state directly; anything else uses one of these few functions. This allows you to see clearly and in one place all of a small amount of code that can change the state, and more easily reason about how it can be changed, again making it easier to deal with.
Have you looked at Attribute grammars (AG)? (More info on wikipedia and an article in the Monad Reader)?
With AG you can add attributes to a syntax tree. These attributes are separated in synthesized and inherited attributes.
Synthesized attributes are things you generate (or synthesize) from your syntax tree, this could be the generated code, or all comments, or whatever else your interested in.
Inherited attributes are input to your syntax tree, this could be the environment, or a list of labels to use during code generation.
At Utrecht University we use the Attribute Grammar System (UUAGC) to write compilers. This is a pre-processor which generates haskell code (.hs files) from the provided .ag files.
Although, if you're still learning Haskell, then maybe this is not the time to start learning yet another layer of abstraction over that.
In that case, you could manually write the sort of code that attributes grammars generate for you, for example:
data AbstractSyntax = Literal Int | Block AbstractSyntax
| Comment String AbstractSyntax
compile :: AbstractSyntax -> [Label] -> (Code, Comments)
compile (Literal x) _ = (generateCode x, [])
compile (Block ast) (l:ls) = let (code', comments) = compile ast ls
in (labelCode l code', comments)
compile (Comment s ast) ls = let (code, comments') = compile ast ls
in (code, s : comments')
generateCode :: Int -> Code
labelCode :: Label -> Code -> Code
It's possible that you may want an applicative functor instead of a
monad:
http://www.haskell.org/haskellwiki/Applicative_functor
I think the original paper explains it better than the wiki, however:
http://www.soi.city.ac.uk/~ross/papers/Applicative.html
I don't think using the State Monad is a code smell when it used to model state.
If you need to thread state through your functions,
you can do this explicitly, taking the the state as an argument and returning it in each function.
The State Monad offers a good abstraction: it passes the state along for you and
provides lots of useful function to combine functions that require state.
In this case, using the State Monad (or Applicatives) is not a code smell.
However, if you use the State Monad to emulate an imperative style of programming
while a functional solution would suffice, you are just making things complicated.
In general you should try to avoid state wherever possible, but that's not always practical. Applicative makes effectful code look nicer and more functional, especially tree traversal code can benefit from this style. For the problem of name generation there is now a rather nice package available: value-supply.
Well, don't use monads. The power of functional programming is function purity and their reuse. There's this paper a professor of mine once wrote and he's one of the guys who helped build Haskell.
The paper is called "Why functional programming matters", I suggest you read through it. It's a good read.
let's be careful about the terminology here. State is not per se bad; functional languages have state. What is a "code smell" is when you find yourself wanting to assign variables values and change them.
Of course, the Haskell state monad is there for just that reason -- as with I/O, it's letting you do unsafe and un-functional things in a constrained context.
So, yes, it's probably a code smell.

Resources