I'm looking to write a generic module that allows Haskell programs to interact with Cassandra. The module will need to maintain its own state. For example, it will have a connection pool and a list of callbacks to be invoked when a new record is saved. How should I structure the code so that this module can maintain its state? Here are some of the approaches I've been considering. Am I on the right track? (I'm new to Haskell and still learning the best ways to think functionally.)
Option 1:
The module runs in a (StateT s IO) monad, where s is the global state for the entire program using the Cassandra module. Of course, since the Cassandra module could be used by multiple programs, the details of what's in s should be invisible to the Cassandra module. The module would have to export a type class that allowed it to extract the CassandraState from s and push a new CassandraState back into s. Then, any program using the module would have to make its main state a member of this type class.
Option 2:
The module runs in a (StateT CassandraState IO) monad. Every time someone calls an action in the module, they would have to extract the CassandraState from wherever they have it stashed off, invoke the action with runState, and take the resulting state and stash it off again (wherever).
Option 3:
Don't put the Cassandra module's functions in a StateT monad at all. Instead, have the caller explicitly pass in CassandraState's when needed. The problem with option 2 is that not all of the functions in the module will modify the state. For example, obtaining a connection will modify the state and will require the caller to stash off the resulting state. But, saving a new record needs to read the state (to get the callbacks), but it doesn't need to change the state. Option 2 doesn't give the caller any hint that connect changes the state while create doesn't.
But, if I move away from using the StateT monad and just have functions that take in states as parameters and return either simple values or tuples of simple values and new states, then it's really obvious to the caller when the state needs to be saved off. (Under the covers in my module, I'd take the incoming states and build them into a (StateT CassandraState IO) monad, but the details of this would be hidden from the caller. So, to the caller, the interface is very explicit, but under the covers, it's just Option 2.)
Option 4:
Something else?
This problem must come up quite often when building reusable modules. Is there some sort of standard way to solve it?
(By the way, if someone knows a better way to interact with Cassandra from Haskell than using Thrift, please let me know! Maybe I don't have to write this at all. :-)
Something like the HDBC model would be to have an explicit CassandraConnection data type. It has an MVar inside with some mutable state. Since all your actions are in IO anyway I'd imagine, they can just take the CassandraConnection as an argument to these actions. The user then can pack that connection into a state or reader monad, or thread it explicitly, or do whatever they want.
Internally you can use a monad or not -- that's really your call. However, I favor APIs that when possible don't force users into any particular monad unless truly necessary.
So this is a sort of version of option 3. But the user shouldn't really care whether or not they're changing the connection state -- at that level you can really hide the details from them.
I'd go with Option 2. Users of your module shouldn't use runState directly; instead, you should provide an opaque Cassandra type with an instance of the Monad typeclass and some runCassandra :: Cassandra a -> IO a operation to "escape" Cassandra. The operations exported by your module should all run in the Cassandra monad (e.g. doSomethingInterestingInCassandra :: Int -> Bool -> Cassandra Char), and their definition can access the wrapped CassandraState.
If your users need some additional state for their application, they can always wrap a monad transformer around Cassandra, e.g. StateT MyState Cassandra.
Related
I'm calling a database (EventStore) that recommend using the same connection for the entire life span of your app. I want to implement a cached call for that, but the only thing I'm finding is memoization caching in that way (lib io-memoize) :
import Database.EventStore
import System.IO.Memoize
getCachedEventStoreConnection :: Settings -> ConnectionType -> IO (IO (Connection))
getCachedEventStoreConnection settings connectionType = once $ connect settings connectionType
What I would like is more a signature like that :
getCachedEventStoreConnection :: Settings -> ConnectionType -> IO Connection
otherwise I'm obliged to keep that IO (IO (Connection)) as a "global fct" that I'm passing everywhere which is bad for modularity...
otherwise I'm obliged to keep that IO (IO (Connection)) as a "global fct" that I'm passing everywhere which is bad for modularity.
Unfortunately, caching calls doesn't help eliminate function arguments: instead of passing around the result of a cached call, you must pass around the cache instead. There's no getting around it; part of the hair shirt you wear when choosing Haskell is that all the data a function wants to use must be made explicit in its type, so if part of your application needs a database Connection, there's nothing for it but to pass a Connection to that part of your application (and, by extension, all its callers).
There is some sugar like ReaderT you can sprinkle around to make things a bit more convenient, making it appear as though you're not passing around function arguments, but at the end of the day that's exactly what they're doing under the hood.
However, I reject your claim that this is bad for modularity. If you did have an implicit cache, this would break modularity: you would not be able to lift that function out of this application into a library and use it in many applications without also lifting the cache out. That is, the cache and any operations that use it become coupled -- one must lift them all or none, the exact opposite of modularity.* If the database connection is a function argument instead of an implicit cache, it can be lifted independently of lifting the chunk of code that creates the connection once at app startup.
* And suppose you do lift out all the operations and the implicit cache into a library. Now two downstream libraries depend on and use yours; do you get two caches, that must be separately initialized and therefore is maybe less efficient, or do you get one shared cache which therefore bleeds effects from the one library into the other and therefore is maybe less correct? A difficult choice -- one that should have to be made carefully and explicitly by the downstream users, not by the library with the cache in it.
I am writing a game in Haskell in which the player and the AI are taking some actions in turns. Until now, the AIs worked by generating actions using all the information about the game, i.e they were functions of the form GameHistory -> GameState -> Action.
This way these functions generate some information they need from the history each time they are called. It would be a lot easier to write AIs if they could have some kind of "internal state" which persists between their turns (i.e. calls to the corresponding function). How could one implement something like that? (By the way, I should also take into account that internal states of different kinds of AIs could have different types.)
What you're looking for might be something like:
newtype AI = AI { runAI :: GameState -> (AI, Action) }
i.e. you'll return your actor's new state along with the action. You might make use of the State monad here. You might also be interested in reading about automata. If you need to serialize your AI (to store it in a database, say) then you might need to do something different.
I have an interface (WX) which is based on Reactive Banana.
Now I have different questions about how to really manage the status:
Should I consider the state as the Behaviors that I define in the code?
If the state depends on external "events" too, not only related to the GUI would be better considering IORef?
Or Can I use State Monad? All the examples I saw till now define the network in IO environment. Have any sense stack State Monad and how? With Moment?
Should I consider the state as the Behaviors that I define in the code?
For most scenarios you will indeed want to use Behaviors for state. In a GUI application you will often want to update your state in response to interface events. In addition, and crucially, the state must remain existing between occurrences of the events, and State doesn't allow that. More specifically, the standard way to react to an event occurrence doing something other than updating a Behavior is through the reactimate function:
reactimate :: Frameworks t => Event t (IO ()) -> Moment t ()
The action to be performed is of type IO (). While it is possible to use runStateT to run a StateT s IO computation using reactimate, the computation will be self-contained, and you won't have the state it used available to be passed elsewhere. This problem does not arise when using Events to update Behaviors through the reactive-banana FRP interface: the Behaviors remain there until you need to use them again.
If the state depends on external "events" too, not only related to the GUI would be better considering IORef?
Not necessarily. In many cases you can use the tools in Reactive.Banana.Frameworks such as fromAddHandler and newEvent to create Events that are fired when external I/O actions happen. That way you can integrate such actions to your event network. One typical example would be a timer: reactive-banana has no built-in notion of time, but you can introduce a tick event that is fired through an I/O action that happens at regular intervals.
That said, in some cases you might still want to use...
... IORefs (or other sorts of mutable variables, such as MVars), if you have to use a library with an interface that, for whatever reason, restricts your ability to freely react to events using Behaviors and reactimate. A while ago there was a very nice question about such a scenario involving hArduino. The two answers there show different, yet similar in spirit, ways to have an useful event network in unfavourable circumstances.
... StateT if you have some stateful algorithm that is self-contained and whose results won't be used elsewhere in your event network, so that you can run it with runStateT and stick it in a reactimate call. Silly example: an IO () action in reactimate along these lines:
displayMessageBox . show =<< evalStateT someStateComputation initialState
I am creating a system that needs to store all the functions and parameters a user has run in a database. No records are ever deleted, but I need to be able to recreate the minimal function sequence and parameter set for deterministic regeneration.
The users interaction is very minimal, they are not programming - input interaction is handled in C++ is passed through the FFI as data to accumulate into lists and callback to process the current buffer of data. The function triggers a series of decisions on how to wire a processing graph of sets of data within the database, and functions they are input to. The graph is acyclic. This graph is initially run and values are visualized for the user. Later portions of the graph will be recombined to generate new graphs.
Haskell internal construction of these graphs is created from analysis of data in the database and simple random choices amongst combinations. I'd like to be able to just store a seed of a random generator, the module and parameter id to which it applies.
I think this may be best framed as storing the functions of a EDSL in a database, where only the highlevel interaction is stored but is fully deterministic.
I am not interested in storing the values, but rather the function graph of the action.
Each table refers to different function. Each record has a date and a task ID to group all the functions of specific actions to gether. The parameters reference a Table ID and Record ID. If a composed function is internally doing something like generating a random number, the seed for that number should be automatically stored.
I am using GHC stage 1 with no GHCI and Persistent SQlite.
I am still new to Haskell and am looking to find out what approach and packages would be appropriate for tackling this problem in a functional manner.
If you want to do this for source-level functions, such as:
myFoo x y = x + y
you are pretty much out of luck, unless you want to go hacking around in the compiler. However, you could define your own notion of function that does support this, with some suitable annotations. Let's call this notion a UserAction a, where a is the return type of the action. In order to compose computations in UserAction, it should be a Monad. Not thinking too awfully hard, my first impression would be to use this stack of monad transformers:
type UserAction = WriterT [LogEntry] (ReaderT FuncIdentifier IO)
The WriterT [LogEntry] component says that a UserAction, when run, produces a sequence of LogEntrys [1], which contain the information you want to write to the database; something like:
data LogEntry = Call FuncIdentifier FuncIdentifier
It's okay to put off storing the random seed, task identifier, etc. for now -- that can be incorporated into this design by adding information to LogEntry.
The ReaderT FuncIdentifier component says that a UserAction depends on a FuncIdentifier; namely, the identifier of the function that is calling it.
FuncIdentifier could be implemented by something as simple as
type FuncIdentifier = String
or you use something with more structure, if you like.
The IO component says that UserActions can do arbitrary input and output to files, the console, spawn threads, the whole lot. If your actions don't need this, don't use it (use Identity instead). But since you mentioned generating random numbers, I figured you did not have pure computations in mind[2].
Then you would annotate each action you want to record logs for with a function like this:
userAction :: FuncIdentifier -> UserAction a -> UserAction a
which would be used like so:
randRange :: (Integer, Integer) -> UserAction Integer
randRange (low,hi) = userAction "randRange" $ do
-- implementation
userAction would record the call and set up its callees to record their calls; e.g. something like:
userAction func action = do
caller <- ask
-- record the current call
tell [Call caller func]
-- Call the body of this action, passing the current identifier as its caller.
local (const func) action
From the top level, run the desired action and after it has finished, collect up all the LogEntrys and write them to the database.
If you need the calls to be written in real time as the code is executing, a different UserAction monad would be needed; but you could still present the same interface.
This approach uses some intermediate Haskell concepts such as monad transformers. I suggest going on IRC to irc.freenode.net #haskell channel to ask for guidance on filling out the details of this implementation sketch. They are a kind bunch and will happily help you learn :-).
[1] In practice you will not want to use [LogEntry] but rather DList LogEntry for performance. But the change is easy, and I suggest you go with [LogEntry] until you get more comfortable with Haskell, then switch over to DList.
[2] Random number generation can be done purely, but it takes further brain-rewiring which this sketch already has plenty of, so I suggest just treating it as an IO effect for the purpose of getting going.
I have a set of Happstack.State MACID methods that I want to test using QuickCheck, but I'm having trouble figuring out the most elegant way to accomplish that. The problems I'm running into are:
The only way to evaluate an Ev monad computation is in the IO monad via query or update.
There's no way to create a purely in-memory MACID store; this is by design. Therefore, running things in the IO monad means there are temporary files to clean up after each test.
There's no way to initialize a new MACID store except with the initialValue for the state; it can't be generated via Arbitrary unless I expose an access method that replaces the state wholesale.
Working around all of the above means writing methods that only use features of MonadReader or MonadState (and running the test inside Reader or State instead of Ev. This means forgoing the use of getRandom or getEventClockTime and the like inside the method definitions.
The only options I can see are:
Run the methods in a throw-away on-disk MACID store, cleaning up after each test and settling for starting from initialValue each time.
Write the methods to have most of the code run in a MonadReader or MonadState (which is more easily testable), and rely on a small amount of non-QuickCheck-able glue around it that calls getRandom or getEventClockTime as necessary.
Is there a better solution that I'm overlooking?
You might checkout out the quickcheck properties that are included with happstack-state:
http://patch-tag.com/r/mae/happstack/snapshot/current/content/pretty/happstack-state/tests/Happstack/State/Tests
If you are just doing testing, and you want a throw-away data store, then you can use the memory saver, which just stores the state, event files, and checkpoints in RAM. If you lose power, then all your state would be lost. That is fine for tests, but not for a real live server. That message you linked to was talk about real live servers, not just testing.
That won't help with the initialValue issue, but it does make option 1 easier since you don't have to do any disk cleanup.
To replace the initialValue, you would need to create your own method that replaces the current state wholesale.
something like:
newState :: YourState -> Update YourState ()
newState st = put st
or something.
jeremy
If you write your functions as polymorphic over MonadState (or MonadReader for queries) it can be a lot easier to set up a test harness with runState/runReader.
The happstack TH code generators are fine with signatures like that, from what I remember.