looking for a cached function call mechanism - haskell

I'm calling a database (EventStore) that recommend using the same connection for the entire life span of your app. I want to implement a cached call for that, but the only thing I'm finding is memoization caching in that way (lib io-memoize) :
import Database.EventStore
import System.IO.Memoize
getCachedEventStoreConnection :: Settings -> ConnectionType -> IO (IO (Connection))
getCachedEventStoreConnection settings connectionType = once $ connect settings connectionType
What I would like is more a signature like that :
getCachedEventStoreConnection :: Settings -> ConnectionType -> IO Connection
otherwise I'm obliged to keep that IO (IO (Connection)) as a "global fct" that I'm passing everywhere which is bad for modularity...

otherwise I'm obliged to keep that IO (IO (Connection)) as a "global fct" that I'm passing everywhere which is bad for modularity.
Unfortunately, caching calls doesn't help eliminate function arguments: instead of passing around the result of a cached call, you must pass around the cache instead. There's no getting around it; part of the hair shirt you wear when choosing Haskell is that all the data a function wants to use must be made explicit in its type, so if part of your application needs a database Connection, there's nothing for it but to pass a Connection to that part of your application (and, by extension, all its callers).
There is some sugar like ReaderT you can sprinkle around to make things a bit more convenient, making it appear as though you're not passing around function arguments, but at the end of the day that's exactly what they're doing under the hood.
However, I reject your claim that this is bad for modularity. If you did have an implicit cache, this would break modularity: you would not be able to lift that function out of this application into a library and use it in many applications without also lifting the cache out. That is, the cache and any operations that use it become coupled -- one must lift them all or none, the exact opposite of modularity.* If the database connection is a function argument instead of an implicit cache, it can be lifted independently of lifting the chunk of code that creates the connection once at app startup.
* And suppose you do lift out all the operations and the implicit cache into a library. Now two downstream libraries depend on and use yours; do you get two caches, that must be separately initialized and therefore is maybe less efficient, or do you get one shared cache which therefore bleeds effects from the one library into the other and therefore is maybe less correct? A difficult choice -- one that should have to be made carefully and explicitly by the downstream users, not by the library with the cache in it.

Related

Getting value out of context or leaving them in?

Assume code below. Is there a quicker way to get the contextual values out of findSerial rather than writing a function like outOfContext?
The underlying question is: does one usually stick within context and use Functors, Applicatives, Monoids and Monads to get the job done, or is it better to take it out of context and apply the usual non-contextual computation methods. In brief: don't want to learn Haskell all wrong, since it takes time enough as it does.
import qualified Data.Map as Map
type SerialNumber = (String, Int)
serialList :: Map.Map String SerialNumber
serialList = Map.fromList [("belt drive",("BD",0001))
,("chain drive",("CD",0002))
,("drive pulley",("DP",0003))
,("drive sprocket",("DS",0004))
]
findSerial :: Ord k => k -> Map.Map k a -> Maybe a
findSerial input = Map.lookup input
outOfContext (Just (a, b)) = (a, b)
Assuming I understand it correctly, I think your question essentially boils down to “Is it idiomatic in Haskell to write and use partial functions?” (which your outOfContext function is, since it’s just a specialized form of the built-in partial function fromJust). The answer to that question is a resounding no. Partial functions are avoided whenever possible, and code that uses them can usually be refactored into code that doesn’t.
The reason partial functions are avoided is that they voluntarily compromise the effectiveness of the type system. In Haskell, when a function has type X -> Y, it is generally assumed that providing it an X will actually produce a Y, and that it will not do something else entirely (i.e. crash). If you have a function that doesn’t always succeed, reflecting that information in the type by writing X -> Maybe Y forces the caller to somehow handle the Nothing case, and it can either handle it directly or defer the failure further to its caller (by also producing a Maybe). This is great, since it means that programs that typecheck won’t crash at runtime. The program might still have logical errors, but knowing before even running the program that it won’t blow up is still pretty nice.
Partial functions throw this guarantee out the window. Any program that uses a partial function will crash at runtime if the function’s preconditions are accidentally violated, and since those preconditions are not reflected in the type system, the compiler cannot statically enforce them. A program might be logically correct at the time of its writing, but without enforcing that correctness with the type system, further modification, extension, or refactoring could easily introduce a bug by mistake.
For example, a programmer might write the expression
if isJust n then fromJust n else 0
which will certainly never crash at runtime, since fromJust’s precondition is always checked before it is called. However, the type system cannot enforce this, and a further refactoring might swap the branches of the if, or it might move the fromJust n to a different part of the program entirely and accidentally omit the isJust check. The program will still compile, but it may fail at runtime.
In contrast, if the programmer avoids partial functions, using explicit pattern-matching with case or total functions like maybe and fromMaybe, they can replace the tricky conditional above with something like
fromMaybe 0 n
which is not only clearer, but ensures any accidental misuse will simply fail to typecheck, and the potential bug will be detected much earlier.
For some concrete examples of how the type system can be a powerful ally if you stick exclusively to total functions, as well as some good food for thought about different ways to encode type safety for your domain into Haskell’s type system, I highly recommend reading Matt Parsons’s wonderful blog post, Type Safety Back and Forth, which explores these ideas in more depth. It additionally highlights how using Maybe as a catch-all representation of failure can be awkward, and it shows how the type system can be used to enforce preconditions to avoid needing to propagate Maybe throughout an entire system.

What is the analogue of ConcurrentHashMap in Haskell?

update: please, bear in mind, I'm just started learning Haskell
Let's say we're writing an application with the following general functionality:
when starting, it gathers some data from an external source;
this data are a set of complex structures which contain lists,
arrays, ints, string, etc.;
when running, the application serves web API (servlets) that provides
access to the data.
Now, if the application would be written in Java, we could use static ConcurrentHashMap object where the data could be stored (representing Java classes). So that, during start, the app could fill the map with data, and then servlets could access it providing some API to the clients.
If the application would be written in Erlang, we could use ETS/DETS for storing the data (as native Erlang structures).
Now the question: what is the proper Haskell way for implementing such design?
It shouldn't be DB, it should be some sort of a lightweight in-memory something, that could store complex structures (Haskell native structures), and that could be accessible from different threads (servlets, talking by Java-world entities). In Haskell: no static global vars as in Java, no ETS and OTP as in Erlang, - so how to do it the right way (with no using external solutions like Redis)?
Thanks
update: another important part of the question - since Haskell doesn't (?) have 'global static' variables, then what would be the right way for implementing this 'global accessible' data keeping object (say, it is "stm-containers")? Should I initialize it somewhere in the 'main' function and then just pass it to every REST API handler? Or is there any other more correct way?
It's not clear from your question whether the client API will provide ways of mutating the data.
If not (i.e., the API will only be about querying), then any immutable data-structure will suffice, since one beauty of immutable data is that it can be accessed from multiple threads safely with you being sure that it can't change. No need for the overhead of locks or other strategies for working with concurrency. You'll simply construct the immutable data during the initialisation and then just query it. For this consider a package like "unordered-containers".
If your API will also be mutating the data, then you will need mutable data-structures, which are optimised for concurrency. "stm-containers" is one package, which provides those.
First off, I'm going to assume you mean it needs to be available to multiple threads, not multiple processes. (The difference being that threads share memory, processes do not.) If that assumption is wrong, much of your question doesn't make sense.
So, the first important point: Haskell has mutable data structures. They can easily be shared between threads. Here's a small example:
import Control.Concurrent
import Control.Monad
main :: IO ()
main = do
v <- newMVar 0 :: IO (MVar Int)
forkIO . forever $ do
x <- takeMVar v
putMVar v $! x + 1
forM_ [1..10] $ \_ -> do
x <- readMVar v
threadDelay 100
print x
Note the use of ($!) when putting the value in the MVar. MVars don't enforce that their contents are evaluated. There's some subtlety in making sure everything works properly. You will get lots of space leaks until you understand Haskell's evaluation model. That's part of why this sort of thing is usually done in a library that handles all those details.
Given this, the first pass approach is to just store a map of some sort in an MVar. Unless it's under a lot of contention, that actually has pretty good performance properties.
When it is under contention, you have a good fallback secondary approach, especially when using a hash map. That's striping. Instead of storing one map in one MVar, use N maps in N MVars. The first step in a lookup is using the hash to determine which of the N MVars to look in.
There are fancy lock-free algorithms, which could be implemented using finer-grained mutable values. But in general, they are a lot of engineering effort for a few percent improvement in performance that doesn't really matter in most use cases.

Undo records in database

I am creating a system that needs to store all the functions and parameters a user has run in a database. No records are ever deleted, but I need to be able to recreate the minimal function sequence and parameter set for deterministic regeneration.
The users interaction is very minimal, they are not programming - input interaction is handled in C++ is passed through the FFI as data to accumulate into lists and callback to process the current buffer of data. The function triggers a series of decisions on how to wire a processing graph of sets of data within the database, and functions they are input to. The graph is acyclic. This graph is initially run and values are visualized for the user. Later portions of the graph will be recombined to generate new graphs.
Haskell internal construction of these graphs is created from analysis of data in the database and simple random choices amongst combinations. I'd like to be able to just store a seed of a random generator, the module and parameter id to which it applies.
I think this may be best framed as storing the functions of a EDSL in a database, where only the highlevel interaction is stored but is fully deterministic.
I am not interested in storing the values, but rather the function graph of the action.
Each table refers to different function. Each record has a date and a task ID to group all the functions of specific actions to gether. The parameters reference a Table ID and Record ID. If a composed function is internally doing something like generating a random number, the seed for that number should be automatically stored.
I am using GHC stage 1 with no GHCI and Persistent SQlite.
I am still new to Haskell and am looking to find out what approach and packages would be appropriate for tackling this problem in a functional manner.
If you want to do this for source-level functions, such as:
myFoo x y = x + y
you are pretty much out of luck, unless you want to go hacking around in the compiler. However, you could define your own notion of function that does support this, with some suitable annotations. Let's call this notion a UserAction a, where a is the return type of the action. In order to compose computations in UserAction, it should be a Monad. Not thinking too awfully hard, my first impression would be to use this stack of monad transformers:
type UserAction = WriterT [LogEntry] (ReaderT FuncIdentifier IO)
The WriterT [LogEntry] component says that a UserAction, when run, produces a sequence of LogEntrys [1], which contain the information you want to write to the database; something like:
data LogEntry = Call FuncIdentifier FuncIdentifier
It's okay to put off storing the random seed, task identifier, etc. for now -- that can be incorporated into this design by adding information to LogEntry.
The ReaderT FuncIdentifier component says that a UserAction depends on a FuncIdentifier; namely, the identifier of the function that is calling it.
FuncIdentifier could be implemented by something as simple as
type FuncIdentifier = String
or you use something with more structure, if you like.
The IO component says that UserActions can do arbitrary input and output to files, the console, spawn threads, the whole lot. If your actions don't need this, don't use it (use Identity instead). But since you mentioned generating random numbers, I figured you did not have pure computations in mind[2].
Then you would annotate each action you want to record logs for with a function like this:
userAction :: FuncIdentifier -> UserAction a -> UserAction a
which would be used like so:
randRange :: (Integer, Integer) -> UserAction Integer
randRange (low,hi) = userAction "randRange" $ do
-- implementation
userAction would record the call and set up its callees to record their calls; e.g. something like:
userAction func action = do
caller <- ask
-- record the current call
tell [Call caller func]
-- Call the body of this action, passing the current identifier as its caller.
local (const func) action
From the top level, run the desired action and after it has finished, collect up all the LogEntrys and write them to the database.
If you need the calls to be written in real time as the code is executing, a different UserAction monad would be needed; but you could still present the same interface.
This approach uses some intermediate Haskell concepts such as monad transformers. I suggest going on IRC to irc.freenode.net #haskell channel to ask for guidance on filling out the details of this implementation sketch. They are a kind bunch and will happily help you learn :-).
[1] In practice you will not want to use [LogEntry] but rather DList LogEntry for performance. But the change is easy, and I suggest you go with [LogEntry] until you get more comfortable with Haskell, then switch over to DList.
[2] Random number generation can be done purely, but it takes further brain-rewiring which this sketch already has plenty of, so I suggest just treating it as an IO effect for the purpose of getting going.

How do you structure a stateful module in Haskell?

I'm looking to write a generic module that allows Haskell programs to interact with Cassandra. The module will need to maintain its own state. For example, it will have a connection pool and a list of callbacks to be invoked when a new record is saved. How should I structure the code so that this module can maintain its state? Here are some of the approaches I've been considering. Am I on the right track? (I'm new to Haskell and still learning the best ways to think functionally.)
Option 1:
The module runs in a (StateT s IO) monad, where s is the global state for the entire program using the Cassandra module. Of course, since the Cassandra module could be used by multiple programs, the details of what's in s should be invisible to the Cassandra module. The module would have to export a type class that allowed it to extract the CassandraState from s and push a new CassandraState back into s. Then, any program using the module would have to make its main state a member of this type class.
Option 2:
The module runs in a (StateT CassandraState IO) monad. Every time someone calls an action in the module, they would have to extract the CassandraState from wherever they have it stashed off, invoke the action with runState, and take the resulting state and stash it off again (wherever).
Option 3:
Don't put the Cassandra module's functions in a StateT monad at all. Instead, have the caller explicitly pass in CassandraState's when needed. The problem with option 2 is that not all of the functions in the module will modify the state. For example, obtaining a connection will modify the state and will require the caller to stash off the resulting state. But, saving a new record needs to read the state (to get the callbacks), but it doesn't need to change the state. Option 2 doesn't give the caller any hint that connect changes the state while create doesn't.
But, if I move away from using the StateT monad and just have functions that take in states as parameters and return either simple values or tuples of simple values and new states, then it's really obvious to the caller when the state needs to be saved off. (Under the covers in my module, I'd take the incoming states and build them into a (StateT CassandraState IO) monad, but the details of this would be hidden from the caller. So, to the caller, the interface is very explicit, but under the covers, it's just Option 2.)
Option 4:
Something else?
This problem must come up quite often when building reusable modules. Is there some sort of standard way to solve it?
(By the way, if someone knows a better way to interact with Cassandra from Haskell than using Thrift, please let me know! Maybe I don't have to write this at all. :-)
Something like the HDBC model would be to have an explicit CassandraConnection data type. It has an MVar inside with some mutable state. Since all your actions are in IO anyway I'd imagine, they can just take the CassandraConnection as an argument to these actions. The user then can pack that connection into a state or reader monad, or thread it explicitly, or do whatever they want.
Internally you can use a monad or not -- that's really your call. However, I favor APIs that when possible don't force users into any particular monad unless truly necessary.
So this is a sort of version of option 3. But the user shouldn't really care whether or not they're changing the connection state -- at that level you can really hide the details from them.
I'd go with Option 2. Users of your module shouldn't use runState directly; instead, you should provide an opaque Cassandra type with an instance of the Monad typeclass and some runCassandra :: Cassandra a -> IO a operation to "escape" Cassandra. The operations exported by your module should all run in the Cassandra monad (e.g. doSomethingInterestingInCassandra :: Int -> Bool -> Cassandra Char), and their definition can access the wrapped CassandraState.
If your users need some additional state for their application, they can always wrap a monad transformer around Cassandra, e.g. StateT MyState Cassandra.

Is there a good way to QuickCheck Happstack.State methods?

I have a set of Happstack.State MACID methods that I want to test using QuickCheck, but I'm having trouble figuring out the most elegant way to accomplish that. The problems I'm running into are:
The only way to evaluate an Ev monad computation is in the IO monad via query or update.
There's no way to create a purely in-memory MACID store; this is by design. Therefore, running things in the IO monad means there are temporary files to clean up after each test.
There's no way to initialize a new MACID store except with the initialValue for the state; it can't be generated via Arbitrary unless I expose an access method that replaces the state wholesale.
Working around all of the above means writing methods that only use features of MonadReader or MonadState (and running the test inside Reader or State instead of Ev. This means forgoing the use of getRandom or getEventClockTime and the like inside the method definitions.
The only options I can see are:
Run the methods in a throw-away on-disk MACID store, cleaning up after each test and settling for starting from initialValue each time.
Write the methods to have most of the code run in a MonadReader or MonadState (which is more easily testable), and rely on a small amount of non-QuickCheck-able glue around it that calls getRandom or getEventClockTime as necessary.
Is there a better solution that I'm overlooking?
You might checkout out the quickcheck properties that are included with happstack-state:
http://patch-tag.com/r/mae/happstack/snapshot/current/content/pretty/happstack-state/tests/Happstack/State/Tests
If you are just doing testing, and you want a throw-away data store, then you can use the memory saver, which just stores the state, event files, and checkpoints in RAM. If you lose power, then all your state would be lost. That is fine for tests, but not for a real live server. That message you linked to was talk about real live servers, not just testing.
That won't help with the initialValue issue, but it does make option 1 easier since you don't have to do any disk cleanup.
To replace the initialValue, you would need to create your own method that replaces the current state wholesale.
something like:
newState :: YourState -> Update YourState ()
newState st = put st
or something.
jeremy
If you write your functions as polymorphic over MonadState (or MonadReader for queries) it can be a lot easier to set up a test harness with runState/runReader.
The happstack TH code generators are fine with signatures like that, from what I remember.

Resources