I am creating a system that needs to store all the functions and parameters a user has run in a database. No records are ever deleted, but I need to be able to recreate the minimal function sequence and parameter set for deterministic regeneration.
The users interaction is very minimal, they are not programming - input interaction is handled in C++ is passed through the FFI as data to accumulate into lists and callback to process the current buffer of data. The function triggers a series of decisions on how to wire a processing graph of sets of data within the database, and functions they are input to. The graph is acyclic. This graph is initially run and values are visualized for the user. Later portions of the graph will be recombined to generate new graphs.
Haskell internal construction of these graphs is created from analysis of data in the database and simple random choices amongst combinations. I'd like to be able to just store a seed of a random generator, the module and parameter id to which it applies.
I think this may be best framed as storing the functions of a EDSL in a database, where only the highlevel interaction is stored but is fully deterministic.
I am not interested in storing the values, but rather the function graph of the action.
Each table refers to different function. Each record has a date and a task ID to group all the functions of specific actions to gether. The parameters reference a Table ID and Record ID. If a composed function is internally doing something like generating a random number, the seed for that number should be automatically stored.
I am using GHC stage 1 with no GHCI and Persistent SQlite.
I am still new to Haskell and am looking to find out what approach and packages would be appropriate for tackling this problem in a functional manner.
If you want to do this for source-level functions, such as:
myFoo x y = x + y
you are pretty much out of luck, unless you want to go hacking around in the compiler. However, you could define your own notion of function that does support this, with some suitable annotations. Let's call this notion a UserAction a, where a is the return type of the action. In order to compose computations in UserAction, it should be a Monad. Not thinking too awfully hard, my first impression would be to use this stack of monad transformers:
type UserAction = WriterT [LogEntry] (ReaderT FuncIdentifier IO)
The WriterT [LogEntry] component says that a UserAction, when run, produces a sequence of LogEntrys [1], which contain the information you want to write to the database; something like:
data LogEntry = Call FuncIdentifier FuncIdentifier
It's okay to put off storing the random seed, task identifier, etc. for now -- that can be incorporated into this design by adding information to LogEntry.
The ReaderT FuncIdentifier component says that a UserAction depends on a FuncIdentifier; namely, the identifier of the function that is calling it.
FuncIdentifier could be implemented by something as simple as
type FuncIdentifier = String
or you use something with more structure, if you like.
The IO component says that UserActions can do arbitrary input and output to files, the console, spawn threads, the whole lot. If your actions don't need this, don't use it (use Identity instead). But since you mentioned generating random numbers, I figured you did not have pure computations in mind[2].
Then you would annotate each action you want to record logs for with a function like this:
userAction :: FuncIdentifier -> UserAction a -> UserAction a
which would be used like so:
randRange :: (Integer, Integer) -> UserAction Integer
randRange (low,hi) = userAction "randRange" $ do
-- implementation
userAction would record the call and set up its callees to record their calls; e.g. something like:
userAction func action = do
caller <- ask
-- record the current call
tell [Call caller func]
-- Call the body of this action, passing the current identifier as its caller.
local (const func) action
From the top level, run the desired action and after it has finished, collect up all the LogEntrys and write them to the database.
If you need the calls to be written in real time as the code is executing, a different UserAction monad would be needed; but you could still present the same interface.
This approach uses some intermediate Haskell concepts such as monad transformers. I suggest going on IRC to irc.freenode.net #haskell channel to ask for guidance on filling out the details of this implementation sketch. They are a kind bunch and will happily help you learn :-).
[1] In practice you will not want to use [LogEntry] but rather DList LogEntry for performance. But the change is easy, and I suggest you go with [LogEntry] until you get more comfortable with Haskell, then switch over to DList.
[2] Random number generation can be done purely, but it takes further brain-rewiring which this sketch already has plenty of, so I suggest just treating it as an IO effect for the purpose of getting going.
Related
Is there an agree-on best practice for aggregating and handling typed errors across many layers of functions in a larger Haskell application?
From introductory texts and the Haskell Wiki, I take that pure functions should be total - that is, evaluate to errors as part of their co-domain. Runtime exceptions cannot be completely avoided, but should be confined to IO and asynchronous computations.
How do I structure error handling in pure, synchronous functions? The standard advice is to use Either as return type, and then define an algebraic data type (ADT) for the errors a function might result in. For example:
data OrderError
= NoLineItems
| DeliveryInPast
| DeliveryMethodUnavailable
mkOrder :: OrderDate -> Customer -> [lineIntem] -> DeliveryInfo -> Either OrderError Order
However, once I try to compose multiple error-producing functions together, each with their own error type, how do I compose the error types? I would like to aggregate all errors up to the UI layer of the application, where the errors are interpreted, potentially mapped to locale-specific error messages, and then presented to the user in a uniform way. Of course, this error presentation should not interfere with the functions in the domain ring of the application, which should be pure business logic.
I don't want to define an uber-type - one large ADT that contains all possible errors in the application; because that would mean (a) that all domain-level code would need to depend on this type, which destroys all modularity, and (b) this would create error types that are too large for any given function.
Alternatively, I could define a new error type in each combining function, and then map the individual errors to the combined error type: Say funA comes with error-ADT ErrorA, and funB with ErrorB. If funC, with error type ErrorC, applies both funA and funB, funC needs to map all error-cases from ErrorA and ErrorB to new cases that are all part of ErrorC. This seems to be a lot of boilerplate.
A third option could be that funC wraps the errors from funA and funB:
data ErrorC
= SomeErrorOfFunC
| ErrorsFromFunB ErrorB
| ErrorsFromFunA ErrorA
In this way, mapping gets easier, but the error handling in the UI-ring needs to know about the exact nesting of functions in the inner rings of the application. If I refactor the domain ring, I do need to touch the error unwrapping function in the UI.
I did find a similar question, but the answer using Control.Monad.Exception seems to suggest runtime exceptions rather than error return types. A detailed treatment of the problem seems to be this one by Matt Parson. Yet, the solution involves several GHC extensions, type-level programming and lenses, which is quite a lot of stuff to digest for a newbie like me, who simply wants to write a decent application with proper "by the book" error handling using Haskell's expressive type system.
I heard that PureScript's extensible record would allow to combine error enums more easily. But in Haskell? Is there a straightforward best practice? If so, where can I find some documentation or tutorial on how to do it?
For your aggregatable Error Type, I suggest that you look up validation: A data-type like Either but with an accumulating Applicative.
The library is exactly one module, consisting of only a handful of definitions. The Validation type within the package is essentially (though not literally):
type Validation e a = Either (NonEmpty e) a
What's worth pointing out is that accumulation of errors is achieved using the applicative combinators, namely liftA2, liftA3 and zip. You cannot accumulate errors inside a monad comprehension, aka do notation:
user :: Int -> Validation DomainError User
userId :: User -> Int
post :: Int -> Validation DomainError Post
userAndPost = do
u <- user 1
p <- post . userId $ u
return $ (u,p)
The applicative version on the other hand, may yield two errors:
userAndPostA2 = liftA2 (,) (user 1) (post 1)
The monad version of the userAndPost function above can never produce two errors for both user and post not being found. It's always one or the other. Applicatives, while theoretically recognized as being less powerful than monads, have unique advantages in some practices. Another edge that an applicative has over a monad is that of concurrency. Taking the examples above once again, one can easily deduce why monads inside a comprehension can never be executed concurrently (Notice how fetching of the post depends on the user id from the fetched user, thus dictating that execution of one action depends on the result of the other).
As for your concern for breaking code modularity when opting to define a single disjointed union type DomainError for all domain level errors, I'd venture to say there is no better way to model it, provided that the said domain-specific error type is only constructed and passed around by functions in the domain layer. Once say the HTTP layer calls the function from the domain layer it would then need to translate the error from the domain layer to that of its own, e.g via a mapping function similar to:
eDomainToHTTP :: DomainError -> HTTPError
eDomainToHTTP InvalidCredentials = Forbidden403
eDomainToHTTP UserNotFound = NotFound404
eDomainToHTTP PostNotFound = NotFound404
With one such function, you can easily transform any input -> Validation DomainError output to input -> Validation HTTPError output, thus preserving encapsulation and modularity within your codebase.
Is there a convention so that I know when to expect runX versus getX typeclass functions?
It's purely a matter of how the author preferred to think about what they're representing. And it's often more about the "abstract concept" being represented, not the actual data structure being used to represent it.
If you have some type X and think of an X value as a computation that could be run to get a value, then you'd have a runX function. If you think of it as more like a container, then you'd have a getX function (There are other possible interpretations that could lead to runX or getX or something else, these are just 2 commonly recurring ways of thinking about values).
Of course when we're using first class Haskell values to represent things (and functions are perfectly good values), a lot of the time you could interpret something as either a computation or a container reasonably well. Consider State for representing stateful computations; surely that has to be interpreted as a computation, right? We say runState :: State s a -> s -> (a , s) because we think of it as "running" the State s a, needing an s as additional input. But we could just as easily think of it as "getting" an s -> (a, s) out of the State s a - treating State more like a container.
So the choice between runX and getX isn't really meaningful in any profound tense, but it tells you how the author was thinking about X (and perhaps how they think you should think about it).
Const is so-named in analogy to the function const (which takes an argument to produce the "constant function" that takes another input, ignores it, and returns whatever the first input to const was). But it's thought of as operating at the type level; Const takes a type and generates a "type-level function" that ignores whatever type it is applied to and then is isomorphic to the first type Const was applied to. Isomorphic rather than equal because to create a new type that could have different instances, it needs to have a constructor. At the value level, in order to be an isomorphism you need to be able to get a Const a b from an a (that's the Const constructor), and get the a back out of a Const a b. Since "being isomorphic to a" is all the properties we need it to have there's no real need to think of it as doing anything other than being a simple container of a, so we have getConst.
Identity seems similarly obvious as "just a container" and we have runIdentity. But one of the main motivations for having Identity is to think of Identity a as being a "monadic computation" in the same way that State s a, Reader e a, etc values are. So to continue the analogy we think of Identity as a "do-nothing" computation we run, rather than a simple wrapper container that we get a value out of. It would be perfectly valid to think of Identity as a container (the simplest possible one), but that wasn't the interpretation the authors chose to focus on.
I am writing a game in Haskell in which the player and the AI are taking some actions in turns. Until now, the AIs worked by generating actions using all the information about the game, i.e they were functions of the form GameHistory -> GameState -> Action.
This way these functions generate some information they need from the history each time they are called. It would be a lot easier to write AIs if they could have some kind of "internal state" which persists between their turns (i.e. calls to the corresponding function). How could one implement something like that? (By the way, I should also take into account that internal states of different kinds of AIs could have different types.)
What you're looking for might be something like:
newtype AI = AI { runAI :: GameState -> (AI, Action) }
i.e. you'll return your actor's new state along with the action. You might make use of the State monad here. You might also be interested in reading about automata. If you need to serialize your AI (to store it in a database, say) then you might need to do something different.
I'm looking to write a generic module that allows Haskell programs to interact with Cassandra. The module will need to maintain its own state. For example, it will have a connection pool and a list of callbacks to be invoked when a new record is saved. How should I structure the code so that this module can maintain its state? Here are some of the approaches I've been considering. Am I on the right track? (I'm new to Haskell and still learning the best ways to think functionally.)
Option 1:
The module runs in a (StateT s IO) monad, where s is the global state for the entire program using the Cassandra module. Of course, since the Cassandra module could be used by multiple programs, the details of what's in s should be invisible to the Cassandra module. The module would have to export a type class that allowed it to extract the CassandraState from s and push a new CassandraState back into s. Then, any program using the module would have to make its main state a member of this type class.
Option 2:
The module runs in a (StateT CassandraState IO) monad. Every time someone calls an action in the module, they would have to extract the CassandraState from wherever they have it stashed off, invoke the action with runState, and take the resulting state and stash it off again (wherever).
Option 3:
Don't put the Cassandra module's functions in a StateT monad at all. Instead, have the caller explicitly pass in CassandraState's when needed. The problem with option 2 is that not all of the functions in the module will modify the state. For example, obtaining a connection will modify the state and will require the caller to stash off the resulting state. But, saving a new record needs to read the state (to get the callbacks), but it doesn't need to change the state. Option 2 doesn't give the caller any hint that connect changes the state while create doesn't.
But, if I move away from using the StateT monad and just have functions that take in states as parameters and return either simple values or tuples of simple values and new states, then it's really obvious to the caller when the state needs to be saved off. (Under the covers in my module, I'd take the incoming states and build them into a (StateT CassandraState IO) monad, but the details of this would be hidden from the caller. So, to the caller, the interface is very explicit, but under the covers, it's just Option 2.)
Option 4:
Something else?
This problem must come up quite often when building reusable modules. Is there some sort of standard way to solve it?
(By the way, if someone knows a better way to interact with Cassandra from Haskell than using Thrift, please let me know! Maybe I don't have to write this at all. :-)
Something like the HDBC model would be to have an explicit CassandraConnection data type. It has an MVar inside with some mutable state. Since all your actions are in IO anyway I'd imagine, they can just take the CassandraConnection as an argument to these actions. The user then can pack that connection into a state or reader monad, or thread it explicitly, or do whatever they want.
Internally you can use a monad or not -- that's really your call. However, I favor APIs that when possible don't force users into any particular monad unless truly necessary.
So this is a sort of version of option 3. But the user shouldn't really care whether or not they're changing the connection state -- at that level you can really hide the details from them.
I'd go with Option 2. Users of your module shouldn't use runState directly; instead, you should provide an opaque Cassandra type with an instance of the Monad typeclass and some runCassandra :: Cassandra a -> IO a operation to "escape" Cassandra. The operations exported by your module should all run in the Cassandra monad (e.g. doSomethingInterestingInCassandra :: Int -> Bool -> Cassandra Char), and their definition can access the wrapped CassandraState.
If your users need some additional state for their application, they can always wrap a monad transformer around Cassandra, e.g. StateT MyState Cassandra.
God I hate the term "code smell", but I can't think of anything more accurate.
I'm designing a high-level language & compiler to Whitespace in my spare time to learn about compiler construction, language design, and functional programming (compiler is being written in Haskell).
During the code generation phase of the compiler, I have to maintain "state"-ish data as I traverse the syntax tree. For example, when compiling flow-control statements I need to generate unique names for the labels to jump to (labels generated from a counter that's passed in, updated, & returned, and the old value of the counter must never be used again). Another example is when I come across in-line string literals in the syntax tree, they need to be permanently converted into heap variables (in Whitespace, strings are best stored on the heap). I'm currently wrapping the entire code generation module in the state monad to handle this.
I've been told that writing a compiler is a problem well suited to the functional paradigm, but I find that I'm designing this in much the same way I would design it in C (you really can write C in any language - even Haskell w/ state monads).
I want to learn how to think in Haskell (rather, in the functional paradigm) - not in C with Haskell syntax. Should I really try to eliminate/minimize use of the state monad, or is it a legitimate functional "design pattern"?
I've written multiple compilers in Haskell, and a state monad is a reasonable solution to many compiler problems. But you want to keep it abstract---don't make it obvious you're using a monad.
Here's an example from the Glasgow Haskell Compiler (which I did not write; I just work around a few edges), where we build control-flow graphs. Here are the basic ways to make graphs:
empyGraph :: Graph
mkLabel :: Label -> Graph
mkAssignment :: Assignment -> Graph -- modify a register or memory
mkTransfer :: ControlTransfer -> Graph -- any control transfer
(<*>) :: Graph -> Graph -> Graph
But as you've discovered, maintaining a supply of unique labels is tedious at best, so we provide these functions as well:
withFreshLabel :: (Label -> Graph) -> Graph
mkIfThenElse :: (Label -> Label -> Graph) -- branch condition
-> Graph -- code in the 'then' branch
-> Graph -- code in the 'else' branch
-> Graph -- resulting if-then-else construct
The whole Graph thing is an abstract type, and the translator just merrily constructs graphs in purely functional fashion, without being aware that anything monadic is going on. Then, when the graph is finally constructed, in order to turn it into an algebraic datatype we can generate code from, we give it a supply of unique labels, run the state monad, and pull out the data structure.
The state monad is hidden underneath; although it's not exposed to the client, the definition of Graph is something like this:
type Graph = RealGraph -> [Label] -> (RealGraph, [Label])
or a bit more accurately
type Graph = RealGraph -> State [Label] RealGraph
-- a Graph is a monadic function from a successor RealGraph to a new RealGraph
With the state monad hidden behind a layer of abstraction, it's not smelly at all!
I'd say that state in general is not a code smell, so long as it's kept small and well controlled.
This means that using monads such as State, ST or custom-built ones, or just having a data structure containing state data that you pass around to a few places, is not a bad thing. (Actually, monads are just assistance in doing exactly this!) However, having state that goes all over the place (yes, this means you, IO monad!) is a bad smell.
An fairly clear example of this was when my team was working on our entry for the ICFP Programming Contest 2009 (the code is available at git://git.cynic.net/haskell/icfp-contest-2009). We ended up with several different modular parts to this:
VM: the virtual machine that ran the simulation program
Controllers: several different sets of routines that read the output of the simulator and generated new control inputs
Solution: generation of the solution file based on the output of the controllers
Visualizers: several different sets of routines that read both the input and output ports and generated some sort of visualization or log of what was going on as the simulation progressed
Each of these has its own state, and they all interact in various ways through the input and output values of the VM. We had several different controllers and visualizers, each of which had its own different kind of state.
The key point here was that the the internals of any particular state were limited to their own particular modules, and each module knew nothing about even the existence of state for other modules. Any particular set of stateful code and data was generally only a few dozen lines long, with a handful of data items in the state.
All this was glued together in one small function of about a dozen lines which had no access to the internals of any of the states, and which merely called the right things in the proper order as it looped through the simulation, and passed a very limited amount of outside information to each module (along with the module's previous state, of course).
When state is used in such a limited way, and the type system is preventing you from inadvertently modifying it, it's quite easy to handle. It's one of the beauties of Haskell that it lets you do this.
One answer says, "Don't use monads." From my point of view, this is exactly backwards. Monads are a control structure that, among other things, can help you minimize the amount of code that touches state. If you look at monadic parsers as an example, the state of the parse (i.e., the text being parsed, how far one has gotten in to it, any warnings that have accumulated, etc.) must run through every combinator used in the parser. Yet there will only be a few combinators that actually manipulate the state directly; anything else uses one of these few functions. This allows you to see clearly and in one place all of a small amount of code that can change the state, and more easily reason about how it can be changed, again making it easier to deal with.
Have you looked at Attribute grammars (AG)? (More info on wikipedia and an article in the Monad Reader)?
With AG you can add attributes to a syntax tree. These attributes are separated in synthesized and inherited attributes.
Synthesized attributes are things you generate (or synthesize) from your syntax tree, this could be the generated code, or all comments, or whatever else your interested in.
Inherited attributes are input to your syntax tree, this could be the environment, or a list of labels to use during code generation.
At Utrecht University we use the Attribute Grammar System (UUAGC) to write compilers. This is a pre-processor which generates haskell code (.hs files) from the provided .ag files.
Although, if you're still learning Haskell, then maybe this is not the time to start learning yet another layer of abstraction over that.
In that case, you could manually write the sort of code that attributes grammars generate for you, for example:
data AbstractSyntax = Literal Int | Block AbstractSyntax
| Comment String AbstractSyntax
compile :: AbstractSyntax -> [Label] -> (Code, Comments)
compile (Literal x) _ = (generateCode x, [])
compile (Block ast) (l:ls) = let (code', comments) = compile ast ls
in (labelCode l code', comments)
compile (Comment s ast) ls = let (code, comments') = compile ast ls
in (code, s : comments')
generateCode :: Int -> Code
labelCode :: Label -> Code -> Code
It's possible that you may want an applicative functor instead of a
monad:
http://www.haskell.org/haskellwiki/Applicative_functor
I think the original paper explains it better than the wiki, however:
http://www.soi.city.ac.uk/~ross/papers/Applicative.html
I don't think using the State Monad is a code smell when it used to model state.
If you need to thread state through your functions,
you can do this explicitly, taking the the state as an argument and returning it in each function.
The State Monad offers a good abstraction: it passes the state along for you and
provides lots of useful function to combine functions that require state.
In this case, using the State Monad (or Applicatives) is not a code smell.
However, if you use the State Monad to emulate an imperative style of programming
while a functional solution would suffice, you are just making things complicated.
In general you should try to avoid state wherever possible, but that's not always practical. Applicative makes effectful code look nicer and more functional, especially tree traversal code can benefit from this style. For the problem of name generation there is now a rather nice package available: value-supply.
Well, don't use monads. The power of functional programming is function purity and their reuse. There's this paper a professor of mine once wrote and he's one of the guys who helped build Haskell.
The paper is called "Why functional programming matters", I suggest you read through it. It's a good read.
let's be careful about the terminology here. State is not per se bad; functional languages have state. What is a "code smell" is when you find yourself wanting to assign variables values and change them.
Of course, the Haskell state monad is there for just that reason -- as with I/O, it's letting you do unsafe and un-functional things in a constrained context.
So, yes, it's probably a code smell.