In Haskell, how to make read-only parameters that depend on ST

In Haskell, how to make read-only parameters that depend on ST - haskell

Context:
When considering the signature of a function in a typical imperative language, some parameters might be denoted as mutable references, some parameters might be denoted as immutable references, some parameters might be seen as simple pure constants.
I am trying to understand how to reproduce this in Haskell, most importantly the mutable/immutable handling of variables that depend on the state.
There are several approaches to manage state in Haskell.
One approach seems to be via State/StateT/MonadState which fit well with monad transformers.
Among stateful function parameters, if I want to make it explicit that one should be regarded as immutable inside the function body, I believe answers to that question: Making Read-Only functions for a State in Haskell explain well how to do it, by using Reader or MonadReader.
Another approach to manage state (which I am more interested in that case), is with ST. I like ST better because, it allows to manage more than just one memory cell at the same time, and it appears to be more performant than State.
The problem now is that I don't know how to properly manage a distinction between mutable/immutable stateful variables in ST. The Reader way does not seem to apply in that case. I have been looking at the STMonadTrans package which seems to help make ST fit with monad transformers, but I am not sure how to use it.
Question: Do you have a simple example of a function f that creates a mutable variable x with newSTRef, and passes x to a function g, in an immutable way, that is, in such a way that g can read x but not modify x? If not is there a workaround?
Remark 1: A workaround could be to freeze the mutable variables before passing them to make them pure, however in my case its not acceptable solution because freezing can be either expensive or unsafe, and it is not possible to freeze complex structures quickly such as vectors of vectors. Unsafe coerce is not acceptable either. I am looking for a safe zero runtime cost solution.
Remark 2: Someone said I can just read the reference before going into the function, but this is over simplified answer to my simplified question. In a more general context, it is possible that one cannot readSTRef the variable x before going into the function g because x is more complex like a set of mutable arrays.
I am still asking my question in that simple way to try to figure out how to do the general thing on a simple example.
Thanks

I haven't read the novel that is the comments section, but a pattern like
newtype ReadOnly s a = ReadOnly (ST s a)
makeReadOnly :: STRef s a -> ReadOnly s a
makeReadOnly = ReadOnly . readSTRef
has served me well. This is a classic trick: if you want a data type to support some operations, just define the data type to be a record of the operations you want to support. In this case, there is only one.
(As a bonus, it can be seen from this that "read only variables" are highly composable!)

Related

Getting value out of context or leaving them in?

Assume code below. Is there a quicker way to get the contextual values out of findSerial rather than writing a function like outOfContext?
The underlying question is: does one usually stick within context and use Functors, Applicatives, Monoids and Monads to get the job done, or is it better to take it out of context and apply the usual non-contextual computation methods. In brief: don't want to learn Haskell all wrong, since it takes time enough as it does.
import qualified Data.Map as Map
type SerialNumber = (String, Int)
serialList :: Map.Map String SerialNumber
serialList = Map.fromList [("belt drive",("BD",0001))
,("chain drive",("CD",0002))
,("drive pulley",("DP",0003))
,("drive sprocket",("DS",0004))
]
findSerial :: Ord k => k -> Map.Map k a -> Maybe a
findSerial input = Map.lookup input
outOfContext (Just (a, b)) = (a, b)

Assuming I understand it correctly, I think your question essentially boils down to “Is it idiomatic in Haskell to write and use partial functions?” (which your outOfContext function is, since it’s just a specialized form of the built-in partial function fromJust). The answer to that question is a resounding no. Partial functions are avoided whenever possible, and code that uses them can usually be refactored into code that doesn’t.
The reason partial functions are avoided is that they voluntarily compromise the effectiveness of the type system. In Haskell, when a function has type X -> Y, it is generally assumed that providing it an X will actually produce a Y, and that it will not do something else entirely (i.e. crash). If you have a function that doesn’t always succeed, reflecting that information in the type by writing X -> Maybe Y forces the caller to somehow handle the Nothing case, and it can either handle it directly or defer the failure further to its caller (by also producing a Maybe). This is great, since it means that programs that typecheck won’t crash at runtime. The program might still have logical errors, but knowing before even running the program that it won’t blow up is still pretty nice.
Partial functions throw this guarantee out the window. Any program that uses a partial function will crash at runtime if the function’s preconditions are accidentally violated, and since those preconditions are not reflected in the type system, the compiler cannot statically enforce them. A program might be logically correct at the time of its writing, but without enforcing that correctness with the type system, further modification, extension, or refactoring could easily introduce a bug by mistake.
For example, a programmer might write the expression
if isJust n then fromJust n else 0
which will certainly never crash at runtime, since fromJust’s precondition is always checked before it is called. However, the type system cannot enforce this, and a further refactoring might swap the branches of the if, or it might move the fromJust n to a different part of the program entirely and accidentally omit the isJust check. The program will still compile, but it may fail at runtime.
In contrast, if the programmer avoids partial functions, using explicit pattern-matching with case or total functions like maybe and fromMaybe, they can replace the tricky conditional above with something like
fromMaybe 0 n
which is not only clearer, but ensures any accidental misuse will simply fail to typecheck, and the potential bug will be detected much earlier.
For some concrete examples of how the type system can be a powerful ally if you stick exclusively to total functions, as well as some good food for thought about different ways to encode type safety for your domain into Haskell’s type system, I highly recommend reading Matt Parsons’s wonderful blog post, Type Safety Back and Forth, which explores these ideas in more depth. It additionally highlights how using Maybe as a catch-all representation of failure can be awkward, and it shows how the type system can be used to enforce preconditions to avoid needing to propagate Maybe throughout an entire system.

Modify an argument in a pure function

I am aware that I should not do this
I am asking for dirty hacks to do something nasty
Goal
I wish to modify an argument in a pure function.
Thus achieving effect of pass by reference.
Illustration
For example, I have got the following function.
fn :: a -> [a] -> Bool
fn a as = elem a as
I wish to modify the arguments passed in. Such as the list as.
as' = [1, 2, 3, 4]
pointTo as as'

It is a common misconception that Haskell just deliberately prevents you from doing stuff it deems unsafe, though most other languages allow it. That's the case, for instance, with C++' const modifiers: while they are a valuable guarantee that something which is supposed to stay constant isn't messed with by mistake, it's generally assumed that the overall compiled result doesn't really change when using them; you still get basically impure functions, implemented as some assembler instructions over a stack memory frame.
What's already less known is that even in C++, these const modifiers allow the compiler to apply certain optimisations that can completely break the results when the references are modified (as possible with const_cast – which in Haskell would have at least “unsafe” in its name!); only the mutable keyword gives the guarantee that hidden modifications don't mess up the sematics you'd expect.
In Haskell, the same general issue applies, but much stronger. A Haskell function is not a bunch of assembler instructions operating on a stack frame and memory locations pointed to from the stack. Sometimes the compiler may actually generate something like that, but in general the standard is very careful not to suggest any particular implementation details. So actually, functions may get inlined, stream-fused, rule-replaced etc. to something where it totally wouldn't make sense to even think about “modifying the arguments”, because perhaps the arguments never even exist as such. The thing that Haskell guarantees is that, in a much higher-level sense, the semantics are preserved; the semantics of how you'd think about a function mathematically. In mathematics, it also doesn't make sense at all, conceptually, to talk about “modified arguments”.

Haskell constructor aliases

Is there a way to have something equivalent to creating "constructor aliases" in Haskell? I'm thinking similar to type aliases where you can give the type a different name but it still behaves in every way as the aliased type.
My use case is a system where I have an assigned time as a property of some objects I'm modelling, so UTCTime. Some of these could be "variable" times, meaning it might not yet be assigned a time or the time it does have is "movable". So Maybe UTCTime.
But only some of the objects have variable times. Others have fixed times that the system has to take as a constant; a time variable currently assigned to a particular time is not handled the same way as a fixed time. Which now suggests Either UTCTime (Maybe UTCTime); it's either a fixed time or a variable time that might be unassigned.
The generic types seem to fit what I'm trying to model really well, so using them feels natural. But while it's obvious what Either UTCTime (Maybe UTCTime) is, it's not particularly obvious what it means, so some descriptive special-case names would be nice.
A simple type Timeslot = Either UTCTime (Maybe UTCTime) would definitely clean up my type signatures a lot, but that does nothing for the constructors. I can use something like bound = Just to get a name for constructing values, but not for pattern matching.
At the other end I can define a custom ADT with whatever names I want, but then I lose all the predefined functionality of the Either and Maybe types. Or rather I'll be applying transformations back and forth all the time (which I suppose is no worse than the situation with using newtype wrappers for things, only without the efficiency guarantee, but I doubt this would be a bottleneck anyway). And I suppose to understand code using generic Either and Maybe functions to manipulate my Timeslot values I'll need to know the way the standard constructors are mapped to whatever I want to use anyway, and the conversion functions would supply a handy compiler-enforced definition of that mapping. So maybe this is a good approach afterall.
I'm pretty sure I know Haskell well enough to say that there is no such thing as constructor-aliasing, but I'm curious whether there's some hack I don't know about, or some other good way of handling this situation.

Despite the drawbacks you mentioned, I strongly suggest simply creating a fresh ADT for your type; for example
data TimeVariable = Constant UTCTime | Assigned UTCTime | Unassigned
I offer these arguments:
Having descriptive constructors will make your code -- both construction and pattern matching -- significantly more readable. Compare Unassigned and Right Nothing. Now add six months and do the same comparison.
I suspect that as your application grows, you will find that this type needs to expand. Adding another constructor or another field to an existing constructor is much easier with a custom ADT, and it makes it very easy to identify code locations that need to be updated to deal with the new type.
Probably there will not be quite as many sensible operations on this type as there are in the standard library for munging Either and Maybe values -- so I bet you won't be duplicating nearly as much code as you think. And though you may be duplicating some code, giving your functions descriptive names is valuable for the same readability and refactoring reasons that giving your constructors descriptive names is.
I have personally written some code where all my sums were Either and all my products were (,). It was horrible. I could never remember which side of a sum meant which thing; when reading old code I had to constantly remind myself what conceptual type each value was supposed to be (e.g. Right doesn't tell you whether you're using Right here as part of a time variable or part of some other thing that you were too lazy to make an ADT for); I had to constantly mentally expand type aliases; etc. Learn from my pain. ;-)

The 'pattern synonyms' might get merged into ghc: http://ghc.haskell.org/trac/ghc/ticket/5144. In the meantime there is also -XViewPatterns, which lets you write things like:
type Timeslot = Either UTCTime (Maybe UTCTime)
fieldA = either Just (const Nothing)
fieldB = either (const Nothing) id
f (fieldA -> Just time) = ...
f (fieldB -> Just time) = ...
f _ = ...

Where to apply Behavior (and other types) in FRP

I'm working on a program using reactive-banana, and I'm wondering how to structure my types with the basic FRP building blocks.
For instance, here's a simplified example from my real program: say my system is composed primarily of widgets — in my program, pieces of text that vary over time.
I could have
newtype Widget = Widget { widgetText :: Behavior String }
but I could also have
newtype Widget = Widget { widgetText :: String }
and use Behavior Widget when I want to talk about time-varying behaviour. This seems to make things "simpler", and means I can use Behavior operations more directly, rather than having to unpack and repack Widgets to do it.
On the other hand, the former seems to avoid duplication in the code that actually defines widgets, since almost all of the widgets vary over time, and I find myself defining even the few that don't with Behavior, since it lets me combine them with the others in a more consistent manner.
As another example, with both representations, it makes sense to have a Monoid instance (and I want to have one in my program), but the implementation for the latter seems more natural (since it's just a trivial lifting of the list monoid to the newtype).
(My actual program uses Discrete rather than Behavior, but I don't think that's relevant.)
Similarly, should I use Behavior (Coord,Coord) or (Behavior Coord, Behavior Coord) to represent a 2D point? In this case, the former seems like the obvious choice; but when it's a five-element record representing something like an entity in a game, the choice seems less clear.
In essence, all these problems reduce down to:
When using FRP, at what layer should I apply the Behavior type?
(The same question applies to Event too, although to a lesser degree.)

The rules I use when developing FRP applications, are:
Isolate the "thing that changes" as much as possible.
Group "things that change simultaneously" into one Behavior/Event.
The reason for (1) is that it becomes easier to create and compose abstract operations if the data types that you use are as primitive as possible.
The reason for this is that instances such as Monoid can be reused for raw types, as you described.
Note that you can use Lenses to easily modify the "contents" of a datatype as if they were raw values, so that extra "wrapping/unwrapping" isn't a problem, mostly. (See this recent tutorial for an introduction to this particular Lens implementation; there are others)
The reason for (2) is that it just removes unnecessary overhead. If two things change simultaneously, they "have the same behavior", so they should be modeled as such.
Ergo/tl;dr: You should use newtype Widget = Widget { widgetText :: Behavior String } because of (1), and you should use Behavior (Coord, Coord) because of (2) (since both coordinates usually change simultaneously).

I agree with dflemstr's advice to
Isolate the "thing that changes" as much as possible.
Group "things that change simultaneously" into one Behavior/Event.
and would like to offer additional reasons for these rules of thumb.
The question boils down to the following: you want to represent a pair (tuple) of values that change in time and the question is whether to use
a. (Behavior x, Behavior y) - a pair of behaviors
b. Behavior (x,y) - a behavior of pairs
Reasons for preferring one over the other are
a over b.
In a push-driven implementation, the change of a behavior will trigger a recalculation of all behaviors that depend on it.
Now, consider a behaviors whose value depends only on the first component x of the pair. In variant a, a change of the second component y will not recompute the behavior. But in variant b, the behavior will be recalculated, even while its value does not depend on the second component at all. In other words, it's a question of fine-grained vs coarse-grained dependencies.
This is an argument for advice 1. Of course, this is not of much importance when both behaviors tend to change simultaneously, which yields advice 2.
Of course, the library should offer a way to offer fine-grained dependencies even for variant b. As of reactive-banana version 0.4.3, this is not possible, but don't worry about that for now, my push-driven implementation is going to mature in future versions.
b over a.
Seeing that reactive-banana version 0.4.3 does not offer dynamic event switching yet, there are certain programs that you can only write if you put all components in a single behavior. The canoncial example would be a program that features variable number of counters, i.e. an extension of the TwoCounter.hs example. You have to represent it as a time-changing list of values
counters :: Behavior [Int]
because there is no way to keep track of a dynamic collection of behaviors yet. That said, the next version of reactive-banana will include dynamic event switching.
Also, you can always convert from variant a to variant b without any trouble
uncurry (liftA2 (,)) :: (Behavior a, Behavior b) -> Behavior (a,b)

Use of Haskell state monad a code smell?

God I hate the term "code smell", but I can't think of anything more accurate.
I'm designing a high-level language & compiler to Whitespace in my spare time to learn about compiler construction, language design, and functional programming (compiler is being written in Haskell).
During the code generation phase of the compiler, I have to maintain "state"-ish data as I traverse the syntax tree. For example, when compiling flow-control statements I need to generate unique names for the labels to jump to (labels generated from a counter that's passed in, updated, & returned, and the old value of the counter must never be used again). Another example is when I come across in-line string literals in the syntax tree, they need to be permanently converted into heap variables (in Whitespace, strings are best stored on the heap). I'm currently wrapping the entire code generation module in the state monad to handle this.
I've been told that writing a compiler is a problem well suited to the functional paradigm, but I find that I'm designing this in much the same way I would design it in C (you really can write C in any language - even Haskell w/ state monads).
I want to learn how to think in Haskell (rather, in the functional paradigm) - not in C with Haskell syntax. Should I really try to eliminate/minimize use of the state monad, or is it a legitimate functional "design pattern"?

I've written multiple compilers in Haskell, and a state monad is a reasonable solution to many compiler problems. But you want to keep it abstract---don't make it obvious you're using a monad.
Here's an example from the Glasgow Haskell Compiler (which I did not write; I just work around a few edges), where we build control-flow graphs. Here are the basic ways to make graphs:
empyGraph :: Graph
mkLabel :: Label -> Graph
mkAssignment :: Assignment -> Graph -- modify a register or memory
mkTransfer :: ControlTransfer -> Graph -- any control transfer
(<*>) :: Graph -> Graph -> Graph
But as you've discovered, maintaining a supply of unique labels is tedious at best, so we provide these functions as well:
withFreshLabel :: (Label -> Graph) -> Graph
mkIfThenElse :: (Label -> Label -> Graph) -- branch condition
-> Graph -- code in the 'then' branch
-> Graph -- code in the 'else' branch
-> Graph -- resulting if-then-else construct
The whole Graph thing is an abstract type, and the translator just merrily constructs graphs in purely functional fashion, without being aware that anything monadic is going on. Then, when the graph is finally constructed, in order to turn it into an algebraic datatype we can generate code from, we give it a supply of unique labels, run the state monad, and pull out the data structure.
The state monad is hidden underneath; although it's not exposed to the client, the definition of Graph is something like this:
type Graph = RealGraph -> [Label] -> (RealGraph, [Label])
or a bit more accurately
type Graph = RealGraph -> State [Label] RealGraph
-- a Graph is a monadic function from a successor RealGraph to a new RealGraph
With the state monad hidden behind a layer of abstraction, it's not smelly at all!

I'd say that state in general is not a code smell, so long as it's kept small and well controlled.
This means that using monads such as State, ST or custom-built ones, or just having a data structure containing state data that you pass around to a few places, is not a bad thing. (Actually, monads are just assistance in doing exactly this!) However, having state that goes all over the place (yes, this means you, IO monad!) is a bad smell.
An fairly clear example of this was when my team was working on our entry for the ICFP Programming Contest 2009 (the code is available at git://git.cynic.net/haskell/icfp-contest-2009). We ended up with several different modular parts to this:
VM: the virtual machine that ran the simulation program
Controllers: several different sets of routines that read the output of the simulator and generated new control inputs
Solution: generation of the solution file based on the output of the controllers
Visualizers: several different sets of routines that read both the input and output ports and generated some sort of visualization or log of what was going on as the simulation progressed
Each of these has its own state, and they all interact in various ways through the input and output values of the VM. We had several different controllers and visualizers, each of which had its own different kind of state.
The key point here was that the the internals of any particular state were limited to their own particular modules, and each module knew nothing about even the existence of state for other modules. Any particular set of stateful code and data was generally only a few dozen lines long, with a handful of data items in the state.
All this was glued together in one small function of about a dozen lines which had no access to the internals of any of the states, and which merely called the right things in the proper order as it looped through the simulation, and passed a very limited amount of outside information to each module (along with the module's previous state, of course).
When state is used in such a limited way, and the type system is preventing you from inadvertently modifying it, it's quite easy to handle. It's one of the beauties of Haskell that it lets you do this.
One answer says, "Don't use monads." From my point of view, this is exactly backwards. Monads are a control structure that, among other things, can help you minimize the amount of code that touches state. If you look at monadic parsers as an example, the state of the parse (i.e., the text being parsed, how far one has gotten in to it, any warnings that have accumulated, etc.) must run through every combinator used in the parser. Yet there will only be a few combinators that actually manipulate the state directly; anything else uses one of these few functions. This allows you to see clearly and in one place all of a small amount of code that can change the state, and more easily reason about how it can be changed, again making it easier to deal with.

Have you looked at Attribute grammars (AG)? (More info on wikipedia and an article in the Monad Reader)?
With AG you can add attributes to a syntax tree. These attributes are separated in synthesized and inherited attributes.
Synthesized attributes are things you generate (or synthesize) from your syntax tree, this could be the generated code, or all comments, or whatever else your interested in.
Inherited attributes are input to your syntax tree, this could be the environment, or a list of labels to use during code generation.
At Utrecht University we use the Attribute Grammar System (UUAGC) to write compilers. This is a pre-processor which generates haskell code (.hs files) from the provided .ag files.
Although, if you're still learning Haskell, then maybe this is not the time to start learning yet another layer of abstraction over that.
In that case, you could manually write the sort of code that attributes grammars generate for you, for example:
data AbstractSyntax = Literal Int | Block AbstractSyntax
| Comment String AbstractSyntax
compile :: AbstractSyntax -> [Label] -> (Code, Comments)
compile (Literal x) _ = (generateCode x, [])
compile (Block ast) (l:ls) = let (code', comments) = compile ast ls
in (labelCode l code', comments)
compile (Comment s ast) ls = let (code, comments') = compile ast ls
in (code, s : comments')
generateCode :: Int -> Code
labelCode :: Label -> Code -> Code

It's possible that you may want an applicative functor instead of a
monad:
http://www.haskell.org/haskellwiki/Applicative_functor
I think the original paper explains it better than the wiki, however:
http://www.soi.city.ac.uk/~ross/papers/Applicative.html

I don't think using the State Monad is a code smell when it used to model state.
If you need to thread state through your functions,
you can do this explicitly, taking the the state as an argument and returning it in each function.
The State Monad offers a good abstraction: it passes the state along for you and
provides lots of useful function to combine functions that require state.
In this case, using the State Monad (or Applicatives) is not a code smell.
However, if you use the State Monad to emulate an imperative style of programming
while a functional solution would suffice, you are just making things complicated.

In general you should try to avoid state wherever possible, but that's not always practical. Applicative makes effectful code look nicer and more functional, especially tree traversal code can benefit from this style. For the problem of name generation there is now a rather nice package available: value-supply.

Well, don't use monads. The power of functional programming is function purity and their reuse. There's this paper a professor of mine once wrote and he's one of the guys who helped build Haskell.
The paper is called "Why functional programming matters", I suggest you read through it. It's a good read.

let's be careful about the terminology here. State is not per se bad; functional languages have state. What is a "code smell" is when you find yourself wanting to assign variables values and change them.
Of course, the Haskell state monad is there for just that reason -- as with I/O, it's letting you do unsafe and un-functional things in a constrained context.
So, yes, it's probably a code smell.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string