Haskell: handling cyclic dependencies while tying the knot

Haskell: handling cyclic dependencies while tying the knot - haskell

While writing a programming language that will feature local type inference (i.e. it will be capable of inferring types with the exception of function parameters, like Scala), I've run into a problem with cyclic dependencies.
I perform type-checking/inference by exploring the AST recursively, and lazily mapping each optionally-typed node to a type-checked node. Because the type of any node may depend on the types of other nodes within the AST, I've tied the knot so that I can refer to the types of other nodes while inferring/checking the type of the current node (I keep the typed-AST within the environment of a Reader monad).
This works perfectly well in the typical case, but breaks down with cyclic dependencies, as the program follows the loop endlessly in search of a known type.
The solution to this sort of problem generally (as far as I know) is to maintain a collection of explored nodes, but I cannot think of a referentially-transparent way of doing this while tying the knot, because I do not know in advance the order in which the nodes will be visited/evaluated, as this depends on the graph of their dependencies on one another.
As such, it seems I need to maintain a local, mutable collection of explored nodes. In order to do so, I tried the following:
Using the State monad, which failed because it seems that each sub-computation receives its own copy of the state, so no information about already explored nodes can be shared between different branches of the computation.
Using the IO monad with IORefs, which precluded me from tying the knot as a result of its strictness.
Using unsafePerformIO with IORefs, which introduced problems with mutations occurring out of order or not at all.
Using the ST monad with STRefs, which introduced the same problems with strictness as the IO monad.
Finally, I came up with a solution using the ST monad, in which I force lazy evaluation while mapping over the AST using unsafeInterleaveST, which works, but feels fragile.
Is there a more idiomatic and/or referentially transparent solution that isn't obscenely lengthy or complicated? I would have included a code sample, but my simplest formulation of this problem is ~250 lines.

Related

Use type classes to implement dependency inversion in a Haskell application?

One major architectural goal when designing large applications is to reduce coupling and dependencies. By dependencies, I mean source-code dependencies, when one function or data type uses another function or another type. A high-level architecture guideline seems to be the Ports & Adapters architecture, with slight variations also referred to as Onion Architecture, Hexagonal Architecture, or Clean Architecture: Types and functions that model the domain of the application are at the center, then come use cases that provide useful services on the basis of the domain, and in the outermost ring are technical aspects like persistence, networking and UI.
The dependency rule says that dependencies must point inwards only. E.g.; persistence may depend on functions and types from use cases, and use cases may depend on functions and types from the domain. But the domain is not allowed to depend on the outer rings. How should I implement this kind of architecture in Haskell? To make it concrete: How can I implement a use case module that does not depend (= import) functions and types from a persistence module, even though it needs to retrieve and store data?
Say I want to implement a use case order placement via a function U.placeOrder :: D.Customer -> [D.LineItem] -> IO U.OrderPlacementResult, which creates an order from line items and attempts to persist the order. Here, U indicates the use case module and D the domain module. The function returns an IO action because it somehow needs to persist the order. However, the persistence itself is in the outermost architectural ring - implemented in some module P; so, the above function must not depend on anything exported from P.
I can imagine two generic solutions:
Higher order functions: The function U.placeOrder takes an additional function argument, say U.OrderDto -> U.PersistenceResult. This function is implemented in the persistence (P) module, but it depends on types of the U module, whereas the U module does not need to declare a dependency on P.
Type classes: The U module defines a Persistence type class that declares the above function. The P module depends on this type class and provides an instance for it.
Variant 1 is quite explicit but not very general. Potentially it results in functions with many arguments. Variant 2 is less verbose (see, for example, here). However, Variant 2 results in many unprincipled type classes, something considered bad practice in most modern Haskell textbooks and tutorials.
So, I am left with two questions:
Am I missing other alternatives?
Which approach is the generally recommended one, if any?

There are, indeed, other alternatives (see below).
While you can use partial application as dependency injection, I don't consider it a proper functional architecture, because it makes everything impure.
With your current example, it doesn't seem to matter too much, because U.placeOrder is already impure, but in general, you'd want your Haskell code to consist of as much referentially transparent code as possible.
You sometimes see a suggestion involving the Reader monad, where the 'dependencies' are passed to the function as the reader context instead of as straight function arguments, but as far as I can tell, these are just (isomorphic?) variations of the same idea, with the same problems.
Better alternatives are functional core, imperative shell, and free monads. There may be other alternatives as well, but these are the ones I'm aware of.
Functional core, imperative shell
You can often factor your code so that your domain model is defined as a set of pure functions. This is often easier to do in languages like Haskell and F# because you can use sum types to communicate decisions. The U.placeOrder function might, for example, look like this:
U.placeOrder :: D.Customer -> [D.LineItem] -> U.OrderPlacementDecision
Notice that this is a pure function, where U.OrderPlacementDecision might be a sum type that enumerates all the possible outcomes of the use case.
That's your functional core. You'd then compose your imperative shell (e.g. your main function) in an impureim sandwich:
main :: IO ()
main = do
stuffFromDb <- -- call the persistence module code here
customer -- initialised from persistence module, or some other place
lineItems -- ditto
let decision = U.placeOrder customer lineItems
_ <- persist decision
return ()
(I've obviously not tried to type-check that code, but I hope it's sufficiently correct to get the point accross.)
Free monads
The functional core, imperative shell is by far the simplest way to achieve the desired architectural outcome, and it's conspicuously often possible to get away with. Still, there are cases where that's not possible. In those cases, you can instead use free monads.
With free monads, you can define data structures that are roughly equivalent to object-oriented interfaces. Like in the functional core, imperative shell case, these data structures are sum types, which means that you can keep your functions pure. You can then run an impure interpreter over the generated expression tree.
I've written an article series about how to think about dependency injection in F# and Haskell. I've also recently published an article that (among other things) showcases this technique. Most of my articles are accompanied by GitHub repositories.

Avoid revisiting node in an invariant directed graph

This is perhaps related to functional data structures, but I found no tags about this topic.
Say I have a syntax tree type Tree, which is organised as a DAG by simply sharing common sub expressions. For example,
data Tree = Val Int | Plus Tree Tree
example :: Tree
example = let x = Val 42 in Plus x x
Then, on this syntax tree type, I have a pure function simplify :: Tree -> Tree, which, when given the root node of a Tree, simplifies the whole tree by first simplifying the children of that root node, and then handle the operation of the root node itself.
Since simplify is a pure function, and some nodes are shared, we expect not to call simplify multiple times on those shared nodes.
Here comes the problem. The whole data structure is invariant, and the sharing is transparent to the programmer, so it seems impossible to determine whether or not two nodes are in fact the same nodes.
The same problem happens when handling the so-called “tying-the-knot” structures. By tying the knot, we produce a finite data representation for an otherwise infinite data structure, e.g. let xs = 1 : xs in xs. Here xs itself is finite, but calling map succ on it does not necessarily produce a finite representation.
These problems can be concluded as such: when the data is organised in an invariant directed graph, how do we avoid revisiting the same node, doing duplicated work, or even resulting in non-termination when the graph happened to be cyclic?
Some ideas that I have thought of:
Extend the Tree type to Tree a, making every nodes hold an extra a. When generating the graph, associate each node with a unique a value. The memory address should have worked here, despite that the garbage collector may move any heap object at any time.
For the syntax tree example, we may store a STRef (Maybe Tree) in every node for the simplified version, but this might not be extensible, and injects some implementation detail of a specific operation to the whole data structure itself.

This is a problem with a lot of research behind it. In general, you cannot observe sharing in a pure language like Haskell, due to referential transparency. But in practice, you can safely observe sharing so long as you restrict yourself to doing the observing in the IO monad. Andy Gill (one of the legends from the old Glasgow school of FP!) has written a wonderful paper about this about 10 years ago:
http://ku-fpg.github.io/files/Gill-09-TypeSafeReification.pdf
Very well worth reading, and the bibliography will give you pointers to prior art in this area and many suggested solutions, from "poor-man's morally-safe" approaches to fully monadic knot-tying techniques. In my mind, Andy's solution and the corresponding reify package in Hackage:
https://hackage.haskell.org/package/data-reify
are the most practical solutions to this problem. And I can tell from experience that they work really well in practice.

Haskell compiler magic: what requires a special treatment from the compiler?

When trying to learn Haskell, one of the difficulties that arise is the ability when something requires special magic from the compiler. One exemple that comes in mind is the seq function which can't be defined i.e. you can't make a seq2 function behaving exactly as the built-in seq. Consequently, when teaching someone about seq, you need to mention that seq is special because it's a special symbol for the compiler.
Another example would be the do-notation which only works with instances of the Monad class.
Sometimes, it's not always obvious. For instance, continuations. Does the compiler knows about Control.Monad.Cont or is it plain old Haskell that you could have invented yourself? In this case, I think nothing special is required from the compiler even if continuations are a very strange kind of beast.
Language extensions set aside, what other compiler magic Haskell learners should be aware of?

Nearly all the ghc primitives that cannot be implemented in userland are in the ghc-prim package. (it even has a module called GHC.Magic there!)
So browsing it will give a good sense.
Note that you should not use this module in userland code unless you know exactly what you are doing. Most of the usable stuff from it is exported in downstream modules in base, sometimes in modified form. Those downstream locations and APIs are considered more stable, while ghc-prim makes no guarantees as to how it will act from version to version.
The GHC-specific stuff is reexported in GHC.Exts, but plenty of other things go into the Prelude (such as basic data types, as well as seq) or the concurrency libraries, etc.

Polymorphic seq is definitely magic. You can implement seq for any specific type, but only the compiler can implement one function for all possible types [and avoid optimising it away even though it looks no-op].
Obviously the entire IO monad is deeply magic, as is everything to with concurrency and parallelism (par, forkIO, MVar), mutable storage, exception throwing and catching, querying the garbage collector and run-time stats, etc.
The IO monad can be considered a special case of the ST monad, which is also magic. (It allows truly mutable storage, which requires low-level stuff.)
The State monad, on the other hand, is completely ordinary user-level code that anybody can write. So is the Cont monad. So are the various exception / error monads.
Anything to do with syntax (do-blocks, list comprehensions) is hard-wired into the language definition. (Note, though, that some of these respond to LANGUAGE RebindableSyntax, which lets you change what functions it binds to.) Also the deriving stuff; the compiler "knows about" a handful of special classes and how to auto-generate instances for them. Deriving for newtype works for any class though. (It's just copying an instance from one type to another identical copy of that type.)
Arrays are hard-wired. Much like every other programming language.
All of the foreign function interface is clearly hard-wired.
STM can be implemented in user code (I've done it), but it's currently hard-wired. (I imagine this gives a significant performance benefit. I haven't tried actually measuring it.) But, conceptually, that's just an optimisation; you can implement it using the existing lower-level concurrency primitives.

How do I serialize or save to a file a Thunk?

In Haskell, you can have infinite lists, because it doesn't completely compute them, it uses thunks. I am wondering if there is a way to serialize or otherwise save to a file a piece of data's thunk. For example let us say you have a list [0..]. Then you do some processing on it (I am mostly interested in tail and (:), but it should support doing filter or map as well.) Here is an example of sort of what I am looking for.
serial::(SerialThunk a)=>a->serThunk
serialized = serial ([0..] :: [Int])
main=writeToFile "foo.txt" serialized
And
deserial::(SerialThunk a)=>serThunk->a
main=do
deserialized <- readFromFile "foo.txt" :: IO [Int]
print $ take 10 deserialized

No. There is no way to serialize a thunk in Haskell. Once code is compiled it is typically represented as assembly (for example, this is what GHC does) and there is no way to recover a serializable description of the function, let alone the function and environment that you'd like to make a thunk.
Yes. You could build custom solutions, such as describing and serializing a Haskell expression. Deserialization and execution could happen by way of interpretation (ex. Using the hint package).
Maybe. Someone (you?) could make a compiler or modify an existing compiler to maintain more information in a platform-agnostic manner such that things could be serialized without the user manually leveraging hint. I imaging this is an are under exploration by the Cloud Haskell (aka distributed-haskell) developers.
Why? I have also wanted an ability to serialize functions so that I could pass closures around in a flexible manner. Most of the time, though, that flexibility isn't actually needed and instead people want to pass certain types of computations that can be easily expressed as a custom data type and interpretation function.

packman: "Evaluation-orthogonal serialisation of Haskell data, as a library" (thanks to a reddit link) -- is exactly what we have been looking for!
...this serialisation is orthogonal to evaluation: the argument is
serialised in its current state of evaluation, it might be
entirely unevaluated (a thunk) or only partially evaluated (containing
thunks).
...The library enables sending and receiving data between different nodes
of a distributed Haskell system. This is where the code originated:
the Eden runtime system.
...Apart from this obvious application, the functionality can be used to
optimise programs by memoisation (across different program runs), and
to checkpoint program execution in selected places. Both uses are
exemplified in the slide set linked above.
...Another limitation is that serialised data can only be used by the
very same binary. This is however common for many approaches to
distributed programming using functional languages.
...

Cloud Haskell supports serialization of function closures. http://www.haskell.org/haskellwiki/Cloud_Haskell

Apart from the work in Cloud Haskell and HdpH on "closures", and part from the answers stating that thunks are not analyzable at runtime, I've found that:
:sprint in GHCi seems to have access to internal thunk representation -- . Perhaps GHCi works with some special, non-optimized code. So in principle one could use this representation and the implementation of :sprint if one wants to serialize thunks, isn't that true?
http://hackage.haskell.org/package/ghc-heap-view-0.5.3/docs/GHC-HeapView.html -- "With this module, you can investigate the heap representation of Haskell values, i.e. to investigate sharing and lazy evaluation."
I'd be very curious to know what kind of working solutions for seriliazing closures can be made out of this stuff...

Use of Haskell state monad a code smell?

God I hate the term "code smell", but I can't think of anything more accurate.
I'm designing a high-level language & compiler to Whitespace in my spare time to learn about compiler construction, language design, and functional programming (compiler is being written in Haskell).
During the code generation phase of the compiler, I have to maintain "state"-ish data as I traverse the syntax tree. For example, when compiling flow-control statements I need to generate unique names for the labels to jump to (labels generated from a counter that's passed in, updated, & returned, and the old value of the counter must never be used again). Another example is when I come across in-line string literals in the syntax tree, they need to be permanently converted into heap variables (in Whitespace, strings are best stored on the heap). I'm currently wrapping the entire code generation module in the state monad to handle this.
I've been told that writing a compiler is a problem well suited to the functional paradigm, but I find that I'm designing this in much the same way I would design it in C (you really can write C in any language - even Haskell w/ state monads).
I want to learn how to think in Haskell (rather, in the functional paradigm) - not in C with Haskell syntax. Should I really try to eliminate/minimize use of the state monad, or is it a legitimate functional "design pattern"?

I've written multiple compilers in Haskell, and a state monad is a reasonable solution to many compiler problems. But you want to keep it abstract---don't make it obvious you're using a monad.
Here's an example from the Glasgow Haskell Compiler (which I did not write; I just work around a few edges), where we build control-flow graphs. Here are the basic ways to make graphs:
empyGraph :: Graph
mkLabel :: Label -> Graph
mkAssignment :: Assignment -> Graph -- modify a register or memory
mkTransfer :: ControlTransfer -> Graph -- any control transfer
(<*>) :: Graph -> Graph -> Graph
But as you've discovered, maintaining a supply of unique labels is tedious at best, so we provide these functions as well:
withFreshLabel :: (Label -> Graph) -> Graph
mkIfThenElse :: (Label -> Label -> Graph) -- branch condition
-> Graph -- code in the 'then' branch
-> Graph -- code in the 'else' branch
-> Graph -- resulting if-then-else construct
The whole Graph thing is an abstract type, and the translator just merrily constructs graphs in purely functional fashion, without being aware that anything monadic is going on. Then, when the graph is finally constructed, in order to turn it into an algebraic datatype we can generate code from, we give it a supply of unique labels, run the state monad, and pull out the data structure.
The state monad is hidden underneath; although it's not exposed to the client, the definition of Graph is something like this:
type Graph = RealGraph -> [Label] -> (RealGraph, [Label])
or a bit more accurately
type Graph = RealGraph -> State [Label] RealGraph
-- a Graph is a monadic function from a successor RealGraph to a new RealGraph
With the state monad hidden behind a layer of abstraction, it's not smelly at all!

I'd say that state in general is not a code smell, so long as it's kept small and well controlled.
This means that using monads such as State, ST or custom-built ones, or just having a data structure containing state data that you pass around to a few places, is not a bad thing. (Actually, monads are just assistance in doing exactly this!) However, having state that goes all over the place (yes, this means you, IO monad!) is a bad smell.
An fairly clear example of this was when my team was working on our entry for the ICFP Programming Contest 2009 (the code is available at git://git.cynic.net/haskell/icfp-contest-2009). We ended up with several different modular parts to this:
VM: the virtual machine that ran the simulation program
Controllers: several different sets of routines that read the output of the simulator and generated new control inputs
Solution: generation of the solution file based on the output of the controllers
Visualizers: several different sets of routines that read both the input and output ports and generated some sort of visualization or log of what was going on as the simulation progressed
Each of these has its own state, and they all interact in various ways through the input and output values of the VM. We had several different controllers and visualizers, each of which had its own different kind of state.
The key point here was that the the internals of any particular state were limited to their own particular modules, and each module knew nothing about even the existence of state for other modules. Any particular set of stateful code and data was generally only a few dozen lines long, with a handful of data items in the state.
All this was glued together in one small function of about a dozen lines which had no access to the internals of any of the states, and which merely called the right things in the proper order as it looped through the simulation, and passed a very limited amount of outside information to each module (along with the module's previous state, of course).
When state is used in such a limited way, and the type system is preventing you from inadvertently modifying it, it's quite easy to handle. It's one of the beauties of Haskell that it lets you do this.
One answer says, "Don't use monads." From my point of view, this is exactly backwards. Monads are a control structure that, among other things, can help you minimize the amount of code that touches state. If you look at monadic parsers as an example, the state of the parse (i.e., the text being parsed, how far one has gotten in to it, any warnings that have accumulated, etc.) must run through every combinator used in the parser. Yet there will only be a few combinators that actually manipulate the state directly; anything else uses one of these few functions. This allows you to see clearly and in one place all of a small amount of code that can change the state, and more easily reason about how it can be changed, again making it easier to deal with.

Have you looked at Attribute grammars (AG)? (More info on wikipedia and an article in the Monad Reader)?
With AG you can add attributes to a syntax tree. These attributes are separated in synthesized and inherited attributes.
Synthesized attributes are things you generate (or synthesize) from your syntax tree, this could be the generated code, or all comments, or whatever else your interested in.
Inherited attributes are input to your syntax tree, this could be the environment, or a list of labels to use during code generation.
At Utrecht University we use the Attribute Grammar System (UUAGC) to write compilers. This is a pre-processor which generates haskell code (.hs files) from the provided .ag files.
Although, if you're still learning Haskell, then maybe this is not the time to start learning yet another layer of abstraction over that.
In that case, you could manually write the sort of code that attributes grammars generate for you, for example:
data AbstractSyntax = Literal Int | Block AbstractSyntax
| Comment String AbstractSyntax
compile :: AbstractSyntax -> [Label] -> (Code, Comments)
compile (Literal x) _ = (generateCode x, [])
compile (Block ast) (l:ls) = let (code', comments) = compile ast ls
in (labelCode l code', comments)
compile (Comment s ast) ls = let (code, comments') = compile ast ls
in (code, s : comments')
generateCode :: Int -> Code
labelCode :: Label -> Code -> Code

It's possible that you may want an applicative functor instead of a
monad:
http://www.haskell.org/haskellwiki/Applicative_functor
I think the original paper explains it better than the wiki, however:
http://www.soi.city.ac.uk/~ross/papers/Applicative.html

I don't think using the State Monad is a code smell when it used to model state.
If you need to thread state through your functions,
you can do this explicitly, taking the the state as an argument and returning it in each function.
The State Monad offers a good abstraction: it passes the state along for you and
provides lots of useful function to combine functions that require state.
In this case, using the State Monad (or Applicatives) is not a code smell.
However, if you use the State Monad to emulate an imperative style of programming
while a functional solution would suffice, you are just making things complicated.

In general you should try to avoid state wherever possible, but that's not always practical. Applicative makes effectful code look nicer and more functional, especially tree traversal code can benefit from this style. For the problem of name generation there is now a rather nice package available: value-supply.

Well, don't use monads. The power of functional programming is function purity and their reuse. There's this paper a professor of mine once wrote and he's one of the guys who helped build Haskell.
The paper is called "Why functional programming matters", I suggest you read through it. It's a good read.

let's be careful about the terminology here. State is not per se bad; functional languages have state. What is a "code smell" is when you find yourself wanting to assign variables values and change them.
Of course, the Haskell state monad is there for just that reason -- as with I/O, it's letting you do unsafe and un-functional things in a constrained context.
So, yes, it's probably a code smell.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string