The haskell-src-exts package has functions for pretty printing a Haskell AST. What I want to do is change its behavior on certain constructors, in my case the way SCC pragmas are printed. So everything else should be printed the default way, only SCCs are handled differently. Is it possible to do it without copying the source file and editing it, which is what I'm doing now?
Well, the library has done one thing right, using a type class for Pretty. The challenge then is how to select a different instance for the constructors you want to print differently. Ideally, you would just newtype the AST node you care about, and somehow substitute that into the AST.
Now, the problem here is that the Haskell AST exported by the library has its type structure fixed. It doesn't, e.g. use two-level types, which would let you substitute newtypes for parts of the tree. So you would have to redefine the type of the AST down to the node you wish to change the type of.
Related
The following introduction is provided to ensure you understand how I reached the problem (to not fall prey to the XY problem):
I am working on a program which turns a parser in Parsec-like DSL into an actual LL(1) parser (and in the future similarly for LALR(1) or others).
It basically works as follows:
The DSL consists of functions which together build up a GADT. The result is a datatype which might contain cycles ('tying the knot'-style).
Data.Reify is used to turn this into a graph representation ('untying' the knot).
Perform the necessary transformations on this graph to turn it into a LL(1) parsing table.
Construct the parser which will use this table.
Because we want to be able to use the data that is recognized while parsing to construct some kind of result, we need to pass on functions through steps (1.) to (4.).
In steps 1, 2, 3 we can get away with using an existential datatype. It's only when actually running the parser, that I found myself requiring Data.Dynamic and its dynApp to combine the results. We know that the types line up (since in step (1) the GADT construction is type-checked), but I did not figure out how to use the existential types in any other way (as each of the parsing steps might have a very different type).
The current procedure thus 'works' but requires Dynamic. Also, the whole parser, while based on a written function definition, will be constructed at runtime.
Enter Template Haskell: Since the parser function is defined in a different module, it ought to be able to construct the parser at compile-time.
However, there is no Lift instance for Dynamic!
Furthermore, attempting to directly lift the existential types (i.e. require a Lift constraint on them) instead also does not work, as these are almost always functions!
How can we lift a GADT containing either Dynamics or Typeable a => a's into a TemplateHaskell quotation?
Or is there another approach to be able to handle this situation?
I am trying to work with GHC core data types.
I am able to compile my Haskell source to core representation with type Bind CoreBndr.
As we know there is no default Show instance for this data type.
There is a way to pretty print this representation but it has way too much noise associated with it.
I want to treat GHC core as any other algebraic data type and write functions with it.
It would be much easier if we had a Show instance of GHC core.
Has anybody already written a show instance which I can reuse?
Aside, how does the community write and verify programs that deal with GHC core?
A naive implementation of Show in GHC is probably not what you want. The reason for this is because internally GHC has recursion among many of its data types. For instance, between TyCon, AlgTyConRhs, and DataCon we have:
TyCon has AlgTyCon, which contains AlgTyConRhs.
AlgTyConRhs contains data_cons :: [DataCon] as one of its record fields.
DataCon contains dcRepTyCon :: TyCon as one of its fields.
And thus we come full circle. Because of how Show works, recursion like this will create infinite output if you ever attempt to print it.
In order to get a "nice" custom representation with data constructors and everything showing, you would have to write it yourself. This is actually somewhat challenging, since you have to consider and debug cases of recursion like this that default pretty printers have solved.
Say I have a String (or Text or whatever) containing valid Haskell code. Is there a way to convert it into a [Dec] with Template Haskell?
I'm pretty sure the AST doesn't directly go to GHC so there's going to be a printing and then a parsing stage anyways.
This would be great to have since it would allow different "backends" for TH. For example you could use the AST from haskell-src-exts which supports more Haskell syntax than TH does.
I'm pretty sure the AST doesn't directly go to GHC so there's going to be a printing and then a parsing stage anyways.
Why would you think that? That isn’t the case, the TH AST is converted to GHC’s internal AST directly; it never gets converted back to text at any point in that process. (If it did, that would be pretty strange.)
Still, it would be somewhat nice if Template Haskell exposed a way to parse Haskell source to expressions, types, and declarations, basically exposing the parsers behind various e, t, and d quoters that are built in to Template Haskell. Unfortunately, it does not, and I don’t believe there are currently any plans to change that.
Currently, you need to go through haskell-src-exts instead. This is somewhat less than ideal, since there are differences between haskell-src-exts’s parser and GHCs, but it’s as good as you’re currently going to get. To lessen the pain, there is a package called haskell-src-meta that bridges haskell-src-exts and template-haskell.
For your use case, you can use the parseDecs function from Language.Haskell.Meta.Parse, which has the type String -> Either String [Dec], which is what you’re looking for.
When trying to learn Haskell, one of the difficulties that arise is the ability when something requires special magic from the compiler. One exemple that comes in mind is the seq function which can't be defined i.e. you can't make a seq2 function behaving exactly as the built-in seq. Consequently, when teaching someone about seq, you need to mention that seq is special because it's a special symbol for the compiler.
Another example would be the do-notation which only works with instances of the Monad class.
Sometimes, it's not always obvious. For instance, continuations. Does the compiler knows about Control.Monad.Cont or is it plain old Haskell that you could have invented yourself? In this case, I think nothing special is required from the compiler even if continuations are a very strange kind of beast.
Language extensions set aside, what other compiler magic Haskell learners should be aware of?
Nearly all the ghc primitives that cannot be implemented in userland are in the ghc-prim package. (it even has a module called GHC.Magic there!)
So browsing it will give a good sense.
Note that you should not use this module in userland code unless you know exactly what you are doing. Most of the usable stuff from it is exported in downstream modules in base, sometimes in modified form. Those downstream locations and APIs are considered more stable, while ghc-prim makes no guarantees as to how it will act from version to version.
The GHC-specific stuff is reexported in GHC.Exts, but plenty of other things go into the Prelude (such as basic data types, as well as seq) or the concurrency libraries, etc.
Polymorphic seq is definitely magic. You can implement seq for any specific type, but only the compiler can implement one function for all possible types [and avoid optimising it away even though it looks no-op].
Obviously the entire IO monad is deeply magic, as is everything to with concurrency and parallelism (par, forkIO, MVar), mutable storage, exception throwing and catching, querying the garbage collector and run-time stats, etc.
The IO monad can be considered a special case of the ST monad, which is also magic. (It allows truly mutable storage, which requires low-level stuff.)
The State monad, on the other hand, is completely ordinary user-level code that anybody can write. So is the Cont monad. So are the various exception / error monads.
Anything to do with syntax (do-blocks, list comprehensions) is hard-wired into the language definition. (Note, though, that some of these respond to LANGUAGE RebindableSyntax, which lets you change what functions it binds to.) Also the deriving stuff; the compiler "knows about" a handful of special classes and how to auto-generate instances for them. Deriving for newtype works for any class though. (It's just copying an instance from one type to another identical copy of that type.)
Arrays are hard-wired. Much like every other programming language.
All of the foreign function interface is clearly hard-wired.
STM can be implemented in user code (I've done it), but it's currently hard-wired. (I imagine this gives a significant performance benefit. I haven't tried actually measuring it.) But, conceptually, that's just an optimisation; you can implement it using the existing lower-level concurrency primitives.
In Haskell, is it possible to define a data type within a function scope?
For example, I am writing a function f :: [(Char, Int)] -> [(Char, String)]. In the implementation of the function, I will build a tree from the input list, and then traverse the tree to build the output list. One solution is to define a new Tree data type specific to my problem, along with two helper functions, one to translate the input list into the Tree and the other to walk the Tree and build the output list.
Now the two helper functions can easily be pulled into the scope of f by putting them into a where clause, but what about the interim nonce type Tree? It seems ugly to pollute the namespace by defining it outside the function scope, but I don't know how else to do it.
For context, it turns out I am computing a Huffman encoding. I am not particularly interested in finding an alternative algorithm at this point, as I suspect that it will often be useful in Haskell to define helper data types between helper functions, so I am interested in general approaches to this.
No, it's impossible.
In Haskell modules and qualified imports are supposed to resolve all namespacing problems, like yours or the notorious record field names collision.
So you want a type to be visible only to a certain function? Put that type and that function in a new module and (optionally) export just the function.
Sure, by following this convention you will end up with more modules than usual, but if you think about it, this is not actually that different from many other languages. E.g., in Java it's conventional to put each class in a separate file, no matter how small the class is.
I have to mention though that far from most people in the community actually follow this convention. You can often see cryptic names used to work around this. Personally I don't find such an approach very clean and would rather recommend using modules.