Can we use GHC API or something else to load not text source modules, but AST expressions, similar to haskell-src-exts Exp type? This way we could save time for code generation and parsing.
I don't think the GHC API exposes an AST interface (could be wrong though), but Template Haskell does. If you build expressions using the Language.Haskell.TH Exp structure, you can create functions/declarations and make use of them by the $(someTHFunction) syntax.
A fairly major caveat is that TH only runs at compile time, so you would need to pre-generate everything. If you want to use TH at run-time, I think you'd need to pretty-print the template haskell AST, then use the GHC API on the resulting string.
Related
Say I have a String (or Text or whatever) containing valid Haskell code. Is there a way to convert it into a [Dec] with Template Haskell?
I'm pretty sure the AST doesn't directly go to GHC so there's going to be a printing and then a parsing stage anyways.
This would be great to have since it would allow different "backends" for TH. For example you could use the AST from haskell-src-exts which supports more Haskell syntax than TH does.
I'm pretty sure the AST doesn't directly go to GHC so there's going to be a printing and then a parsing stage anyways.
Why would you think that? That isn’t the case, the TH AST is converted to GHC’s internal AST directly; it never gets converted back to text at any point in that process. (If it did, that would be pretty strange.)
Still, it would be somewhat nice if Template Haskell exposed a way to parse Haskell source to expressions, types, and declarations, basically exposing the parsers behind various e, t, and d quoters that are built in to Template Haskell. Unfortunately, it does not, and I don’t believe there are currently any plans to change that.
Currently, you need to go through haskell-src-exts instead. This is somewhat less than ideal, since there are differences between haskell-src-exts’s parser and GHCs, but it’s as good as you’re currently going to get. To lessen the pain, there is a package called haskell-src-meta that bridges haskell-src-exts and template-haskell.
For your use case, you can use the parseDecs function from Language.Haskell.Meta.Parse, which has the type String -> Either String [Dec], which is what you’re looking for.
I have some Haskell code, and would like a Fay script to be able to access it. The problem is the Haskell code uses monads. Fay doesn't support arbitrary monads. How do I get my Haskell code to work with Fay? Namely, the Fay script needs to be able to access functions from the Haskell script. What do I do?
I may not quite understand what you are asking.
You have some Haskell that isn't valid Fay, so if you want to run it as Fay code you would need to replace unsupported features, for example by using monomorphic functions to replace the missing Monad instances (note that you can use RebindableSyntax here)
If you compile the Haskell code with GHC there is no reasonable way to interface with the functions from Fay. You would need to invoke an external process from Fay inside node.js or similar.
I'm working on a project which involves optimizing certain constructs in a very small subset of Java, formalized in BNF.
If I were to do this in Java, I would use a combination of JTB and JavaCC which builds an AST. Visitors are then used to manipulate the tree. But, given the vast libraries for parsing in Haskell (parsec, happy, alex etc), I'm a bit confused in chossing the appropriate library.
So, simply put, when a language is specified in BNF, which library offers the easiest means to build an AST? And what is the best way to go about modifying this tree in idiomatic Haskell?
Well in Haskell there are 2 main ways of parsing something, parse combinators or a parser generator. Since you already have a BNF I'd suggest the latter.
A good one is alex. GHC's parser IIRC is written using this so you'd be in good company.
Next you'll have a big honking stack of data declarations to parse into:
data JavaClass = {
className :: Name,
interfaces :: [Name],
contents :: [ClassContents],
...
}
data ClassContents = M Method
| F Field
| IC InnerClass
and for expressions and whatever else you need. Finally you'll combine these into something like
data TopLevel = JC JavaClass
| WhateverOtherForms
| YouWillParse
Once you have this you'll have the entire AST represented as one TopLevel or a list of them depending on how many you classes/files you parse.
To proceed from here depends on what you want to do. There are a number of libraries such as syb (scrap your boilerplate) that let you write very concise tree traversals and modifications. lens is also an option. At a minimum check out Data.Traversable and Data.Foldable.
To modify the tree, you can do something as simple as
ignoreInnerClasses :: JavaClass -> JavaClass
ignoreInnerContents c = c{contents = filter isClass $ contents c}
-- ^^^ that is called a record update
where isClass (IC _) = True
isClass _ = False
and then you could potentially use something like syb to write
everywhere (mkT ignoreInnerClass) toplevel
which will traverse everything and apply ignoreInnerClass to all JavaClasses. This is possible to do in lens and many other libraries too, but syb is very easy to read.
I've never used bnfc-meta (suggested by #phg), but I would strongly recommend you look into BNFC (on hackage: http://hackage.haskell.org/package/BNFC). The basic approach is that you write your grammar in an annotated BNF style, and it will automatically generate an AST, parser, and pretty-printer for the grammar.
How suitable BNFC is depends upon the complexity of your grammar. If it's not context-free, you'll likely have a difficult time making any progress (I did make some success hacking up context-sensitive extensions, but that code's likely bit-rotted by now). The other downside is that your AST will very directly reflect the grammar specification. But since you already have a BNF specification, adding the necessary annotations for BNFC should be rather straightforward, so it's probably the fastest way to get a usable AST. Even if you decide to go a different route, you might be able to take the generated data types as a starting point for a hand-written version.
Alex + Happy.
There are many approaches to modify/investigate the parsed terms (ASTs). The keyword to search for is "datatype-generic" programming. But beware: it is a complex topic ...
http://people.cs.uu.nl/andres/Rec/MutualRec.pdf
http://www.cs.uu.nl/wiki/GenericProgramming/Multirec
It has a generic implementation of the zipper available here:
http://hackage.haskell.org/packages/archive/zipper/0.3/doc/html/Generics-MultiRec-Zipper.html
Also checkout https://github.com/pascalh/Astview
You might also check out the Haskell Compiler Series which is nice as an introduction to using alex and happy to parse a subset of Java: http://bjbell.wordpress.com/haskell-compiler-series/.
Since your grammar can be expressed in BNF, it is in the class of grammars that are efficiently parseable with a shift-reduce parser (LALR grammars). Such efficient parsers can be generated by the parser generator yacc/bison (C,C++), or its Haskell equivalent "Happy".
That's why I would use "Happy" in your case. It takes grammar rules in BNF form and generates a parser from it directly. The resulting parser will accept the language that is described by your grammar rules and produce an AST (abstract syntax tree). The Happy user guide is quite nice and gets you started quickly:
http://www.haskell.org/happy/doc/html/
To transform the resulting AST, generic programming is a good idea. Here is a classical explanation on how to do this in Haskell in a practical fashion, from scratch:
http://research.microsoft.com/en-us/um/people/simonpj/papers/hmap/
I have used exactly this to build a compiler for a small domain specific language, and it was a simple and concise solution.
I'm writing an anti-quoter in Haskell and I need a Parsec combinator that parses a valid Haskell variable identifier.
Is there one already implemented in the quasiquoting libraries or do I need to write my own?
I'm hoping I don't need to copy/paste the ident implementation found in http://www.haskell.org/haskellwiki/Quasiquotation.
It's unlikely that anything in the implementation of Template Haskell itself contains a Parsec parser for anything, because GHC does not use Parsec for parsing--note that it's not in the list of packages tied to GHC in various ways.
However, the module Text.Parsec.Token gives a means of describing full token parsers for languages, and the Text.Parsec.Language module includes some predefined token parsers, including one for Haskell tokens.
Beyond that, you could also look at the haskell-src-exts package, which is a parser for Haskell source files.
The haskell-src-exts package has functions for pretty printing a Haskell AST. What I want to do is change its behavior on certain constructors, in my case the way SCC pragmas are printed. So everything else should be printed the default way, only SCCs are handled differently. Is it possible to do it without copying the source file and editing it, which is what I'm doing now?
Well, the library has done one thing right, using a type class for Pretty. The challenge then is how to select a different instance for the constructors you want to print differently. Ideally, you would just newtype the AST node you care about, and somehow substitute that into the AST.
Now, the problem here is that the Haskell AST exported by the library has its type structure fixed. It doesn't, e.g. use two-level types, which would let you substitute newtypes for parts of the tree. So you would have to redefine the type of the AST down to the node you wish to change the type of.