Haskell parser combinator for Haskell identifier - haskell

I'm writing an anti-quoter in Haskell and I need a Parsec combinator that parses a valid Haskell variable identifier.
Is there one already implemented in the quasiquoting libraries or do I need to write my own?
I'm hoping I don't need to copy/paste the ident implementation found in http://www.haskell.org/haskellwiki/Quasiquotation.

It's unlikely that anything in the implementation of Template Haskell itself contains a Parsec parser for anything, because GHC does not use Parsec for parsing--note that it's not in the list of packages tied to GHC in various ways.
However, the module Text.Parsec.Token gives a means of describing full token parsers for languages, and the Text.Parsec.Language module includes some predefined token parsers, including one for Haskell tokens.
Beyond that, you could also look at the haskell-src-exts package, which is a parser for Haskell source files.

Related

Is it possible to emit raw source code with Template Haskell?

Say I have a String (or Text or whatever) containing valid Haskell code. Is there a way to convert it into a [Dec] with Template Haskell?
I'm pretty sure the AST doesn't directly go to GHC so there's going to be a printing and then a parsing stage anyways.
This would be great to have since it would allow different "backends" for TH. For example you could use the AST from haskell-src-exts which supports more Haskell syntax than TH does.
I'm pretty sure the AST doesn't directly go to GHC so there's going to be a printing and then a parsing stage anyways.
Why would you think that? That isn’t the case, the TH AST is converted to GHC’s internal AST directly; it never gets converted back to text at any point in that process. (If it did, that would be pretty strange.)
Still, it would be somewhat nice if Template Haskell exposed a way to parse Haskell source to expressions, types, and declarations, basically exposing the parsers behind various e, t, and d quoters that are built in to Template Haskell. Unfortunately, it does not, and I don’t believe there are currently any plans to change that.
Currently, you need to go through haskell-src-exts instead. This is somewhat less than ideal, since there are differences between haskell-src-exts’s parser and GHCs, but it’s as good as you’re currently going to get. To lessen the pain, there is a package called haskell-src-meta that bridges haskell-src-exts and template-haskell.
For your use case, you can use the parseDecs function from Language.Haskell.Meta.Parse, which has the type String -> Either String [Dec], which is what you’re looking for.

Connect legacy Haskell code and Fay code

I have some Haskell code, and would like a Fay script to be able to access it. The problem is the Haskell code uses monads. Fay doesn't support arbitrary monads. How do I get my Haskell code to work with Fay? Namely, the Fay script needs to be able to access functions from the Haskell script. What do I do?
I may not quite understand what you are asking.
You have some Haskell that isn't valid Fay, so if you want to run it as Fay code you would need to replace unsupported features, for example by using monomorphic functions to replace the missing Monad instances (note that you can use RebindableSyntax here)
If you compile the Haskell code with GHC there is no reasonable way to interface with the functions from Fay. You would need to invoke an external process from Fay inside node.js or similar.

How to build Abstract Syntax Trees from grammar specification in Haskell?

I'm working on a project which involves optimizing certain constructs in a very small subset of Java, formalized in BNF.
If I were to do this in Java, I would use a combination of JTB and JavaCC which builds an AST. Visitors are then used to manipulate the tree. But, given the vast libraries for parsing in Haskell (parsec, happy, alex etc), I'm a bit confused in chossing the appropriate library.
So, simply put, when a language is specified in BNF, which library offers the easiest means to build an AST? And what is the best way to go about modifying this tree in idiomatic Haskell?
Well in Haskell there are 2 main ways of parsing something, parse combinators or a parser generator. Since you already have a BNF I'd suggest the latter.
A good one is alex. GHC's parser IIRC is written using this so you'd be in good company.
Next you'll have a big honking stack of data declarations to parse into:
data JavaClass = {
className :: Name,
interfaces :: [Name],
contents :: [ClassContents],
...
}
data ClassContents = M Method
| F Field
| IC InnerClass
and for expressions and whatever else you need. Finally you'll combine these into something like
data TopLevel = JC JavaClass
| WhateverOtherForms
| YouWillParse
Once you have this you'll have the entire AST represented as one TopLevel or a list of them depending on how many you classes/files you parse.
To proceed from here depends on what you want to do. There are a number of libraries such as syb (scrap your boilerplate) that let you write very concise tree traversals and modifications. lens is also an option. At a minimum check out Data.Traversable and Data.Foldable.
To modify the tree, you can do something as simple as
ignoreInnerClasses :: JavaClass -> JavaClass
ignoreInnerContents c = c{contents = filter isClass $ contents c}
-- ^^^ that is called a record update
where isClass (IC _) = True
isClass _ = False
and then you could potentially use something like syb to write
everywhere (mkT ignoreInnerClass) toplevel
which will traverse everything and apply ignoreInnerClass to all JavaClasses. This is possible to do in lens and many other libraries too, but syb is very easy to read.
I've never used bnfc-meta (suggested by #phg), but I would strongly recommend you look into BNFC (on hackage: http://hackage.haskell.org/package/BNFC). The basic approach is that you write your grammar in an annotated BNF style, and it will automatically generate an AST, parser, and pretty-printer for the grammar.
How suitable BNFC is depends upon the complexity of your grammar. If it's not context-free, you'll likely have a difficult time making any progress (I did make some success hacking up context-sensitive extensions, but that code's likely bit-rotted by now). The other downside is that your AST will very directly reflect the grammar specification. But since you already have a BNF specification, adding the necessary annotations for BNFC should be rather straightforward, so it's probably the fastest way to get a usable AST. Even if you decide to go a different route, you might be able to take the generated data types as a starting point for a hand-written version.
Alex + Happy.
There are many approaches to modify/investigate the parsed terms (ASTs). The keyword to search for is "datatype-generic" programming. But beware: it is a complex topic ...
http://people.cs.uu.nl/andres/Rec/MutualRec.pdf
http://www.cs.uu.nl/wiki/GenericProgramming/Multirec
It has a generic implementation of the zipper available here:
http://hackage.haskell.org/packages/archive/zipper/0.3/doc/html/Generics-MultiRec-Zipper.html
Also checkout https://github.com/pascalh/Astview
You might also check out the Haskell Compiler Series which is nice as an introduction to using alex and happy to parse a subset of Java: http://bjbell.wordpress.com/haskell-compiler-series/.
Since your grammar can be expressed in BNF, it is in the class of grammars that are efficiently parseable with a shift-reduce parser (LALR grammars). Such efficient parsers can be generated by the parser generator yacc/bison (C,C++), or its Haskell equivalent "Happy".
That's why I would use "Happy" in your case. It takes grammar rules in BNF form and generates a parser from it directly. The resulting parser will accept the language that is described by your grammar rules and produce an AST (abstract syntax tree). The Happy user guide is quite nice and gets you started quickly:
http://www.haskell.org/happy/doc/html/
To transform the resulting AST, generic programming is a good idea. Here is a classical explanation on how to do this in Haskell in a practical fashion, from scratch:
http://research.microsoft.com/en-us/um/people/simonpj/papers/hmap/
I have used exactly this to build a compiler for a small domain specific language, and it was a simple and concise solution.

Dynamic loading of Haskell abstract syntax expression

Can we use GHC API or something else to load not text source modules, but AST expressions, similar to haskell-src-exts Exp type? This way we could save time for code generation and parsing.
I don't think the GHC API exposes an AST interface (could be wrong though), but Template Haskell does. If you build expressions using the Language.Haskell.TH Exp structure, you can create functions/declarations and make use of them by the $(someTHFunction) syntax.
A fairly major caveat is that TH only runs at compile time, so you would need to pre-generate everything. If you want to use TH at run-time, I think you'd need to pretty-print the template haskell AST, then use the GHC API on the resulting string.

Parsec vs Yacc/Bison/Antlr: Why and when to use Parsec?

I'm new to Haskell and Parsec. After reading Chapter 16 Using Parsec of Real World Haskell, a question appeared in my mind: Why and when is Parsec better than other parser generators like Yacc/Bison/Antlr?
My understanding is that Parsec creates a nice DSL of writing parsers and Haskell makes it very easy and expressive. But parsing is such a standard/popular technology that deserves its own language, which outputs to multiple target languages. So when shall we use Parsec instead of, say, generating Haskell code from Bison/Antlr?
This question might go a little beyond technology, and into the realm of industry practice. When writing a parser from scratch, what's the benefit of picking up Haskell/Parsec compared to Bison/Antlr or something similar?
BTW: my question is quite similar to this one but wasn't answered satisfactorily there.
One of the main differences between the tools you listed, is that ANTLR, Bison and their friends are parser generators, whereas Parsec is a parser combinator library.
A parser generator reads in a description of a grammar and spits out a parser. It is generally not possible to combine existing grammars into a new grammar, and it is certainly not possible to combine two existing generated parsers into a new parser.
A parser combinator OTOH does nothing but combine existing parsers into new parsers. Usually, a parser combinator library ships with a couple of trivial built-in parsers that can parse the empty string or a single character, and it ships with a set of combinators that take 1 or more parsers and return a new one that, for example, parses the sequence of the original parsers (e.g. you can combine a d parser and an o parser to form a do parser), the alternation of the original parsers (e.g. a 0 parser and a 1 parser to a 0|1 parser) or parses the original parse multiple times (repetetion).
What this means is that you could, for example, take an existing parser for Java and an existing parser for HTML and combine them into a parser for JSP.
Most parser generators don't support this, or only support it in a limited way. Parser combinators OTOH only support this and nothing else.
You might want to see this question as well as the linked one in your question.
Which Haskell parsing technology is most pleasant to use, and why?
In Haskell the competition is between Parsec (and other parser combinators) and the parser generator Happy. I'd pick Happy if I already had an LR grammar to work from - parser combinators take grammars in LL form and the translation from LR to LL takes some effort and a combinator parser will usually be significantly slower. If I don't have a grammar I'll use Parsec, it is more flexible (powerful) than Happy and its more fun to work "in Haskell" than generate code with Happy and Alex. If you use Happy for parsing you almost always need to use Alex for lexing.
For industry practice, it would be odd to decide to use Haskell just to get Parsec. For parsing, most of the current crop of languages will have at least a parser generator and probably something more flexible like a port of Parsec or a PEG system.
Ira Baxter's answer to the linked question was spot-on about a parser getting you merely to the foothold of the Himalayas for writing a translator, but being part of a translator is only one of the uses for a parser, so there are still many domains where fairly minimalist systems like ANTLR, Happy and Parsec are satisfactory.
Following on from stephen's answer, I think that one of the most common alternatives to Parsec, if you want to stick with parser combinators, is attoparsec. The main difference is that attoparsec was written with more of a bias towards speed, and makes trade-offs accordingly. For example, Parsec does some book-keeping to try to return helpful error messages if a parse fails, which attoparsec doesn't do to the same extent. Also, I think that attoparsec is specialised to one input stream/token type, whereas Parsec abstracts from the input type so that it can parse streams of type String, ByteString, Text, etc. without problem.

Resources