Is there any way to make parsec report "shift-reduce" conflicts? - haskell

I'm playing around with parsec and realized that I had an ambiguous grammar. Obviously that's an error on my part, but I'm sort of used to yacc-style parser generators letting me know I'm being dumb. Parsec just eats characters in the order you give it parsers (yeah, I know about try).
Is there any way to make parsec tell me when my grammar isn't left-factored? Programs that do work for me are great.
Thanks!
(I know that shift-reduce has to do with a different kind of parser technology. I simply mean to describe ambiguous grammars.)

I am not a Parsec expert, so I'm likely to be corrected, but I don't think this is possible, for the simple reason that Parsec knows nothing about your grammar.
Or put another way, while your grammar may be ambiguous, your Parsec parser is not, and a program has no way of determining that some other arrangement of parsec combinators, which produces a different output for equivalent input, is also a valid representation of an unspecified grammar.
Since you do have a grammar, you might prefer to use happy and alex, which will give you a much more lexx/yacc-like experience.
An interesting project might be to adapt the BNFC to produce an AST of parsec combinators to represent a grammar, but I suspect this would be a non-trivial task.

Related

Haskell: use or uses in Getter

In Control.Lens we have Getter that can access the nested structure. Getter has use and uses, but it's not clear to me how they work. So it'd be great if someone can provide some simple examples that use or uses is utilised.
Why do I need to know it? because I'm reading some implementation in Haskell and "uses" and "use" were used in them. In particualr it says:
inRange <- uses fsCurrentCoinRangeUpperBound (coinIndex <=)
If the above code is just for comparing (<=) two values, then why do we need "uses" at all, there?
As I tried to make clear in my answer to your other question over at Use cases of makePrisms with examples there is a lot of requisite knowledge required before you can understand this.
First up, you have to understand Lens quite well. Judging from your other question you're just beginning them. This is great! They're amazingly cool and it's excellent to tackle such things.
However, I'd give you a big amount of caution here, one of the dangers of Haskell is it's so powerful, and can be so expressive and terse, that it seems easy to try to skip stuff.
If you skipped understanding algebraic data types very well, for example, you can easily read code and think you have an understanding of it when you don't at all. This can then lead to compounded confusion, and you'll feel like you don't understand any of it, which actually might be true, but that feeling is not a good feeling to have when learning Haskell.
I don't want you to feel like that.
So I encourage you to learn Lens, but if you don't have the requisite knowledge for Lens, then I encourage you to go get that first. It's not too hard to understand this stuff to a degree, but the way Lens is written is not trivial or easy to approach for programmers who aren't quite familiar with at least simple types, parameterized types, Algebraic Data Types, Typeclasses, the Functor typeclass, and to really understand it, you need to understand several instances of Functor.
As well, if you're trying to understand use and uses, which only make sense when dealing with State values, then I'd suggest it's almost impossible to understand what's happening without knowing what State is, as well as what Lens does and is.
use and uses are for taking a lens and a state value and looking into the current state inside a State value. So, to a degree you really need to also understand what do syntax is doing, therefore the Monad typeclass to a degree, as well as how the State / MonadState work from that perspective.
If any of these preliminaries are skipped, you'll be confused.
I hope this helps! And I wish you well.

Is it possible to emit raw source code with Template Haskell?

Say I have a String (or Text or whatever) containing valid Haskell code. Is there a way to convert it into a [Dec] with Template Haskell?
I'm pretty sure the AST doesn't directly go to GHC so there's going to be a printing and then a parsing stage anyways.
This would be great to have since it would allow different "backends" for TH. For example you could use the AST from haskell-src-exts which supports more Haskell syntax than TH does.
I'm pretty sure the AST doesn't directly go to GHC so there's going to be a printing and then a parsing stage anyways.
Why would you think that? That isn’t the case, the TH AST is converted to GHC’s internal AST directly; it never gets converted back to text at any point in that process. (If it did, that would be pretty strange.)
Still, it would be somewhat nice if Template Haskell exposed a way to parse Haskell source to expressions, types, and declarations, basically exposing the parsers behind various e, t, and d quoters that are built in to Template Haskell. Unfortunately, it does not, and I don’t believe there are currently any plans to change that.
Currently, you need to go through haskell-src-exts instead. This is somewhat less than ideal, since there are differences between haskell-src-exts’s parser and GHCs, but it’s as good as you’re currently going to get. To lessen the pain, there is a package called haskell-src-meta that bridges haskell-src-exts and template-haskell.
For your use case, you can use the parseDecs function from Language.Haskell.Meta.Parse, which has the type String -> Either String [Dec], which is what you’re looking for.

Haskell 'showInt' not in scope: why not?

I'm trying to understand Philip Wadler's "Essence of Functional Programming", and I seem to be held back by his assertion that "No knowledge of Haskell is necessary to understand this paper." Maybe not, but his examples sure require some.
Specifically, I'm trying to understand his example interpreter. When I try to compile this using GHC, or load it using :load, it complains not in scope: showint. Perhaps you meant showInt. When I replace the token with showInt, it says Not in scope: showInt.
I'd really like to believe Dr. Wadler when he says all I need to know is contained in his paper.
I'd really like to get it working under GHCI, so I can try various expressions under the interpreter. I'm new to Haskell, and was duly warned about the opaqueness of its error messages, but this seems designed to perplex!
The showInt function is part of the Numeric module, so you have to import Numeric to have it in scope. I guess the typo hint system knows about modules you haven't imported.
showInt also doesn't return a string directly but instead a String -> String function. I think this functionality is used for showing things composed of multiple parts more efficiently, but here it'd just be a pain and your code won't typecheck as is.
Instead, you can replace showint with just show and let the compiler figure it out. show is Haskell's toString and is overloaded for any type it's reasonable to convert to a string.

How to build Abstract Syntax Trees from grammar specification in Haskell?

I'm working on a project which involves optimizing certain constructs in a very small subset of Java, formalized in BNF.
If I were to do this in Java, I would use a combination of JTB and JavaCC which builds an AST. Visitors are then used to manipulate the tree. But, given the vast libraries for parsing in Haskell (parsec, happy, alex etc), I'm a bit confused in chossing the appropriate library.
So, simply put, when a language is specified in BNF, which library offers the easiest means to build an AST? And what is the best way to go about modifying this tree in idiomatic Haskell?
Well in Haskell there are 2 main ways of parsing something, parse combinators or a parser generator. Since you already have a BNF I'd suggest the latter.
A good one is alex. GHC's parser IIRC is written using this so you'd be in good company.
Next you'll have a big honking stack of data declarations to parse into:
data JavaClass = {
className :: Name,
interfaces :: [Name],
contents :: [ClassContents],
...
}
data ClassContents = M Method
| F Field
| IC InnerClass
and for expressions and whatever else you need. Finally you'll combine these into something like
data TopLevel = JC JavaClass
| WhateverOtherForms
| YouWillParse
Once you have this you'll have the entire AST represented as one TopLevel or a list of them depending on how many you classes/files you parse.
To proceed from here depends on what you want to do. There are a number of libraries such as syb (scrap your boilerplate) that let you write very concise tree traversals and modifications. lens is also an option. At a minimum check out Data.Traversable and Data.Foldable.
To modify the tree, you can do something as simple as
ignoreInnerClasses :: JavaClass -> JavaClass
ignoreInnerContents c = c{contents = filter isClass $ contents c}
-- ^^^ that is called a record update
where isClass (IC _) = True
isClass _ = False
and then you could potentially use something like syb to write
everywhere (mkT ignoreInnerClass) toplevel
which will traverse everything and apply ignoreInnerClass to all JavaClasses. This is possible to do in lens and many other libraries too, but syb is very easy to read.
I've never used bnfc-meta (suggested by #phg), but I would strongly recommend you look into BNFC (on hackage: http://hackage.haskell.org/package/BNFC). The basic approach is that you write your grammar in an annotated BNF style, and it will automatically generate an AST, parser, and pretty-printer for the grammar.
How suitable BNFC is depends upon the complexity of your grammar. If it's not context-free, you'll likely have a difficult time making any progress (I did make some success hacking up context-sensitive extensions, but that code's likely bit-rotted by now). The other downside is that your AST will very directly reflect the grammar specification. But since you already have a BNF specification, adding the necessary annotations for BNFC should be rather straightforward, so it's probably the fastest way to get a usable AST. Even if you decide to go a different route, you might be able to take the generated data types as a starting point for a hand-written version.
Alex + Happy.
There are many approaches to modify/investigate the parsed terms (ASTs). The keyword to search for is "datatype-generic" programming. But beware: it is a complex topic ...
http://people.cs.uu.nl/andres/Rec/MutualRec.pdf
http://www.cs.uu.nl/wiki/GenericProgramming/Multirec
It has a generic implementation of the zipper available here:
http://hackage.haskell.org/packages/archive/zipper/0.3/doc/html/Generics-MultiRec-Zipper.html
Also checkout https://github.com/pascalh/Astview
You might also check out the Haskell Compiler Series which is nice as an introduction to using alex and happy to parse a subset of Java: http://bjbell.wordpress.com/haskell-compiler-series/.
Since your grammar can be expressed in BNF, it is in the class of grammars that are efficiently parseable with a shift-reduce parser (LALR grammars). Such efficient parsers can be generated by the parser generator yacc/bison (C,C++), or its Haskell equivalent "Happy".
That's why I would use "Happy" in your case. It takes grammar rules in BNF form and generates a parser from it directly. The resulting parser will accept the language that is described by your grammar rules and produce an AST (abstract syntax tree). The Happy user guide is quite nice and gets you started quickly:
http://www.haskell.org/happy/doc/html/
To transform the resulting AST, generic programming is a good idea. Here is a classical explanation on how to do this in Haskell in a practical fashion, from scratch:
http://research.microsoft.com/en-us/um/people/simonpj/papers/hmap/
I have used exactly this to build a compiler for a small domain specific language, and it was a simple and concise solution.

Parsec vs Yacc/Bison/Antlr: Why and when to use Parsec?

I'm new to Haskell and Parsec. After reading Chapter 16 Using Parsec of Real World Haskell, a question appeared in my mind: Why and when is Parsec better than other parser generators like Yacc/Bison/Antlr?
My understanding is that Parsec creates a nice DSL of writing parsers and Haskell makes it very easy and expressive. But parsing is such a standard/popular technology that deserves its own language, which outputs to multiple target languages. So when shall we use Parsec instead of, say, generating Haskell code from Bison/Antlr?
This question might go a little beyond technology, and into the realm of industry practice. When writing a parser from scratch, what's the benefit of picking up Haskell/Parsec compared to Bison/Antlr or something similar?
BTW: my question is quite similar to this one but wasn't answered satisfactorily there.
One of the main differences between the tools you listed, is that ANTLR, Bison and their friends are parser generators, whereas Parsec is a parser combinator library.
A parser generator reads in a description of a grammar and spits out a parser. It is generally not possible to combine existing grammars into a new grammar, and it is certainly not possible to combine two existing generated parsers into a new parser.
A parser combinator OTOH does nothing but combine existing parsers into new parsers. Usually, a parser combinator library ships with a couple of trivial built-in parsers that can parse the empty string or a single character, and it ships with a set of combinators that take 1 or more parsers and return a new one that, for example, parses the sequence of the original parsers (e.g. you can combine a d parser and an o parser to form a do parser), the alternation of the original parsers (e.g. a 0 parser and a 1 parser to a 0|1 parser) or parses the original parse multiple times (repetetion).
What this means is that you could, for example, take an existing parser for Java and an existing parser for HTML and combine them into a parser for JSP.
Most parser generators don't support this, or only support it in a limited way. Parser combinators OTOH only support this and nothing else.
You might want to see this question as well as the linked one in your question.
Which Haskell parsing technology is most pleasant to use, and why?
In Haskell the competition is between Parsec (and other parser combinators) and the parser generator Happy. I'd pick Happy if I already had an LR grammar to work from - parser combinators take grammars in LL form and the translation from LR to LL takes some effort and a combinator parser will usually be significantly slower. If I don't have a grammar I'll use Parsec, it is more flexible (powerful) than Happy and its more fun to work "in Haskell" than generate code with Happy and Alex. If you use Happy for parsing you almost always need to use Alex for lexing.
For industry practice, it would be odd to decide to use Haskell just to get Parsec. For parsing, most of the current crop of languages will have at least a parser generator and probably something more flexible like a port of Parsec or a PEG system.
Ira Baxter's answer to the linked question was spot-on about a parser getting you merely to the foothold of the Himalayas for writing a translator, but being part of a translator is only one of the uses for a parser, so there are still many domains where fairly minimalist systems like ANTLR, Happy and Parsec are satisfactory.
Following on from stephen's answer, I think that one of the most common alternatives to Parsec, if you want to stick with parser combinators, is attoparsec. The main difference is that attoparsec was written with more of a bias towards speed, and makes trade-offs accordingly. For example, Parsec does some book-keeping to try to return helpful error messages if a parse fails, which attoparsec doesn't do to the same extent. Also, I think that attoparsec is specialised to one input stream/token type, whereas Parsec abstracts from the input type so that it can parse streams of type String, ByteString, Text, etc. without problem.

Resources