Text
Text.Parsec
Text.Parsec.ByteString
Text.Parsec.ByteString.Lazy
Text.Parsec.Char
Text.Parsec.Combinator
Text.Parsec.Error
Text.Parsec.Expr
Text.Parsec.Language
Text.Parsec.Perm
Text.Parsec.Pos
Text.Parsec.Prim
Text.Parsec.String
Text.Parsec.Token
ParserCombinators
Text.ParserCombinators.Parsec
Text.ParserCombinators.Parsec.Char
Text.ParserCombinators.Parsec.Combinator
Text.ParserCombinators.Parsec.Error
Text.ParserCombinators.Parsec.Expr
Text.ParserCombinators.Parsec.Language
Text.ParserCombinators.Parsec.Perm
Text.ParserCombinators.Parsec.Pos
Text.ParserCombinators.Parsec.Prim
Text.ParserCombinators.Parsec.Token
Are they the same?
At the moment there are two widely used major versions of Parsec: Parsec 2 and
Parsec 3.
My advice is simply to use the latest release of Parsec 3. But if you want to
make a conscientious choice, read on.
New in Parsec 3
Monad Transformer
Parsec 3 introduces a monad transformer, ParsecT, which can be used to combine
parsing with other monadic effects.
Streams
Although Parsec 2 lets you to choose the token type (which is useful when you
want to separate lexical analysis from the parsing), the tokens are always
arranged into lists. List may be not the most efficient data structure in which to store
large texts.
Parsec 3 can work with arbitrary streams -- data structures with a list-like
interface. You can define your own streams, but Parsec 3 also includes a popular
and efficient Stream implementation based on ByteString (for Char-based
parsing), exposed through the modules Text.Parsec.ByteString and
Text.Parsec.ByteString.Lazy.
Reasons to prefer Parsec 2
Fewer extensions required
Advanced features provided by Parsec 3 do not come for free; to implement them
several language extensions are required.
Neither of the two versions is Haskell-2010 (i.e. both use extensions), but
Parsec 2 uses fewer extensions than Parsec 3, so chances that any given compiler
can compile Parsec 2 are higher than those for Parsec 3.
By this time both versions work with GHC, while Parsec 2 is also reported to
build with JHC and is included as one of the JHC's standard libraries.
Performance
Originally (i.e. as of 3.0 version) Parsec 3 was considerably slower than
Parsec 2. However, work on improving Parsec 3 performance has been done,
and as of version 3.1 Parsec 3 is only slightly slower than Parsec 2
(benchmarks: 1, 2).
Compatibility layer
It has been possible to "reimplement" all of the Parsec 2 API in Parsec 3. This
compatibility layer is provided by the Parsec 3 package under the module hierarchy
Text.ParserCombinators.Parsec (the same hierarchy which is used by Parsec 2),
while the new Parsec 3 API is available under the Text.Parsec hierarchy.
This means that you can use Parsec 3 as a drop-in replacement for Parsec 2.
I believe the latter is a backwards-compatible layer for Parsec 2, implemented in terms of the newer API.
Related
As Nikita Volkov mentioned in his question Data.Text vs String I also wondered why I have to deal with the different String implementations type String = [Char] and Data.Text in haskell. In my code I use the pack and unpack functions really often.
My question: Is there a way to have an automatic conversion between both string types so that I can avoid writing pack and unpack so often?
In other programming languages like Python or JavaScript there is for example an automatic conversion between integers and floats if it is needed. Can I reach something like this also in haskell? I know, that the mentioned languages are weakly typed, but I heard that C++ has a similar feature.
Note: I already know the language extension {-# LANGUAGE OverloadedStrings #-}. But as I understand this language extensions just applies to strings defined as "...". I want to have an automatic conversion for strings which I got from other functions or I have as arguments in function definitions.
Extended question: Haskell. Text or Bytestring covers also the difference between Data.Text and Data.ByteString. Is there a way to have an automatic conversion between the three strings String, Data.Text and Data.ByteString?
No.
Haskell doesn't have implicit coercions for technical, philosophical, and almost religious reasons.
As a comment, converting between these representations isn't free and most people don't like the idea that you have hidden and potentially expensive computations lurking around. Additionally, with strings as lazy lists, coercing them to a Text value might not terminate.
We can convert literals to Texts automatically with OverloadedStrings by desugaring a string literal "foo" to fromString "foo" and fromString for Text just calls pack.
The question might be to ask why you're coercing so much? Is there some why do you need to unpack Text values so often? If you constantly changing them to strings it defeats the purpose a bit.
Almost Yes: Data.String.Conversions
Haskell libraries make use of different types, so there are many situations in which there is no choice but to heavily use conversion, distasteful as it is - rewriting libraries doesn't count as a real choice.
I see two concrete problems, either of which being potentially a significant problem for Haskell adoption :
coding ends up requiring specific implementation knowledge of the libraries you want to use.This is a big issue for a high-level language
performance on simple tasks is bad - which is a big issue for a generalist language.
Abstracting from the specific types
In my experience, the first problem is the time spent guessing the package name holding the right function for plumbing between libraries that basically operate on the same data.
To that problem there is a really handy solution : the Data.String.Conversions package, provided you are comfortable with UTF-8 as your default encoding.
This package provides a single cs conversion function between a number of different types.
String
Data.ByteString.ByteString
Data.ByteString.Lazy.ByteString
Data.Text.Text
Data.Text.Lazy.Text
So you just import Data.String.Conversions, and use cs which will infer the right version of the conversion function according to input and output types.
Example:
import Data.Aeson (decode)
import Data.Text (Text)
import Data.ByteString.Lazy (ByteString)
import Data.String.Conversions (cs)
decodeTextStoredJson' :: T.Text -> MyStructure
decodeTextStoredJson' x = decode (cs x) :: Maybe MyStructure
NB : In GHCi you generally do not have a context that gives the target type so you direct the conversion by explicitly stating the type of the result, like for read
let z = cs x :: ByteString
Performance and the cry for a "true" solution
I am not aware of any true solution as of yet - but we can already guess the direction
it is legitimate to require conversion because the data does not change ;
best performance is achieved by not converting data from one type to another for administrative purposes ;
coercion is evil - coercitive, even.
So the direction must be to make these types not different, i.e. to reconcile them under (or over) an archtype from which they would all derive, allowing composition of functions using different derivations, without the need to convert.
Nota : I absolutely cannot evaluate the feasability / potential drawbacks of this idea. There may be some very sound stoppers.
Using Control.Applicative is very useful with Parsec, but you need to always hide <|> and similar objects as they conflict with Parsec's own:
import Control.Applicative hiding ((<|>), many, optional)
import Text.Parsec.Combinator
import Text.Parsec
Alternatively, as Antal S-Z points out, you can hide the Parsec version. However, as far as I can tell, this seems like an unnecessary restriction.
Why did parsec not simply implement these operators from Applicative?
It's for historic reasons. The Parsec library predates the discovery of applicative functors and so it wasn't designed with them in mind. And I guess no one has taken the time to update Parsec to use Control.Applicative. There is no deep fundamental reason for not doing it.
I'm writing an anti-quoter in Haskell and I need a Parsec combinator that parses a valid Haskell variable identifier.
Is there one already implemented in the quasiquoting libraries or do I need to write my own?
I'm hoping I don't need to copy/paste the ident implementation found in http://www.haskell.org/haskellwiki/Quasiquotation.
It's unlikely that anything in the implementation of Template Haskell itself contains a Parsec parser for anything, because GHC does not use Parsec for parsing--note that it's not in the list of packages tied to GHC in various ways.
However, the module Text.Parsec.Token gives a means of describing full token parsers for languages, and the Text.Parsec.Language module includes some predefined token parsers, including one for Haskell tokens.
Beyond that, you could also look at the haskell-src-exts package, which is a parser for Haskell source files.
Text
Text.Parsec
Text.Parsec.ByteString
Text.Parsec.ByteString.Lazy
Text.Parsec.Char
Text.Parsec.Combinator
Text.Parsec.Error
Text.Parsec.Expr
Text.Parsec.Language
Text.Parsec.Perm
Text.Parsec.Pos
Text.Parsec.Prim
Text.Parsec.String
Text.Parsec.Token
ParserCombinators
Text.ParserCombinators.Parsec
Text.ParserCombinators.Parsec.Char
Text.ParserCombinators.Parsec.Combinator
Text.ParserCombinators.Parsec.Error
Text.ParserCombinators.Parsec.Expr
Text.ParserCombinators.Parsec.Language
Text.ParserCombinators.Parsec.Perm
Text.ParserCombinators.Parsec.Pos
Text.ParserCombinators.Parsec.Prim
Text.ParserCombinators.Parsec.Token
Are they the same?
At the moment there are two widely used major versions of Parsec: Parsec 2 and
Parsec 3.
My advice is simply to use the latest release of Parsec 3. But if you want to
make a conscientious choice, read on.
New in Parsec 3
Monad Transformer
Parsec 3 introduces a monad transformer, ParsecT, which can be used to combine
parsing with other monadic effects.
Streams
Although Parsec 2 lets you to choose the token type (which is useful when you
want to separate lexical analysis from the parsing), the tokens are always
arranged into lists. List may be not the most efficient data structure in which to store
large texts.
Parsec 3 can work with arbitrary streams -- data structures with a list-like
interface. You can define your own streams, but Parsec 3 also includes a popular
and efficient Stream implementation based on ByteString (for Char-based
parsing), exposed through the modules Text.Parsec.ByteString and
Text.Parsec.ByteString.Lazy.
Reasons to prefer Parsec 2
Fewer extensions required
Advanced features provided by Parsec 3 do not come for free; to implement them
several language extensions are required.
Neither of the two versions is Haskell-2010 (i.e. both use extensions), but
Parsec 2 uses fewer extensions than Parsec 3, so chances that any given compiler
can compile Parsec 2 are higher than those for Parsec 3.
By this time both versions work with GHC, while Parsec 2 is also reported to
build with JHC and is included as one of the JHC's standard libraries.
Performance
Originally (i.e. as of 3.0 version) Parsec 3 was considerably slower than
Parsec 2. However, work on improving Parsec 3 performance has been done,
and as of version 3.1 Parsec 3 is only slightly slower than Parsec 2
(benchmarks: 1, 2).
Compatibility layer
It has been possible to "reimplement" all of the Parsec 2 API in Parsec 3. This
compatibility layer is provided by the Parsec 3 package under the module hierarchy
Text.ParserCombinators.Parsec (the same hierarchy which is used by Parsec 2),
while the new Parsec 3 API is available under the Text.Parsec hierarchy.
This means that you can use Parsec 3 as a drop-in replacement for Parsec 2.
I believe the latter is a backwards-compatible layer for Parsec 2, implemented in terms of the newer API.
I'm new to Haskell and Parsec. After reading Chapter 16 Using Parsec of Real World Haskell, a question appeared in my mind: Why and when is Parsec better than other parser generators like Yacc/Bison/Antlr?
My understanding is that Parsec creates a nice DSL of writing parsers and Haskell makes it very easy and expressive. But parsing is such a standard/popular technology that deserves its own language, which outputs to multiple target languages. So when shall we use Parsec instead of, say, generating Haskell code from Bison/Antlr?
This question might go a little beyond technology, and into the realm of industry practice. When writing a parser from scratch, what's the benefit of picking up Haskell/Parsec compared to Bison/Antlr or something similar?
BTW: my question is quite similar to this one but wasn't answered satisfactorily there.
One of the main differences between the tools you listed, is that ANTLR, Bison and their friends are parser generators, whereas Parsec is a parser combinator library.
A parser generator reads in a description of a grammar and spits out a parser. It is generally not possible to combine existing grammars into a new grammar, and it is certainly not possible to combine two existing generated parsers into a new parser.
A parser combinator OTOH does nothing but combine existing parsers into new parsers. Usually, a parser combinator library ships with a couple of trivial built-in parsers that can parse the empty string or a single character, and it ships with a set of combinators that take 1 or more parsers and return a new one that, for example, parses the sequence of the original parsers (e.g. you can combine a d parser and an o parser to form a do parser), the alternation of the original parsers (e.g. a 0 parser and a 1 parser to a 0|1 parser) or parses the original parse multiple times (repetetion).
What this means is that you could, for example, take an existing parser for Java and an existing parser for HTML and combine them into a parser for JSP.
Most parser generators don't support this, or only support it in a limited way. Parser combinators OTOH only support this and nothing else.
You might want to see this question as well as the linked one in your question.
Which Haskell parsing technology is most pleasant to use, and why?
In Haskell the competition is between Parsec (and other parser combinators) and the parser generator Happy. I'd pick Happy if I already had an LR grammar to work from - parser combinators take grammars in LL form and the translation from LR to LL takes some effort and a combinator parser will usually be significantly slower. If I don't have a grammar I'll use Parsec, it is more flexible (powerful) than Happy and its more fun to work "in Haskell" than generate code with Happy and Alex. If you use Happy for parsing you almost always need to use Alex for lexing.
For industry practice, it would be odd to decide to use Haskell just to get Parsec. For parsing, most of the current crop of languages will have at least a parser generator and probably something more flexible like a port of Parsec or a PEG system.
Ira Baxter's answer to the linked question was spot-on about a parser getting you merely to the foothold of the Himalayas for writing a translator, but being part of a translator is only one of the uses for a parser, so there are still many domains where fairly minimalist systems like ANTLR, Happy and Parsec are satisfactory.
Following on from stephen's answer, I think that one of the most common alternatives to Parsec, if you want to stick with parser combinators, is attoparsec. The main difference is that attoparsec was written with more of a bias towards speed, and makes trade-offs accordingly. For example, Parsec does some book-keeping to try to return helpful error messages if a parse fails, which attoparsec doesn't do to the same extent. Also, I think that attoparsec is specialised to one input stream/token type, whereas Parsec abstracts from the input type so that it can parse streams of type String, ByteString, Text, etc. without problem.