Using Control.Applicative is very useful with Parsec, but you need to always hide <|> and similar objects as they conflict with Parsec's own:
import Control.Applicative hiding ((<|>), many, optional)
import Text.Parsec.Combinator
import Text.Parsec
Alternatively, as Antal S-Z points out, you can hide the Parsec version. However, as far as I can tell, this seems like an unnecessary restriction.
Why did parsec not simply implement these operators from Applicative?
It's for historic reasons. The Parsec library predates the discovery of applicative functors and so it wasn't designed with them in mind. And I guess no one has taken the time to update Parsec to use Control.Applicative. There is no deep fundamental reason for not doing it.
Related
Almost every module in our code base has imports such as:
import qualified Data.Map as Map
import qualified Data.Set as Set
import qualified Data.Text as Text
I would like to define a local prelude so that Map, Set and Text are available to the modules importing that prelude. Apparently there is no way to do that in Haskell. So I am wondering how do people solve this problem in large Haskell code bases.
I'm going to answer this question, interpreted as literally as possible:
How do people solve this problem in large Haskell code bases?
Answer: they write
import qualified Data.Map as Map
import qualified Data.Set as Set
import qualified Data.Text as Text
at the top of each module which needs Map, Set, and Text.
In my experience, managing imports is not a significant part of the difficulty of working with large codebases. The effort of jumping to the import list and adding a line for Data.Map when you discover you need it is absolutely swamped by the effort of finding the right place in the codebase to make changes, knowing the full breadth of the codebase so you don't duplicate efforts, and finding ways to test small chunks of a large application in isolation.
Compared to the proposed alternative in the other answer (CPP), this way also has some technical advantages:
Less project lead-in time. The fewer surprises there are for the humans who join onto your project, the quicker they can get up and running and be independently useful.
Better tool support. If I see Foo.bar as an identifier somewhere, I can use my text editor's regex search to find out what import line made the Foo namespace available without fancy additions to include #included files. If I want to find all the files that depend on Some.Fancy.Module, I can learn that by grepping for Some.Fancy.Module. Build systems that do change detection don't need to know about the extra .h file when computing which files to watch. And so forth.
Fewer spurious rebuilds. If you have more imports than you actually use, this can cause GHC to rebuild your module even when it need not be rebuilt.
One solution is to define the import list in a CPP header.
N.B.: This answer is just to show what is technically possible; Daniel Wagner's answer is generally the better alternative.
For a package-level example:
my-pkg/
my-pkg.cabal
include/imports.h
src/MyModule.hs
...
include/imports.h:
import Control.Applicative
import Data.Maybe
import Data.Char
In my-pkg.cabal, components (library, executable, test, ...) have a include-dirs field (that in turn correspond to some GHC option):
library
...
include-dirs: include
Then you can use that header in any module:
{-# LANGUAGE CPP #-}
module MyModule where
#include "imports.h"
-- your code here
mymaybe = maybe
I want to use the <|> operator from Text.Parsec but I am also importing Control.Applicative which also contains a <|> operator. How do I make sure that the former operator shadows the latter assuming I don't want to use an ugly looking qualified import?
Options you have, in descending order of recommendation:
Switch to megaparsec, which is built up front around the existing Haskell functor / transformer classes, rather than defining that kind of stuff anew.
Hide the unnecessarily specific Parsec.<|>. You don't need it.
import Text.Parsec hiding ((<|>))
import Control.Applicative
Hide Applicative.<|> and use only the stuff from Control.Applicative that's not concerned with alternatives.
import Text.Parsec
import Control.Applicative hiding (Alternative(..))
The proper way to do this would either be a qualified import or an import with hiding
import qualified Control.Applicative as CA
import Text.Parsec
or if you don't want to use qualified
import Control.Applicative hiding ((<|>))
import Text.Parsec
Shadowing would not work because both imports live in the same namespace and therefore ghc cannot infer the correct choice of function, i.e. the one you meant.
To my knowledge shadowing only works in function blocks which generate a new scope.
Moreover the order of imports makes in a lazily evaluated language has its perils and even though I am no expert I would guess that if a library is imported depends whether if it is used or where it is used which would affect the order of import.
update:
If there was a name shadowing at the point of imports - then the order of imports would make a difference, which is something you are not used to in a lazily evaluated language, where usually the order of execution of pure functions is arbitrary (even though the imports are done at compile time as #TikhonJelvis pointed out).
As Nikita Volkov mentioned in his question Data.Text vs String I also wondered why I have to deal with the different String implementations type String = [Char] and Data.Text in haskell. In my code I use the pack and unpack functions really often.
My question: Is there a way to have an automatic conversion between both string types so that I can avoid writing pack and unpack so often?
In other programming languages like Python or JavaScript there is for example an automatic conversion between integers and floats if it is needed. Can I reach something like this also in haskell? I know, that the mentioned languages are weakly typed, but I heard that C++ has a similar feature.
Note: I already know the language extension {-# LANGUAGE OverloadedStrings #-}. But as I understand this language extensions just applies to strings defined as "...". I want to have an automatic conversion for strings which I got from other functions or I have as arguments in function definitions.
Extended question: Haskell. Text or Bytestring covers also the difference between Data.Text and Data.ByteString. Is there a way to have an automatic conversion between the three strings String, Data.Text and Data.ByteString?
No.
Haskell doesn't have implicit coercions for technical, philosophical, and almost religious reasons.
As a comment, converting between these representations isn't free and most people don't like the idea that you have hidden and potentially expensive computations lurking around. Additionally, with strings as lazy lists, coercing them to a Text value might not terminate.
We can convert literals to Texts automatically with OverloadedStrings by desugaring a string literal "foo" to fromString "foo" and fromString for Text just calls pack.
The question might be to ask why you're coercing so much? Is there some why do you need to unpack Text values so often? If you constantly changing them to strings it defeats the purpose a bit.
Almost Yes: Data.String.Conversions
Haskell libraries make use of different types, so there are many situations in which there is no choice but to heavily use conversion, distasteful as it is - rewriting libraries doesn't count as a real choice.
I see two concrete problems, either of which being potentially a significant problem for Haskell adoption :
coding ends up requiring specific implementation knowledge of the libraries you want to use.This is a big issue for a high-level language
performance on simple tasks is bad - which is a big issue for a generalist language.
Abstracting from the specific types
In my experience, the first problem is the time spent guessing the package name holding the right function for plumbing between libraries that basically operate on the same data.
To that problem there is a really handy solution : the Data.String.Conversions package, provided you are comfortable with UTF-8 as your default encoding.
This package provides a single cs conversion function between a number of different types.
String
Data.ByteString.ByteString
Data.ByteString.Lazy.ByteString
Data.Text.Text
Data.Text.Lazy.Text
So you just import Data.String.Conversions, and use cs which will infer the right version of the conversion function according to input and output types.
Example:
import Data.Aeson (decode)
import Data.Text (Text)
import Data.ByteString.Lazy (ByteString)
import Data.String.Conversions (cs)
decodeTextStoredJson' :: T.Text -> MyStructure
decodeTextStoredJson' x = decode (cs x) :: Maybe MyStructure
NB : In GHCi you generally do not have a context that gives the target type so you direct the conversion by explicitly stating the type of the result, like for read
let z = cs x :: ByteString
Performance and the cry for a "true" solution
I am not aware of any true solution as of yet - but we can already guess the direction
it is legitimate to require conversion because the data does not change ;
best performance is achieved by not converting data from one type to another for administrative purposes ;
coercion is evil - coercitive, even.
So the direction must be to make these types not different, i.e. to reconcile them under (or over) an archtype from which they would all derive, allowing composition of functions using different derivations, without the need to convert.
Nota : I absolutely cannot evaluate the feasability / potential drawbacks of this idea. There may be some very sound stoppers.
I have been using Haskell for quite a while. The more I use it, the more I fall in love with the language. I simply cannot believe I have spent almost 15 years of my life using other languages.
However, I am slowly but steadily growing fed up with Haskell's standard libraries. My main pet peeve is the "not polymorphic enough" definitions (Prelude.map, Control.Monad.forM_, etc.). I have a lot of Haskell source code files whose first lines look like
{-# LANGUAGE NoMonomorphismRestriction #-}
module Whatever where
import Control.Monad.Error hiding (forM_, mapM_)
import Control.Monad.State hiding (forM_, mapM_)
import Data.Foldable (forM_, mapM_)
{- ... -}
In order to avoid constantly hoogling which definitions I should hide, I would like to have a single or a small amount of source code files that wrap this import boilerplate into manageable units.
So...
Has anyone else tried doing this before?
If the answer to the previous question is "Yes", have they posted the resulting boilerplate-wrapping source code files?
It is not as clear cut as you imagine it to be. I will list all the disadvantages I can think of off the top of my head:
First, there is no limit to how general these functions can get. For example, right now I am writing a library for indexed types that subsumes ordinary types. Every function you mentioned has a more general indexed equivalent. Do I expect everybody to switch to my library for everything? No.
Here's another example. The mapM function defines a higher order functor that satisfies the functor laws in the Kleisli category:
mapM return = return
mapM (f >=> g) = mapM f >=> mapM g
So I could argue that your traversable generalization is the wrong one and instead we should generalize it as just being an instance of a higher order functor class.
Also, check out the category-extras package for some examples of these higher order classes and functions which subsume all your examples.
There is also the issue of performance. Many of thesr more specialized functions have really finely tuned implementations that dramatically help performance. Sometimes classes expose ways to admit more performant versions, but sometimes they don't.
There is also the issue of typeclass overload. I actually prefer to minimize use of typeclasses unless they have sound laws derived from theory rather than convenience. Also, typeclasses generally play poorly with the monomorphism restriction and I enjoy writing functions without signatures for my application code.
There is also the issue of taste. A lot of people simply don't agree what is the best Haskell style. We still can't even agree on the Prelude. Speaking of which, there have been many attempts to write new Preludes, but nobody can agree on what is best so we all default back to the Haskell98 one anyway.
However, I think the overall spirit of improving things is good and the worst enemy of progress is satisfaction, but don't assume there will be a clear-cut right way to do everything.
Text
Text.Parsec
Text.Parsec.ByteString
Text.Parsec.ByteString.Lazy
Text.Parsec.Char
Text.Parsec.Combinator
Text.Parsec.Error
Text.Parsec.Expr
Text.Parsec.Language
Text.Parsec.Perm
Text.Parsec.Pos
Text.Parsec.Prim
Text.Parsec.String
Text.Parsec.Token
ParserCombinators
Text.ParserCombinators.Parsec
Text.ParserCombinators.Parsec.Char
Text.ParserCombinators.Parsec.Combinator
Text.ParserCombinators.Parsec.Error
Text.ParserCombinators.Parsec.Expr
Text.ParserCombinators.Parsec.Language
Text.ParserCombinators.Parsec.Perm
Text.ParserCombinators.Parsec.Pos
Text.ParserCombinators.Parsec.Prim
Text.ParserCombinators.Parsec.Token
Are they the same?
At the moment there are two widely used major versions of Parsec: Parsec 2 and
Parsec 3.
My advice is simply to use the latest release of Parsec 3. But if you want to
make a conscientious choice, read on.
New in Parsec 3
Monad Transformer
Parsec 3 introduces a monad transformer, ParsecT, which can be used to combine
parsing with other monadic effects.
Streams
Although Parsec 2 lets you to choose the token type (which is useful when you
want to separate lexical analysis from the parsing), the tokens are always
arranged into lists. List may be not the most efficient data structure in which to store
large texts.
Parsec 3 can work with arbitrary streams -- data structures with a list-like
interface. You can define your own streams, but Parsec 3 also includes a popular
and efficient Stream implementation based on ByteString (for Char-based
parsing), exposed through the modules Text.Parsec.ByteString and
Text.Parsec.ByteString.Lazy.
Reasons to prefer Parsec 2
Fewer extensions required
Advanced features provided by Parsec 3 do not come for free; to implement them
several language extensions are required.
Neither of the two versions is Haskell-2010 (i.e. both use extensions), but
Parsec 2 uses fewer extensions than Parsec 3, so chances that any given compiler
can compile Parsec 2 are higher than those for Parsec 3.
By this time both versions work with GHC, while Parsec 2 is also reported to
build with JHC and is included as one of the JHC's standard libraries.
Performance
Originally (i.e. as of 3.0 version) Parsec 3 was considerably slower than
Parsec 2. However, work on improving Parsec 3 performance has been done,
and as of version 3.1 Parsec 3 is only slightly slower than Parsec 2
(benchmarks: 1, 2).
Compatibility layer
It has been possible to "reimplement" all of the Parsec 2 API in Parsec 3. This
compatibility layer is provided by the Parsec 3 package under the module hierarchy
Text.ParserCombinators.Parsec (the same hierarchy which is used by Parsec 2),
while the new Parsec 3 API is available under the Text.Parsec hierarchy.
This means that you can use Parsec 3 as a drop-in replacement for Parsec 2.
I believe the latter is a backwards-compatible layer for Parsec 2, implemented in terms of the newer API.