Implementing Read typeclass where parsing strings includes "$" - haskell

I've been playing with Haskell for about a month. For my first "real" Haskell project I'm writing a parts-of-speech tagger. As part of this project I have a type called Tag that represents a parts-of-speech tag, implemented as follows:
data Tag = CC | CD | DT | EX | FW | IN | JJ | JJR | JJS ...
The above is a long list of standardized parts-of-speech tags which I've intentionally truncated. However, in this standard set of tags there are two that end in a dollar sign ($): PRP$ and NNP$. Because I can't have type constructors with $ in their name, I've elected to rename them PRPS and NNPS.
This is all well and good, but I'd like to read tags from strings in a lexicon and convert them to my Tag type. Trying this fails:
instance Read Tag where
readsPrec _ input =
(\inp -> [((NNPS), rest) | ("NNP$", rest) <- lex inp]) input
The Haskell lexer chokes on the $. Any ideas how to pull this off?
Implementing Show was fairly straightforward. It would be great if there were some similar strategy for Read.
instance Show Tag where
showsPrec _ NNPS = showString "NNP$"
showsPrec _ PRPS = showString "PRP$"
showsPrec _ tag = shows tag

You're abusing Read here.
Show and Read are meant to print and parse valid Haskell values, to enable debugging, etc. This doesn't always perfectly (e.g. if you import Data.Map qualified and then call show on a Map value, the call to fromList isn't qualified) but it's a valid starting point.
If you want to print or parse your values to match some specific format, then use a pretty-printing library for the former and an actual parsing library (e.g. uu-parsinglib, polyparse, parsec, etc.) for the latter. They typically have much nicer support for parsing than ReadS (though ReadP in GHC isn't too bad).
Whilst you may argue that this isn't necessary, this is just a quick'n'dirty hack you're doing, quick'n'dirty hacks have a tendency to linger around... do yourself a favour and do it right the first time: it means there's less to re-write when you want to do it "properly" later on.

Don't use the Haskell lexer then. The read functions use ParSec, which you can find an excellent introduction to in the Real World Haskell book.
Here's some code that seems to work,
import Text.Read
import Text.ParserCombinators.ReadP hiding (choice)
import Text.ParserCombinators.ReadPrec hiding (choice)
data Tag = CC | CD | DT | EX | FW | IN | JJ | JJR | JJS deriving (Show)
strValMap = map (\(x, y) -> lift $ string x >> return y)
instance Read Tag where
readPrec = choice $ strValMap [
("CC", CC),
("CD", CD),
("JJ$", JJS)
]
just run it with
(read "JJ$") :: Tag
The code is pretty self explanatory. The string x parser monad matches x, and if it succeeds (doesn't throw an exception), then y is returned. We use choice to select among all of these. It will backtrack appropriately, so if you add a CCC constructor, then CC partially matching "CCC" will fail later, and it will backtrack to CCC. Of course, if you don't need this, then use the <|> combinator.

Related

How to combine Megaparsec with Text.Read (derived Read instance)

I want to use the derived instances of Read in the megaparsec module.
How can I use 'Text.Read.read' or 'Text.Read.readEither' in a 'Parser a' ?
It needs not to be fast, but easy to maintain and to extend.
The megaparsec module is for testing my application via CLI, so many different datatypes must be parsed.
It shall work in the following way:
import Text.Megaparsec
readableDatatype :: Read a => Parser a
readableDatatype =
-- This is wrong, but describes how it shall work
-- liftA read chunkToTokens
expr' :: Parser UserControlExpr
expr' = timeExpr
<|> timeEventExpr
<|> digiInExpr
<|> quitExpr
digiInExpr :: Parser UserControlExpr
digiInExpr = do
cmdword "digiIn"
inElement <- (readableDatatype :: Parser TI_I)
return $ UserDigiIn inElement
What do I have to write, so that the three functions typecheck, especially readableDataype ?
You can use getInput :: MonadParsec e s m => m s and setInput :: MonadParsec e s m => s -> m () together with reads :: Read a => String -> [(a, String)] for that. getInput and setInput just get and set the input stream the parser is working on and reads takes a string and returns a list of possible parses together with the remaining unconsumed portions of the input. We also need to tell the parser the new offset in the input, otherwise error locations are wrong. We can do that using getOffset and setOffset.
-- For equality constraint (~)
{-# LANGUAGE TypeFamilies #-}
import Text.Megaparsec
import Text.Read (reads)
readableDatatype :: (Read a, MonadParsec e s m, s ~ String) => m a
readableDatatype = do
input <- getInput
offset <- getOffset
choice $
(\(a, input') -> a <$ setInput input'
<* setOffset (offset + length input - length input'))
<$> reads input
If your input is something other than String you will have to convert between that and String after getInput and before setInput.
The following is about performance concerns, so not really relevant to your problem, but maybe it is educational and it may be useful to others who may need a solution with good performance.
Converting the whole input between String and some other type all the time during parsing is a rather big performance bottleneck for larger input. Furthermore using length to calculate the new offset here is not very performant either.
To solve both of these problems need some way to be able to know how much of the input was actually consumed by the Read-parser, so that we can just drop that part from the original input instead of having to convert the whole unconsumed part back to the original input type. But the Read class does not have that. One could try to parse incrementally longer prefixes of the input, which may be faster in cases where the parses done using Read are short compared to the length of the entire input. You could also use unsafePerformIO to write to an IORef how much of the input was actually forced by the Read-parser which would be the fastest but not so pretty solution.
I implemented the latter here. Feel free to use it, but be aware that it is not very well tested. It does however solve all the problems with the above approach.
That did it. Thank you! In the meantime I made a "conservative" solution of the problem by defining the constructors as strings and parsing them, without using read. That has the advantage, that you got the impressive error message of megaparsec, that tell you what symbols are missing.
Example with read:
1:8:
|
1 | digiIn TI_I_Signal1 DirA Dectivated
| ^
unknown parse error
(only a 'a' was missing in "Deactivated")
example with an hand written parser for the datatype:
1:19:
|
1 | digiIn TI_I_Signal1 Dectivated
| ^^^^^^^^
unexpected "Dectivat"
expecting "active", "inactive", '0', or '1'
I think I will use your code block in future datatypes.
Thank you very much!

Nearest equivalent to Prolog atom or Lisp symbol in Haskell

I'm trying to write a simple program for manipulating expressions in propositional calculus and would like a nice syntax for proposition variables (e.g. 'P or something).
Strings get the job done, but the syntax is misleading in this context and they permit inappropriate operations like ++.
Syntactically, I'd like to be able to write down something that does not "look quoted" visually (something like 'P is okay, though). In terms of the supported operations, I'd like to be able to determine whether two symbols are equal and to convert them into a string matching their name via show. I'd also like these things to be open (ADTs with only nullary constructors are similar in principle to symbols, but require all variants to be declared in advance).
Here's a toy example using strings where something symbol-like would be more appropriate.
type Var = String
data Proposition =
Primitive Var |
Negated Proposition |
Implication Proposition Proposition
instance Show Proposition where
show (Primitive p) = p
show (Negated n) = "!" ++ show n
show (Implication ant cons) =
"(" ++ show ant ++ "->" ++ show cons ++ ")"
main = putStrLn $ show $ Implication (Primitive "A") (Primitive "B")
Typically the way this is done in Haskell is by parameterizing over the type of symbols. So your example would become:
data Proposition a =
Primitive a |
Negated (Proposition a) |
Implication (Proposition a) (Proposition a)
which then leaves it up to the user to decide the best representation their symbols. This has advantages over LISP-like symbols: symbols intended for different purposes will not be mixed up, and data structures involving symbols now admit transformations over all the symbols, which are more useful than you realize. For example, Functor changes between symbol representations, and Monad models substitution.
(=<<) :: (a -> Proposition b) -> Proposition a -> Proposition b
^ ^^^^^^^^^^^^^ ^^^^^^^^^^^^^
substitute each free var with an expression in this expression
You can get a form of type-safe openness too:
implyOpen :: Proposition a -> Proposition b -> Proposition (Either a b)
implyOpen p q = Implication (Left <$> p) (Right <$> q)
Another fun trick is using a non-regular recursive type to model variable bindings in a type-safe way.
data Proposition a =
... |
ForAll (Proposition (Maybe a))
Here we have added one "free variable" to the inner proposition -- Primitive Nothing is the variable being quantified over. It may seem awkward at first, but when you get to coding it's bomb, because the types make it very hard to get it wrong.
bound is an excellent package for modelling expression languages based on this idea (and a few other tricks).

Haskell: Runtime Data Type Iteration?

A friend and I have been working on a system for automatically importing C functions into GNU Guile, but need to use Haskell's C parser because no other parser seems sufficient or as accessible (let me know if we're wrong about that).
The trouble is coming up when we try to produce Scheme data from the parsed AST. We need to produce text that can be directly imported by scheme (S-Expressions, not M-Expressions), so below is an example...
toscm $ Just (SomeType (AnotherType "test" 5) (YetAnother "hello!"))
=> "(Just (SomeType (AnotherType \"test\" 5) (YetAnother \"hello!\")))"
The output of Haskell's C parser has type (Either ParseError CTranslUnit). CTranslUnit is a specialization (CTranslationUnit NodeInfo). The CTranslationUnit's contents have ever more contents going deeper than is really any fun at all.
Before realizing that, I tried the following...
class Schemable a where
toscm :: a -> String
{- and then an (omitted) ridiculously (1000+ lines) long chain
of horrible instance declarations for every single type
that can be part of the C AST -}
I figured there really must be a better way to do this, but I haven't been able to find one I understand. GHC.Generics seems like it might have a solution, but the semantics of some of its internal types baffles me.
What should I do??
Update: I've been looking into Scrap Your Boilerplate, and while it definitely looks good for finding substructures, it doesn't provide a way for me to generically convert data to strings. The ideal would be if the data with the constructor Maybe (CoolType "awesome" (Something "funny")) could be processed with a function that would give me access to the names of the constructors, and allow me to recurse on the values of the arguments to that constructor. e.g...
data Constructorized = Constructed String [Constructorized]
| RawValue <anything>
constructorize :: <anything> -> Constructorized
constructorize a = <???>
toscm :: Constructorized -> String
toscm (Constructed c v) = "(" ++ c ++ " " ++ (intercalate " " (map toscm v)) ++ ")"
toscm (RawValue v) = show v
I guess the train of thought I'm on is: Show seems to be able to recurse into every single type that derives it, no problem. How are the functions for that generated? Shouldn't we be able to make our own Show that generates similar functions, with a slightly different output?

Accessing the "default show" in Haskell?

Say you have a data-structure (borrowed from this question):
data Greek = Alpha | Beta | Gamma | Delta | Eta | Number Int
Now one can make it an instance of Show by appending deriving Show on that instruction.
Say however we wish to show Number Int as:
instance Show Greek where
show (Number x) = show x
-- ...
The problem is that one must specify all other parts of the Greek data as well like:
show Alpha = "Alpha"
show Beta = "Beta"
For this small example that's of course doable. But if the number of options is long, it requires a large amount of work.
I'm wondering whether it is possible to access the "default show" implementation and call it with a wildcard. For instance:
instance Show Greek where
show (Number x) = show x
show x = defaultShow x
You thus "implement" the specific patterns that differ from the default approach and the remaining patterns are resolved by the "fallback mechanism".
Something a bit similar to method overriding with a reference to super.method in object oriented programming.
As #phg pointed above in the comment this can be also done with the help of generic-deriving:
{-# LANGUAGE DeriveGeneric #-}
module Main where
import Generics.Deriving.Base (Generic)
import Generics.Deriving.Show (GShow, gshow)
data Greek = Alpha | Beta | Gamma | Delta | Eta | Number Int
deriving (Generic)
instance GShow Greek
instance Show Greek where
show (Number n) = "n:" ++ show n
show l = gshow l
main :: IO ()
main = do
print (Number 8)
print Alpha
You can sorta accomplish this using Data and Typeable. It is a hack of course, and this example only works for "enumerated" types as in your example.
I'm sure we could get more elaborate with how we do this, but to cover your given example:
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Data
import Data.Typeable
data Greek = Alpha | Beta | Gamma | Delta | Eta | Number Int
deriving (Data,Typeable)
instance Show Greek where
show Number n = show n
show x = show $ toConstr x
This approach as I've implemented it cannot handle nested data structures or anything else remotely fancy, but again, this is an ugly hack. If you really must use this approach you can dig around in the Data.Data package I'm sure you could piece something together...
Here is a blog post giving a quick introduction to the packages: http://chrisdone.com/posts/data-typeable
The proper way to go about this would be to use a newtype wrapper. I realize that this isn't the most convenient solution though, especially when using GHCi, but it incurs no additional overhead, and is less likely to break in unexpected ways as your program grows.
data Greek = Alpha | Beta | Gamma | Delta | Eta | Number Int
deriving (Show)
newtype SpecialPrint = SpecialPrint Greek
instance Show SpecialPrint where
show (SpecialPrint (Number x)) = "Number: " ++ show x
show (SpecialPrint x) = show x
main = do
print (SpecialPrint Alpha)
print (SpecialPrint $ Number 1)
No, that's not possible AFAIK.
Further, custom instances of Show deserve a second thought, because Show and Read instances should be mutually compatible.
For just converting to human (or whoever) readable strings, use your own function or own typeclass. This will also achieve what you want:
Assuming you have a Presentable typeclass with a method present, and also the default Show instance, you can write:
instance Presentable Greek where
present (Number x) = show x
present x = show x

Haskell pattern matching symmetric cases

Suppose I have a haskell expression like:
foo (Nothing, Just a) = bar a
foo (Just a, Nothing) = bar a
Is there any haskell syntax to collapse those cases, so I can match either pattern and specify bar a as the response for both? Or is that about as succinct as I can get it?
If your code is more complex than your example, you might want to do something like this, using the Alternative instance for Maybe and the PatternGuards extension (part of Haskell2010).
{-# LANGUAGE PatternGuards #-}
import Control.Applicative
foo (x, y) | Just a <- y <|> x = bar a
In case you are not familiar with it, <|> picks the left-most Just if there is one and returns Nothing otherwise, causing the pattern guard to fail.
That's as succinct as it gets in Haskell. In ML there is a syntax for what you want (by writing multiple patterns, which bind the same variables, next to each other separated by | with the body after the last pattern), but in Haskell there is not.
You can use -XViewPatterns, to add arbitrary functions to collapse your two cases into a single pattern.
Your pattern is now a function p that yields the thing you want to match:
foo (p -> (Just a, Nothing)) = bar a
much simpler!
We have to define p though, as:
p (Nothing, a#(Just _)) = (a, Nothing)
p a#(Just _, Nothing) = a
p a = a
or however you wish to normalize the data before viewing.
References: The GHC User's Guide chapter on View Patterns

Resources