How to trigger error during manyAccum parser - haskell

I would like to write a Parsec parser that would parse a sequence of numbers, returning a sorted list, but failing on encountering a duplicate number (the error should point to the position of the duplicate). For the sorted part I can simply use:
manySorted = manyAccum insert
But how can I trigger a Parsec error if the number is already on the list. It doesn't seem like manyAccum allows that and I couldn't figure out how to make my own clone of manyAccum that would (implementation uses unParser which doesn't seem to be exposed outside of Parsec library).

You could try to obtain the current parser's position by
sourcePos :: Monad m => ParsecT s u m SourcePos
sourcePos = statePos `liftM` getParserState
Then accumulate the positions together with parsed values so that you can report the position of the original.

If you're sure you want the parser to be doing this, you can use the fail method with an error message to fail out of the parser. (Complete with an error location, and trying other <|> possibilities, and so on.)

Related

In Parsec, Fixing a broken token before a parser fails

Background:
I am using a combination of Alex and Parsec to parse an indentation sensitive language. An line more indented then the one above it my either be data for that line, or a continuation of the commands on that line. Alex has no way to know for sure so it simply returns a Block{...} token.
The parser knows when a data block is expected and passes the data into the Abstract Syntax Tree to be used by the interpreter.
The problem arises when the parser encounters a Block token when it wanted something else. I have a function that will convert a block token into a list of Tokens called lex_block but I cannot seem to apply it to the right place in the input stream.
Question:
Given a parser combinator that which fails at some token in the input stream, is there a way to modify the token that caused the failure and either continue evaluating the new input stream or at least to try again?
I would like to write a function retry as follows:
> retry :: (s -> Maybe s) -> ParsecT s u m a -> ParsecT s u m a
retry f p acts like parser p, except that if p fails the function f is used to alter the input stream at the point p failed and p is retried. When f returns Nothing the failure is passed on.
I have tried to write this using getInput, setInput and try, but I cannot figure out how to get the input stream at the point the parser combinator failed rather then the point that it started. I think that if I could discover a way to be told where the failure happened in the input stream I could make this work.

attoparsec: succeeding on part of the input instead of failing

I have an attoparsec parser, and tests for it, what annoys me is that if I comment part of the parser and run the tests, the parser doesn't return Left "parse error at line ..." but instead I get Right [].
Note that I'm using parseOnly to make it clear that there'll be no more input.
Otherwise it's nice to get the partially parsed input, it can definitely be useful and I'm glad to have it. However I'd like to be informed that the whole input was not consumed. Maybe to get a character offset of the last consumed letter, or if that's what it takes, at least an option to be returned Left.
If it's relevant, the parser can be found there.
If I comment for instance the line:
<|> PlainText <$> choice (string <$> ["[", "]", "*", "`"])
And run the tests, I get for instance:
1) notes parsing tests parses notes properly
simple test
expected: Right [NormalLine [PlainText "one line* # hello world"]]
but got: Right []
This is from that test.
Depending on if consuming the whole input should be the property of parseNoteDocument or just the tests, I'd extend one or the other with endOfInput or atEnd.
I'd suggest to define a proper Parser for your documents, like
parseNoteDocument' :: Text -> Parsec NoteDocument
parseNoteDocument' = many parseLine
and then define parseNoteDocument in terms of it. Then you can use parseNoteDocument' in the tests by defining a helper that parses a given piece of text using
parseNoteDocument' <* endOfInput
to ensure that the whole input is consumed.

Syntax rules for Haskell infix datatype constructors

I'm trying to make a Haskell datatype a bit like a python dictionary, a ruby hash or a javascript object, in which a string is linked to a value, like so:
data Entry t = Entry String t
type Dictionary t = [Entry t]
The above code works fine. However, I would like a slightly nicer constructor, so I tried defining it like this:
data Entry t = String ~> t
This failed. I tried this:
data Entry t = [Char] ~> t
Again, it failed. I know that ~ has special meaning in Haskell, and GHCi still permits the operator ~>, but I still tried one other way:
data Entry t = [Char] & t
And yet another failure due to parse error. I find this confusing because, for some inexplicable reason, this works:
data Entry t = String :> t
Does this mean that there are certain rules for what characters may occur in infix type constructors, or is it a cast of misinterpretation. I'm not a newbie in Haskell, and I'm aware that it would be more idiomatic to use the first constructor, but this one's stumping me, and it seems to be an important part of Haskell that I'm missing.
Any operator that starts with a colon : is a type constructor or a data constructor, with the exception of (->). If you want the tilde, you could use :~>, but you're not going to get away with using something that doesn't start with a colon. Source

Convert one full String to ints and words as an interpreter in Haskell

I am trying to write a Forth interpreter in Haskell. There are many sub problems and categories to accomplish this, however, I am trying to accomplish the most basic of steps, and I have been at it for some time in different approaches. The simple input case I am trying to get to is "25 12 +" -> [37]. I am not worried about the lists in Forth are backwards from Haskell, but I do want to try and accommodate the extensibility of the input string down the road, so I am using Maybe, as if there is an error, I will just do Nothing.
I first tried to break the input string into a list of "words" using Prelude's words function. From there I used Prelude's reads function to turn it into a list of tuples (Int,String). So this works great, up until I get to a command "word", such as the char + in the sample problem.
So how do I parse/interpret the string's command to something I can use?
Do I create a new data structure that has all the Forth commands or special characters? (assuming this, how do I convert it from the string format to that data type?)
Need anything else, just ask. I appreciate the help thinking this through.
read is essentially a very simple string parser. Rather than adapting it, you might want to consider learning to use a parser combinator library such as Parsec.
There are a bunch of different tutorials about parser combinators so you'll probably need to do a bit of reading before they 'click.' However, the first example in this tutorial is quite closely related to your problem.
import Text.Parsec
import Text.Parsec.String
play :: String -> Either ParseError Integer
play s = parse pmain "parameter" s
pmain :: Parser Integer
pmain = do
x <- pnum `chainl1` pplus
eof
return x
pnum = read `fmap` many1 digit
pplus = char '+' >> return (+)
It's a simple parser that evaluates arbitrarily long lists:
*Main> play "1+2+3+4+5"
Right 15
It also produces useful parse errors:
*Main> play "1+2+3+4+5~"
Left "parameter" (line 1, column 10):
unexpected '~'
expecting digit, "+" or end of input
If you can understand this simple parser, you should be able to work out how to adapt it to your particular problem (referring to the list of generic combinators in the documentation for Text.Parsec.Combinator). It will take a little longer at first than using read, but using a proper parsing library will make it much easier to achieve the ultimate goal of parsing Forth's whole grammar.

Safely serialize Text to Bytestring

The text package does not provide an instance of Binary to encode/decode text. I have read about and understand the reasoning behind this (namely, Text should be encoding-agnostic). However, I need a Binary instance of Text. I've found that there is a package called text-binary that does this. However, the instance is as follows:
instance Binary T.Text where
put = put . T.encodeUtf8
get = T.decodeUtf8 <$> get
Pretty good, except that decodeUtf8 is a partial function, so I'd rather be using decodeUtf8' and passing the failure through the Get monad. I can't figure out how to fail correctly with the Get monad. Just from looking around in Data.Binary.Get, I see this:
data Decoder a = Fail !B.ByteString {-# UNPACK #-} !ByteOffset String
| Partial (Maybe B.ByteString -> Decoder a)
| Done !B.ByteString {-# UNPACK #-} !ByteOffset a
Which seems to indicate that there is a way to do what I want. I just can't see how the library authors intend for it to be used. I appreciate any insights that a more learned mind than my own has to offer.
Well, though we tend to disregard it, the Monad class still has that yucky fail method:
get = do
utf8'd <- get
case T.decodeUtf8' utf8'd of
Just t -> return t
Nothing -> fail "No parse for UTF-8 Text"
I'm not sure if it should still be considered "correct" to use fail, but it seems to be the obvious thing for a case like this. I suppose even if it's removed from the Monad class, there'll be some other class MonadPlus m => MonadFail m where fail :: String -> m a which Get will be an instance of.

Resources