program mode determined from arguments count [optparse-applicative] [duplicate] - haskell

I'm trying to use the optparse-applicative library in an program which should perform a different action depending on the number of arguments.
For example, the argument parsing for a program which calculates perimeters:
module TestOpts where
import Options.Applicative
type Length = Double
data PerimeterCommand
= GeneralQuadranglePerimeter Length Length Length Length
| RectanglePerimeter Length Length
parsePerimeterCommand :: Parser PerimeterCommand
parsePerimeterCommand = parseQuadPerimeter <|> parseRectPerimeter
parseQuadPerimeter = GeneralQuadranglePerimeter <$>
parseLength "SIDE1" <*>
parseLength "SIDE2" <*>
parseLength "SIDE3" <*>
parseLength "SIDE4"
parseRectPerimeter = RectanglePerimeter <$>
parseLength "WIDTH" <*> parseLength "HEIGHT"
parseLength name = argument auto (metavar name)
Only the first argument to <|> will ever successfully parse. I think some kind of argument backtracking is required, similar to Parsec's try combinator.
Any ideas on how to parse alternative sets of arguments, when the first alternative may consume some arguments of the next alternative?

Please note: this answer was written by the optparse-applicative author, Paolo Capriotti.
You can't do this with optparse-applicative directly. The main feature
of optparse-applicative is that options can be parsed in any order. If
you want to work mainly with arguments (which are positional), you are
better off having two levels of parsers: use many argument in
optparse-applicative, then pass the resulting array to a normal parser
(say using Parsec). If you only have positional arguments, then
optparse-applicative won't buy you very much, and you could just parse
the arguments manually with Parsec.

Related

understanding trifecta parser <|> and try

While reading Haskell book I came across trifecta
I'm trying to wrap my head around but still not able to understand <|>
I have following questions.
in simple words (<|>) = Monadic Choose ?
p = a <|> b -- use parser a if not then use b ?
if yes then why following parser is failing ?
parseFraction :: Parser Rational
parseFraction = do
numerator <- decimal
char '/'
denominator <- decimal
case denominator of
0 -> fail "denominator cannot be zero"
_ -> return (numerator % denominator)
type RationalOrDecimal = Either Rational Integer
parseRationalOrDecimal = (Left <$> parseFraction) <|> (Right<$> decimal)
main = do
let p f i = parseString f mempty i
print $ p (some (skipMany (oneOf "\n") *> parseRationalOrDecimal <* skipMany (oneOf "\n"))) "10"
in perfect world if a is parseFraction is going to fail then <|> should go with decimal but this is not the case.
but when I use try it works.
what I'm missing ?
why we need to use try when <|> should run second parser on first failure ?
parseRationalOrDecimal = try (Left <$> parseFraction) <|> (Right<$> decimal)
The reason is beacuse parseFraction consumes input before failing therefore, it is considered to be the correct branch in the choice. Let me give you and example:
Let say you are writing a python parser and you have to decide if a declaration is a class or a function (keyword def), then you write
parseExpresion = word "def" <|> word "class" -- DISCLAIMER: using a ficticious library
Then if the user writes def or class it will match, but if the user writes det It will try the first branch and match de and then fail to match expected f because t was found. It will not bother to try the next parser, because the error is considered to be in the first branch. It'd make little sense to try the class parser since likely, the error is in the first branch.
In your case parseFraction matches some digits and then fails because / isn't found, and then it doesn't bother to try decimal parser.
This is a desing decision, some other libraries use a different convention (ex: Attoparsec always backtrack on failure), and some functions claim to "not consume input" (ex: notFollowedBy)
Notice that there is a trade-off here:
First: If <|> behaves as you expect the following
parse parseRationalOrDecimal "123456789A"
will first parse all numbers until "A" is found and then it will parse again! all numbers until "A" is found... so doing the same computation twice just to return a failure.
Second: If you care more about error messages the current behaviour is more convinient. Following the python example, imagine:
parseExpresion = word "def" <|> word "class" <|> word "import" <|> word "type" <|> word "from"
If the user types "frmo" the, the parser will go to the last branch and will raise and error like expected "from" but "frmo" was found Whereas, if all alternatives must be checked the error would be something more like expected one of "def", "class", "import", "type" of "from" which is less close to the actual typo.
As I said, it is a library desing decision, I am just trying to convince you that there are good reasons to not try all alternatives automatically, and use try if you explicitly want to do so.

Split string on multiple delimiters of any length in Haskell

I am attempting a Haskell coding challenge where, given a certain string with a prefix indicating which substrings are delimiting markers, a list needs to be built from the input.
I have already solved the problem for multiple single-length delimiters, but I am stuck with the problem where the delimiters can be any length. I use splitOneOf from Data.List.Split, but this works for character (length 1) delimiters only.
For example, given
input ";,\n1;2,3,4;10",
delimiters are ';' and ','
splitting the input on the above delivers
output [1,2,3,4,10]
The problem I'm facing has two parts:
Firstly, a single delimiter of any length, e.g.
"****\n1****2****3****4****10" should result in the list [1,2,3,4,10].
Secondly, more than one delimiter can be specified, e.g.
input "[***][||]\n1***2||3||4***10",
delimiters are "***" and "||"
splitting the input on the above delivers
output [1,2,3,4,10]
My code for retrieving the delimiter in the case of character delimiters:
--This gives the delimiters as a list of characters, i.e. a String.
getDelimiter::String->[Char]
getDelimiter text = head . splitOn "\n" $ text
--drop "[delimiters]\n" from the input
body::String->String
body text = drop ((length . getDelimiter $ text)+1)) $ text
--returns tuple with fst being the delimiters, snd the body of the input
doc::String->(String,String)
doc text = (getDelimiter text, body text)
--given the delimiters and the body of the input, return a list of strings
numbers::(String,String)->[String]
numbers (delim, rest) = splitOneOf delim rest
--input ",##\n1,2#3#4" gives output ["1","2","3","4"]
getList::String->[String]
getList text = numbers . doc $ text
So my question is, how do I do the processing for when the delimiters are e.g. "***" and "||"?
Any hints are welcome, especially in a functional programming context.
If you don't mind making multiple passes over the input string, you can use splitOn from Data.List.Split, and gradually split the input string using one delimiter at a time.
You can write this fairly succinctly using foldl':
import Data.List
import Data.List.Split
splitOnAnyOf :: Eq a => [[a]] -> [a] -> [[a]]
splitOnAnyOf ds xs = foldl' (\ys d -> ys >>= splitOn d) [xs] ds
Here, the accumulator for the fold operation is a list of strings, or more generally [[a]], so you have to 'lift' xs into a list, using [xs].
Then you fold over the delimiters ds - not the input string to be parsed. For each delimiter d, you split the accumulated list of strings with splitOn, and concatenate them. You could also have used concatMap, but here I arbitrarily chose to use the more general >>= (bind) operator.
This seems to do what is required in the OP:
*Q49228467> splitOnAnyOf [";", ","] "1;2,3,4;10"
["1","2","3","4","10"]
*Q49228467> splitOnAnyOf ["***", "||"] "1***2||3||4***10"
["1","2","3","4","10"]
Since this makes multiple passes over temporary lists, it's most likely not the fastest implementation you can make, but if you don't have too many delimiters, or extremely long lists, this may be good enough.
This problem has two kinds of solutions: the simple, and the efficient. I will not cover the efficient (because it is not simple), though I will hint on it.
But first, the part where you extract the delimiter and body parts of the input, may be simplified with Data.List.break:
delims = splitOn "/" . fst . break (== '\n') -- Presuming the delimiters are delimited with
-- a slash.
body = snd . break (== '\n')
In any way, we may reduce this problem to finding the positions of all the given patterns in a given string. (By saying "string", I do not mean the haskell String. Rather, I mean an arbitrarily long sequence (or even an infinite stream) of any symbols for which an Equality relation is defined, which is typed in Haskell as Eq a => [a]. I hope this is not too confusing.) As soon as we have the positions, we may slice the string to our hearts' content. If we want to deal with an infinite stream, we must obtain the positions incrementally, and yield the results as we go, which is a restriction that must be kept in mind. Haskell is equipped well enough to handle the stream case as well as the finite string.
A simple approach is to cast isPrefixOf on the string, for each of the patterns.
If some of them matches, we replace it with a Nothing.
Otherwise we mark the first symbol as Just and move to the next position.
Thus, we will have replaced all the different delimiters by a single one: Nothing. We may then readily slice the string by it.
This is fairly idiomatic, and I will bring the code to your judgement shortly. The problem with this approach is that it is inefficient: in fact, if a pattern failed to match, we would rather advance by more than one symbol.
It would be more efficient to base our work on the research that has been made into finding patterns in a string; this problem is well known and there are great, intricate algorithms that solve it an order of magnitude faster. These algorithms are designed to work with a single pattern, so some work must be put into adapting them to our case; however, I believe they are adaptable. The simplest and eldest of such algorithms is the KMP, and it is already encoded in Haskell. You may wish to take arms and generalize it − a quick path to some amount of fame.
Here is the code:
module SplitSubstr where
-- stackoverflow.com/questions/49228467
import Data.List (unfoldr, isPrefixOf, elemIndex)
import Data.List.Split (splitWhen) -- Package `split`.
import Data.Maybe (catMaybes, isNothing)
-- | Split a (possibly infinite) string at the occurrences of any of the given delimiters.
--
-- λ take 10 $ splitOnSubstrs ["||", "***"] "la||la***fa"
-- ["la","la","fa"]
--
-- λ take 10 $ splitOnSubstrs ["||", "***"] (cycle "la||la***fa||")
-- ["la","la","fa","la","la","fa","la","la","fa","la"]
--
splitOnSubstrs :: [String] -> String -> [String]
splitOnSubstrs delims
= fmap catMaybes -- At this point, there will be only `Just` elements left.
. splitWhen isNothing -- Now we may split at nothings.
. unfoldr f -- Replace the occurences of delimiters with a `Nothing`.
where
-- | This is the base case. It will terminate the `unfoldr` process.
f [ ] = Nothing
-- | This is the recursive case. It is divided into 2 cases:
-- * One of the delimiters may match. We will then replace it with a Nothing.
-- * Otherwise, we will `Just` return the current element.
--
-- Notice that, if there are several patterns that match at this point, we will use the first one.
-- You may sort the patterns by length to always match the longest or the shortest. If you desire
-- more complicated behaviour, you must plug a more involved logic here. In any way, the index
-- should point to one of the patterns that matched.
--
-- vvvvvvvvvvvvvv
f body#(x:xs) = case elemIndex True $ (`isPrefixOf` body) <$> delims of
Just index -> return (Nothing, drop (length $ delims !! index) body)
Nothing -> return (Just x, xs)
It might happen that you will not find this code straightforward. Specifically, the unfoldr part is somewhat dense, so I will add a few words about it.
unfoldr f is an embodiment of a recursion scheme. f is a function that may chip a part from the body: f :: (body -> Maybe (chip, body)).
As long as it keeps chipping, unfoldr keeps applying it to the body. This is called recursive case.
Once it fails (returning Nothing), unfoldr stops and hands you all the chips it thus collected. This is called base case.
In our case, f takes symbols from the string, and fails once the string is empty.
That's it. I hope you send me a postcard when you receive a Turing award for a fast splitting algorithm.

How to make a custom Attoparsec parser combinator that returns a Vector instead of a list?

{-# LANGUAGE OverloadedStrings #-}
import Data.Attoparsec.Text
import Control.Applicative(many)
import Data.Word
parseManyNumbers :: Parser [Int] -- I'd like many to return a Vector instead
parseManyNumbers = many (decimal <* skipSpace)
main :: IO ()
main = print $ parseOnly parseManyNumbers "131 45 68 214"
The above is just an example, but I need to parse a large amount of primitive values in Haskell and need to use arrays instead of lists. This is something that possible in the F#'s Fparsec, so I've went as far as looking at Attoparsec's source, but I can't figure out a way to do it. In fact, I can't figure out where many from Control.Applicative is defined in the base Haskell library. I thought it would be there as that is where documentation on Hackage points to, but no such luck.
Also, I am having trouble deciding what data structure to use here as I can't find something as convenient as a resizable array in Haskell, but I would rather not use inefficient tree based structures.
An option to me would be to skip Attoparsec and implement an entire parser inside the ST monad, but I would rather avoid it except as a very last resort.
There is a growable vector implementation in Haskell, which is based on the great AMT algorithm: "persistent-vector". Unfortunately, the library isn't that much known in the community so far. However to give you a clue about the performance of the algorithm, I'll say that it is the algorithm that drives the standard vector implementations in Scala and Clojure.
I suggest you implement your parser around that data-structure under the influence of the list-specialized implementations. Here the functions are, btw:
-- | One or more.
some :: f a -> f [a]
some v = some_v
where
many_v = some_v <|> pure []
some_v = (fmap (:) v) <*> many_v
-- | Zero or more.
many :: f a -> f [a]
many v = many_v
where
many_v = some_v <|> pure []
some_v = (fmap (:) v) <*> many_v
Some ideas:
Data Structures
I think the most practical data structure to use for the list of Ints is something like [Vector Int]. If each component Vector is sufficiently long (i.e. has length 1k) you'll get good space economy. You'll have
to write your own "list operations" to traverse it, but you'll avoid re-copying data that you would have to perform to return the data in a single Vector Int.
Also consider using a Dequeue instead of a list.
Stateful Parsing
Unlike Parsec, Attoparsec does not provide for user state. However, you
might be able to make use of the runScanner function (link):
runScanner :: s -> (s -> Word8 -> Maybe s) -> Parser (ByteString, s)
(It also returns the parsed ByteString which in your case may be problematic since it will be very large. Perhaps you can write an alternate version which doesn't do this.)
Using unsafeFreeze and unsafeThaw you can incrementally fill in a Vector. Your s data structure might look
something like:
data MyState = MyState
{ inNumber :: Bool -- True if seen a digit
, val :: Int -- value of int being parsed
, vecs :: [ Vector Int ] -- past parsed vectors
, v :: Vector Int -- current vector we are filling
, vsize :: Int -- number of items filled in current vector
}
Maybe instead of a [Vector Int] you use a Dequeue (Vector Int).
I imagine, however, that this approach will be slow since your parsing function will get called for every single character.
Represent the list as a single token
Parsec can be used to parse a stream of tokens, so how about writing
your own tokenizer and letting Parsec create the AST.
The key idea is to represent these large sequences of Ints as a single token. This gives you a lot more latitude in how you parse them.
Defer Conversion
Instead of converting the numbers to Ints at parse time, just have parseManyNumbers return a ByteString and defer the conversion until
you actually need the values. This much enable you to avoid reifying
the values as an actual list.
Vectors are arrays, under the hood. The tricky thing about arrays is that they are fixed-length. You pre-allocate an array of a certain length, and the only way of extending it is to copy the elements into a larger array.
This makes linked lists simply better at representing variable-length sequences. (It's also why list implementations in imperative languages amortise the cost of copying by allocating arrays with extra space and copying only when the space runs out.) If you don't know in advance how many elements there are going to be, your best bet is to use a list (and perhaps copy the list into a Vector afterwards using fromList, if you need to). That's why many returns a list: it runs the parser as many times as it can with no prior knowledge of how many that'll be.
On the other hand, if you happen to know how many numbers you're parsing, then a Vector could be more efficient. Perhaps you know a priori that there are always n numbers, or perhaps the protocol specifies before the start of the sequence how many numbers there'll be. Then you can use replicateM to allocate and populate the vector efficiently.

optparse-applicative Backtracking

I'm trying to use the optparse-applicative library in an program which should perform a different action depending on the number of arguments.
For example, the argument parsing for a program which calculates perimeters:
module TestOpts where
import Options.Applicative
type Length = Double
data PerimeterCommand
= GeneralQuadranglePerimeter Length Length Length Length
| RectanglePerimeter Length Length
parsePerimeterCommand :: Parser PerimeterCommand
parsePerimeterCommand = parseQuadPerimeter <|> parseRectPerimeter
parseQuadPerimeter = GeneralQuadranglePerimeter <$>
parseLength "SIDE1" <*>
parseLength "SIDE2" <*>
parseLength "SIDE3" <*>
parseLength "SIDE4"
parseRectPerimeter = RectanglePerimeter <$>
parseLength "WIDTH" <*> parseLength "HEIGHT"
parseLength name = argument auto (metavar name)
Only the first argument to <|> will ever successfully parse. I think some kind of argument backtracking is required, similar to Parsec's try combinator.
Any ideas on how to parse alternative sets of arguments, when the first alternative may consume some arguments of the next alternative?
Please note: this answer was written by the optparse-applicative author, Paolo Capriotti.
You can't do this with optparse-applicative directly. The main feature
of optparse-applicative is that options can be parsed in any order. If
you want to work mainly with arguments (which are positional), you are
better off having two levels of parsers: use many argument in
optparse-applicative, then pass the resulting array to a normal parser
(say using Parsec). If you only have positional arguments, then
optparse-applicative won't buy you very much, and you could just parse
the arguments manually with Parsec.

How to access nth element in a Haskell tuple

I have this:
get3th (_,_,a,_,_,_) = a
which works fine in GHCI but I want to compile it with GHC and it gives error. If I want to write a function to get the nth element of a tuple and be able to run in GHC what should I do?
my all program is like below, what should I do with that?
get3th (_,_,a,_,_,_) = a
main = do
mytuple <- getLine
print $ get3th mytuple
Your problem is that getLine gives you a String, but you want a tuple of some kind. You can fix your problem by converting the String to a tuple – for example by using the built-in read function. The third line here tries to parse the String into a six-tuple of Ints.
main = do
mystring <- getLine
let mytuple = read mystring :: (Int, Int, Int, Int, Int, Int)
print $ get3th mytuple
Note however that while this is useful for learning about types and such, you should never write this kind of code in practise. There are at least two warning signs:
You have a tuple with more than three or so elements. Such a tuple is very rarely needed and can often be replaced by a list, a vector or a custom data type. Tuples are rarely used more than temporarily to bring two kinds of data into one value. If you start using tuples often, think about whether or not you can create your own data type instead.
Using read to read a structure is not a good idea. read will explode your program with a terrible error message at any tiny little mistake, and that's usually not what you want. If you need to parse structures, it's a good idea to use a real parser. read can be enough for simple integers and such, but no more than that.
The type of getLine is IO String, so your program won't type check because you are supplying a String instead of a tuple.
Your program will work if proper parameter is supplied, i.e:
main = do
print $ get3th (1, 2, 3, 4, 5, 6)
It seems to me that your confusion is between tuples and lists. That is an understandable confusion when you first meet Haskell as many other languages only have one similar construct. Tuples use round parens: (1,2). A tuple with n values in it is a type, and each value can be a different type which results in a different tuple type. So (Int, Int) is a different type from (Int, Float), both are two tuples. There are some functions in the prelude which are polymorphic over two tuples, ie fst :: (a,b) -> a which takes the first element. fst is easy to define using pattern matching like your own function:
fst (a,b) = a
Note that fst (1,2) evaluates to 1, but fst (1,2,3) is ill-typed and won't compile.
Now, lists on the other hand, can be of any length, including zero, and still be the same type; but each element must be of the same type. Lists use square brackets: [1,2,3]. The type for a list with elements of type a is written [a]. Lists are constructed from appending values onto the empty list [], so a list with one element can be typed [a], but this is syntactic sugar for a:[], where : is the cons operator which appends a value to the head of the list. Like tuples can be pattern matched, you can use the empty list and the cons operator to pattern match:
head :: [a] -> a
head (x:xs) = x
The pattern match means x is of type a and xs is of type [a], and it is the former we want for head. (This is a prelude function and there is an analogous function tail.)
Note that head is a partial function as we cannot define what it does in the case of the empty list. Calling it on an empty list will result in a runtime error as you can check for yourself in GHCi. A safer option is to use the Maybe type.
safeHead :: [a] -> Maybe a
safeHead (x:xs) = Just x
safeHead [] = Nothing
String in Haskell is simply a synonym for [Char]. So all of these list functions can be used on strings, and getLine returns a String.
Now, in your case you want the 3rd element. There are a couple of ways you could do this, you could call tail a few times then call head, or you could pattern match like (a:b:c:xs). But there is another utility function in the prelude, (!!) which gets the nth element. (Writing this function is a very good beginner exercise). So your program can be written
main = do
myString <- getLine
print $ myString !! 2 --zero indexed
Testing gives
Prelude> main
test
's'
So remember, tuples us ()and are strictly of a given length, but can have members of different types; whereas lists use '[]', can be any length, but each element must be the same type. And Strings are really lists of characters.
EDIT
As an aside, I thought I'd mention that there is a neater way of writing this main function if you are interested.
main = getLine >>= print . (!!3)

Resources