A simple way of parsing a program arguments

A simple way of parsing a program arguments - haskell

I've seen some approaches about parsing program arguments. They seem to be too complicated. I need a simple solution. But I also want to be able (preferably, not necessarily) to refer to the arguments by name. So if a command like looks like this:
./MyApp --arg1Name arg1Value --arg2Name arg2Value
then I'd like to treat them as args["arg1Name"] and args["arg2Name"] to get their values. I know that's not a valid Haskell code, though. What I have now is not much:
main = do
[args] <- getArgs
I repeat, I'd like a simple solution, preferably without involving any third-party haskell libraries.

optparse-applicative is great for argument parsing, and very easy to use! Writing your own argument parser will be much more difficult to get right, change, extend, or otherwise manage than if you take 10 minutes to write a parser with optparse-applicative.
Start by importing the Options.Applicative module.
import Options.Applicative
Next create a data type for your command-line configuration.
data Configuration = Configuration
{ foo :: String
, bar :: Int
}
Now for the workhorse, we create a parser by using the combinators exported from optparse-applicative. Read the documentation on Options.Applicative.Builder for the full experience.
configuration :: Parser Configuration
configuration = Configuration
<$> strOption
( long "foo"
<> metavar "ARG1"
)
<*> option
( long "bar"
<> metavar "ARG2"
)
Now we can execute our Parser Configuration in an IO action to get at our command-line data.
main :: IO ()
main = do
config <- execParser (info configuration fullDesc)
putStrLn (show (bar config) ++ foo config)
And we're done! You can easily extend this parser to support a --help argument to print out usage documentation (make a new parser with helper <*> configuration and pass it to info), you can add default values for certain arguments (include a <> value "default" clause in the arguments to strOption or option), you can support flags, or sub-parsers or generate tab-completion data.
Libraries are a force multiplier! The investment you make in learning the basics of a good library will pay dividends in what you're able to accomplish, and tasks will often be easier (and quicker!) with the proper tool than with a "fast" solution thrown together out of duct tape.

How about just parsing them in pairs and adding them to a map:
simpleArgsMap :: [String] -> Map String String
simpleArgsMap [] = Map.empty
simpleArgsMap (('-':'-':name):value:rest)
= Map.insert name value (simpleArgsMap rest)
simpleArgsMap xs = error $ "Couldn't parse arguments: " ++ show xs
A simple wrapper program to show this working:
module Args where
import System.Environment ( getArgs )
import Control.Applicative ( (<$>) )
import qualified Data.Map as Map
import Data.Map ( Map )
main = do
argsMap <- simpleArgsMap <$> getArgs
print $ Map.lookup "foo" argsMap

Related

Get parameters with value

I would like to get the value of parameters from the input.
I have a program that read input, and I want to be able to get value like: ./test --params value and in my program get the value as an INT.
main = do
args <- getArgs
print args
Without using getOpt()
Thanks

Without using getOpt() Thanks
Huh. You're bound to reimplement at least parts of it, probably badly, but sure, let's assume your code is actually so special that it needs a hand-written arg parsing code.
Since getArgs :: IO [String], we already have the input tokenized by spaces, which is neat. However, in your case you want specifically --params value, and obtain value by Int.
There are numerous problems to solve here:
there might not be --params in the list at all
or there might be multiple instances of it
it might have no following token
or the following token might be another --otherparam
or the following token might not parse as Int
All of the above (and more) are possible to happen, because the input is completely unsanitized.
Solving all of the cases brings us back to using getOpt, so let's assume that there's exactly one --params in the list, and that it's followed by something that parses as an Int.
import System.Environment (getArgs)
main = do
args <- getArgs
let intArg = (read :: String -> Int) . head . tail . dropWhile (/= "--params") $ args
print intArg
If any of those assumptions is broken, this code will fail in numerous ways. Each of the problems requires a careful decision about a failure path. You might want to abort execution, provide a default value, you might want to use exceptions or a Maybe access API. Ultimately, you'll figure out that this is a solved problem and simply use getOpt:
import System.Console.GetOpt
import Data.Maybe (fromMaybe)
import System.Environment (getArgs)
data Arg = Params Int deriving Show
params :: String -> Arg
params = Params . read
options :: [OptDescr Arg]
options = [ Option ['p'] ["params"] (ReqArg params "VALUE") "Pass your params"]
main = do
argv <- getArgs
case getOpt Permute options argv of
(o,_no,[]) -> print o
(_,_,errs) -> ioError (userError (concat errs ++ usageInfo header options))
where header = "Usage:"

You can do this using the following function:
getParam :: [String] -> Maybe String
getParam [] = Nothing
getParam ("--param":next:_) = Just next
getParam (_:xs) = getParam xs
And you can use it as follows:
main = do
args <- getArgs
let param = getParam args
print param
If you’re interested in the details, getParam works by recursion:
The first line is a type signature stating that getParam takes a list of strings as its only argument, and it returns either a string or nothing (that’s what Maybe String means).
The second line states that if there are no arguments, it returns nothing.
The third line states that if the first argument is --param, match the next argument (by assigning it to the identifier next) and return it (albeit wrapped in Just; look up the ‘Maybe data type’ if you want to know more).
The fourth line states that if neither of the previous cases have matched, discard the first item in the list and try again.
There is one slight problem with this implementation of getParam: it returns a String, but you want an Int. You can fix this by using the read function, which can be used to convert a String to many other types, including Int. You could insert read in two places in the program: you could either replace Just next by Just (read next) (to get getParam to return an Int), or you could replace getParam args by read (getParam args) (to get getParam to return an String, and then convert that to an Int outside getParam).

Get a list of what is exported by a module

Similar question (Is there a way to see the list of functions in a module, in GHCI?), though not the result that I seek.
Is there a way to get a list of what is exported by a module?
Of course in GHCi you can import it then type Some.Module., hit tab for auto-completion and it will show what I seek. But I want to capture that stuff. Roughly speaking, String -> [String].
Purpose? Suppose that I have a source file with a naked import Some.Module. Question: What belongs to Some.Module in that file? A simple way would be to output the list of what the module exports, feed that to grep and return the contenders, without the need to load that source file in GHCi (might be complicated or not possible). And everything becomes a lot clearer.
If there's a smarter approach to that, I'm listening. I heard of solutions involving GOA and lambdabot. No idea if applicable or how to make use of this.

As #HTNW mentioned in a comment, if you can run ghc on your actual file, you can use -ddump-minimal-imports. Otherwise, if you want to actually get the list of exports from another module, assuming that you're using GHC, the easiest way to do this is probably to look at the .hi interface files. ghc has some built-in support for printing human-readable representations of interface files once you know the path to one, but as the wiki page notes "This textual format is not particularly designed for machine parsing". You can also access the information you might want via the GHC API. A small example of doing something like that follows.
We start with a bunch of random imports for doing IO & from the GHC api:
import Control.Monad.IO.Class
import System.IO
import System.Environment
import GHC
import GHC.Paths (libdir)
import DynFlags
import Outputable
import Name
import Pretty (Mode(..))
With that bureaucracy out of the way, main starts by firing up the GHC Monad:
main :: IO ()
main = defaultErrorHandler defaultFatalMessager defaultFlushOut $ do
runGhc (Just libdir) $ do
We're not actually generating any code so we can set hscTarget = HscNothing during the DynFlags setup boilerplate:
dflags <- getSessionDynFlags
let dflags' = dflags { hscTarget = HscNothing }
setSessionDynFlags dflags'
With that out of the way we can find the module we want from the package database (using the first command-line argument as the name):
mn <- head <$> (liftIO $ getArgs)
m <- lookupModule (mkModuleName mn) Nothing
We can use getModuleInfo to get a module info structure:
mmi <- getModuleInfo m
case mmi of
Nothing -> liftIO $ putStrLn "Could not find module interface"
If we did find the interface, everything we need for this is in the modInfoExports. If we needed more, we could also get the actual ModIface:
Just mi -> mapM_ (printExport dflags') (modInfoExports mi)
Actually printing out an exported is a bit tedious, as it requires working with Names; a simple example printExport might just use the pretty-printing functions, but these are more intended for printing human-readable output than machine-readable:
printExport :: DynFlags -> Name -> Ghc ()
printExport dflags n =
liftIO $ printSDocLn PageMode dflags stdout (defaultUserStyle dflags)
$ pprNameUnqualified n

A particularly simple way for interactive use is :browse. Load up a ghci that has access to the appropriate package, then
> :browse Some.Module
class Some.Module.Foo a where
Some.Module.foo :: a -> a
{-# MINIMAL foo #-}
Some.Module.bar :: Int
All the qualification can get a bit much, especially if there are many functions that operate on types defined in the same module. To reduce the clutter, you can bring Some.Module into scope first:
> :m + Some.Module
> :browse Some.Module
class Foo a where
foo :: a -> a
{-# MINIMAL foo #-}
bar :: Int

Assigning command line arguments to variables

I need to convert my command line argument chars to Ints. I would like to assign each one to a variable and use them in my function. I have the following:
import System.Environment
getIntArg :: IO Int
getIntArg = fmap (read . head) getArgs
main = do
n <- getIntArg
print n
However, it doesn't loop through all the arguments only printing one. Also, would I assign each one to a variable to use?
I'm new to FP.

if you want to map all arguments to Ints you should use
getIntArgs :: IO [Int]
getIntArgs = fmap (map read) $ getArgs
or a bit shorter:
getIntArgs = map read <$> getArgs
instead.
This way you can do
main = do
ns <- getIntArgs
print ns
and then get the ones you are interested in with (!!) or if you know/expect at least a fixed number of Int - arguments with
(n1:n2:n3:_) <- getIntArgs
but of course you should maybe check the length first - also this will fail if the user decides to input things that cannot get parsed into Ints
So if you don't want to reinvent the wheel (which might be ok if you want to learn or only need a quick solution) you maybe want to look around and use a existing package - for example parseargs to do this for you

Is there something better than unsafePerformIO for this....?

I've so far avoided ever needing unsafePerformIO, but this might have to change today.... I would like to see if the community agrees, or if someone has a better solution.
I have a library which needs to use some config data stored in a bunch of files. This data is guaranteed static (during the run), but needs to be in files that can (on very rare occasions) be edited by an end user who can not compile Haskell programs. (The details are uninportant, but think of "/etc/mime.types" as a pretty good approximation. It is a large almost static data file used throughout many programs).
If this weren't a library I would just use the IO monad.... But because it is a library which is called throughout my code, it literally forces a bubbling up of the IO monad through pretty much everything I have written in multiple modules! Although I need to do a one time read of the data files, this low level call is effetively pure, so this is a pretty unacceptable outcome.
FYI, I plan to also wrap the call in unsafeInterleaveIO, so that only files that are needed will be loaded. My code will look something like this....
dataDir="<path to files>"
datafiles::[FilePath]
datafiles =
unsafePerformIO $
unsafeInterleaveIO $
map (dataDir </>)
<$> filter (not . ("." `isPrefixOf`))
<$> getDirectoryContents dataDir
fileData::[String]
fileData = unsafePerformIO $ unsafeInterleaveIO $ sequence $ readFile <$> datafiles
Given that the data read is referentially transparent, I am pretty sure that unsafePerformIO is safe (this has been discussed in many place, such as "Use of unsafePerformIO appropriate?"). Still, though, if there is a better way, I would love to hear about it.
UPDATE-
In response to Anupam's comment....
There are two reasons why I can't break up the lib into IO and non IO parts.
First, the amount of data is large, and I don't want to read it all into memory at once. Remember that IO is always read strictly.... This is the reason that I need to put in the unsafeInterleaveIO call, to make it lazy. IMHO, once you use unsafeInterleaveIO, you might as well use unsafePerformIO, as the risk is already there.
Second, breaking out the IO specific parts just substitutes the bubbling up of the IO monad with the bubbling up of the IO read code, as well as the passing around of the data (I might actually choose to pass around the data using the state monad anyway, so it really isn't an improvement to substitute the IO monad for the state monad everywhere). This wouldn't be so bad if the low level function itself wasn't effectively pure (ie- think of my /etc/mime.types example above, and imagine a Haskell extensionToMimeType function, which is basically pure, but needs to get the database data from the file.... Suddenly everything from low to high in the stack needs to call or pass through a readMimeData::IO String. Why should each main even need to care about the library choice of a submodule many levels deep?).

I agree with Anupam Jain, you would be better off reading these data files at a somewhat higher level, in IO, and then passing the data in them through the rest of your program purely.
You could, for example, put the functions that need the results of fileData into Reader [String], so that they can just ask for the results as needed (or some Reader Config, where Config holds these strings and whatever else you need).
A sketch of what I'm suggesting follows:
type AppResult = String
fileData :: IO [String]
fileData = undefined -- read the files
myApp :: String -> Reader [String] AppResult
myApp s = do
files <- ask
return undefined -- do whatever with s and config
main = do
config <- fileData
return $ runReader (myApp "test") config

I gather that you don't want to read all the data at once, because that would be costly. And maybe you don't really know up-front what files you will need to load, so loading all of them at the start would be wasteful.
Here's an attempt at a solution. It requires you to work inside a free monad and relegate the side-effecting operations to an interpreter. Some preliminary imports:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Data.ByteString as B
import Data.Monoid
import Data.List
import Data.Functor.Compose
import Control.Applicative
import Control.Monad
import Control.Monad.Free
import System.IO
We define a functor for the free monad. It will offer a value p do the interpreter and continue the computation after receiving a value b:
type LazyLoad p b = Compose ((,) p) ((->) b)
A convenience function to request the loading of a file:
lazyLoad :: FilePath -> Free (LazyLoad FilePath B.ByteString) B.ByteString
lazyLoad path = liftF $ Compose (path,id)
A dummy interpreter function that reads "file contents" from stdin:
interpret :: Free (LazyLoad FilePath B.ByteString) a -> IO a
interpret = iterM $ \(Compose (path,next)) -> do
putStrLn $ "Enter the contents for file " <> path <> ":"
B.hGetLine stdin >>= next
Some silly example functions:
someComp :: B.ByteString -> B.ByteString
someComp b = "[" <> b <> "]"
takesAwhile :: Int
takesAwhile = foldl' (+) 0 $ take 400000000 $ intersperse (negate 1) $ repeat 1
An example program:
main :: IO ()
main = do
r <- interpret $ do
r1 <- someComp <$> lazyLoad "file1"
r2 <- return takesAwhile
if (r2 == 1)
then return r1
else someComp <$> lazyLoad "file2"
putStrLn . show $ r
When executed, this program will request a line, spend some time computing takesAwhile and only then request another line.
If want to allow different kinds of "requests", this solution could be extended with something like Data types à la carte so that each function only needs to know about about the precise effects it requires.
If you are content with allowing only one type of request, you could also use Clients and Servers from Pipes.Core instead of the free monad.

Grandfather Paradox in Haskell

I'm trying to write a renamer for a compiler that I'm writing in Haskell.
The renamer scans an AST looking for symbol DEFs, which it enters into a symbol table, and symbol USEs, which it resolves by looking in the symbol table.
In this language, uses can come before or after defs, so it would seem that a 2 pass strategy is required; one pass to find all the defs and build the symbol table, and a second to resolve all the uses.
However, since Haskell is lazy (like me), I figure I can tie-the-knot and pass the renamer the final symbol table before it is actually built. This is fine as long as I promise to actually build it. In an imperative programming language, this would be like sending a message back in time. This does work in Haskell, but care must be taken to not introduce a temporal paradox.
Here's a terse example:
module Main where
import Control.Monad.Error
import Control.Monad.RWS
import Data.Maybe ( catMaybes )
import qualified Data.Map as Map
import Data.Map ( Map )
type Symtab = Map String Int
type RenameM = ErrorT String (RWS Symtab String Symtab)
data Cmd = Def String Int
| Use String
renameM :: [Cmd] -> RenameM [(String, Int)]
renameM = liftM catMaybes . mapM rename1M
rename1M :: Cmd -> RenameM (Maybe (String, Int))
rename1M (Def name value) = do
modify $ \symtab -> Map.insert name value symtab
return Nothing
rename1M (Use name) = return . liftM ((,) name) . Map.lookup name =<< ask
--rename1M (Use name) =
-- maybe (return Nothing) (return . Just . (,) name) . Map.lookup name =<< ask
--rename1M (Use name) =
-- maybe (throwError $ "Cannot locate " ++ name) (return . Just . (,) name) . Map.lookup name =<< ask
rename :: [Cmd] -> IO ()
rename cmds = do
let (result, symtab, log) = runRWS (runErrorT $ renameM cmds) symtab Map.empty
print result
main :: IO ()
main = do
rename [ Use "foo"
, Def "bar" 2
, Use "bar"
, Def "foo" 1
]
This is the line where the knot is tied:
let (result, symtab, log) = runRWS (runErrorT $ renameM cmds) symtab Map.empty
The running symbol table is stored in the MonadState of the RWS, and the final symbol table is stored in the MonadReader.
In the above example, I have 3 versions of rename1M for Uses (2 are commented out). In this first form, it works fine.
If you comment out the first rename1M Use, and uncomment the second, the program does not terminate. However, it is, in spirit, no different than the first form. The difference is that it has two returns instead of one, so the Maybe returned from Map.lookup must be evaluated to see which path to take.
The third form is the one that I really want. I want to throw an error if I can't find a symbol. But this version also does not terminate. Here, the temporal paradox is obvious; the decision about whether the the symbol will be in the table can affect whether it will be in the table...
So, my question is, is there an elegant way to do what the third version does (throw an error) without running into the paradox? Send the errors on the MonadWriter without allowing the lookup to change the path? Two passes?

Do you really have to interrupt execution when an error occurs? An alternative approach would be to log errors. After tying the knot, you can check whether the list of errors is empty. I've taken this approach in the past.
-- I've wrapped a writer in a writer transformer. You'll probably want to implement it differently to avoid ambiguity
-- related to writer methods.
type RenameM = WriterT [RenameError] (RWS Symtab String Symtab)
rename1M (Use name) = do
symtab_entry <- asks (Map.lookup name)
-- Write a list of zero or more errors. Evaluation of the list is not forced until all processing is done.
tell $ if isJust symtab_entry then [] else missingSymbol name
return $ Just (name, fromMaybe (error "lookup failed") symtab_entry)
rename cmds = do
let ((result, errors), symtab, log) = runRWS (runWriterT $ renameM cmds) symtab Map.empty
-- After tying the knot, check for errors
if null errors then print result else print errors
This does not produce laziness-related nontermination problems because the contents of the symbol table are not affected by whether or not a lookup succeeded.

I don't have a well thought out answer, but one quick thought. Your single pass over the AST takes all the Def and produces a (Map Symbol _), and I wonder if the same AST pass can take all the Use and produce a (Set Symbol) as well as the lazy lookup.
Afterwards you can quite safely compare the Symbols in the keys of the Map with the Symbols in the Set. If the Set has anything not in the Map then you can report all of those Symbols are errors. If any Def'd Symbols are not in in the Set then you can warn about unused Symbols.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

A simple way of parsing a program arguments - haskell

Related

Get parameters with value

Get a list of what is exported by a module

Assigning command line arguments to variables

Is there something better than unsafePerformIO for this....?

Grandfather Paradox in Haskell

Categories

Resources