A noob Haskell question.
I have had fun writing a few parsers with both Parsec and AttoParsec. I now want to gather information during the parsing process (basically build a symbol table) and using the WriterT monad transformer seems like a good option.
I have this working now in this simple example:
module NewParse where
import qualified Text.ParserCombinators.Parsec as A
import Control.Monad.Trans (lift)
import Control.Monad.Writer (WriterT, tell, runWriterT)
type WParser a = WriterT [String] A.Parser a
data StoryToken = StoryToken Char deriving (Show)
getToken :: WParser StoryToken
getToken = do
tell ["hello from Writer T"]
c <- lift A.anyChar
return $ StoryToken c
test = A.parse (runWriterT getToken) "story" "#"
It works great. Calling test in ghci gives me this:
*NewParse> test
Right (StoryToken '#',["hello"])
What I'm kind of dreading when applying this to my parser code having to lift the parser monad for every every call it. I'll be calling tell relatively infrequently but parser functions a lot. The code is going to get a lot uglier.
Is there a way of easily exposing certain functions from Parsec so I don't have to use lift in the code?
A solution that worked for me was creating my own function
anyChar = lift A.anyChar
Which is cool but would require creating similar shadow functions for every function I use from parsec.
And I don't mind doing that but just wondered if there was a better, less boiler-platey way of achieving the same thing.
Any help gratefully received :)
Parsec allows for users to carry state through parsers just use
addSym :: String -> Parsec String [String] ()
addSym = void . updateState . (:) -- append a string to the symbol list
type MyParser = Parsec String [String]
This will handle backtracking correctly.
Related
I have a DSL, and a parser for it written in Haskell with the Parsec package. Now I want to deprecate a specific language feature of the DSL. In the next release, I want the parser to accept both the new and the old syntax, but I want the parser to spawn a deprecation message. I could not find how to do this. Is this possible, and if so, how can this be done?
Instead of emitting messages during parsing, it would be better to return extra information at the end of parsing: whether or not deprecated syntax was encountered.
The ParsecT type admits a type parameter for state set by the user during parsing:
ParsecT s u m a is a parser with stream type s, user state type u, underlying monad m and return type a. Parsec is strict in the user state.
The user state can be set with putState and modifyState. It can be obtained using getState.
Most parsec combinators are polymorphic on the user state. Most combinators for your own DSL should be, as well. But parsers for deprecated parts of the syntax should set a "flag" in your user state.
Something like this:
import Text.Parsec
import Text.Parsec.Char
import Data.Functor.Identity
type Parser = ParsecT [Char] Bool Identity -- using a Bool state
myParser :: Parser Char
myParser =
try (do char 'a'
putState True
char 'b')
<|>
try (do char 'a'
char 'c')
main :: IO ()
main = do
print $ runParser ((,) <$> myParser <*> getState) False "" "ab"
print $ runParser ((,) <$> myParser <*> getState) False "" "ac"
-- results:
-- Right ('b',True)
-- Right ('c',False)
Of course, instead of a simple boolean flag, it would be better to put more information into the state.
Notice that state set by a sub-parser is "forgotten" if the sub-parser backtracks. That is the correct behavior for our purposes: otherwise, we would get "false positives" triggered by branches that are ultimately discarded.
A common alternative to parsec is megaparsec. The latter doesn't allow for user-defined state in the parser type itself, but it can be emulated using a StateT transformer over the ParsecT type.
In the code below I manage a game, which owns a list of links.
At each step of the game, I change the game state updating the list of links modified.
As I am learning the State monad, I was trying to apply the State monad technique to this use case.
Nonetheless, at each turn, I need to get a piece of info from IO, using getLine
this gives such a code
{-# LANGUAGE NamedFieldPuns #-}
{-# LANGUAGE RecordWildCards #-}
import Control.Monad
import Control.Monad.State.Strict
import qualified Data.List as List
import qualified Control.Monad.IO.Class as IOClass
type Node = Int
type Link = (Node,Node)
type Links = [Link]
type Gateway = Node
type Gateways = [Gateway]
data Game = Game { nbNodes :: Int, links :: Links, gateways :: Gateways }
computeNextTurn :: State Game Link
computeNextTurn = do
g <- get
inputLine <- IOClass.liftIO getLine -- this line causes problem
let node = read inputLine :: Int
let game#(Game _ ls gs) = g
let linkToSever = computeLinkToSever game node
let ls' = List.delete linkToSever ls
let game' = game{links = ls'}
put game'
return linkToSever
computeAllTurns :: State Game Links
computeAllTurns = do
linkToSever <- computeNextTurn
nextGames <- computeAllTurns
return (linkToSever : nextGames)
computeLinkToSever :: Game -> Node -> Link
computeLinkToSever _ _ = (0,1) -- just a dummy value
-- this function doesnt compute really anything for the moment
-- but it will depend on the value of node i got from IO
However I get an error at compilation:
No instance for (MonadIO Data.Functor.Identity.Identity)
arising from a use of liftIO
and I get the same style of error, if I try to use liftM and lift.
I have read some questions that are suggesting StateT and ST, which I don't grasp yet.
I am wondering if my current techique with a simple State is doomed to fail, and that indeed I can not use State, but StateT / ST ?
Or is there a possible operation to simply get the value from getLine, inside the State monad ?
As #bheklilr said in his comment, you can't use IO from State. The reason for that, basically, is that State (which is just shorthand for StateT over Identity) is no magic, so it's not going to be able to use anything more than
What you can already do in its base monad, Identity
The new operations provided by State itself
However, that first point also hints at the solution: if you change the base monad from Identity to some other monad m, then you get the capability to use the effects provided by m. In your case, by setting m to IO, you're good to go.
Note that if you have parts of your computation that don't need to do IO, but require access to your state, you can still express this fact by making their type something like
foo :: (Monad m) => Bar -> StateT Game m Baz
You can then compose foo with computations in StateT Game IO, but its type also makes it apparent that it can't possibly do any IO (or anything else base monad-specific).
You also mentioned ST in your question as possible solution. ST is not a monad transformer and thus doesn't allow you to import effects from some base monad.
I want to provide a function that replaces each occurrence of # in a string with a different random number. In a non-pure language, it's trivial. However, how should it be designed in a pure language? I don't want to use unsafePerformIO, as it rather looks like a hack and not a proper design.
Should this function require a random generator as one of its parameters? And if so, would that generator have to be passed through the whole stack of invocations? Are there other possible approaches? Should I use the State monad, here? I would appreciate a toy example demonstrating a viable approach...
You would, in fact, use a variant of the state monad to pass the random generator around behind the scenes. The Rand type in Control.Monad.Random helps with this. The API is a bit confusing, but more because it's polymorphic over the type of random generator you use than because it has to be functional. This extra bit of scaffolding is useful, however, because you can easily reuse your existing code with different random generators which lets you test different algorithms as well as explicitly controlling whether the generator is deterministic (good for testing) or seeded with outside data (in IO).
Here's a simple example of Rand in action. The RandomGen g => in the type signature tells us that we can use any type of random generator for it. We have to explicitly annotate n as an Int because otherwise GHC only knows that it has to be some numeric type that can be generated and turned into a string, which can be one of multiple possible options (like Double).
randomReplace :: RandomGen g => String -> Rand g String
randomReplace = foldM go ""
where go str '#' = do
n :: Int <- getRandomR (0, 10)
return (str ++ show n)
go str chr = return $ str ++ [chr]
To run this, we need to get a random generator from somewhere and pass it into evalRand. The easiest way to do this is to get the global system generator which we can do in IO:
main :: IO ()
main = do gen <- getStdGen
print $ evalRand (randomReplace "ab#c#") gen
This is such a common pattern that the library provides an evalRandIO function which does it for you:
main :: IO ()
main = do res <- evalRandIO $ randomReplace "ab#c#"
print res
In the end, the code is a bit more explicit about having a random generator and passing it around, but it's still reasonably easy to follow. For more involved code, you could also use RandT, which allows you to extend other monads (like IO) with the ability to generate random values, letting you relegate all the plumbing and setup to one part of your code.
It's just a monadic mapping
import Control.Applicative
import Control.Monad.Random
import Data.Char
randomReplace :: RandomGen g => String -> Rand g String
randomReplace = mapM f where
f '#' = intToDigit <$> getRandomR (0, 10)
f c = return c
main = evalRandIO (randomReplace "#abc#def#") >>= print
I've so far avoided ever needing unsafePerformIO, but this might have to change today.... I would like to see if the community agrees, or if someone has a better solution.
I have a library which needs to use some config data stored in a bunch of files. This data is guaranteed static (during the run), but needs to be in files that can (on very rare occasions) be edited by an end user who can not compile Haskell programs. (The details are uninportant, but think of "/etc/mime.types" as a pretty good approximation. It is a large almost static data file used throughout many programs).
If this weren't a library I would just use the IO monad.... But because it is a library which is called throughout my code, it literally forces a bubbling up of the IO monad through pretty much everything I have written in multiple modules! Although I need to do a one time read of the data files, this low level call is effetively pure, so this is a pretty unacceptable outcome.
FYI, I plan to also wrap the call in unsafeInterleaveIO, so that only files that are needed will be loaded. My code will look something like this....
dataDir="<path to files>"
datafiles::[FilePath]
datafiles =
unsafePerformIO $
unsafeInterleaveIO $
map (dataDir </>)
<$> filter (not . ("." `isPrefixOf`))
<$> getDirectoryContents dataDir
fileData::[String]
fileData = unsafePerformIO $ unsafeInterleaveIO $ sequence $ readFile <$> datafiles
Given that the data read is referentially transparent, I am pretty sure that unsafePerformIO is safe (this has been discussed in many place, such as "Use of unsafePerformIO appropriate?"). Still, though, if there is a better way, I would love to hear about it.
UPDATE-
In response to Anupam's comment....
There are two reasons why I can't break up the lib into IO and non IO parts.
First, the amount of data is large, and I don't want to read it all into memory at once. Remember that IO is always read strictly.... This is the reason that I need to put in the unsafeInterleaveIO call, to make it lazy. IMHO, once you use unsafeInterleaveIO, you might as well use unsafePerformIO, as the risk is already there.
Second, breaking out the IO specific parts just substitutes the bubbling up of the IO monad with the bubbling up of the IO read code, as well as the passing around of the data (I might actually choose to pass around the data using the state monad anyway, so it really isn't an improvement to substitute the IO monad for the state monad everywhere). This wouldn't be so bad if the low level function itself wasn't effectively pure (ie- think of my /etc/mime.types example above, and imagine a Haskell extensionToMimeType function, which is basically pure, but needs to get the database data from the file.... Suddenly everything from low to high in the stack needs to call or pass through a readMimeData::IO String. Why should each main even need to care about the library choice of a submodule many levels deep?).
I agree with Anupam Jain, you would be better off reading these data files at a somewhat higher level, in IO, and then passing the data in them through the rest of your program purely.
You could, for example, put the functions that need the results of fileData into Reader [String], so that they can just ask for the results as needed (or some Reader Config, where Config holds these strings and whatever else you need).
A sketch of what I'm suggesting follows:
type AppResult = String
fileData :: IO [String]
fileData = undefined -- read the files
myApp :: String -> Reader [String] AppResult
myApp s = do
files <- ask
return undefined -- do whatever with s and config
main = do
config <- fileData
return $ runReader (myApp "test") config
I gather that you don't want to read all the data at once, because that would be costly. And maybe you don't really know up-front what files you will need to load, so loading all of them at the start would be wasteful.
Here's an attempt at a solution. It requires you to work inside a free monad and relegate the side-effecting operations to an interpreter. Some preliminary imports:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Data.ByteString as B
import Data.Monoid
import Data.List
import Data.Functor.Compose
import Control.Applicative
import Control.Monad
import Control.Monad.Free
import System.IO
We define a functor for the free monad. It will offer a value p do the interpreter and continue the computation after receiving a value b:
type LazyLoad p b = Compose ((,) p) ((->) b)
A convenience function to request the loading of a file:
lazyLoad :: FilePath -> Free (LazyLoad FilePath B.ByteString) B.ByteString
lazyLoad path = liftF $ Compose (path,id)
A dummy interpreter function that reads "file contents" from stdin:
interpret :: Free (LazyLoad FilePath B.ByteString) a -> IO a
interpret = iterM $ \(Compose (path,next)) -> do
putStrLn $ "Enter the contents for file " <> path <> ":"
B.hGetLine stdin >>= next
Some silly example functions:
someComp :: B.ByteString -> B.ByteString
someComp b = "[" <> b <> "]"
takesAwhile :: Int
takesAwhile = foldl' (+) 0 $ take 400000000 $ intersperse (negate 1) $ repeat 1
An example program:
main :: IO ()
main = do
r <- interpret $ do
r1 <- someComp <$> lazyLoad "file1"
r2 <- return takesAwhile
if (r2 == 1)
then return r1
else someComp <$> lazyLoad "file2"
putStrLn . show $ r
When executed, this program will request a line, spend some time computing takesAwhile and only then request another line.
If want to allow different kinds of "requests", this solution could be extended with something like Data types à la carte so that each function only needs to know about about the precise effects it requires.
If you are content with allowing only one type of request, you could also use Clients and Servers from Pipes.Core instead of the free monad.
I've been suggested csv-conduit as a good Haskell package to work with CSV files. I want to learn how it works, but the documentation is too terse for a newbie Haskell programmer.
Is there a way for me to figure out how it works by trial-and-error in GHCi?
More specifically, should I load modules and files from GHCi or should I write a simple HS file to load them and then move around interactively?
I mentioned csv-conduit, but I'm opened to using any CSV package. I just need to get my hands on one and fool around with it, until I feel at ease (much like I would do in IDLE).
Take a look at the following function: readCSVFile :: :: (MonadResource m, CSV ByteString a) => CSVSettings -> FilePath -> m [a]
Its relatively simple to call, as we just need a CSVSettings, such as defCSVSettings, and a FilePath (aka String), "file.csv" or something.
Thus, after the call, we get (MonadResource m, CSV ByteString a). We can resolve this one at a time to figure out an appropriate type for this. We are performing IO in this operation, so for MonadResource m, m should just be ResourceT IO, which happens to be an instance of MonadBaseControl IO as required by runResourceT. This is a conduit specific thing.
For the CSV ByteString a, we need to find what instances of CSV. To do so, go to http://hackage.haskell.org/packages/archive/csv-conduit/0.2.1.1/doc/html/Data-CSV-Conduit.html#t:CSV (where the documentation for the package is in my opinion somewhat obnoxiously all stuffed into the typeclass...) and click on Instances to see what available instances we have of the form CSV ByteString a. The two options are CSV ByteString ByteString and CSV ByteString Text.
Of the two of these, Text is preferable because it handles unicode and CSV is unlikely to be containing binary data. ByteString is more or less similar to a [Word8] while Text is more similar to [Char] which is probably what you want. Hence, a should be Text (although ByteString will still work).
This means the result of the function call is ResourceT IO [Row Text]. We can't do much with this, but because ResourceT is a monad transformer, we can easily "pop" off the monad transformation layer with the function runResourceT. Thus,
readFile :: FilePath -> IO [Row Text]
readFile = runResourceT . readCSVFile defCSVSettings
which is easily usable within, say, main to get at the [Row Text] which you can then iterate over with a map or a fold to get your hands on the individual rows.
To run this sort of thing in GHCI you absolutely have to specifically point out the type. The reason is that the result class instance is not dependent on any of the parameters; thus, for any set of CSVSettings and FilePath, readCSVFile could return any number of different types as long as they as m is an instance of MonadResource m and a is an instance of CSV ByteString a. Thus, we have to explicitly point out to GHCi which type you want.
Have you tried Text.CSV? It might be more appropriate if you're just starting out with Haskell, as it's much simpler.
As for exploring new modules, you can just load it into GHCi, there's no need to write an additional file.
This works with the latest version of the csv-conduit package (version 0.6.3). Note the signature of readCsv without which I couldn't compile.
{-# LANGUAGE OverloadedStrings #-}
import Data.CSV.Conduit
import Data.Text (Text)
import qualified Data.Vector as V
import qualified Data.ByteString as B
csvset :: Char -> CSVSettings
csvset c = CSVSettings {csvSep = c, csvQuoteChar = Just '"'}
readCsv :: String -> Char -> IO (V.Vector (Row Text))
readCsv fp del = readCSVFile (csvset del) fp
main = readCsv "C:\\mydir\\myfile.csv" ';'