Running Haskell HXT outside of IO? - haskell

All the examples I've seen so far using the Haskell XML toolkit, HXT, uses runX to execute the parser. runX runs inside the IO monad. Is there a way of using this XML parser outside of IO? Seems to be a pure operation to me, don't understand why I'm forced to be inside IO.

You can use HXT's xread along with runLA to parse an XML string outside of IO.
xread has the following type:
xread :: ArrowXml a => a String XmlTree
This means you can compose it with any arrow of type (ArrowXml a) => a XmlTree Whatever to get an a String Whatever.
runLA is like runX, but for things of type LA:
runLA :: LA a b -> a -> [b]
LA is an instance of ArrowXml.
To put this all together, the following version of my answer to your previous question uses HXT to parse a string containing well-formed XML without any IO involved:
{-# LANGUAGE Arrows #-}
module Main where
import qualified Data.Map as M
import Text.XML.HXT.Arrow
classes :: (ArrowXml a) => a XmlTree (M.Map String String)
classes = listA (divs >>> pairs) >>> arr M.fromList
where
divs = getChildren >>> hasName "div"
pairs = proc div -> do
cls <- getAttrValue "class" -< div
val <- deep getText -< div
returnA -< (cls, val)
getValues :: (ArrowXml a) => [String] -> a XmlTree (String, Maybe String)
getValues cs = classes >>> arr (zip cs . lookupValues cs) >>> unlistA
where lookupValues cs m = map (flip M.lookup m) cs
xml = "<div><div class='c1'>a</div><div class='c2'>b</div>\
\<div class='c3'>123</div><div class='c4'>234</div></div>"
values :: [(String, Maybe String)]
values = runLA (xread >>> getValues ["c1", "c2", "c3", "c4"]) xml
main = print values
classes and getValues are similar to the previous version, with a few minor changes to suit the expected input and output. The main difference is that here we use xread and runLA instead of readString and runX.
It would be nice to be able to read something like a lazy ByteString in a similar manner, but as far as I know this isn't currently possible with HXT.
A couple of other things: you can parse strings in this way without IO, but it's probably better to use runX whenever you can: it gives you more control over the configuration of the parser, error messages, etc.
Also: I tried to make the code in the example straightforward and easy to extend, but the combinators in Control.Arrow and Control.Arrow.ArrowList make it possible to work with arrows much more concisely if you like. The following is an equivalent definition of classes, for example:
classes = (getChildren >>> hasName "div" >>> pairs) >. M.fromList
where pairs = getAttrValue "class" &&& deep getText

Travis Brown's answer was very helpful. I just want to add my own solution here, which I think is a bit more general (using the same functions, just ignoring the problem-specific issues).
I was previously unpickling with:
upIO :: XmlPickler a => String -> IO [a]
upIO str = runX $ readString [] str >>> arrL (maybeToList . unpickleDoc xpickle)
which I was able to change to this:
upPure :: XmlPickler a => String -> [a]
upPure str = runLA (xreadDoc >>> arrL (maybeToList . unpickleDoc xpickle)) str
I completely agree with him that doing this gives you less control over the configuration of the parser etc, which is unfortunate.

Related

How to convert a haskell List into a monadic function that uses list values for operations?

I am having trouble wrapping my head around making to work a conversion of a list into a monadic function that uses values of the list.
For example, I have a list [("dir1/content1", "1"), ("dir1/content11", "11"), ("dir2/content2", "2"), ("dir2/content21", "21")] that I want to be converted into a monadic function that is mapped to a following do statement:
do
mkBlob ("dir1/content1", "1")
mkBlob ("dir1/content11", "11")
mkBlob ("dir2/content2", "2")
mkBlob ("dir2/content21", "21")
I imagine it to be a function similar to this:
contentToTree [] = return
contentToTree (x:xs) = (mkBlob x) =<< (contentToTree xs)
But this does not work, failing with an error:
• Couldn't match expected type ‘() -> TreeT LgRepo m ()’
with actual type ‘TreeT LgRepo m ()’
• Possible cause: ‘(>>=)’ is applied to too many arguments
In the expression: (mkBlob x) >>= (contentToTree xs)
In an equation for ‘contentToTree’:
contentToTree (x : xs) = (mkBlob x) >>= (contentToTree xs)
• Relevant bindings include
contentToTree :: [(TreeFilePath, String)] -> () -> TreeT LgRepo m ()
I do not quite understand how to make it work.
Here is my relevant code:
import Data.Either
import Git
import Data.Map
import Conduit
import qualified Data.List as L
import qualified Data.ByteString.Char8 as BS
import qualified Data.ByteString.Lazy as BL
import Control.Monad (join)
type FileName = String
data Content = Content {
content :: Either (Map FileName Content) String
} deriving (Eq, Show)
contentToPaths :: String -> Content -> [(TreeFilePath, String)]
contentToPaths path (Content content) = case content of
Left m -> join $ L.map (\(k, v) -> (contentToPaths (if L.null path then k else path ++ "/" ++ k) v)) $ Data.Map.toList m
Right c -> [(BS.pack path, c)]
mkBlob :: MonadGit r m => (TreeFilePath, String) -> TreeT r m ()
mkBlob (path, content) = putBlob path
=<< lift (createBlob $ BlobStream $
sourceLazy $ BL.fromChunks [BS.pack content])
sampleContent = Content $ Left $ fromList [
("dir1", Content $ Left $ fromList [
("content1", Content $ Right "1"),
("content11", Content $ Right "11")
]),
("dir2", Content $ Left $ fromList [
("content2", Content $ Right "2"),
("content21", Content $ Right "21")
])
]
Would be grateful for any tips or help.
You have:
A list of values of some type a (in this case a ~ (String, String)). So, xs :: [a]
A function f from a to some type b in a monadic context, m b. Since you're ignoring the return value, we can imagine b ~ (). So, f :: Monad m => a -> m ().
You want to perform the operation, yielding some monadic context and an unimportant value, m (). So overall, we want some function doStuffWithList :: Monad m => [a] -> (a -> m ()) -> m (). We can search Hoogle for this type, and it yields some results. Unfortunately, as we've chosen to order the arguments, the first several results are little-used functions from other packages. If you scroll further, you start to find stuff in base - very promising. As it turns out, the function you are looking for is traverse_ :: (Foldable t, Applicative f) => (a -> f b) -> t a -> f (). With that, we can replace your do-block with just:
traverse_ mkBlob [ ("dir1/content1", "1")
, ("dir1/content11", "11")
, ("dir2/content2", "2")
, ("dir2/content21", "21")
]
As it happens there are many names for this function, some for historical reasons and some for stylistic reasons. mapM_, forM_, and for_ are all the same and all in base, so you could use any of these. But the M_ versions are out of favor these days because really you only need Applicative, not Monad; and the for versions take their arguments in an order that's convenient for lambdas but inconvenient for named functions. So, traverse_ is the one I'd suggest.
Assuming mkBlob is a function that looks like
mkBlob :: (String, String) -> M ()
where M is some specific monad, then you have the list
xs = [("dir1/content1", "1"), ("dir1/content11", "11"), ("dir2/content2", "2"), ("dir2/content21", "21")]
whose type is xs :: [(String, String)]. The first thing we need is to run the mkBlob function on each element, i.e. via map.
map mkBlob xs :: [M ()]
Now, we have a list of monadic actions, so we can use sequence to run them in sequence.
sequence (map mkBlob xs) :: M [()]
The resulting [()] value is all but useless, so we can use void to get rid of it
void . sequence . map mkBlob $ xs :: M ()
Now, void . sequence is called sequence_ in Haskell (since this pattern is fairly common), and sequence . map is called mapM. Putting the two together, the function you want is called mapM_.
mapM_ mkBlob xs :: M ()

Efficiently search for a single element in a large Pandoc

Unless I'm missing something it seems that there are only two ways to "traverse" a Pandoc data-structure:
Manually pattern-matching on Block and Inline constructors
Via the Walkable type-class and related utility function
Using the Walkable type-class, is there an efficient way to search for the first matching element (preferably in a breadth-first manner), and stop the traversal as soon as its found? It seems to me that all functions around the Walkable type-class are going to traverse the entire data structure no matter what.
If not, I guess the only way is to pattern-match the Block and Inline constructors and build this on my own.
The other answer points out the useful query function. I'd add that there's a package of pandoc lenses. You asked about breadth-first traversal too, so here's both.
import Data.Semigroup (First (..))
dfsFirstLink :: Walkable Inline b => b -> Maybe Text
dfsFirstLink = fmap getFirst . query (preview $ _Link . _2 . _1 . to First)
bfsFirstLink :: Walkable Inline b => b -> Maybe Text
bfsFirstLink = fmap getFirst . getConst . traverseOf (levels query . folded) (Const . preview (_Link . _2 . _1 . to First))
-- Construct a walkable value where dfs != bfs
p :: Pandoc
p = Pandoc mempty [Plain [Note [Plain [Link mempty [] ("a","b")]]],Plain [Link mempty [] ("c","d")]]
>> dfsFirstLink p
Just "a"
>> bfsFirstLink p
Just "c"
Though unfortunately some ad-hoc experiments suggest it may not be as lazy as one might hope.
The Walkable typeclass contains a function called query with the following type signature:
query :: Monoid c => (a -> c) -> b -> c
In Data.Semigroup, there's a type called First, with a semigroup instance where the accumulating behavior is to return the "leftmost value".
This can be combined with the Monoid on Maybe, which turns any Semigroup into a Monoid with mempty of Nothing, to give the behavior you want.
For example, adapting a function from Inline -> Maybe String, to Pandoc -> Maybe String, can be done like so:
import Text.Pandoc
import Text.Pandoc.Walk (query)
import Data.Semigroup
findUrl :: Inline -> Maybe String
findUrl (Link _ _ target) = Just $ fst target
findUrl _ = Nothing
findFirstUrl :: Pandoc -> Maybe String
findFirstUrl = (fmap getFirst) . (query findUrl')
where
findUrl' :: Inline -> Maybe (First String)
findUrl' = (fmap First) . findUrl
With regards to your concern that this will traverse the entire data structure: Haskell is lazy; it shouldn't traverse any further than it needs to.
As pointed out in the comments, it's also possible to write this by specializing query to the List Monoid:
import Text.Pandoc
import Text.Pandoc.Walk (query)
import Data.Maybe (listToMaybe)
findUrl :: Inline -> [String]
findUrl (Link _ _ target) = [fst target]
findUrl _ = []
findFirstUrl :: Pandoc -> Maybe String
findFirstUrl = listToMaybe . (query findUrl)

Reading from file list of chars or list of ints

I have a question. There is any solution for reading from file list of tuples ? Depends on content ?
I know that if i need to read integers i do something like that:
toTuple :: [String] -> [(Int,Int)]
toTuple = map (\y -> read y ::(Int,Int))
But in file i can have tuples this kind (int,int) or (char, int). Is any way to do this nice ?
I was trying to do this at first in finding sign " ' " . If it was, then reading chars, but it doesn't work for some reason.
[Edit]
To function to tuple, i give strings with tuples, before that i splits lines by space sign.
INPUT EXAMPLE:
Case 1 : ["(1,2)", "(1,3)" ,"(3,4)" ,"(1,4)"]
Case 2 : ["('a',2)", "('b',3)", "('g',8)", "('h',2)", "('r',4)"]
Just try both and choose the successful:
import Text.Read
import Control.Applicative
choose :: Maybe a -> Maybe b -> Maybe (Either a b)
choose x y = fmap Left x <|> fmap Right y
readListMaybe :: Read a => [String] -> Maybe [a]
readListMaybe = mapM readMaybe
toTuple :: [String] -> Maybe (Either [(Int, Int)] [(Char, Int)])
toTuple ss = readListMaybe ss `choose` readListMaybe ss
main = do
-- Just (Left [(1,2),(1,3),(3,4),(1,4)])
print $ toTuple ["(1,2)", "(1,3)" ,"(3,4)" ,"(1,4)"]
-- Just (Right [('a',2),('b',3),('g',8),('h',2),('r',4)])
print $ toTuple ["('a',2)", "('b',3)", "('g',8)", "('h',2)", "('r',4)"]
Here is a far more efficient (and unsafe) version:
readListWithMaybe :: Read a => String -> [String] -> Maybe [a]
readListWithMaybe s ss = fmap (: map read ss) (readMaybe s)
toTuple :: [String] -> Either [(Int, Int)] [(Char, Int)]
toTuple [] = Left []
toTuple (s:ss) = fromJust $ readListWithMaybe s ss `choose` readListWithMaybe s ss
In the first definition of toTuple
toTuple :: [String] -> Maybe (Either [(Int, Int)] [(Char, Int)])
toTuple ss = readListMaybe ss `choose` readListMaybe ss
readListMaybe is too strict:
readListMaybe :: Read a => [String] -> Maybe [a]
readListMaybe = mapM readMaybe
mapM is defined in terms of sequence which is defined in terms of (>>=) which is strict for the Maybe monad. And also the reference to ss is keeped for too long. The second version doesn't have these problems.
As I said it may be a good idea to consider using a parsing library, if the task at hand gets a bit more complicated.
First of all you have the benefit of getting error messages and if you decide to switch to a self declared data Type it is still easily applicable (with slight modifications of course).
Also switching from ByteString to Text (which are both preferable to working with String anyways) is just a matter of (un)commenting 4 lines
Here is some example if you have not had the pleasure to work with it.
I'll explain it some time later today - for I have to leave now.
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Attoparsec.ByteString.Char8
import Data.ByteString.Char8 as X
-- import Data.Attoparsec.Text
-- import Data.Text as X
main :: IO ()
main = do print <$> toTuples $ X.unlines ["(1,2)","(1,3)","(3,4)","(1,4)"]
print <$> toTuples $ X.unlines ["('a',2)","('h',2)","('r',4)"]
print <$> toTuples $ X.unlines ["('a',2)","(1,3)","(1,4)"] --works
print <$> toTuples $ "('a',2)" -- yields Right [Right ('a',2)]!!
print <$> toTuples $ "(\"a\",2)" -- yields Right []!!
toTuples = parseOnly (myparser `sepBy` skipSpace :: Parser [Either (Int,Int) (Char,Int)])
where myparser :: Parser (Either (Int,Int) (Char,Int))
myparser = eitherP (tupleP decimal decimal)
(tupleP charP decimal)
charP = do char '\''
c <- notChar '\''
char '\''
return c
tupleP :: Parser a -> Parser b -> Parser (a, b)
tupleP a b = do char '('
a' <- a
skipSpace
char ','
skipSpace
b' <- b
char ')'
return (a',b')
Edit: Explanation
Parser is a monad, so it comes with do-notation which enables us to write the tupleP function in this very convenient form. Same goes for charP - we describe what to parse in the primitives given by the attoparsec library
and it reads something like
first expect a quote
then something that is not allowed to be a quote
and another quote
return the not quote thingy
if you can write down the parser informally you're most likely halfway through writing the haskell code, the only thing left to do is find the primitives in the library or write some auxilary function like tupleP.
A nice thing is that Parsers (being monads) compose nicely so we get our desired parser eitherP (tupleP ..) (tupleP ..).
The only magic that happens in the print <$>.. lines is that Either is a functor and every function using <$> or fmap uses the Right side of the Eithers.
Last thing to note is sepBy returns a list - so in the case where the parsing fails we still get an empty list as a result, if you want to see the failing use sepBy1 instead!

Haskell map does not iterate over the whole list

I'm trying to learn the basics of Haskell while developing a filter for Pandoc to recursively include additional markdown files.
Based on the scripting guide I was able to create a somewhat working filter. This looks for CodeBlocks with the include class and tries to include the ASTs of the referenced files.
```include
section-1.md
section-2.md
#pleasedontincludeme.md
```
The whole filter and the input sources could be found in the following repository: steindani/pandoc-include (or see below)
One could run pandoc with the filter and see the output in markdown format using the following command: pandoc -t json input.md | runhaskell IncludeFilter.hs | pandoc --from json --to markdown
I've noticed that the map function (at line 38) — although gets the list of files to include — only calls the function for the first element. And this is not the only strange behavior. The included file could also have an include block that is processed and the referenced file is included; but it won't go deeper, the include blocks of the last file are ignored.
Why does not the map function iterate over the whole list? Why does it stop after 2 levels of hierarchy?
Please note that I'm just starting to learn Haskell, I'm sure I made mistakes, but I'm happy to learn.
Thank you
Full source code:
module Text.Pandoc.Include where
import Control.Monad
import Data.List.Split
import Text.Pandoc.JSON
import Text.Pandoc
import Text.Pandoc.Error
stripPandoc :: Either PandocError Pandoc -> [Block]
stripPandoc p =
case p of
Left _ -> [Null]
Right (Pandoc _ blocks) -> blocks
ioReadMarkdown :: String -> IO(Either PandocError Pandoc)
ioReadMarkdown content = return (readMarkdown def content)
getContent :: String -> IO [Block]
getContent file = do
c <- readFile file
p <- ioReadMarkdown c
return (stripPandoc p)
doInclude :: Block -> IO [Block]
doInclude cb#(CodeBlock (_, classes, _) list) =
if "include" `elem` classes
then do
files <- return $ wordsBy (=='\n') list
contents <- return $ map getContent files
result <- return $ msum contents
result
else
return [cb]
doInclude x = return [x]
main :: IO ()
main = toJSONFilter doInclude
I can spot the following error in your doInclude function:
doInclude :: Block -> IO [Block]
doInclude cb#(CodeBlock (_, classes, _) list) =
if "include" `elem` classes
then do
let files = wordsBy (=='\n') list
let contents = map getContent files
let result = msum contents -- HERE
result
else
return [cb]
doInclude x = return [x]
Since the type of the result of this whole function is IO [Block], we can work backward:
result has type IO [Block]
contents has type [IO [Block]]
msum is being used with type [IO [Block]] -> IO [Block]
And that third part is the problem—somehow in your program, there is a non-standard MonadPlus instance being loaded for IO, and I bet that what it does on msum contents is this:
Execute the first action
If that succeeds, produce the same result as that and discard the rest of the list. (This is the cause of the behavior you observe.)
If it fails with an exception, try the rest of the list.
This isn't a standard MonadPlus instance so it's coming from one of the libraries that you're importing. I don't know which.
A general recommendation here would be:
Split your program into smaller functions
Write type signatures for those functions
Because the problem here seems to be that msum is being used with a different type than the one you expect. Normally this would produce a type error, but here you got unlucky and it interacted with a strange type class instance in some library.
From the comments, your intent with msum contents was to create an IO action that executes all of the subactions in sequence, and collects their result as a list. Well, the MonadPlus class isn't normally defined for IO, and when it is it does something else. So the correct function to use here is sequence:
-- Simplified version, the real one is more general:
sequence :: Monad m => [m a] -> m [a]
sequence [] = return []
sequence (ma:mas) = do
a <- ma
as <- mas
return (a:as)
That gets you from [IO [Block]] to IO [[Block]]. To eliminate the double nested lists then you just use fmap to apply concat inside IO.

Count `Pat`s in a Module

I need to count the number of Pat in a haskell Module. I know the simplest way is to pattern match on each level of the AST, which will result in a huge function that looks like the entire AST. I believe there's some way to take advantage of typeclasses like Functor or the State Monad to lean on some existing function that walks the tree (like prettyPrint) and trace a counter along, but I'm not sure how it works exactly.
It's very easy using uniplate:
import Data.Data
import Data.Generics.Uniplate.Data
import Control.Monad
import Language.Haskell.Exts
findPats :: Data a => a -> [Pat]
findPats = universeBi
test = do
content <- readFile "Simple.hs"
case parseModule content of
ParseFailed _ e -> error e
ParseOk a -> do
forM_ (findPats a) $ \p -> do
putStrLn $ "got a pat: " ++ show p
Essentially it's just the universeBi function.

Resources