Test if it exists a node HXT - haskell

Is there a way to test a node (Attribute Value) and use it with my if-condition ?
For example:
import Text.XML.HXT.Core
import System.Environment --para uso do getArgs
import Data.List.Split (splitOn)
data Class = Class { name ::StringĀ }
deriving (Show,Eq)
main = do
[src]<- getArgs
teams <- runX(readDocument [ withValidate no] src >>> getClass)
print teams
--Test
test = if (True) then getAttrValue "rdf:about" else getAttrValue "rdf:ID"
atTag tag = deep (isElem >>> hasName tag)
getClass = atTag "owl:Class" >>>
proc l -> do
className <- test -< l
returnA -< Class { name = splitOn "#" className !! 1}
On that example, i would like to test an attribute value and if it exists it return my then-condition otherwise the else-condition !
I saw the API of XMLArrow and it exists some function which will be able to do it (for example, isAttrib or hasAttrib) But it doesn't return a boolean ...
So ... I thought on other ways for solving it, but i think there must be a simpler solution to solve that...
Can someone gives me a hint please ?

You can use the functions of the module Control.Arrow.ArrowIf of the hxt package. Here you find the function ifA, a lifted version of the if-else-statement. For example the code
if (True) then getAttrValue "rdf:about" else getAttrValue "rdf:ID"
should be written as
ifA (constA True) (getAttrValue "rdf:about") (getAttrValue "rdf:ID")
Depending of what you want to archive, you should use derived functions of ifA like guards.

Related

Catching and halting on syntax exceptions during parsing with HXT

Suppose an XML based language where the node attribute animal is illegal. Consider:
{-# LANGUAGE Arrows, RankNTypes #-}
module Lib ( parseXml ) where
import Control.Arrow
import Text.XML.HXT.Core
parseXml = runLA (xread >>> isElem >>> myParser) content
where
content = unlines
[ "<pet animal='cat'>felix</pet>"
, "<pet>milo</pet>"
, "<pet animal='rat'>tom</pet>" ]
myParser :: (ArrowXml a) => a XmlTree [String]
myParser = proc xml -> do
x <- isElem -< xml
pets <- (getText <<< getChildren <<< neg (hasAttr "animal")) >. id -< x
returnA -< pets
The result of evaluating parseXml is:
parseXml
[[],["milo"],[]]
Which is not what is intended. The parser has silently ignored the 1st and 3rd XML nodes since they do not conform to the myParser, specifically neg (hasAttr "animal"), but I'd like the behaviour to not silently ignore them, but instead halt parsing at the 1st XML node.
How can I change this code such that HXT throw an error if it ever encounters our syntax violation that an XML node can never have an "animal" attribute? That is, when attempting to parse the 1st XML node it returns a Left value e.g.
parseXml
Left (ParseError "'animal' attribute is not permitted")
Try err and its neighbors with when.

Introspecting function names in a GHC module

Is there a way in the Glasgow Haskell Compiler to introspect the names of all functions in a module?
I am trying to create an automatic database migration system that, given the names of migration modules, introspects the names of the functions inside and calls them one at a time.
Something like
doMigrations("Migrations.M_2015")
doMigrations("Migrations.M_2016")
-- ...
where Migration.M_2015 contains
module Migration.M_2015
where
migration_2015_01_02 :: DbConnection -> Status
migration_2015_01_02 connection =
-- ...
Each doMigration will reflect the names of the migration functions in its module and only call those that have not been run before (names saved in a DB table). This will only be called at application start-up, so performance is not a big issue. The reflection can occur at either compile-time or run-time.
In order to do this, you need to use the GHC API -- which is included in the ghc package (which is hidden) -- and is poorly documented.
I attach here a simple program which will print out the list of top level items exported in a module. This should serve as a starting point. This is a little command line utility which takes two arguments -- a module name and the word "class", "data", "function". So, for example:
test Prelude function
will print a list of functions exported by the module (those that are not constructors or defined in a class).
In order to compile this (assuming it is in test.hs) you will need to do:
ghc -package ghc test
in order to make the GHC API packages available.
Here's the code:
import Data.List ( (\\) )
import Data.Maybe (fromJust, catMaybes)
import System.Environment (getArgs)
-- the GHC API stuff
import GHC
import GHC.Paths (libdir)
import ConLike ( ConLike(..) )
import Outputable (showPpr, showSDocUnqual)
import Var (tyVarName)
showU dfs = showSDocUnqual dfs . pprParenSymName
main = do
(mn : ty : _) <- getArgs
a <- runGhc (Just libdir) $ do
dflags <- getSessionDynFlags
setSessionDynFlags dflags
mm <- lookupModule (mkModuleName mn) Nothing
mi <- fmap fromJust $ getModuleInfo mm
res <- fmap catMaybes $ mapM lookupName (modInfoExports mi)
return $ case ty of
"class" -> [showU dflags c' | c#(ATyCon c') <- res, isClassTyCon c']
"data" -> [showU dflags c' | c#(ATyCon c') <- res, (not . isClassTyCon) c']
"function" -> let cf = map getName $ concat [(classMethods . fromJust . tyConClass_maybe) c' | c#(ATyCon c') <- res, isClassTyCon c']
df = map getName $ concat [ tyConDataCons c' | c#(ATyCon c') <- res, (not . isClassTyCon) c']
ff = [ getName c | c#(AnId{}) <- res] \\ cf
fd = [ getName c | c#(AConLike (RealDataCon{})) <- res] \\ df
in [showU dflags x | x <- ff ++ fd]
_ -> ["need to specify: class, data, or function"]
print a
The list of classes and defined data are pretty straightforward. The list of defined functions includes functions defined in classes and constructors. The above code, for functions, excludes these ( with \\ cf and \\ df ).
a is the generated list of function (or class or data) names.
The code which would invoke these functions would be the subject of a different question (and answer).
lookupModule is the function which loads the module for analysis.
The combination of getModuleInfo and modInfoExports get the list of "stuff" which includes the list of functions exported from the module.
The rest of the code is about getting those names in a usable form.

Logical OR in HXT without duplicating results

I'm having a little trouble with HXT: I am trying to locate all the nodes in a document that match some criteria, and I'm trying to combine the lenses/XPaths as predicates in an OR-like fashion, using Control.Arrow.<+>, as this guide suggests. However, when I try to "run" the arrow on my document, I am getting duplicate results. Is there an easy way to remove the duplicates, or to combine the tests in a more meaningful way? Here is my code:
run :: App -> IO ()
run a = do
inputContents <- readFile (input a)
let doc = readString [withParseHTML yes, withWarnings no] inputContents
links <- runX . xshow $ doc >>> indentDoc //> cssLinks
mapM_ putStrLn links
cssLinks = links >>> (rels <+> hrefs <+> types)
where
links = hasName "link"
rels = hasAttrValue "rel" (isInfixOf "stylesheet")
hrefs = hasAttrValue "href" (endswith ".css")
types = hasAttrValue "type" (== "text/css")
Yet every time I run this (on any web page), I get duplicated results / nodes. I noticed that <+> is part of the ArrowPlus typeclass, which mimics a monoid, and ArrowXML is an instance of both ArrowList and ArrowTree, which gives me a lot to work with. Would I have to construct ArrowIf predicates? Any help with this would be wonderful :)
You may get the arrow result as a [XmlTree], then apply List.nub, then get the string rep.
import "hxt" Text.XML.HXT.DOM.ShowXml as SX
...
links <- runX $ doc >>> indentDoc //> cssLinks
-- first remove duplicates (List.nub) then apply SX.xshow
putStrLn (SX.xshow . L.nub $ links)

split a word using HXT

I would like to know, how am i able to split a word with HXT ?
For example :
I have that ->
{-# LANGUAGE Arrows, NoMonomorphismRestriction #-}
import Text.XML.HXT.Core
import System.Environment --para uso do getArgs
data Class = Class { name ::StringĀ }
deriving (Show,Eq)
main = do
[src]<- getArgs
teams <- runX(readDocument [ withValidate no] src >>> getClass)
print teams
atTag tag = deep (isElem >>> hasName tag)
getClass = atTag "owl:Class" >>>
proc l -> do
className <- getAttrValue "rdf:about" -< l
returnA -< Class { name = className }
And i want to split the word ClassName !
Because the result of that programs (teams), gives me a set of hyperlink website (http:// ......) ! (Due to the XML file )
Can anyone give me some hints to solve it, please ?
Thank you !
You can use the function splitOn of the package split:
{-# LANGUAGE Arrows, NoMonomorphismRestriction #-}
import Text.XML.HXT.Core
import Data.List.Split (splitOn)
...
getClass = atTag "owl:Class" >>>
proc l -> do
className <- getAttrValue "rdf:about" -< l
returnA -< Class { name = splitOn "#" className !! 1 }
Example in ghci:
> import Data.List.Split
> let className = "http://www.xfront.com/owl/ontologies/camera/#Window"
> splitOn "#" className !! 1
Loading package split-0.2.2 ... linking ... done.
"Window"
The above code just works, if there is just one "#" in all of your URLs. If they are more complex, you shall have a look an the package Parsec.

Running Haskell HXT outside of IO?

All the examples I've seen so far using the Haskell XML toolkit, HXT, uses runX to execute the parser. runX runs inside the IO monad. Is there a way of using this XML parser outside of IO? Seems to be a pure operation to me, don't understand why I'm forced to be inside IO.
You can use HXT's xread along with runLA to parse an XML string outside of IO.
xread has the following type:
xread :: ArrowXml a => a String XmlTree
This means you can compose it with any arrow of type (ArrowXml a) => a XmlTree Whatever to get an a String Whatever.
runLA is like runX, but for things of type LA:
runLA :: LA a b -> a -> [b]
LA is an instance of ArrowXml.
To put this all together, the following version of my answer to your previous question uses HXT to parse a string containing well-formed XML without any IO involved:
{-# LANGUAGE Arrows #-}
module Main where
import qualified Data.Map as M
import Text.XML.HXT.Arrow
classes :: (ArrowXml a) => a XmlTree (M.Map String String)
classes = listA (divs >>> pairs) >>> arr M.fromList
where
divs = getChildren >>> hasName "div"
pairs = proc div -> do
cls <- getAttrValue "class" -< div
val <- deep getText -< div
returnA -< (cls, val)
getValues :: (ArrowXml a) => [String] -> a XmlTree (String, Maybe String)
getValues cs = classes >>> arr (zip cs . lookupValues cs) >>> unlistA
where lookupValues cs m = map (flip M.lookup m) cs
xml = "<div><div class='c1'>a</div><div class='c2'>b</div>\
\<div class='c3'>123</div><div class='c4'>234</div></div>"
values :: [(String, Maybe String)]
values = runLA (xread >>> getValues ["c1", "c2", "c3", "c4"]) xml
main = print values
classes and getValues are similar to the previous version, with a few minor changes to suit the expected input and output. The main difference is that here we use xread and runLA instead of readString and runX.
It would be nice to be able to read something like a lazy ByteString in a similar manner, but as far as I know this isn't currently possible with HXT.
A couple of other things: you can parse strings in this way without IO, but it's probably better to use runX whenever you can: it gives you more control over the configuration of the parser, error messages, etc.
Also: I tried to make the code in the example straightforward and easy to extend, but the combinators in Control.Arrow and Control.Arrow.ArrowList make it possible to work with arrows much more concisely if you like. The following is an equivalent definition of classes, for example:
classes = (getChildren >>> hasName "div" >>> pairs) >. M.fromList
where pairs = getAttrValue "class" &&& deep getText
Travis Brown's answer was very helpful. I just want to add my own solution here, which I think is a bit more general (using the same functions, just ignoring the problem-specific issues).
I was previously unpickling with:
upIO :: XmlPickler a => String -> IO [a]
upIO str = runX $ readString [] str >>> arrL (maybeToList . unpickleDoc xpickle)
which I was able to change to this:
upPure :: XmlPickler a => String -> [a]
upPure str = runLA (xreadDoc >>> arrL (maybeToList . unpickleDoc xpickle)) str
I completely agree with him that doing this gives you less control over the configuration of the parser etc, which is unfortunate.

Resources