I've written a simple XML parser in Haskell.
The function convertXML recieves contents of a XML file and returns a list of extracted values that are further processed.
One attribute of XML tag contains also an URL of a product image and I would like to extend the function to also download it if the tag is found.
convertXML :: (Text.XML.Light.Lexer.XmlSource s) => s -> [String]
convertXML xml = productToCSV products
where
productToCSV [] = []
productToCSV (x:xs) = (getFields x) ++ (productToCSV
(elChildren x)) ++ (productToCSV xs)
getFields elm = case (qName . elName) elm of
"product" -> [attrField "uid", attrField "code"]
"name" -> [trim $ strContent elm]
"annotation" -> [trim $ strContent elm]
"text" -> [trim $ strContent elm]
"category" -> [attrField "uid", attrField "name"]
"manufacturer" -> [attrField "uid",
attrField "name"]
"file" -> [getImgName]
_ -> []
where
attrField fldName = trim . fromJust $
findAttr (unqual fldName) elm
getImgName = if (map toUpper $ attrField "type") == "FULL"
then
-- here I need some IO code
-- to download an image
-- fetchFile :: String -> IO String
attrField "file"
else []
products = findElements (unqual "product") productsTree
productsTree = fromJust $ findElement (unqual "products") xmlTree
xmlTree = fromJust $ parseXMLDoc xml
Any idea how to insert an IO code in the getImgName function or do I have to completely rewrite convertXML function to an impure version ?
UPDATE II
Final version of convertXML function. Hybrid pure/impure but clean way suggested by Carl. Second parameter of returned pair is an IO action that runs images downloading and saving to disk and wraps list of local paths where are images stored.
convertXML :: (Text.XML.Light.Lexer.XmlSource s) => s -> ([String], IO [String])
convertXML xml = productToCSV products (return [])
where
productToCSV :: [Element] -> IO String -> ([String], IO [String])
productToCSV [] _ = ([], return [])
productToCSV (x:xs) (ys) = storeFields (getFields x)
( storeFields (productToCSV (elChildren x) (return []))
(productToCSV xs ys) )
getFields elm = case (qName . elName) elm of
"product" -> ([attrField "uid", attrField "code"], return [])
"name" -> ([trim $ strContent elm], return [])
"annotation" -> ([trim $ strContent elm], return [])
"text" -> ([trim $ strContent elm], return [])
"category" -> ([attrField "uid", attrField "name"], return [])
"manufacturer" -> ([attrField "uid",
attrField "name"], return [])
"file" -> getImg
_ -> ([], return [])
where
attrField fldName = trim . fromJust $
findAttr (unqual fldName) elm
getImg = if (map toUpper $ attrField "type") == "FULL"
then
( [attrField "file"], fetchFile url >>=
saveFile localPath >>
return [localPath] )
else ([], return [])
where
fName = attrField "file"
localPath = imagesDir ++ "/" ++ fName
url = attrField "folderUrl" ++ "/" ++ fName
storeFields (x1s, y1s) (x2s, y2s) = (x1s ++ x2s, liftM2 (++) y1s y2s)
products = findElements (unqual "product") productsTree
productsTree = fromJust $ findElement (unqual "products") xmlTree
xmlTree = fromJust $ parseXMLDoc xml
The better approach would be to have the function return the list of files to download as part of the result:
convertXML :: (Text.XML.Light.Lexer.XmlSource s) => s -> ([String], [URL])
and download them in a separate function.
The entire point of the type system in Haskell is that you can't do IO except with IO actions - values of type IO a. There are ways to violate this, but they run the risk of behaving entirely unlike what you'd expect, due to interactions with optimizations and lazy evaluation. So until you understand why IO works the way it does, don't try to make it work differently.
But a very important consequence of this design is that IO actions are first class. With a bit of cleverness, you could write your function as this:
convertXML :: (Text.XML.Light.Lexer.XmlSource s) => s -> ([String], IO [Image])
The second item in the pair would be an IO action that, when executed, would give a list of the images present. That would avoid the need to have image loading code outside of convertXML, and it would allow you to do IO only if you actually needed the images.
I basically see to approaches:
let the function give out a list of found images too and process them with an impure function afterwards. Laziness will do the rest.
Make the whole beast impure
I generally like the first approach more. d
Related
So I'm writing a program that checks for every line of a .txt file whether it is a palindrome or not,
import System.IO
main :: IO()
main = do {
content <- readFile "palindrom.txt";
print content;
print (lines content);
singleWord (head (lines content));
return ();
}
palindrom :: [Char] -> Bool
palindrom a = a == reverse a
singleWord :: [Char] -> IO()
singleWord a = do {
print (length a);
print (show (palindrom a));
}
But instead of singleWord (head (lines content)) I need to run the singleWord through the entire list.
The problem is that with map or normal list comprehension I always get a ton of varying errors all to do with lines content (which should be an array of Strings or IO Strings) apparently always being the type I don't want (I've tried messing around with type declarations on that forever, but it keeps being the wrong type, or the right one but in an extra array-layer or whatever).
My last attempt is to walk through the array with recursion, with this little extra code:
walkthrough [] = []
walkthrough x = do { singleWord head x; walkthrough (tail x) }
which I can't typecast correctly no matter what.
It's supposed to replace the singleWord (head (lines content)) in main, and if I try anything with typeclassing, like
walkthrough :: [[Char]] -> [[Char]]
walkthrough [] = ["Hi"]
walkthrough x = do { singleWord head x; walkthrough (tail x) }
I get
Couldn't match type `IO' with `[]'
Expected type: [()]
Actual type: IO ()
or some other stuff that won't fit together.
You're looking for a function called mapM_.
main :: IO ()
main = do {
content <- readFile "palindrom.txt";
mapM_ singleWord (lines content);
};
palindrome :: [Char] -> Bool
palindrome a = (a == reverse a)
singleWord :: [Char] -> IO()
singleWord a = do {
let {
adverb = (if palindrome a then " " else " not ");
};
putStrLn (a ++ " is" ++ adverb ++ "a palindrome.");
};
That should've been
walkthrough [] = return () -- this is the final action
walkthrough x = do { singleWord (head x) -- here you missed the parens
; walkthrough (tail x) }
or better yet,
walkthrough [] = return ()
walkthrough (x:xs) = do { singleWord x -- can't make that mistake now!
; walkthrough xs}
and call it as walkthrough (lines content) in your main do block.
As others have pointed out, walkthrough is the same as mapM_ singleWord.
You could also write it with a list comprehension,
walkthrough xs = sequence_ [ singleWord x | x <- xs]
sequence_ :: Monad m => [m a] -> m () turns a list of actions into a sequence of actions discarding their results and producing the () in the end: sequence_ = foldr (>>) (return ()). And sequence_ (map f xs) === mapM_ f xs, so it all ties up in the end.
Use mapM_ singleWord (lines content). For the sake of simplicity, think of mapM_ as.
mapM_ :: (a -> IO ()) -> [a] -> IO ()
I want to lazily read user input and do something with it line by line. But if user ends a line with , (comma) followed by any number of spaces (including zero), I want give him opportunity to finish his input on the next line.
And here is what I've got:
import System.IO
import Data.Char
chop :: String -> [String]
chop = f . map (++ "\n") . lines
where f [] = []
f [x] = [x]
f (x : y : xs) = if (p . tr) x
then f ((x ++ y) : xs)
else x : f (y : xs)
p x = (not . null) x && ((== ',') . last) x
tr xs | all isSpace xs = ""
tr (x : xs) = x :tr xs
main :: IO ()
main =
do putStrLn "Welcome to hell, version 0.1.3!"
putPrompt
mapM_ process . takeWhile (/= "quit\n") . chop =<< getContents
where process str = putStr str >> putPrompt
putPrompt = putStr ">>> " >> hFlush stdout
Sorry, it doesn't work at all. Bloody mess.
P.S. I want to preserve \n characters on end of every chunk. Currently I add them manually with map (++ "\n") after lines.
How about changing the type of chop a little:
readMultiLine :: IO [String]
readMultiLine = do
ln <- getLine
if (endswith (rstrip ln) ",") then
liftM (ln:) readMultiLine
else
return [ln]
Now you know that if the last list is not empty, then the user didn't finish typing (the last input ended with ',').
Of course, either import Data.String.Utils, or write your own. Could be as simple as:
endswith xs ys = (length xs >= length ys)
&& (and $ zipWith (==) (reverse xs) (reverse ys))
rstrip = reverse . dropWhile isSpace . reverse
But I missed the point at first. Here's the actual thing.
unfoldM :: (Monad m) => (a -> Maybe (m b, m a)) -> a -> m [b]
unfoldM f z = case f z of
Nothing -> return []
Just (x, y) -> liftM2 (:) x $ y >>= unfoldM f
main = unfoldM (\x -> if (x == ["quit"]) then Nothing
else Just (print x, readMultiLine)) =<< readMultiLine
The reason is, you need to be able to insert the "action" to be done on input between reading one multi-line input and the next. Here print x is the action inserted between two readMultiLine
Since you have questions about getContents, let me add. Even though getContents provides a lazy String, its effectful changes to the world are ordered with the subsequent effects of processing the list. But the processing of the list attempts to insert effects between effects of reading particular list items. To do that, you need a function that exposes the chain of effects, so you can insert your own effects between them.
You can do this using pipes, preserving the laziness of the user's input
import Data.Char (isSpace)
import Pipes
import qualified Pipes.Prelude as Pipes
endsWithComma :: String -> Bool
endsWithComma str =
case (dropWhile isSpace $ reverse str) of
',':_ -> True
_ -> False
finish :: Monad m => Pipe String String m ()
finish = do
str <- await
yield str
if endsWithComma str
then do
str' <- await
yield str'
else finish
user :: Producer String IO ()
user = Pipes.stdinLn >-> finish
You can then hook up the user Producer to any downstream Consumer. For example, to echo the stream back out you can write:
main = runEffect (user >-> Pipes.stdoutLn)
To learn more about pipes you can read the tutorial.
Sorry, I wrote something wrong in a comment and I thought that now that I understood what you were trying to do, I'd give an answer with a little more substance. The core idea is that you're going to need a state buffer while you loop through the string, as far as I can tell. You have f :: [String] -> [String] but you'll need an extra string of buffer before you can solve this puzzle.
So let me assume an answer which looks like:
chop = joinCommas "" . map (++ "\n") . lines
Then the structure of joinCommas is going to look like:
import Data.List (isSuffixOf)
-- override with however you want to handle the ",\n" between lines.
joinLines = (++)
incomplete = isSuffixOf ",\n"
joinCommas :: String -> [String] -> [String]
joinCommas prefix (line : rest)
| incomplete prefix = joinCommas (joinLines prefix line) rest
| otherwise = prefix : joinCommas line rest
joinCommas prefix []
| incomplete prefix = error "Incomplete input"
| otherwise = [prefix]
The prefix stores up lines until it doesn't end with ",\n" at which point it emits the prefix and continues with the rest of the lines. On EOF we process the last line unless that line is incomplete.
As title states, I am trying to find a given string within a given path. Here is what I come up so far:
getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents topdir = do
names <- getDirectoryContents topdir
let properNames = filter (`notElem` [".", ".."]) names
paths <- forM properNames $ \name -> do
let path = topdir </> name
isDirectory <- doesDirectoryExist path
if isDirectory
then getRecursiveContents path
else return [path]
return (concat paths)
findInFile:: String -> FilePath -> IO Bool
findInFile needle filePath= do
content <- readFile filePath
return (needle `L.isInfixOf` content)
findInFolder:: (String -> Bool) -> FilePath -> String -> IO [IO Bool]
findInFolder p path needle = do
files <- getRecursiveContents path
return (map (findInFile needle) (filter p files))
find = findInFolder (\p -> takeExtension p `elem` [".py", ".xml", ".html"])
I can :
*Main> findInFile "search_string" "./path/to/a/file"
True
Which is perfect but I cannot do the same search for a folder:
*Main> find "./path/to/a/folder" "search_string"
*Main>
In my file system ./path/to/a/file is located under ./path/to/a/folder. Thus I was expecting the same result.
What am I doing wrong?
Note: getRecursiveContents is from real world haskell.
It does indeed work. The only issue is with how things are printed. When you type some expressions into ghci, it will call print on that expression. If the value has type IO x, it will execute the IO action and print x only if it has a Show instance; otherwise it prints no additional information.
find "./path/to/a/folder" "search_string" produces a list of IO actions, which have no Show instance. You can get the result of find, which is again a list of IO actions, and then execute them:
> x <- find "./path/to/a/folder" "search_string"
> sequence x
> [True, False ...
Likely you wanted to do this originally in your function. Simply make the following changes:
findInFolder:: (String -> Bool) -> FilePath -> String -> IO [Bool]
findInFolder p path needle = do
files <- getRecursiveContents path
mapM (findInFile needle) (filter p files)
Now findInFolder will work as you expect.
How do I dynamically generate forms with a varying number of input fields?
The closest I managed is:
listEditForm :: [String] -> Html -> MForm App App (FormResult Text, Widget)
listEditForm xs = renderDivs $ mconcat [ areq textField (String.fromString x) Nothing | x <- xs]
but this has the result type Text and not [Text] as intended, owning to the coincidence that Text is an instance of Monoid, e.g. it fails with Int.
I have a working alternate attempt, which combines several forms, but somehow it only works for this toy example, while the real attempt fails strangely. Anyhow, I don't think this is the correct approach:
data MapPair = MapPair { mpKey :: T.Text, mpValue :: Maybe T.Text }
editForm mmp = renderTable $ MapPair
<$> areq textField "Key" (mpKey <$> mmp)
<*> aopt textField "Value" (mpValue <$> mmp)
pair2mp (v,k) = MapPair { mpKey = v, mpValue = Just k }
getEditR = do
sess <- getSession
let sesslist = Map.toList $ Map.map (decodeUtf8With lenientDecode) sess
forms <- forM sesslist (\a -> generateFormPost $ editForm $ Just $ pair2mp a)
defaultLayout [whamlet|
<h1>Edit Value Pairs
$forall (widget,enctype) <- forms
<form method=post action=#{EditR} enctype=#{enctype}>
^{widget}
<input type=submit>
|]
postEditR = do
sess <- getSession
let sesslist = Map.toList $ Map.map (decodeUtf8With lenientDecode) sess
forM_ sesslist (\a -> do
((res,_),_) <- runFormPost $ editForm $ Just $ pair2mp a
case res of
(FormSuccess (MapPair {mpKey=mk, mpValue=(Just mv)})) -> setSession mk mv
_ -> return ()
)
defaultLayout [whamlet|ok|]
Duh, it is actually easy using monadic forms (see code below).
My major headache is the extra text fields to make sure that the handler which receives the answer may also infer the corresponding question. Maybe I can hide those text fields, make them uneditable, or find another way around that (but I don't know much about Html yet).
listEditMForm :: [(String,Int)] -> Html -> MForm App App (FormResult [(FormResult Int, FormResult Text)], Widget)
listEditMForm xs extra = do
ifields <- forM xs (\(s,i) -> mreq intField (String.fromString s) (Just i))
tfields <- forM xs (\(s,i) -> mreq textField (String.fromString s) (Just $ pack s))
let (iresults,iviews) = unzip ifields
let (tresults,tviews) = unzip tfields
let results = zip iresults tresults
let views = zip iviews tviews
let widget = [whamlet|
#{extra}
<h1>Multi Field Form
$forall (iv,tv) <- views
Field #
#{fvLabel iv}: #
^{fvInput tv} #
^{fvInput iv}
<div>
|]
return ((FormSuccess results), widget)
There are also still some ugly things that I have no clue about, like always wrapping the result always in an outermost FormSuccess constructor, but I guess that really depends on each use-case (e.g. a single FormFailure or FormMissing should probably make the whole form fail/missing as well, but maybe in some case this is not wanted.)
All the zipping and unzipping can probably be done more neatly, but I guess in my case I just create a combined field textintField. I think I know how to do it, but it would be neat if there were a function to combine fields.
The tricky thing with having a dynamic number of fields is that the number of rows/fields need to be known when the form is parsed in the handler.
Let's say we have a regular form that looks like this:
type Form m a b =
(MonadHandler m, m ~ HandlerFor App) =>
Maybe a ->
Html ->
MForm m (FormResult b, Widget)
nameAndAgeForm :: Form m (Text, Int) (Text, Int)
nameAndAgeForm mPair extra = do
let nameSettings =
FieldSettings
{ fsLabel = "name"
, fsTooltip = Nothing
, fsId = Nothing
, fsName = Nothing
, fsAttrs = []
}
(nameResult, nameField) <- mreq textField nameSettings (fst <$> mPair)
let ageSettings =
FieldSettings
{ fsLabel = "age"
, fsTooltip = Nothing
, fsId = Nothing
, fsName = Nothing
, fsAttrs = []
}
(ageResult, ageField) <- mreq intField ageSettings (snd <$> mPair)
let result = (,) <$> nameResult <*> ageResult
let widget = [whamlet|age: ^{fvInput nameField}, age: ^{fvInput ageField}^{extra}|]
pure (result, widget)
NOTE it's important that fsName = Nothing in all of the fields or they will collide with themselves when we try to repeat the form in a list.
We can turn it into a form of lists of pairs with a function with that has the following signature Form m a b -> Form m [a] [b].
We can write such a function if we use a trick to get around the problem that the number of fields must be known when parsing. We can send the number of rows as the first field to be parsed.
listifyForm :: Form m a b -> Form m [a] [b]
listifyForm form items csrf = do
let countSettings =
FieldSettings
{ fsLabel = "rowCount"
, fsTooltip = Nothing
, fsId = Nothing
, fsName = Just "listifiedFormRowCount"
, fsAttrs = []
}
(rowCountResult, rowCountField) <- mreq hiddenField countSettings (length <$> items)
case (rowCountResult, items) of
(FormSuccess rowCount, _) -> constructForms rowCountField $ replicate rowCount Nothing
(FormMissing, Just items') -> constructForms rowCountField $ Just <$> items'
(FormFailure err, _) -> pure (FormFailure err, [whamlet|Something went wrong with the form. Do all the fields have unique ID's?|])
(FormMissing, _) -> pure (FormMissing, [whamlet|Something went wrong with the form|])
where
constructForms rowCountField mItems =
fmap ([whamlet|^{csrf}^{fvInput rowCountField}|] <>) . bimap sequenceA mconcat . unzip
<$> traverse (flip form mempty) mItems
Now we can convert the nameAndAgeForm into a nameAndAgeListForm:
nameAndAgeListForm :: Form m [(Text, Int)] [(Text, Int)]
nameAndAgeListForm = listifyForm nameAndAgeForm
That can then be called like this in the handler that displays the form:
((_, namesAndAgesWidget), _) <- runFormPost $ nameAndAgeListForm $ Just [("Alice", 12), ("Bob", 34)]
And like this in the handler that handles the input:
((result, _), _) <- runFormPost $ nameAndAgeListForm Nothing
I have the following code:
import System.Environment
import System.Directory
import System.IO
import Data.List
dispatch :: [(String, [String] -> IO ())]
dispatch = [ ("add", add)
, ("view", view)
, ("remove", remove)
, ("bump", bump)
]
main = do
(command:args) <- getArgs
let result = lookup command dispatch
if result == Nothing then
errorExit
else do
let (Just action) = result
action args
errorExit :: IO ()
errorExit = do
putStrLn "Incorrect command"
add :: [String] -> IO ()
add [fileName, todoItem] = appendFile fileName (todoItem ++ "\n")
view :: [String] -> IO ()
view [fileName] = do
contents <- readFile fileName
let todoTasks = lines contents
numberedTasks = zipWith (\n line -> show n ++ " - " ++ line) [0..] todoTasks
putStr $ unlines numberedTasks
remove :: [String] -> IO ()
remove [fileName, numberString] = do
handle <- openFile fileName ReadMode
(tempName, tempHandle) <- openTempFile "." "temp"
contents <- hGetContents handle
let number = read numberString
todoTasks = lines contents
newTodoItems = delete (todoTasks !! number) todoTasks
hPutStr tempHandle $ unlines newTodoItems
hClose handle
hClose tempHandle
removeFile fileName
renameFile tempName fileName
bump :: [String] -> IO ()
bump [fileName, numberString] = do
handle <- openFile fileName ReadMode
(tempName, tempHandle) <- openTempFile "." "temp"
contents <- hGetContents handle
let number = read numberString
todoTasks = lines contents
bumpedItem = todoTasks !! number
newTodoItems = [bumpedItem] ++ delete bumpedItem todoTasks
hPutStr tempHandle $ unlines newTodoItems
hClose handle
hClose tempHandle
removeFile fileName
renameFile tempName fileName
Trying to compile it gives me the following error:
$ ghc --make todo
[1 of 1] Compiling Main ( todo.hs, todo.o )
todo.hs:16:15:
No instance for (Eq ([[Char]] -> IO ()))
arising from a use of `=='
Possible fix:
add an instance declaration for (Eq ([[Char]] -> IO ()))
In the expression: result == Nothing
In a stmt of a 'do' block:
if result == Nothing then
errorExit
else
do { let (Just action) = ...;
action args }
In the expression:
do { (command : args) <- getArgs;
let result = lookup command dispatch;
if result == Nothing then
errorExit
else
do { let ...;
.... } }
I don't get why is that since lookup returns Maybe a, which I'm surely can compare to Nothing.
The type of the (==) operator is Eq a => a -> a -> Bool. What this means is that you can only compare objects for equality if they're of a type which is an instance of Eq. And functions aren't comparable for equality: how would you write (==) :: (a -> b) -> (a -> b) -> Bool? There's no way to do it.1 And while clearly Nothing == Nothing and Just x /= Nothing, it's the case that Just x == Just y if and only if x == y; thus, there's no way to write (==) for Maybe a unless you can write (==) for a.
There best solution here is to use pattern matching. In general, I don't find myself using that many if statements in my Haskell code. You can instead write:
main = do (command:args) <- getArgs
case lookup command dispatch of
Just action -> action args
Nothing -> errorExit
This is better code for a couple of reasons. First, it's shorter, which is always nice. Second, while you simply can't use (==) here, suppose that dispatch instead held lists. The case statement remains just as efficient (constant time), but comparing Just x and Just y becomes very expensive. Second, you don't have to rebind result with let (Just action) = result; this makes the code shorter and doesn't introduce a potential pattern-match failure (which is bad, although you do know it can't fail here).
1:: In fact, it's impossible to write (==) while preserving referential transparency. In Haskell, f = (\x -> x + x) :: Integer -> Integer and g = (* 2) :: Integer -> Integer ought to be considered equal because f x = g x for all x :: Integer; however, proving that two functions are equal in this way is in general undecidable (since it requires enumerating an infinite number of inputs). And you can't just say that \x -> x + x only equals syntactically identical functions, because then you could distinguish f and g even though they do the same thing.
The Maybe a type has an Eq instance only if a has one - that's why you get No instance for (Eq ([[Char]] -> IO ())) (a function can't be compared to another function).
Maybe the maybe function is what you're looking for. I can't test this at the moment, but it should be something like this:
maybe errorExit (\action -> action args) result
That is, if result is Nothing, return errorExit, but if result is Just action, apply the lambda function on action.