Recursive list function in Haskell - haskell

I am currently working on this program in Haskell where I analyze a website and try to find all links (href) that belong to this website. I was already able to extract all the links of the main site but i am struggling with the recursion since i want to follow the links I already found and do the same process again.
This is what i have already:
parseHtml = fmap LB.unpack . simpleHttp
filterFunc x y = -- damn long line with a lot of filters
main :: IO()
main = do
let site = "https://stackoverflow.com/"
url <- parseHtml site
let links = filterFunc site url
mapM_ print $ take 5 $ links
And this is my output so far:
"https://stackoverflow.com/company/about"
"https://stackoverflow.com/company/work-here"
"https://stackoverflow.com/help"
"https://stackoverflow.com/jobs/directory/developer-jobs"
"https://stackoverflow.com/questions/44691577/stream-versus-iterators-in-set"
I just need a hint on how to further proceed and how to visit the already found links again. Should I work with fold?

Link finding is essentially a graph traversal problem, which can be tricky in Haskell because of functional purity: it's hard to explicitly mark nodes (links) as visited or not through the use of an external history table.
Your typical traversal algorithm might look something like this:
function traverse(current_node) {
if (current_node.is_visited) {
return some_data;
} else {
current_node.is_visisted = true; // Hard in Haskell!
accumulated_data = ...;
for (child in current_node.children()) {
accumulated_data += traverse(child); // Recursion happens here.
}
return accumulated_data;
}
}
Because there is not an easy, direct way to mark a node as visited or not, we can try other solutions. For instance, we might consider something of the sort:
traverse :: ([URL], Data) -> URL -> ([URL], Data)
traverse (history, datum) current = let ... in ([new_history], accumulated_data)
The idea here is as follows: we keep an explicit list of URLs that we have visited. This allows us to quickly return from the current node (URL) if it appears in our history list (perhaps a Set for optimization? :)). In this case, each subsequent call to a child node using traverse would get the new_history list, effectively keeping track of a list of visited and unvisisted URLs.
One possible way to implement this is using a fold function such as foldl:
foldl :: Foldable t => (b -> a -> b) -> b -> t a -> b
Here type t a might be [URL], that denotes the children of the current link, and our traverse function conveniently has the type signature (b -> a -> b), where type b = ([URL], Data) and type a = URL.
Can you take it from here and figure out how to combine traverse and foldl?

Simply move your link visiting logic in a separate function which takes a link as a parameter, and then recurse on the links, as you intuited.
Depending on what you want to ultimately do with the links, you can for instance simply fold the links with your function.
For example, slightly modifying your code:
parseHtml = fmap LB.unpack . simpleHttp
filterFunc x y = -- damn long line with a lot of filters
visitLink :: String -> IO ()
visitLink site = do
url <- parseHtml site
let links = filterFunc site url
mapM_ print $ take 5 $ links -- or whatever you want to do on your links
mapM_ visitLink links -- the recursive call
main :: IO()
main = visitLinks "https://stackoverflow.com/"
If, rather than printing the links as you go, you would rather for instance return them, tweak the return type of the visitLink function (for instance String -> IO [String] and change your last line in visitLink suitably (for instance fmap join $ mapM visitLinks links).
As mentionned in another answer, keep in mind that with such a simple code, you may visit the same link infinitely many times. Consider storing the links you visit in a suitable data structure (such as a set) that you would pass to visitLink.

Related

The "Haskell way" to extract/cumulate results inside an predefined vistor pattern iterator

I'm getting started with Haskell (from many years of C and c++) and have decided to attempt a small database project. I'm using a predefined binder library to a C database library (Database.kyotocabint). I'm struggling to get my head round how to do anything with the iterator interfaces due to the separation of effects when using a pre-defined method.
The toy demo to iterate over the data base and print it out (which works fine) is
test7 = do
db <- openTree "testdatabase/mydb.kct" defaultLoggingOptions (Writer [] [])
let visitor = \k v -> putStr (show k) >> putStr ":" >> putStrLn (show v) >>
return (Left NoOperation)
iterate db visitor False
close db
Where iterate and visitor are provided by the library bindings and the relevant types are
iterate :: forall db. WithDB db => db -> VisitorFull -> Writable -> IO ()
visitor :: ByteString -> ByteString -> IO (Either VisitorAction b)
But I can't see to how extract information out from inside the iterator rather than process each one individually - for example collect all the keys beginning with 'a' in a list or even just count the number of entries.
Am I limited because iterate just has the type IO () and so I can't build in side effects and would have to rebuild this replacing the library versions? The state monad on paper seems to adress this but the visitor type doesn't seem to allow me to maintain the state over subsequent visitor calls.
What would be the Haskell way to solve this ?
Matthew
Edit - many thanks for the clear answer below which siad both 0 its not the Haskell way but also provided a solution - this answer led me to Mutable objects which I found a clear explanation of the options.
The kyotocabinet library unfortunately does not seem to support your operation. Beyond iterate, it should expose some similar operation which returns something more complex than IO (), say IO a or IO [a] while requiring a more complex visitor function.
Still, since we work inside IO, there is a workaround: we can exploit IORefs and collect results. I want to stress, though, that this is not idiomatic code one would write in Haskell, but something one is forced to use because if the limitation of this library.
Anyway, the code would look something like this (untested):
test7 = do
db <- openTree "testdatabase/mydb.kct" defaultLoggingOptions (Writer [] [])
w <- newIORef [] -- create mutable var, initialize to []
let visitor = \k v -> do
putStrLn (show k ++ ":" ++ show v)
modifyIORef w ((k,v):) -- prepend (k,v) to the list w
return (Left NoOperation)
iterate db visitor False
result <- readIORef w -- get the whole list
print result
close db
Since you come from C++, you might want to compare the code above to the following pseudo-C++:
std::vector<std::pair<int,int>> w;
db.iterate([&](int k, int v) {
std::cout << k << ", " << v << "\n";
w.push_back({k,v});
});
// here we can read w, even if db.iterate returns void
Again, this is not something I would consider idiomatic Haskell.

Dynamic Programming with Vectors in Haskell

I'm trying to code a kind of a simple web crawler in haskell just for practice. To my own astonishment neither the web request itself nor parsing the web site was any complicated.
I coded the program purely functional with a recursive function, but only some fourty or fifty web requests later, the program eats up all the memory.
So I tried to do the task with dynamic programming, but here I'm totally stuck, which means, I have no idea where to begin. In this tiny program I got so many errors, that I'm not able to figure out, where to start.
This is my current concept:
scanPage :: String -> IO (String,String,[String])
scanPage url = ....
crawler :: String -> IO [(String, Int)]
crawler startUrl = runST $ do
toVisit <- newSTRef [startUrl] :: ST s (STRef s [String])
visited <- newSTRef [] :: ST s (STRef s [String])
result <- newSTRef [] :: ST s (STRef s [(String, Int)])
-- Iterate over urls to visit
while (liftM not $ liftM null $ readSTRef toVisit) $ do
url <- fmap (head) (readSTRef toVisit)
(moreUrls, value_a, value_b) <- scanPage url
-- Mark page as visited
vis <- readSTRef visited
writeSTRef visited (url : vis)
-- Add Results
res <- readSTRef result
writeSTRef result ((value_a, value_b) : res)
-- Extend urls to visit
nextUrls <- readSTRef toVisit
writeSTRef toVisit (nextUrls ++ (moreUrls \\ vis))
-- End of while
return =<< readSTRef result
main = do
putStrLn =<< fmap show (crawler "http://starturl.com")
I already wrote a lot of programs like this with arrays, which are much more convenient, as I can simply write or read from or to array elements. So I thought I could use mutable vectors for these lists, but they can't grow (at least in the same instance) or shrink. So I ended up with simple lists in STRef.
The first line I can't get to work is the line with the while command. I wrote my own while function like this
while :: (Monad m) => m Bool -> m a -> m ()
while cond action = do
c <- cond
when c $ do
action
while cond action
because I couldn't find any other while command. I googled many days for mutable vectors, but was not able to find a single tutorial or even example that I could use here. Please, can anyone tell me, how to write a syntactical correct crawler function? Yes, a pure functional solution would be nicer and more "haskellish", but I'm considering me still as a beginner and all this monad-stuff is still a bit strange for me. I'm willing to learn, but a hint or even an example would be really awesome.
EDIT:
Here comes some pseudocode of my messy code.
toVisitList = startURL
visitedList = []
resultList = []
while (length toVisitList /= 0) {
url = head toVisitList -- Get the 1st element
toVisitList -= url -- Remove this url from list
visitedList += url -- Append url to visitedList
(moreUrls, val_a, val_b) = scanPage url
resultList += (val_a, val_b) -- append the result
toVisitList += (moreUrls - visitedList)
}
return resultList
EDIT:
I still haven't any clue, how to put this pseudocode into real code, especially the while-statement. Any hints appreciacted.
The natural data structure for your toVisitList is a queue. There are a few implementations of queues around, but for this purpose, the simplest thing is to just use Data.Sequence.Seq. This lets you add things to the end with |> or <>, and to view the beginning with viewl. Consider something like
crawlOnce :: Seq Url -> [Url] -> IO (Either [Url] (Seq Url, [Url]))
crawlOnce toVisitList visitedList uses viewl to look at the front of the list of URLs to visit. If it's empty, it returns Left visitedList. Otherwise, it visits the first URL, appends it to the visited list, and adds the newly discovered URLS to the list to visit, then wraps them up in Right.
There are several reasonable variations. For instance, you could go for a type like ExceptT [Url] (StateT (Seq Url, [Url]) IO) a that "throws" its final result.

reading files with references to other files in haskell

I am trying to expand regular markdown with the ability to have references to other files, such that the content in the referenced files is rendered at the corresponding places in the "master" file.
But the furthest I've come is to implement
createF :: FTree -> IO String
createF Null = return ""
createF (Node f children) = ifNExists f (_id f)
(do childStrings <- mapM createF children
withFile (_path f) ReadMode $ \handle ->
do fc <- lines <$> hGetContents handle
return $ merge fc childStrings)
ifNExists is just a helper that can be ignored, the real problem happens in the reading of the handle, it just returns the empty string, I assume this is due to lazy IO.
I thought that the use of withFile filepath ReadMode $ \handle -> {-do stutff-}hGetContents handle would be the right solution as I've read fcontent <- withFile filepath ReadMode hGetContents is a bad idea.
Another thing that confuses me is that the function
createFT :: File -> IO FTree
createFT f = ifNExists f Null
(withFile (_path f) ReadMode $ \handle ->
do let thisParse = fparse (_id f :_parents f)
children <-rights . map ( thisParse . trim) . lines <$> hGetContents handle
c <- mapM createFT children
return $ Node f c)
works like a charm.
So why does createF return just an empty string?
the whole project and a directory/file to test can be found at github
Here are the datatype definitions
type ID = String
data File = File {_id :: ID, _path :: FilePath, _parents :: [ID]}
deriving (Show)
data FTree = Null
| Node { _file :: File
, _children :: [FTree]} deriving (Show)
As you suspected, lazy IO is probably the problem. Here's the (awful) rule you have to follow to use it properly without going totally nuts:
A withFile computation must not complete until all (lazy) I/O required to fully evaluate its result has been performed.
If something forces I/O after the handle is closed, you are not guaranteed to get an error, even though that would be very nice. Instead, you get completely undefined behavior.
You break this rule with return $ merge fc childStrings, because this value is returned before it's been fully evaluated. What you can do instead is something vaguely like
let retVal = merge fc childStrings
deepSeq retVal $ return retVal
An arguably cleaner alternative is to put all the rest of the code that relies on those results into the withFile argument. The only real reason not to do that is if you do a bunch of other work with the results after you're finished with that file. For example, if you're processing a bunch of different files and accumulating their results, then you want to be sure to close each of them when you're done with it. If you're just reading in one file and then acting on it, you can leave it open till you're finished.
By the way, I just submitted a feature request to the GHC team to see if they might be willing to make these kinds of programs more likely to fail early with useful error messages.
Update
The feature request was accepted, and such programs are now much more likely to produce useful error messages. See What caused this "delayed read on closed handle" error? for details.
I'd strongly suggest you to avoid lazy IO as it always creates problems like this, as described in What's so bad about Lazy I/O? As in your case, where you need to keep the file open until it's fully read, but this would mean closing the file somewhere in pure code, when the content is actually consumed.
One possibility would be to use strict ByteStrings and read files using readFile. This would also make many operations more efficient.
Another option would be to use one of the libraries that address the lazy IO problem (see What are the pros and cons of Enumerators vs. Conduits vs. Pipes?). These libraries allow you to separate content production from its processing or consumption. So you could have a producer that reads input files and produces a stream of some tokens, and a pure consumer (not depending on IO) that consumes the stream and produces some result. For example, in conduit-extra there is a module that converts an atto-parsec parser into a consumer.
See also Is there a better way to walk a directory tree?

Understanding when to use let and <-

I have been playing around with Haskell for a bit now but I have not fully grasped how to use third party functions that run inside a Monad. Every time I go back to reading articles about Monads, etc. I get a good understanding but when it comes to applying them to real-world code, I cannot figure why a piece of code does not work. I resort to trial and error and usually get it to compile but I feel I should be able to use them properly the first time without trying to go through my heuristic of changes (try let, <-, liftM, etc.)
So I would like to ask a few questions based on this simple function, which admittedly does a lot of interesting things.
import Text.XML.HXT.Core
import Text.HandsomeSoup
import Data.String.Utils
function h = do
let url = myUrlBuilder h
doc = fromUrl url
res = runX $ doc >>> css "strong" /> getText
--nres = liftM rmSpaceAndBang (res)
res
rmSpaceAndBang ps = map (\x-> replace "!" "" (strip x)) ps
The above code compiles. I have purposefully left out the type declarations as what I thought it should be doesn't compile. So here are my questions.
Why can I not do res <- runX ... and return res that way?
Why should res be inside a let statement and not be bound the result of action? As I understand it, do x <- a1; a2 is equivalent to a1 >>= \x -> a2. How is that different when you let x = a1?
When I used <- I got the following error and if not for my trial and error approach I would not have been able to figure out that I need to use let here.
Couldn't match type `[]' with `IO'
Expected type: IO String
Actual type: [String]
While I focused on res above, my lack of understanding applies to other let statements in the function as well.
How do I find the return type of res?
I couldn't figure out a way to search hackage for getText (hxt seems too big to look through module by module. Probably will try Google site search next time). In the end, I ended up typing up some parts of the code in GHCi and did :t res. It told me it is [String]. Is there a better way to do this?
Since res is of type [String] I thought I will put [String] as the return type for my function. But GHC says it should be IO [String] (compiles). Why did :t give me the wrong information first?
When functions return IO String, what's the best way to use pure functions on them?
Now that I am stuck inside IO [String] I need to use to lift everywhere I do string operations. Is there a better way to do this?
Hopefully I will learn enough from this that I will be able to use right syntax without resorting to blindly trying a few combinations.
Update:
The key piece I was missing was the fact res is not a value but rather an action. So I have 2 choices: one is is my above code with let res = but call it at the end and the other is to do res <- but then do return (res).
The advantage of using res <- is that I can get rid of the liftM as res is now [String] (see #duplode's answer below).
Thanks!
In your code, res is an IO [String]. I do not doubt that you got [String] through GHCi at first, but I believe you tested it with
>>> res <- runX $ doc >>> css "strong" /> getText
>>> :t res
res :: [String]
Which is not equivalent to your code. The difference is that let just binds your IO [String] action without running it, while <- in a do block runs the action and binds the result, in this case a [String].
Now that I am stuck inside IO [String] I need to use to lift
everywhere I do string operations. Is there a better way to do this?
Within a do block, sometimes it is more convenient to write:
res <- runX $ doc >>> css "strong" /> getText
return $ rmSpaceAndBang res
Which is strictly equivalent to using liftM (or fmap):
liftM rmSpaceAndBang $ doc >>> css "strong" /> getText
For a fast answer, let doesn't run anything, it's just makes the lhs as a synonym for rhs.
You actually need a monadic function inside the do for computation be executed.
main = do
let func = print "I need to be called"
print "I don't need to be called"
func
outputs:
"I don't need to be called"
"I need to be called"
So res in your code is not a value, it's a monadic action/function.
Remember that <- is tied to >>=, and requires a a -> m b on the rhs.
let has no requirements.

How to read data from IO into data-structure and then process the data-structure?

first off sorry for doing the typical thing of 'where do I begin', but I'm totally lost.
I've been reading the 'Learn you a haskell for great good' site for what feels like an age now (pretty much half a semester. I'm just about to finish the 'Input and Output' chapter, and I still have no clue how to write a multi line program.
I've seen the do statement, and that you can only use it to concat IO actions into a single function, but I can't see how I'm gonna go about writing a realistic application.
Can someone point me in the right direction.
I'm from a C background, and basically I'm using haskell for one of my modules this semester at uni, I want to compare C++ against haskell (in many aspects). I'm looking to create a series of searching and sorting programs so that I can comment on how easy they are in the respective languages versus their speed.
However, I'm really starting to loose my faith in using Haskell as its been six weeks, and I still have no idea how to write a complete application, and the chapters in the site I'm reading seem to be getting longer and longer.
I basically need to create a basic object which will be stored in the structure (which I know how to do), more what I'm struggling with is, how do I create a program which reads data in from some text file, and populates the structure with that data in the first place, then goes on to process it. As haskell seems to split IO and other operations and it won't just let me write multiple lines in a program, I'm looking for something like this:
main = data <- getContent
let allLines = lines data
let myStructure = generateStruct allLines
sort/search/etc
print myStructure
how do I go about this? any good tutorials which will help me get going with realistic programs?
-A
You mentioned seeing do notation, now it's time to learn how to use do. Consider your example main is an IO, you should be using do syntax or binds:
main = do
dat <- getContent
let allLines = lines dat
myStructure = generateStruct allLines
sorted = mySort myStructure
searchResult = mySearch myStructure
print myStructure
print sorted
print searchResult
So now you have a main that gets stdin, turns it into [String] via lines, presumably parses it into a structure and runs sorting and searches on that structure. Notice the interesting code is all pure - mySort, mySearch, and generateStruct doesn't need to be IO (and can't be, being inside a let binding) so you are actually properly using pure and effectful code together.
I suggest you look at how bind works (>>=) and how do notation desugars into bind. This SO question should help.
See also Explaining Haskell IO without Monads by Neil Mitchell.
I'll try to start with a simplified example. Let's say this is what we want to do:
Open a file which contains a list of integers and return it.
Sort this list
Let's also reverse the list
Print the result on the screen
Let's also say that we have these functions that we can use:
getContent :: IO [Int]
sort :: [Int] -> [Int]
reverse :: [Int] -> [Int]
show :: a -> String
putStrLn :: String -> IO ()
Just so we are clear, I'll have a word about these functions:
getContent: I made up this function, but if there was such function that would be it's signature (you can use getContent = return [3,7,2,1] for testing purposes). I'm sure you've seen such signature before and at least vaguely understand that since it does IO its signature can not be just getContent :: [Int].
sort: It's a function defined in Data.List module, usage is simple: sort [3,1,2] returns [1,2,3]
reverse: Also defined in Data.List module: reverse [1,3,2] returns [2,3,1]
show: don't need to import anything, just use it: show 11 returns the string "11"; show [1,2,3] returns the string "[1,2,3]", etc.
putStrLn: takes a string, puts it on the screen and returns IO (), now again, since it does IO its signature can not be just putStrLn :: Stiring -> ().
OK, now we have all we need to create our program, the problem now is about connecting these functions together. Let's start with connecting functions:
getContent :: IO [Int] with sort :: [Int] -> [Int]
I think if you get this part, you'll easily get the rest as well. So, the problem is that since getContent returns IO [Int] and not just [Int], you can't just ignore or get rid of the IO part and shove it into sort. That is, this is what you can not do to connect these functions:
sort (getRidOfIO getContent)
Here is where the >>= :: m a -> (a -> m b) -> m b operation comes to the rescue. Now notice that m, a and b are type variables so if we substitute m for IO, a for [Int] and b for [Int], we get the signagure:
>>= :: IO [Int] -> ([Int] -> IO [Int]) -> IO [Int]
Have a look again at those getContent and sort functions and their signatures and try to think about how they'll fit into the >>=. I'm sure you'll notice that you can use getContent directly as the first argument to >>=. So far what >>= will do is take the [Int] out getContent and shoves it into the function provided as a second argument. But what will be the function in the second argument? We can't use the sort :: [Int] -> [Int] directly, the next best thing we can try is
\listOfInts -> sort listOfInts
but that still has signature [Int] -> [Int] so that did not help much. Here is where the other hero comes to the play, the
return :: a -> m a.
Again, a and m are type variables, lets substitute them and we will get
return :: [Int] -> IO [Int]
so adding \listOfInts -> sort listOfInts and return together we will get:
\listOfInts -> return $ sort listOfInts :: [Int] -> IO [Int]
Which is exactly what we want to put as a second argument to >>=. So lets finaly connect getContent and sort using our glue together:
getContent >>= (\listOfInts -> return $ sort listOfInts)
which is the same thing as (using the do notation):
do listOfInts <- getContent
return $ sort listOfInts
There, that is the end of the most terrifying part. And now comes possibly one of the aha moments, try to think about what is the result type of the connection we just made up. I'll spoil it for you,... the type of
getContent >>= (\listOfInts -> return $ sort listOfInts) is IO [Int] again.
Lets summarize: we took something of type IO [Int] and something of type [Int] -> [Int], glued those two things together and got again something of type IO [Int]!
Now go ahead and try exactly the same thing: Take the IO [Int] object we have just created and glue it together (using >>= and return) with reverse :: [Int] -> [Int].
I think I wrote way too much, but let me know if anything was not clear or if you need help with the rest.
Wha I've described so far can look something like this:
getContent :: IO [Int]
getContent = return [5,2,1,7]
main :: IO ()
main = do
listOfInts <- getContent
return $ sort listOfInts
return () -- This is only to sattisfy the signature of main
If it is a question of reading from stdin and writing a result to stdout, with no further intevening user input -- as your mention of getContents suggests -- then the ancient interact :: (String -> String) -> IO (), or the several other versions, e.g. Data.ByteString.interact :: (ByteString -> ByteString) -> IO () or Data.Text.interact :: (Text -> Text) -> IO() are all that are needed. interact is basically the 'make a little unix tool out of this function' function -- it maps pure functions of the right type to executable actions (i.e. values of the type IO().) All Haskell tutorials should mention it on the third or fourth page, with instructions on compilation.
So if you write
main = interact arthur
arthur :: String -> String
arthur = reverse
and compile with ghc --make -O2 Reverse.hs -o reverse then whatever you pipe to ./reverse will be understood as a list of characters and emerge reversed. Similarly, whatever you pipe to
main = interact (unlines . meredith . lines)
meredith :: [String] -> [String]
meredith = filter (not.null)
will emerge with the empty lines omitted. More interestingly,
main = interact ( unlines . map show . luther . map read . lines)
luther :: [Int] -> [Int]
luther = filter even
will take a stream of characters separated by newlines, read them as Ints, removing the odd ones, and yielding the suitably filtered stream.
main = interact ( unlines . map show . emma . map read . lines)
emma :: [Int] -> Int
emma = sum . map square
where square x = x * x
will print the sum of the squares of the newline-separated numerals.
In these last two cases, luther and emma the internal 'data structure' is [Int], which is pretty dull, and the function applied to it is idiot simple, of course. The main point is to let one of the forms of interact take care of all of the IO, and thus get images like 'populating a structure' and 'processing it' out of your head. To use interact you need to use composition to make the whole yield some sort of String -> String function. But even here, as in the runt first example arthur:: String -> String you are defining a genuine function in something more like the mathematical sense. Values in the types String and ByteString are just as pure as those in Bool or Int.
In more complicated cases of this basic interact type, your task is thus, first, to think how the desired pure values of the function you will be focussing on can be mapped to String values (here, it's just show for an Int or unlines . map show for a [Int]). interact knows what to "do" with the string. -- And then to figure out how to define a pure mapping from Strings or ByteString (which will contain your 'raw' data) to values in the type or types your principal function takes as arguments. Here I was just using map read . lines resulting in a [Int]. If you are working on some more complicated, say tree structure you'd need a function from [Int] to MyTree Int. A more elaborate function to put in this position would be a Parser, of course.
Then you can go to town, in this sort of case: there is really no reason to think of yourself as 'programming', 'populating' and 'processing' at all. This is where all the cool devices of LYAH kick in. Your duty is to define a mapping within the specific definitional discipline. In the last two cases, these are from [Int] to [Int] and from [Int] to Int, but here is a similar example derived from the excellent, still incomplete, tutorial on the super-excellent Vector package where the initial numerical structure one is dealing with is Vector Int
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as L
import qualified Data.Vector.Unboxed as U
import System.Environment
main = L.interact (L.pack . (++"\n") . show . roman . parse)
where
parse :: L.ByteString -> U.Vector Int
parse bytestr = U.unfoldr step bytestr
step !s = case L.readInt s of
Nothing -> Nothing
Just (!k, !t) -> Just (k, L.tail t)
-- now the IO and stringy nonsense is out of the way
-- so we can calculate properly:
roman :: U.Vector Int -> Int
roman = U.sum
Here again roman is moronic, any function from a Vector of Ints to an Int, however complex, can take its place. Writing a better roman will never be a question of "populating" "multi-line programming" "processing" etc., though of course we speak this way; it is just a question of defining a genuine function by composition of the functions in Data.Vector and elsewhere. The sky is the limit, check out that tutorial too.

Resources