I'm trying to learn the basics of Haskell while developing a filter for Pandoc to recursively include additional markdown files.
Based on the scripting guide I was able to create a somewhat working filter. This looks for CodeBlocks with the include class and tries to include the ASTs of the referenced files.
```include
section-1.md
section-2.md
#pleasedontincludeme.md
```
The whole filter and the input sources could be found in the following repository: steindani/pandoc-include (or see below)
One could run pandoc with the filter and see the output in markdown format using the following command: pandoc -t json input.md | runhaskell IncludeFilter.hs | pandoc --from json --to markdown
I've noticed that the map function (at line 38) — although gets the list of files to include — only calls the function for the first element. And this is not the only strange behavior. The included file could also have an include block that is processed and the referenced file is included; but it won't go deeper, the include blocks of the last file are ignored.
Why does not the map function iterate over the whole list? Why does it stop after 2 levels of hierarchy?
Please note that I'm just starting to learn Haskell, I'm sure I made mistakes, but I'm happy to learn.
Thank you
Full source code:
module Text.Pandoc.Include where
import Control.Monad
import Data.List.Split
import Text.Pandoc.JSON
import Text.Pandoc
import Text.Pandoc.Error
stripPandoc :: Either PandocError Pandoc -> [Block]
stripPandoc p =
case p of
Left _ -> [Null]
Right (Pandoc _ blocks) -> blocks
ioReadMarkdown :: String -> IO(Either PandocError Pandoc)
ioReadMarkdown content = return (readMarkdown def content)
getContent :: String -> IO [Block]
getContent file = do
c <- readFile file
p <- ioReadMarkdown c
return (stripPandoc p)
doInclude :: Block -> IO [Block]
doInclude cb#(CodeBlock (_, classes, _) list) =
if "include" `elem` classes
then do
files <- return $ wordsBy (=='\n') list
contents <- return $ map getContent files
result <- return $ msum contents
result
else
return [cb]
doInclude x = return [x]
main :: IO ()
main = toJSONFilter doInclude
I can spot the following error in your doInclude function:
doInclude :: Block -> IO [Block]
doInclude cb#(CodeBlock (_, classes, _) list) =
if "include" `elem` classes
then do
let files = wordsBy (=='\n') list
let contents = map getContent files
let result = msum contents -- HERE
result
else
return [cb]
doInclude x = return [x]
Since the type of the result of this whole function is IO [Block], we can work backward:
result has type IO [Block]
contents has type [IO [Block]]
msum is being used with type [IO [Block]] -> IO [Block]
And that third part is the problem—somehow in your program, there is a non-standard MonadPlus instance being loaded for IO, and I bet that what it does on msum contents is this:
Execute the first action
If that succeeds, produce the same result as that and discard the rest of the list. (This is the cause of the behavior you observe.)
If it fails with an exception, try the rest of the list.
This isn't a standard MonadPlus instance so it's coming from one of the libraries that you're importing. I don't know which.
A general recommendation here would be:
Split your program into smaller functions
Write type signatures for those functions
Because the problem here seems to be that msum is being used with a different type than the one you expect. Normally this would produce a type error, but here you got unlucky and it interacted with a strange type class instance in some library.
From the comments, your intent with msum contents was to create an IO action that executes all of the subactions in sequence, and collects their result as a list. Well, the MonadPlus class isn't normally defined for IO, and when it is it does something else. So the correct function to use here is sequence:
-- Simplified version, the real one is more general:
sequence :: Monad m => [m a] -> m [a]
sequence [] = return []
sequence (ma:mas) = do
a <- ma
as <- mas
return (a:as)
That gets you from [IO [Block]] to IO [[Block]]. To eliminate the double nested lists then you just use fmap to apply concat inside IO.
Related
I would like to parse an infinite stream of bytes into an infinite stream of Haskell data. Each byte is read from the network, thus they are wrapped into IO monad.
More concretely I have an infinite stream of type [IO(ByteString)]. On the other hand I have a pure parsing function parse :: [ByteString] -> [Object] (where Object is a Haskell data type)
Is there a way to plug my infinite stream of monad into my parsing function ?
For instance, is it possible to write a function of type [IO(ByteString)] -> IO [ByteString] in order for me to use my function parse in a monad?
The Problem
Generally speaking, in order for IO actions to be properly ordered and behave predictably, each action needs to complete fully before the next action is run. In a do-block, this means that this works:
main = do
sequence (map putStrLn ["This","action","will","complete"])
putStrLn "before we get here"
but unfortunately this won't work, if that final IO action was important:
dontRunMe = do
putStrLn "This is a problem when an action is"
sequence (repeat (putStrLn "infinite"))
putStrLn "<not printed>"
So, even though sequence can be specialized to the right type signature:
sequence :: [IO a] -> IO [a]
it doesn't work as expected on an infinite list of IO actions. You'll have no problem defining such a sequence:
badSeq :: IO [Char]
badSeq = sequence (repeat (return '+'))
but any attempt to execute the IO action (e.g., by trying to print the head of the resulting list) will hang:
main = (head <$> badSeq) >>= print
It doesn't matter if you only need a part of the result. You won't get anything out of the IO monad until the entire sequence is done (so "never" if the list is infinite).
The "Lazy IO" Solution
If you want to get data from a partially completed IO action, you need to be explicit about it and make use of a scary-sounding Haskell escape hatch, unsafeInterleaveIO. This function takes an IO action and "defers" it so that it won't actually execute until the value is demanded.
The reason this is unsafe in general is that an IO action that makes sense now, might mean something different if actually executed at a later time point. As a simple example, an IO action that truncates/removes a file has a very different effect if it's executed before versus after updated file contents are written!
Anyway, what you'd want to do here is write a lazy version of sequence:
import System.IO.Unsafe (unsafeInterleaveIO)
lazySequence :: [IO a] -> IO [a]
lazySequence [] = return [] -- oops, not infinite after all
lazySequence (m:ms) = do
x <- m
xs <- unsafeInterleaveIO (lazySequence ms)
return (x:xs)
The key point here is that, when a lazySequence infstream action is executed, it will actually execute only the first action; the remaining actions will be wrapped up in a deferred IO action that won't truly execute until the second and subsequent elements of the returned list are demanded.
This works for fake IO actions:
> take 5 <$> lazySequence (repeat (return ('+'))
"+++++"
>
(where if you replaced lazySequence with sequence, it would hang). It also works for real IO actions:
> lns <- lazySequence (repeat getLine)
<waits for first line of input, then returns to prompt>
> print (head lns)
<prints whatever you entered>
> length (head (tail lns)) -- force next element
<waits for second line of input>
<then shows length of your second line before prompt>
>
Anyway, with this definition of lazySequence and types:
parse :: [ByteString] -> [Object]
input :: [IO ByteString]
you should have no trouble writing:
outputs :: IO [Object]
outputs = parse <$> lazySequence inputs
and then using it lazily however you want:
main = do
objs <- outputs
mapM_ doSomethingWithObj objs
Using Conduit
Even though the above lazy IO mechanism is pretty simple and straightforward, lazy IO has fallen out of favor for production code due to issues with resource management, fragility with respect to space leaks (where a small change to your code blows up the memory footprint), and problems with exception handling.
One solution is the conduit library. Another is pipes. Both are carefully designed streaming libraries that can support infinite streams.
For conduit, if you had a parse function that created one object per byte string, like:
parse1 :: ByteString -> Object
parse1 = ...
then given:
inputs :: [IO ByteString]
inputs = ...
useObject :: Object -> IO ()
useObject = ...
the conduit would look something like:
import Conduit
main :: IO ()
main = runConduit $ mapM_ yieldM inputs
.| mapC parse1
.| mapM_C useObject
Given that your parse function has signature:
parse :: [ByteString] -> [Object]
I'm pretty sure you can't integrate this with conduit directly (or at least not in any way that wouldn't toss out all the benefits of using conduit). You'd need to rewrite it to be conduit friendly in how it consumed byte strings and produced objects.
I am trying to make a function that takes a list of strings and executes the command putStrLn or print (I think they are basically equivalent, please correct me if I am wrong as I'm still new to Haskell) to every element and have it printed out on my terminal screen. I was experimenting with the map function and also with lambda/anonymous functions as I already know how to do this recursively but wanted to try a more complex non recursive version. map returned a list of the type IO() which was not what I was going for and my attempts at lambda functions did not go according to plan. The basic code was:
test :: [String] -> something
test x = map (\a->putStrLn a) x -- output for this function would have to be [IO()]
Not entirely sure what the output of the function was supposed to be either which also gave me issues.
I was thinking of making a temp :: String variable and have each String appended to temp and then putStrLn temp but was not sure how to do that entirely. I though using where would be viable but I still ran into issues. I know how to do this in languages like java and C but I am still quite new to Haskell. Any help would be appreciated.
There is a special version of map that works with monadic functions, it's called mapM:
test :: [String] -> IO [()]
test x = mapM putStrLn x
Note that this way the return type of test is a list of units - that's because each call to putStrLn returns a unit, so result of applying it to each element in a list would be a list of units. If you'd rather not deal with this silliness and have the return type be a plain unit, use the special version mapM_:
test :: [String] -> IO ()
test x = mapM_ putStrLn x
I was thinking of making a temp :: String variable and have each String appended to temp and then putStrLn temp
Good idea. A pattern of "render the message" then a separate "emit the message" is often nice to have long term.
test xs = let temp = unlines (map show xs)
in putStrLn temp
Or just
test xs = putStrLn (unlines (show <$> xs))
Or
test = putStrLn . unlines . map show
Not entirely sure what the output of the function was supposed to be either which also gave me issues.
Well you made a list of IO actions:
test :: [String] -> [IO ()]
test x = map (\a->putStrLn a) x
So with this list of IO actions when do you want to execute them? Now? Just once? The first one many times the rest never? In what order?
Presumably you want to execute them all now. Let's also eta reduce (\a -> putStrLn a) to just putStrLn since that means the same thing:
test :: [String] -> IO ()
test x = sequence_ (map (\a->putStrLn a) x)
Consider the two following variations:
myReadListTailRecursive :: IO [String]
myReadListTailRecursive = go []
where
go :: [String] -> IO [String]
go l = do {
inp <- getLine;
if (inp == "") then
return l;
else go (inp:l);
}
myReadListOrdinary :: IO [String]
myReadListOrdinary = do
inp <- getLine
if inp == "" then
return []
else
do
moreInps <- myReadListOrdinary
return (inp:moreInps)
In ordinary programming languages, one would know that the tail recursive variant is a better choice.
However, going through this answer, it is apparent that haskell's implementation of recursion is not similar to that of using the recursion stack repeatedly.
But because in this case the program in question involves actions, and a strict monad, I am not sure if the same reasoning applies. In fact, I think in the IO case, the tail recursive form is indeed better. I am not sure how to correctly reason about this.
EDIT: David Young pointed out that the outermost call here is to (>>=). Even in that case, does one of these styles have an advantage over the other?
FWIW, I'd go for existing monadic combinators and focus on readability/consiseness. Using unfoldM :: Monad m => m (Maybe a) -> m [a]:
import Control.Monad (liftM, mfilter)
import Control.Monad.Loops (unfoldM)
myReadListTailRecursive :: IO [String]
myReadListTailRecursive = unfoldM go
where
go :: IO (Maybe String)
go = do
line <- getLine
return $ case line of
"" -> Nothing
s -> Just s
Or using MonadPlus instance of Maybe, with mfilter :: MonadPlus m => (a -> Bool) -> m a -> m a:
myReadListTailRecursive :: IO [String]
myReadListTailRecursive = unfoldM (liftM (mfilter (/= "") . Just) getLine)
Another, more versatile option, might be to use LoopT.
That’s really not how I would write it, but it’s clear enough what you’re doing. (By the way, if you want to be able to efficiently insert arbitrary output from any function in the chain, without using monads, you might try a Data.ByteString.Builder.)
Your first implementation is very similar to a left fold, and your second very similar to a right fold or map. (You might try actually writing them as such!) The second one has several advantages for I/O. One of the most important, for handling input and output, is that it can be interactive.
You’ll notice that the first builds the entire list from the outside in: in order to determine what the first element of the list is, the program needs to compute the entire structure to get to the innermost thunk, which is return l. The program generates the entire data structure first, then starts to process it. That’s useful when you’re reducing a list, because tail-recursive functions and strict left folds are efficient.
With the second, the outermost thunk contains the head and tail of the list, so you can grab the tail, then call the thunk to generate the second list. This can work with infinite lists, and it can produce and return partial results.
Here’s a contrived example: a program that reads in one integer per line and prints the sums so far.
main :: IO ()
main = interact( display . compute 0 . parse . lines )
where parse :: [String] -> [Int]
parse [] = []
parse (x:xs) = (read x):(parse xs)
compute :: Int -> [Int] -> [Int]
compute _ [] = []
compute accum (x:xs) = let accum' = accum + x
in accum':(compute accum' xs)
display = unlines . map show
If you run this interactively, you’ll get something like:
$ 1
1
$ 2
3
$ 3
6
$ 4
10
But you could also write compute tail-recursively, with an accumulating parameter:
main :: IO ()
main = interact( display . compute [] . parse . lines )
where parse :: [String] -> [Int]
parse = map read
compute :: [Int] -> [Int] -> [Int]
compute xs [] = reverse xs
compute [] (y:ys) = compute [y] ys
compute (x:xs) (y:ys) = compute (x+y:x:xs) ys
display = unlines . map show
This is an artificial example, but strict left folds are a common pattern. If, however, you write either compute or parse with an accumulating parameter, this is what you get when you try to run interactively, and hit EOF (control-D on Unix, control-Z on Windows) after the number 4:
$ 1
$ 2
$ 3
$ 4
1
3
6
10
This left-folded version needs to compute the entire data structure before it can read any of it. That can’t ever work on an infinite list (When would you reach the base case? How would you even reverse an infinite list if you did?) and an application that can’t respond to user input until it quits is a deal-breaker.
On the other hand, the tail-recursive version can be strict in its accumulating parameter, and will run more efficiently, especially when it’s not being consumed immediately. It doesn’t need to keep any thunks or context around other than its parameters, and it can even re-use the same stack frame. A strict accumulating function, such as Data.List.foldl', is a great choice whenver you’re reducing a list to a value, not building an eagerly-evaluated list of output. Functions such as sum, product or any can’t return any useful intermediate value. They inherently have to finish the computation first, then return the final result.
I'm pretty new to Haskell, and am trying to simply read a file into a list of strings. I'd like one line of the file per element of the list. But I'm running into a type issue that I don't understand. Here's what I've written for my function:
readAllTheLines hdl = (hGetLine hdl):(readAllTheLines hdl)
That compiles fine. I had thought that the file handle needed to be the same one returned from openFile. I attempted to simply show the list from the above function by doing the following:
displayFile path = show (readAllTheLines (openFile path ReadMode))
But when I try to compile it, I get the following error:
filefun.hs:5:43:
Couldn't match expected type 'Handle' with actual type 'IO Handle'
In the return type of a call of 'openFile'
In the first argument of 'readAllTheLines', namely
'(openFile path ReadMode)'
In the first argument of 'show', namely
'(readAllTheLines (openFile path ReadMode))'
So it seems like openFile returns an IO Handle, but hGetLine needs a plain old Handle. Am I misunderstanding the use of these 2 functions? Are they not intended to be used together? Or is there just a piece I'm missing?
Use readFile and lines for a better alternative.
readLines :: FilePath -> IO [String]
readLines = fmap lines . readFile
Coming back to your solution openFile returns IO Handle so you have to run the action to get the Handle. You also have to check if the Handle is at eof before reading something from that. It is much simpler to just use the above solution.
import System.IO
readAllTheLines :: Handle -> IO [String]
readAllTheLines hndl = do
eof <- hIsEOF hndl
notEnded eof
where notEnded False = do
line <- hGetLine hndl
rest <- readAllTheLines hndl
return (line:rest)
notEnded True = return []
displayFile :: FilePath -> IO [String]
displayFile path = do
hndl <- openFile path ReadMode
readAllTheLines hndl
To add on to Satvik's answer, the example below shows how you can utilize a function to populate an instance of Haskell's STArray typeclass in case you need to perform computations on a truly random access data type.
Code Example
Let's say we have the following problem. We have lines in a text file "test.txt", and we need to load it into an array and then display the line found in the center of that file. This kind of computation is exactly the sort situation where one would want to use a random access array over a sequentially structured list. Granted, in this example, there may not be a huge difference between using a list and an array, but, generally speaking, list accesses will cost O(n) in time whereas array accesses will give you constant time performance.
First, let's create our sample text file:
test.txt
This
is
definitely
a
test.
Given the file above, we can use the following Haskell program (located in the same directory as test.txt) to print out the middle line of text, i.e. the word "definitely."
Main.hs
{-# LANGUAGE BlockArguments #-} -- See footnote 1
import Control.Monad.ST (runST, ST)
import Data.Array.MArray (newArray, readArray, writeArray)
import Data.Array.ST (STArray)
import Data.Foldable (for_)
import Data.Ix (Ix) -- See footnote 2
populateArray :: (Integral i, Ix i) => STArray s i e -> [e] -> ST s () -- See footnote 3
populateArray stArray es = for_ (zip [0..] es) (uncurry (writeArray stArray))
middleWord' :: (Integral i, Ix i) => i -> STArray s i String -> ST s String
middleWord' arrayLength = flip readArray (arrayLength `div` 2)
middleWord :: [String] -> String
middleWord ws = runST do
let len = length ws
array <- newArray (0, len - 1) "" :: ST s (STArray s Int String)
populateArray array ws
middleWord' len array
main :: IO ()
main = do
ws <- words <$> readFile "test.txt"
putStrLn $ middleWord ws
Explanation
Starting with the top of Main.hs, the ST s monad and its associated function runST allow us to extract pure values from imperative-style computations with in-place updates in a referentially transparent manner. The module Data.Array.MArray exports the MArray typeclass as an interface for instantiating mutable array data types and provides helper functions for creating, reading, and writing MArrays. These functions can be used in conjunction with STArrays since there is an instance of MArray defined for STArray.
The populateArray function is the crux of our example. It uses for_ to "applicatively" loop over a list of tuples of indices and list elements to fill the given STArray with those list elements, producing a value of type () in the ST s monad.
The middleWord' helper function uses readArray to produce a String (wrapped in the ST s monad) that corresponds to the middle element of a given STArray of Strings.
The middleWord function instantiates a new STArray, uses populateArray to fill the array with values from a provided list of strings, and calls middleWord' to obtain the middle string in the array. runST is applied to this whole ST s monadic computation to extract the pure String result.
We finally use our middleWord function in main to find the middle word in the text file "test.txt".
Further Reading
Haskell's STArray is not the only way to work with arrays in Haskell. There are in fact Arrays, IOArrays, DiffArrays and even "unboxed" versions of all of these array types that avoid using the indirection of pointers to simply store "raw" values. There is a page on the Haskell Wikibook on this topic that may be worth some study. Before that, however, looking at the Wikibook page on mutable objects may give you some insight as to why the ST s monad allows us to safely compute pure values from functions that use imperative/destructive operations.
Footnotes
1 The BlockArguments language extension is what allows us to pass a do block directly to a function without any parentheses or use of the function application operator $.
2 As suggested by the Hackage documentation, Ix is a typeclass mainly meant to be used to specify types for indexing arrays.
3 The use of the Integral and Ix type constraints may be a bit of overkill, but it's used to make our type signatures as general as possible.
first off sorry for doing the typical thing of 'where do I begin', but I'm totally lost.
I've been reading the 'Learn you a haskell for great good' site for what feels like an age now (pretty much half a semester. I'm just about to finish the 'Input and Output' chapter, and I still have no clue how to write a multi line program.
I've seen the do statement, and that you can only use it to concat IO actions into a single function, but I can't see how I'm gonna go about writing a realistic application.
Can someone point me in the right direction.
I'm from a C background, and basically I'm using haskell for one of my modules this semester at uni, I want to compare C++ against haskell (in many aspects). I'm looking to create a series of searching and sorting programs so that I can comment on how easy they are in the respective languages versus their speed.
However, I'm really starting to loose my faith in using Haskell as its been six weeks, and I still have no idea how to write a complete application, and the chapters in the site I'm reading seem to be getting longer and longer.
I basically need to create a basic object which will be stored in the structure (which I know how to do), more what I'm struggling with is, how do I create a program which reads data in from some text file, and populates the structure with that data in the first place, then goes on to process it. As haskell seems to split IO and other operations and it won't just let me write multiple lines in a program, I'm looking for something like this:
main = data <- getContent
let allLines = lines data
let myStructure = generateStruct allLines
sort/search/etc
print myStructure
how do I go about this? any good tutorials which will help me get going with realistic programs?
-A
You mentioned seeing do notation, now it's time to learn how to use do. Consider your example main is an IO, you should be using do syntax or binds:
main = do
dat <- getContent
let allLines = lines dat
myStructure = generateStruct allLines
sorted = mySort myStructure
searchResult = mySearch myStructure
print myStructure
print sorted
print searchResult
So now you have a main that gets stdin, turns it into [String] via lines, presumably parses it into a structure and runs sorting and searches on that structure. Notice the interesting code is all pure - mySort, mySearch, and generateStruct doesn't need to be IO (and can't be, being inside a let binding) so you are actually properly using pure and effectful code together.
I suggest you look at how bind works (>>=) and how do notation desugars into bind. This SO question should help.
See also Explaining Haskell IO without Monads by Neil Mitchell.
I'll try to start with a simplified example. Let's say this is what we want to do:
Open a file which contains a list of integers and return it.
Sort this list
Let's also reverse the list
Print the result on the screen
Let's also say that we have these functions that we can use:
getContent :: IO [Int]
sort :: [Int] -> [Int]
reverse :: [Int] -> [Int]
show :: a -> String
putStrLn :: String -> IO ()
Just so we are clear, I'll have a word about these functions:
getContent: I made up this function, but if there was such function that would be it's signature (you can use getContent = return [3,7,2,1] for testing purposes). I'm sure you've seen such signature before and at least vaguely understand that since it does IO its signature can not be just getContent :: [Int].
sort: It's a function defined in Data.List module, usage is simple: sort [3,1,2] returns [1,2,3]
reverse: Also defined in Data.List module: reverse [1,3,2] returns [2,3,1]
show: don't need to import anything, just use it: show 11 returns the string "11"; show [1,2,3] returns the string "[1,2,3]", etc.
putStrLn: takes a string, puts it on the screen and returns IO (), now again, since it does IO its signature can not be just putStrLn :: Stiring -> ().
OK, now we have all we need to create our program, the problem now is about connecting these functions together. Let's start with connecting functions:
getContent :: IO [Int] with sort :: [Int] -> [Int]
I think if you get this part, you'll easily get the rest as well. So, the problem is that since getContent returns IO [Int] and not just [Int], you can't just ignore or get rid of the IO part and shove it into sort. That is, this is what you can not do to connect these functions:
sort (getRidOfIO getContent)
Here is where the >>= :: m a -> (a -> m b) -> m b operation comes to the rescue. Now notice that m, a and b are type variables so if we substitute m for IO, a for [Int] and b for [Int], we get the signagure:
>>= :: IO [Int] -> ([Int] -> IO [Int]) -> IO [Int]
Have a look again at those getContent and sort functions and their signatures and try to think about how they'll fit into the >>=. I'm sure you'll notice that you can use getContent directly as the first argument to >>=. So far what >>= will do is take the [Int] out getContent and shoves it into the function provided as a second argument. But what will be the function in the second argument? We can't use the sort :: [Int] -> [Int] directly, the next best thing we can try is
\listOfInts -> sort listOfInts
but that still has signature [Int] -> [Int] so that did not help much. Here is where the other hero comes to the play, the
return :: a -> m a.
Again, a and m are type variables, lets substitute them and we will get
return :: [Int] -> IO [Int]
so adding \listOfInts -> sort listOfInts and return together we will get:
\listOfInts -> return $ sort listOfInts :: [Int] -> IO [Int]
Which is exactly what we want to put as a second argument to >>=. So lets finaly connect getContent and sort using our glue together:
getContent >>= (\listOfInts -> return $ sort listOfInts)
which is the same thing as (using the do notation):
do listOfInts <- getContent
return $ sort listOfInts
There, that is the end of the most terrifying part. And now comes possibly one of the aha moments, try to think about what is the result type of the connection we just made up. I'll spoil it for you,... the type of
getContent >>= (\listOfInts -> return $ sort listOfInts) is IO [Int] again.
Lets summarize: we took something of type IO [Int] and something of type [Int] -> [Int], glued those two things together and got again something of type IO [Int]!
Now go ahead and try exactly the same thing: Take the IO [Int] object we have just created and glue it together (using >>= and return) with reverse :: [Int] -> [Int].
I think I wrote way too much, but let me know if anything was not clear or if you need help with the rest.
Wha I've described so far can look something like this:
getContent :: IO [Int]
getContent = return [5,2,1,7]
main :: IO ()
main = do
listOfInts <- getContent
return $ sort listOfInts
return () -- This is only to sattisfy the signature of main
If it is a question of reading from stdin and writing a result to stdout, with no further intevening user input -- as your mention of getContents suggests -- then the ancient interact :: (String -> String) -> IO (), or the several other versions, e.g. Data.ByteString.interact :: (ByteString -> ByteString) -> IO () or Data.Text.interact :: (Text -> Text) -> IO() are all that are needed. interact is basically the 'make a little unix tool out of this function' function -- it maps pure functions of the right type to executable actions (i.e. values of the type IO().) All Haskell tutorials should mention it on the third or fourth page, with instructions on compilation.
So if you write
main = interact arthur
arthur :: String -> String
arthur = reverse
and compile with ghc --make -O2 Reverse.hs -o reverse then whatever you pipe to ./reverse will be understood as a list of characters and emerge reversed. Similarly, whatever you pipe to
main = interact (unlines . meredith . lines)
meredith :: [String] -> [String]
meredith = filter (not.null)
will emerge with the empty lines omitted. More interestingly,
main = interact ( unlines . map show . luther . map read . lines)
luther :: [Int] -> [Int]
luther = filter even
will take a stream of characters separated by newlines, read them as Ints, removing the odd ones, and yielding the suitably filtered stream.
main = interact ( unlines . map show . emma . map read . lines)
emma :: [Int] -> Int
emma = sum . map square
where square x = x * x
will print the sum of the squares of the newline-separated numerals.
In these last two cases, luther and emma the internal 'data structure' is [Int], which is pretty dull, and the function applied to it is idiot simple, of course. The main point is to let one of the forms of interact take care of all of the IO, and thus get images like 'populating a structure' and 'processing it' out of your head. To use interact you need to use composition to make the whole yield some sort of String -> String function. But even here, as in the runt first example arthur:: String -> String you are defining a genuine function in something more like the mathematical sense. Values in the types String and ByteString are just as pure as those in Bool or Int.
In more complicated cases of this basic interact type, your task is thus, first, to think how the desired pure values of the function you will be focussing on can be mapped to String values (here, it's just show for an Int or unlines . map show for a [Int]). interact knows what to "do" with the string. -- And then to figure out how to define a pure mapping from Strings or ByteString (which will contain your 'raw' data) to values in the type or types your principal function takes as arguments. Here I was just using map read . lines resulting in a [Int]. If you are working on some more complicated, say tree structure you'd need a function from [Int] to MyTree Int. A more elaborate function to put in this position would be a Parser, of course.
Then you can go to town, in this sort of case: there is really no reason to think of yourself as 'programming', 'populating' and 'processing' at all. This is where all the cool devices of LYAH kick in. Your duty is to define a mapping within the specific definitional discipline. In the last two cases, these are from [Int] to [Int] and from [Int] to Int, but here is a similar example derived from the excellent, still incomplete, tutorial on the super-excellent Vector package where the initial numerical structure one is dealing with is Vector Int
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as L
import qualified Data.Vector.Unboxed as U
import System.Environment
main = L.interact (L.pack . (++"\n") . show . roman . parse)
where
parse :: L.ByteString -> U.Vector Int
parse bytestr = U.unfoldr step bytestr
step !s = case L.readInt s of
Nothing -> Nothing
Just (!k, !t) -> Just (k, L.tail t)
-- now the IO and stringy nonsense is out of the way
-- so we can calculate properly:
roman :: U.Vector Int -> Int
roman = U.sum
Here again roman is moronic, any function from a Vector of Ints to an Int, however complex, can take its place. Writing a better roman will never be a question of "populating" "multi-line programming" "processing" etc., though of course we speak this way; it is just a question of defining a genuine function by composition of the functions in Data.Vector and elsewhere. The sky is the limit, check out that tutorial too.