I'm pretty new to Haskell, and am trying to simply read a file into a list of strings. I'd like one line of the file per element of the list. But I'm running into a type issue that I don't understand. Here's what I've written for my function:
readAllTheLines hdl = (hGetLine hdl):(readAllTheLines hdl)
That compiles fine. I had thought that the file handle needed to be the same one returned from openFile. I attempted to simply show the list from the above function by doing the following:
displayFile path = show (readAllTheLines (openFile path ReadMode))
But when I try to compile it, I get the following error:
filefun.hs:5:43:
Couldn't match expected type 'Handle' with actual type 'IO Handle'
In the return type of a call of 'openFile'
In the first argument of 'readAllTheLines', namely
'(openFile path ReadMode)'
In the first argument of 'show', namely
'(readAllTheLines (openFile path ReadMode))'
So it seems like openFile returns an IO Handle, but hGetLine needs a plain old Handle. Am I misunderstanding the use of these 2 functions? Are they not intended to be used together? Or is there just a piece I'm missing?
Use readFile and lines for a better alternative.
readLines :: FilePath -> IO [String]
readLines = fmap lines . readFile
Coming back to your solution openFile returns IO Handle so you have to run the action to get the Handle. You also have to check if the Handle is at eof before reading something from that. It is much simpler to just use the above solution.
import System.IO
readAllTheLines :: Handle -> IO [String]
readAllTheLines hndl = do
eof <- hIsEOF hndl
notEnded eof
where notEnded False = do
line <- hGetLine hndl
rest <- readAllTheLines hndl
return (line:rest)
notEnded True = return []
displayFile :: FilePath -> IO [String]
displayFile path = do
hndl <- openFile path ReadMode
readAllTheLines hndl
To add on to Satvik's answer, the example below shows how you can utilize a function to populate an instance of Haskell's STArray typeclass in case you need to perform computations on a truly random access data type.
Code Example
Let's say we have the following problem. We have lines in a text file "test.txt", and we need to load it into an array and then display the line found in the center of that file. This kind of computation is exactly the sort situation where one would want to use a random access array over a sequentially structured list. Granted, in this example, there may not be a huge difference between using a list and an array, but, generally speaking, list accesses will cost O(n) in time whereas array accesses will give you constant time performance.
First, let's create our sample text file:
test.txt
This
is
definitely
a
test.
Given the file above, we can use the following Haskell program (located in the same directory as test.txt) to print out the middle line of text, i.e. the word "definitely."
Main.hs
{-# LANGUAGE BlockArguments #-} -- See footnote 1
import Control.Monad.ST (runST, ST)
import Data.Array.MArray (newArray, readArray, writeArray)
import Data.Array.ST (STArray)
import Data.Foldable (for_)
import Data.Ix (Ix) -- See footnote 2
populateArray :: (Integral i, Ix i) => STArray s i e -> [e] -> ST s () -- See footnote 3
populateArray stArray es = for_ (zip [0..] es) (uncurry (writeArray stArray))
middleWord' :: (Integral i, Ix i) => i -> STArray s i String -> ST s String
middleWord' arrayLength = flip readArray (arrayLength `div` 2)
middleWord :: [String] -> String
middleWord ws = runST do
let len = length ws
array <- newArray (0, len - 1) "" :: ST s (STArray s Int String)
populateArray array ws
middleWord' len array
main :: IO ()
main = do
ws <- words <$> readFile "test.txt"
putStrLn $ middleWord ws
Explanation
Starting with the top of Main.hs, the ST s monad and its associated function runST allow us to extract pure values from imperative-style computations with in-place updates in a referentially transparent manner. The module Data.Array.MArray exports the MArray typeclass as an interface for instantiating mutable array data types and provides helper functions for creating, reading, and writing MArrays. These functions can be used in conjunction with STArrays since there is an instance of MArray defined for STArray.
The populateArray function is the crux of our example. It uses for_ to "applicatively" loop over a list of tuples of indices and list elements to fill the given STArray with those list elements, producing a value of type () in the ST s monad.
The middleWord' helper function uses readArray to produce a String (wrapped in the ST s monad) that corresponds to the middle element of a given STArray of Strings.
The middleWord function instantiates a new STArray, uses populateArray to fill the array with values from a provided list of strings, and calls middleWord' to obtain the middle string in the array. runST is applied to this whole ST s monadic computation to extract the pure String result.
We finally use our middleWord function in main to find the middle word in the text file "test.txt".
Further Reading
Haskell's STArray is not the only way to work with arrays in Haskell. There are in fact Arrays, IOArrays, DiffArrays and even "unboxed" versions of all of these array types that avoid using the indirection of pointers to simply store "raw" values. There is a page on the Haskell Wikibook on this topic that may be worth some study. Before that, however, looking at the Wikibook page on mutable objects may give you some insight as to why the ST s monad allows us to safely compute pure values from functions that use imperative/destructive operations.
Footnotes
1 The BlockArguments language extension is what allows us to pass a do block directly to a function without any parentheses or use of the function application operator $.
2 As suggested by the Hackage documentation, Ix is a typeclass mainly meant to be used to specify types for indexing arrays.
3 The use of the Integral and Ix type constraints may be a bit of overkill, but it's used to make our type signatures as general as possible.
Related
I want to provide a function that replaces each occurrence of # in a string with a different random number. In a non-pure language, it's trivial. However, how should it be designed in a pure language? I don't want to use unsafePerformIO, as it rather looks like a hack and not a proper design.
Should this function require a random generator as one of its parameters? And if so, would that generator have to be passed through the whole stack of invocations? Are there other possible approaches? Should I use the State monad, here? I would appreciate a toy example demonstrating a viable approach...
You would, in fact, use a variant of the state monad to pass the random generator around behind the scenes. The Rand type in Control.Monad.Random helps with this. The API is a bit confusing, but more because it's polymorphic over the type of random generator you use than because it has to be functional. This extra bit of scaffolding is useful, however, because you can easily reuse your existing code with different random generators which lets you test different algorithms as well as explicitly controlling whether the generator is deterministic (good for testing) or seeded with outside data (in IO).
Here's a simple example of Rand in action. The RandomGen g => in the type signature tells us that we can use any type of random generator for it. We have to explicitly annotate n as an Int because otherwise GHC only knows that it has to be some numeric type that can be generated and turned into a string, which can be one of multiple possible options (like Double).
randomReplace :: RandomGen g => String -> Rand g String
randomReplace = foldM go ""
where go str '#' = do
n :: Int <- getRandomR (0, 10)
return (str ++ show n)
go str chr = return $ str ++ [chr]
To run this, we need to get a random generator from somewhere and pass it into evalRand. The easiest way to do this is to get the global system generator which we can do in IO:
main :: IO ()
main = do gen <- getStdGen
print $ evalRand (randomReplace "ab#c#") gen
This is such a common pattern that the library provides an evalRandIO function which does it for you:
main :: IO ()
main = do res <- evalRandIO $ randomReplace "ab#c#"
print res
In the end, the code is a bit more explicit about having a random generator and passing it around, but it's still reasonably easy to follow. For more involved code, you could also use RandT, which allows you to extend other monads (like IO) with the ability to generate random values, letting you relegate all the plumbing and setup to one part of your code.
It's just a monadic mapping
import Control.Applicative
import Control.Monad.Random
import Data.Char
randomReplace :: RandomGen g => String -> Rand g String
randomReplace = mapM f where
f '#' = intToDigit <$> getRandomR (0, 10)
f c = return c
main = evalRandIO (randomReplace "#abc#def#") >>= print
Consider the code below taken from a working example I've built to help me learn Haskell. This code parses a CSV file containing stock quotes downloaded from Yahoo into a nice simple list of bars with which I can then work.
My question: how can I write a function that will take a file name as its parameter and return an OHLCBarList so that the first four lines inside main can be properly encapsulated?
In other words, how can I implement (without getting all sorts of errors about IO stuff) the function whose type would be
getBarsFromFile :: Filename -> OHLCBarList
so that the grunt work that was being done in the first four lines of main can be properly encapsulated?
I've tried to do this myself but with my limited Haskell knowledge, I'm failing miserably.
import qualified Data.ByteString as BS
type Filename = String
getContentsOfFile :: Filename -> IO BS.ByteString
barParser :: Parser Bar
barParser = do
time <- timeParser
char ','
open <- double
char ','
high <- double
char ','
low <- double
char ','
close <- double
char ','
volume <- decimal
char ','
return $ Bar Bar1Day time open high low close volume
type OHLCBar = (UTCTime, Double, Double, Double, Double)
type OHLCBarList = [OHLCBar]
barsToBarList :: [Either String Bar] -> OHLCBarList
main :: IO ()
main = do
contents :: C.ByteString <- getContentsOfFile "PriceData/Daily/yhoo1.csv" --PriceData/Daily/Yhoo.csv"
let lineList :: [C.ByteString] = C.lines contents -- Break the contents into a list of lines
let bars :: [Either String Bar] = map (parseOnly barParser) lineList -- Using the attoparsec
let ohlcBarList :: OHLCBarList = barsToBarList bars -- Now I have a nice simple list of tuples with which to work
--- Now I can do simple operations like
print $ ohlcBarList !! 0
If you really want your function to have type Filename -> OHLCBarList, it can't be done.* Reading the contents of a file is an IO operation, and Haskell's IO monad is specifically designed so that values in the IO monad can never leave. If this restriction were broken, it would (in general) mess with a lot of things. Instead of doing this, you have two options: make the type of getBarsFromFile be Filename -> IO OHLCBarList — thus essentially copying the first four lines of main — or write a function with type C.ByteString -> OHLCBarList that the output of getContentsOfFile can be piped through to encapsulate lines 2 through 4 of main.
* Technically, it can be done, but you really, really, really shouldn't even try, especially if you're new to Haskell.
Others have explained that the correct type of your function has to be Filename -> IO OHLCBarList, I'd like to try and give you some insight as to why the compiler imposes this draconian measure on you.
Imperative programming is all about managing state: "do certain operations to certain bits of memory in sequence". When they grow large, procedural programs become brittle; we need a way of limiting the scope of state changes. OO programs encapsulate state in classes but the paradigm is not fundamentally different: you can call the same method twice and get different results. The output of the method depends on the (hidden) state of the object.
Functional programming goes all the way and bans mutable state entirely. A Haskell function, when called with certain inputs, will always produce the same output. Simple examples of
pure functions are mathematical operators like + and *, or most of the list-processing functions like map. Pure functions are all about the inputs and outputs, not managing internal state.
This allows the compiler to be very smart in optimising your program (for example, it can safely collapse duplicated code for you), and helps the programmer not to make mistakes: you can't put the system in an invalid state if there is none! We like pure functions.
The exception to the rule is IO. Code that performs IO is impure by definition: you could call getLine a hundred times and never get the same result, because it depends on what the user typed. Haskell handles this using the type system: all impure functions are marred with the IO type. IO can be thought of as a dependency on the state of the real world, sort of like World -> (NewWorld, a)
To summarise: pure functions are good because they are easy to reason about; this is why Haskell makes functions pure by default. Any impure code has to be labelled as such with an IO type signature; this tells the compiler and the reader to be careful with this function. So your function which reads from a file (a fundamentally impure action) but returns a pure value can't exist.
Addendum in response to your comment
You can still write pure functions to operate on data that was obtained impurely. Consider the following straw-man:
main :: IO ()
main = do
putStrLn "Enter the numbers you want me to process, separated by spaces"
line <- getLine
let numberStrings = words line
let numbers = map read numberStrings
putStrLn $ "The result of the calculation is " ++ (show $ foldr1 (*) numbers + 10)
Lots of code inside IO here. Let's extract some functions:
main :: IO ()
main = do
putStrLn "Enter the numbers you want me to process, separated by spaces"
result <- fmap processLine getLine -- fmap :: (a -> b) -> IO a -> IO b
-- runs an impure result through a pure function
-- without leaving IO
putStrLn $ "The result of the calculation is " ++ result
processLine :: String -> String -- look ma, no IO!
processLine = show . calculate . readNumbers
readNumbers :: String -> [Int]
readNumbers = map read . words
calculate :: [Int] -> Int
calculate numbers = product numbers + 10
product :: [Int] -> Int
product = foldr1 (*)
I've pulled logic out of main into pure functions which are easier to read, easier for the compiler to optimise, and more reusable (and so more testable). The program as a whole still lives inside IO because the data is obtained impurely (see the last part of this answer for a more thorough treatment of this argument). Impure data can be piped through pure functions using fmap and other combinators; you should try to put as little logic in main as possible.
Your code does seem to be most of the way there; as others have suggested you could extract lines 2-4 of your main into another function.
In other words, how can I implement (without getting all sorts of errors about IO stuff) the function whose type would be
getBarsFromFile :: Filename -> OHLCBarList
so that the grunt work that was being done in the first four lines of main can be properly encapsulated?
You cannot do this without getting all sorts of errors about IO stuff because this type for getBarsFromFile misses an IO. Probably that's what the errors about IO stuff are trying to tell you. Did you try understanding and fixing the errors?
In your situation, I would start by abstracting over the second to fourth line of your main in a function:
parseBars :: ByteString -> OHLCBarList
And then I would combine this function with getContentsOfFile to get:
getBarsFromFile :: FilePath -> IO OHLCBarList
This I would call in main.
I' ve got a problem with Haskell. I have text file looking like this:
5.
7.
[(1,2,3),(4,5,6),(7,8,9),(10,11,12)].
I haven't any idea how can I get the first 2 numbers (2 and 7 above) and the list from the last line. There are dots on the end of each line.
I tried to build a parser, but function called 'readFile' return the Monad called IO String. I don't know how can I get information from that type of string.
I prefer work on a array of chars. Maybe there is a function which can convert from 'IO String' to [Char]?
I think you have a fundamental misunderstanding about IO in Haskell. Particularly, you say this:
Maybe there is a function which can convert from 'IO String' to [Char]?
No, there isn't1, and the fact that there is no such function is one of the most important things about Haskell.
Haskell is a very principled language. It tries to maintain a distinction between "pure" functions (which don't have any side-effects, and always return the same result when give the same input) and "impure" functions (which have side effects like reading from files, printing to the screen, writing to disk etc). The rules are:
You can use a pure function anywhere (in other pure functions, or in impure functions)
You can only use impure functions inside other impure functions.
The way that code is marked as pure or impure is using the type system. When you see a function signature like
digitToInt :: String -> Int
you know that this function is pure. If you give it a String it will return an Int and moreover it will always return the same Int if you give it the same String. On the other hand, a function signature like
getLine :: IO String
is impure, because the return type of String is marked with IO. Obviously getLine (which reads a line of user input) will not always return the same String, because it depends on what the user types in. You can't use this function in pure code, because adding even the smallest bit of impurity will pollute the pure code. Once you go IO you can never go back.
You can think of IO as a wrapper. When you see a particular type, for example, x :: IO String, you should interpret that to mean "x is an action that, when performed, does some arbitrary I/O and then returns something of type String" (note that in Haskell, String and [Char] are exactly the same thing).
So how do you ever get access to the values from an IO action? Fortunately, the type of the function main is IO () (it's an action that does some I/O and returns (), which is the same as returning nothing). So you can always use your IO functions inside main. When you execute a Haskell program, what you are doing is running the main function, which causes all the I/O in the program definition to actually be executed - for example, you can read and write from files, ask the user for input, write to stdout etc etc.
You can think of structuring a Haskell program like this:
All code that does I/O gets the IO tag (basically, you put it in a do block)
Code that doesn't need to perform I/O doesn't need to be in a do block - these are the "pure" functions.
Your main function sequences together the I/O actions you've defined in an order that makes the program do what you want it to do (interspersed with the pure functions wherever you like).
When you run main, you cause all of those I/O actions to be executed.
So, given all that, how do you write your program? Well, the function
readFile :: FilePath -> IO String
reads a file as a String. So we can use that to get the contents of the file. The function
lines:: String -> [String]
splits a String on newlines, so now you have a list of Strings, each corresponding to one line of the file. The function
init :: [a] -> [a]
Drops the last element from a list (this will get rid of the final . on each line). The function
read :: (Read a) => String -> a
takes a String and turns it into an arbitrary Haskell data type, such as Int or Bool. Combining these functions sensibly will give you your program.
Note that the only time you actually need to do any I/O is when you are reading the file. Therefore that is the only part of the program that needs to use the IO tag. The rest of the program can be written "purely".
It sounds like what you need is the article The IO Monad For People Who Simply Don't Care, which should explain a lot of your questions. Don't be scared by the term "monad" - you don't need to understand what a monad is to write Haskell programs (notice that this paragraph is the only one in my answer that uses the word "monad", although admittedly I have used it four times now...)
Here's the program that (I think) you want to write
run :: IO (Int, Int, [(Int,Int,Int)])
run = do
contents <- readFile "text.txt" -- use '<-' here so that 'contents' is a String
let [a,b,c] = lines contents -- split on newlines
let firstLine = read (init a) -- 'init' drops the trailing period
let secondLine = read (init b)
let thirdLine = read (init c) -- this reads a list of Int-tuples
return (firstLine, secondLine, thirdLine)
To answer npfedwards comment about applying lines to the output of readFile text.txt, you need to realize that readFile text.txt gives you an IO String, and it's only when you bind it to a variable (using contents <-) that you get access to the underlying String, so that you can apply lines to it.
Remember: once you go IO, you never go back.
1 I am deliberately ignoring unsafePerformIO because, as implied by the name, it is very unsafe! Don't ever use it unless you really know what you are doing.
As a programming noob, I too was confused by IOs. Just remember that if you go IO you never come out. Chris wrote a great explanation on why. I just thought it might help to give some examples on how to use IO String in a monad. I'll use getLine which reads user input and returns an IO String.
line <- getLine
All this does is bind the user input from getLine to a value named line. If you type this this in ghci, and type :type line it will return:
:type line
line :: String
But wait! getLine returns an IO String
:type getLine
getLine :: IO String
So what happened to the IOness from getLine? <- is what happened. <- is your IO friend. It allows you to bring out the value that is tainted by the IO within a monad and use it with your normal functions. Monads are easily identified because they begin with do. Like so:
main = do
putStrLn "How much do you love Haskell?"
amount <- getLine
putStrln ("You love Haskell this much: " ++ amount)
If you're like me, you'll soon discover that liftIO is your next best monad friend, and that $ help reduce the number of parenthesis you need to write.
So how do you get the information from readFile? Well if readFile's output is IO String like so:
:type readFile
readFile :: FilePath -> IO String
Then all you need is your friendly <-:
yourdata <- readFile "samplefile.txt"
Now if type that in ghci and check the type of yourdata you'll notice it's a simple String.
:type yourdata
text :: String
As people already say, if you have two functions, one is readStringFromFile :: FilePath -> IO String, and another is doTheRightThingWithString :: String -> Something, then you don't really need to escape a string from IO, since you can combine this two functions in various ways:
With fmap for IO (IO is Functor):
fmap doTheRightThingWithString readStringFromFile
With (<$>) for IO (IO is Applicative and (<$>) == fmap):
import Control.Applicative
...
doTheRightThingWithString <$> readStringFromFile
With liftM for IO (liftM == fmap):
import Control.Monad
...
liftM doTheRightThingWithString readStringFromFile
With (>>=) for IO (IO is Monad, fmap == (<$>) == liftM == \f m -> m >>= return . f):
readStringFromFile >>= \string -> return (doTheRightThingWithString string)
readStringFromFile >>= \string -> return $ doTheRightThingWithString string
readStringFromFile >>= return . doTheRightThingWithString
return . doTheRightThingWithString =<< readStringFromFile
With do notation:
do
...
string <- readStringFromFile
-- ^ you escape String from IO but only inside this do-block
let result = doTheRightThingWithString string
...
return result
Every time you will get IO Something.
Why you would want to do it like that? Well, with this you will have pure and
referentially transparent programs (functions) in your language. This means that every function which type is IO-free is pure and referentially transparent, so that for the same arguments it will returns the same values. For example, doTheRightThingWithString would return the same Something for the same String. However readStringFromFile which is not IO-free can return different strings every time (because file can change), so that you can't escape such unpure value from IO.
If you have a parser of this type:
myParser :: String -> Foo
and you read the file using
readFile "thisfile.txt"
then you can read and parse the file using
fmap myParser (readFile "thisfile.txt")
The result of that will have type IO Foo.
The fmap means myParser runs "inside" the IO.
Another way to think of it is that whereas myParser :: String -> Foo, fmap myParser :: IO String -> IO Foo.
I'm trying to spew out randomly generated dice for every roll that the user plays. The user has 3 rolls per turn and he gets to play 5 turns (I haven't implemented this part yet and I would appreciate suggestions).
I'm also wondering how I can display the colors randomly. I have the list of tuples in place, but I reckon I need some function that uses random and that list to match those colors. I'm struggling as to how.
module Main where
import System.IO
import System.Random
import Data.List
diceColor = [("Black",1),("Green",2),("Purple",3),("Red",4),("White",5),("Yellow",6)]
{-
randomList :: (RandomGen g) -> Int -> g -> [Integer]
random 0 _ = []
randomList n generator = r : randomList (n-1) newGenerator
where (r, newGenerator) = randomR (1, 6) generator
-}
rand :: Int -> [Int] -> IO ()
rand n rlst = do
num <- randomRIO (1::Int, 6)
if n == 0
then doSomething rlst
else rand (n-1) (num:rlst)
doSomething x = putStrLn (show (sort x))
main :: IO ()
main = do
--hSetBuffering stdin LineBuffering
putStrLn "roll, keep, score?"
cmd <- getLine
doYahtzee cmd
--rand (read cmd) []
doYahtzee :: String -> IO ()
doYahtzee cmd = do
if cmd == "roll"
then rand 5 []
else do print "You won"
There's really a lot of errors sprinkled throughout this code, which suggests to me that you tried to build the whole thing at once. This is a recipe for disaster; you should be building very small things and testing them often in ghci.
Lecture aside, you might find the following facts interesting (in order of the associated errors in your code):
List is deprecated; you should use Data.List instead.
No let is needed for top-level definitions.
Variable names must begin with a lower case letter.
Class prerequisites are separated from a type by =>.
The top-level module block should mainly have definitions; you should associate every where clause (especially the one near randomList) with a definition by either indenting it enough not to be a new line in the module block or keeping it on the same line as the definition you want it to be associated with.
do introduces a block; those things in the block should be indented equally and more than their context.
doYahtzee is declared and used as if it has three arguments, but seems to be defined as if it only has one.
The read function is used to parse a String. Unless you know what it does, using read to parse a String from another String is probably not what you want to do -- especially on user input.
putStrLn only takes one argument, not four, and that argument has to be a String. However, making a guess at what you wanted here, you might like the (!!) and print functions.
dieRoll doesn't seem to be defined anywhere.
It's possible that there are other errors, as well. Stylistically, I recommend that you check out replicateM, randomRs, and forever. You can use hoogle to search for their names and read more about them; in the future, you can also use it to search for functions you wish existed by their type.
first off sorry for doing the typical thing of 'where do I begin', but I'm totally lost.
I've been reading the 'Learn you a haskell for great good' site for what feels like an age now (pretty much half a semester. I'm just about to finish the 'Input and Output' chapter, and I still have no clue how to write a multi line program.
I've seen the do statement, and that you can only use it to concat IO actions into a single function, but I can't see how I'm gonna go about writing a realistic application.
Can someone point me in the right direction.
I'm from a C background, and basically I'm using haskell for one of my modules this semester at uni, I want to compare C++ against haskell (in many aspects). I'm looking to create a series of searching and sorting programs so that I can comment on how easy they are in the respective languages versus their speed.
However, I'm really starting to loose my faith in using Haskell as its been six weeks, and I still have no idea how to write a complete application, and the chapters in the site I'm reading seem to be getting longer and longer.
I basically need to create a basic object which will be stored in the structure (which I know how to do), more what I'm struggling with is, how do I create a program which reads data in from some text file, and populates the structure with that data in the first place, then goes on to process it. As haskell seems to split IO and other operations and it won't just let me write multiple lines in a program, I'm looking for something like this:
main = data <- getContent
let allLines = lines data
let myStructure = generateStruct allLines
sort/search/etc
print myStructure
how do I go about this? any good tutorials which will help me get going with realistic programs?
-A
You mentioned seeing do notation, now it's time to learn how to use do. Consider your example main is an IO, you should be using do syntax or binds:
main = do
dat <- getContent
let allLines = lines dat
myStructure = generateStruct allLines
sorted = mySort myStructure
searchResult = mySearch myStructure
print myStructure
print sorted
print searchResult
So now you have a main that gets stdin, turns it into [String] via lines, presumably parses it into a structure and runs sorting and searches on that structure. Notice the interesting code is all pure - mySort, mySearch, and generateStruct doesn't need to be IO (and can't be, being inside a let binding) so you are actually properly using pure and effectful code together.
I suggest you look at how bind works (>>=) and how do notation desugars into bind. This SO question should help.
See also Explaining Haskell IO without Monads by Neil Mitchell.
I'll try to start with a simplified example. Let's say this is what we want to do:
Open a file which contains a list of integers and return it.
Sort this list
Let's also reverse the list
Print the result on the screen
Let's also say that we have these functions that we can use:
getContent :: IO [Int]
sort :: [Int] -> [Int]
reverse :: [Int] -> [Int]
show :: a -> String
putStrLn :: String -> IO ()
Just so we are clear, I'll have a word about these functions:
getContent: I made up this function, but if there was such function that would be it's signature (you can use getContent = return [3,7,2,1] for testing purposes). I'm sure you've seen such signature before and at least vaguely understand that since it does IO its signature can not be just getContent :: [Int].
sort: It's a function defined in Data.List module, usage is simple: sort [3,1,2] returns [1,2,3]
reverse: Also defined in Data.List module: reverse [1,3,2] returns [2,3,1]
show: don't need to import anything, just use it: show 11 returns the string "11"; show [1,2,3] returns the string "[1,2,3]", etc.
putStrLn: takes a string, puts it on the screen and returns IO (), now again, since it does IO its signature can not be just putStrLn :: Stiring -> ().
OK, now we have all we need to create our program, the problem now is about connecting these functions together. Let's start with connecting functions:
getContent :: IO [Int] with sort :: [Int] -> [Int]
I think if you get this part, you'll easily get the rest as well. So, the problem is that since getContent returns IO [Int] and not just [Int], you can't just ignore or get rid of the IO part and shove it into sort. That is, this is what you can not do to connect these functions:
sort (getRidOfIO getContent)
Here is where the >>= :: m a -> (a -> m b) -> m b operation comes to the rescue. Now notice that m, a and b are type variables so if we substitute m for IO, a for [Int] and b for [Int], we get the signagure:
>>= :: IO [Int] -> ([Int] -> IO [Int]) -> IO [Int]
Have a look again at those getContent and sort functions and their signatures and try to think about how they'll fit into the >>=. I'm sure you'll notice that you can use getContent directly as the first argument to >>=. So far what >>= will do is take the [Int] out getContent and shoves it into the function provided as a second argument. But what will be the function in the second argument? We can't use the sort :: [Int] -> [Int] directly, the next best thing we can try is
\listOfInts -> sort listOfInts
but that still has signature [Int] -> [Int] so that did not help much. Here is where the other hero comes to the play, the
return :: a -> m a.
Again, a and m are type variables, lets substitute them and we will get
return :: [Int] -> IO [Int]
so adding \listOfInts -> sort listOfInts and return together we will get:
\listOfInts -> return $ sort listOfInts :: [Int] -> IO [Int]
Which is exactly what we want to put as a second argument to >>=. So lets finaly connect getContent and sort using our glue together:
getContent >>= (\listOfInts -> return $ sort listOfInts)
which is the same thing as (using the do notation):
do listOfInts <- getContent
return $ sort listOfInts
There, that is the end of the most terrifying part. And now comes possibly one of the aha moments, try to think about what is the result type of the connection we just made up. I'll spoil it for you,... the type of
getContent >>= (\listOfInts -> return $ sort listOfInts) is IO [Int] again.
Lets summarize: we took something of type IO [Int] and something of type [Int] -> [Int], glued those two things together and got again something of type IO [Int]!
Now go ahead and try exactly the same thing: Take the IO [Int] object we have just created and glue it together (using >>= and return) with reverse :: [Int] -> [Int].
I think I wrote way too much, but let me know if anything was not clear or if you need help with the rest.
Wha I've described so far can look something like this:
getContent :: IO [Int]
getContent = return [5,2,1,7]
main :: IO ()
main = do
listOfInts <- getContent
return $ sort listOfInts
return () -- This is only to sattisfy the signature of main
If it is a question of reading from stdin and writing a result to stdout, with no further intevening user input -- as your mention of getContents suggests -- then the ancient interact :: (String -> String) -> IO (), or the several other versions, e.g. Data.ByteString.interact :: (ByteString -> ByteString) -> IO () or Data.Text.interact :: (Text -> Text) -> IO() are all that are needed. interact is basically the 'make a little unix tool out of this function' function -- it maps pure functions of the right type to executable actions (i.e. values of the type IO().) All Haskell tutorials should mention it on the third or fourth page, with instructions on compilation.
So if you write
main = interact arthur
arthur :: String -> String
arthur = reverse
and compile with ghc --make -O2 Reverse.hs -o reverse then whatever you pipe to ./reverse will be understood as a list of characters and emerge reversed. Similarly, whatever you pipe to
main = interact (unlines . meredith . lines)
meredith :: [String] -> [String]
meredith = filter (not.null)
will emerge with the empty lines omitted. More interestingly,
main = interact ( unlines . map show . luther . map read . lines)
luther :: [Int] -> [Int]
luther = filter even
will take a stream of characters separated by newlines, read them as Ints, removing the odd ones, and yielding the suitably filtered stream.
main = interact ( unlines . map show . emma . map read . lines)
emma :: [Int] -> Int
emma = sum . map square
where square x = x * x
will print the sum of the squares of the newline-separated numerals.
In these last two cases, luther and emma the internal 'data structure' is [Int], which is pretty dull, and the function applied to it is idiot simple, of course. The main point is to let one of the forms of interact take care of all of the IO, and thus get images like 'populating a structure' and 'processing it' out of your head. To use interact you need to use composition to make the whole yield some sort of String -> String function. But even here, as in the runt first example arthur:: String -> String you are defining a genuine function in something more like the mathematical sense. Values in the types String and ByteString are just as pure as those in Bool or Int.
In more complicated cases of this basic interact type, your task is thus, first, to think how the desired pure values of the function you will be focussing on can be mapped to String values (here, it's just show for an Int or unlines . map show for a [Int]). interact knows what to "do" with the string. -- And then to figure out how to define a pure mapping from Strings or ByteString (which will contain your 'raw' data) to values in the type or types your principal function takes as arguments. Here I was just using map read . lines resulting in a [Int]. If you are working on some more complicated, say tree structure you'd need a function from [Int] to MyTree Int. A more elaborate function to put in this position would be a Parser, of course.
Then you can go to town, in this sort of case: there is really no reason to think of yourself as 'programming', 'populating' and 'processing' at all. This is where all the cool devices of LYAH kick in. Your duty is to define a mapping within the specific definitional discipline. In the last two cases, these are from [Int] to [Int] and from [Int] to Int, but here is a similar example derived from the excellent, still incomplete, tutorial on the super-excellent Vector package where the initial numerical structure one is dealing with is Vector Int
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as L
import qualified Data.Vector.Unboxed as U
import System.Environment
main = L.interact (L.pack . (++"\n") . show . roman . parse)
where
parse :: L.ByteString -> U.Vector Int
parse bytestr = U.unfoldr step bytestr
step !s = case L.readInt s of
Nothing -> Nothing
Just (!k, !t) -> Just (k, L.tail t)
-- now the IO and stringy nonsense is out of the way
-- so we can calculate properly:
roman :: U.Vector Int -> Int
roman = U.sum
Here again roman is moronic, any function from a Vector of Ints to an Int, however complex, can take its place. Writing a better roman will never be a question of "populating" "multi-line programming" "processing" etc., though of course we speak this way; it is just a question of defining a genuine function by composition of the functions in Data.Vector and elsewhere. The sky is the limit, check out that tutorial too.