I am looking at a simple IO program from the Haskell Wikibook. The construction presented on that page works just fine, but I'm trying to understand "how".
The writeChar function below takes a filepath (as a string) and a character, and it writes the character to the file at the given path. The function uses a bracket to ensure that the file opens and closes properly. Of the three computations run in the bracket, the "computation to run in-between"---as I understand it---is a lambda function that returns the result of hPutChar h c.
Now, hPutChar itself has a declaration of hPutChar :: Handle -> Char -> IO (). This is where I'm lost. I seem to be passing h as the handle to hPutChar. I would expect a handle somehow to reference the file opened as fp, but instead it appears to be recursively calling the lambda function \h. I don't see how this lambda function calling itself recursively knows to write c to the file at fp.
I would like to understand why the last line of this function shouldn't read (\h -> hPutChar fp c). Attempting to run it that way results in "Couldn't match type ‘[Char]’ with ‘Handle’" which I consider sensible given that hPutChar expects a Handle datatype as opposed to a string.
import Control.Exception
writeChar :: FilePath -> Char -> IO ()
writeChar fp c =
bracket
(openFile fp WriteMode)
hClose
(\h -> hPutChar h c)
Let's have a look at the type of bracket (quoted as it appears in your Haskell Wiki link):
bracket :: IO a -- computation to run first ("acquire resource")
-> (a -> IO b) -- computation to run last ("release resource")
-> (a -> IO c) -- computation to run in-between
-> IO c
In your use case, the first argument, openFile fp WriteMode is an IO Handle value, a computation that produces a handle corresponding to the fp path. The third argument, \h -> hPutChar h c, is a function that takes a handle and returns a computation that writes to it. The idea is that the function you pass as the third argument specifies how the resource produced by the first argument will be used.
There is no recursion going on here. h is indeed a Handle. If you've programmed in C, the rough equivalent is a FILE. The handle consists of a file descriptor, buffers, and whatever else is needed to perform I/O on the attached file/pipe/terminal/whatever. openFile takes a path, opens the requested file (or device), and provides a handle that you can use to manipulate the requested file.
bracket
(openFile fp WriteMode)
hClose
(\h -> hPutChar h c)
This opens the file to produce a handle. That handle is passed to the third function, which binds it to h and passes it to hPutChar to output a character. Then in the end, bracket passes the handle to hClose to close the file.
If exceptions didn't exist, you could implement bracket like this:
bracket
:: IO resource
-> (resource -> IO x)
-> (resource -> IO a)
-> IO a
bracket first last middle = do
resource <- first
result <- middle resource
last resource
pure result
But bracket actually has to install an exception handler to endure that last is called even if an exception occurs.
hPutChar :: Handle -> Char -> IO ()
is a pure Haskell function, which, given two arguments h :: Handle and c :: Char, produces a pure Haskell value of type IO (), an "IO action":
h :: Handle c :: Char
---------------------------------------------
hPutChr h c :: IO ()
This "action" is simply a Haskell value, but when it appears inside the IO monad do block under main, it becomes executed by the Haskell run-time system and then it actually performs the I/O operation of putting a character c into a filesystem's entity referred to by the handle h.
As for the lambda function, the actual unambiguous syntax is
(\ h -> ... )
where the white space between \ and h is optional, and the whole (.......) expression is the lambda expression. So there is no "\h entity":
(\ ... -> ... ) is the lambda-expression syntax;
h in \ h -> is the lambda function's parameter,
... in (\ h -> ... ) is the lambda function's body.
bracket calls the (\h -> hPutChar h c) lambda function with the result produced by (openFile fp WriteMode) I/O computation, which is the handle of the file name referred to by fp, opened according to the mode WriteMode.
Main thing to understand about the Haskell monadic IO is that "computation" is not a function: it is the actual (here, I/O) computation performing the actual file open -- which produces the handle -- which is then used by the run-time system to call the pure Haskell function (\ h -> ...) with it.
This layering (of pure Haskell "world" and the impure I/O "world") is the essence of .... yes, Monad. An I/O computation does something, finds some value, uses it to call a pure Haskell function, which creates a new computation, which is then run, feeds its results into the next pure function, etc. etc.
So we keep our purity in Haskell by only talking about the impure stuff. Talking is not doing.
Or is it?
Related
In The acquire-use-release cycle section from Real World Haskell, the type of bracket is shown:
ghci> :type bracket
bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c
Now, from the description of bracket, I understand that an exception might be thrown while the function of type a -> IO c is running. With reference to the book, this exception is caught by the calling function, via handle:
getFileSize path = handle (\_ -> return Nothing) $
bracket (openFile path ReadMode) hClose $ \h -> do
size <- hFileSize h
return (Just size)
I can't help but thinking that when the exception does occur from within bracket's 3rd argument, bracket is not returning an IO c.
How does this go well with purity?
I think the answer might be exactly this, but I'm not sure.
I can't help but thinking that when the exception does occur from within bracket's 3rd argument, bracket is not returning an IO c.
Prelude> fail "gotcha" :: IO Bool
*** Exception: user error (gotcha)
As you note, no Bool (respectively, c) value is produced. That's ok, because the action does not conclude – instead it re-raises the exception. That exception might then either crash the program, or it might be caught again somewhere else in calling code – importantly, whoever catches it will a) not get the result value (“the c”), you never do that in case of an exception; b) doesn't need to worry about closing the file handle, because that has already been done by bracket.
I'm new to Haskell and I'm writing a compiler from JSON to some other format. I don't understand the difference between binding an expression to a variable versus passing it inline, here's two examples (inside a do block) which I expect to work the same, but instead the second one confuses the typechecker and fails:
Works as expected
modul <- parseModule <$> readFile "foo.json"
case (modul) of
(Left s) -> error s
(Right m) -> what (m :: Module) `shouldBe` "module"
Fails typechecking
case (parseModule <$> readFile "foo.json") of
(Left s) -> error s
(Right m) -> what (m :: Module) `shouldBe` "module"
Error:
• Couldn't match type ‘Either a0’ with ‘IO’
Expected type: IO (Either String Module)
Actual type: Either a0 (Either String Module)
parseModule <$> readFile "foo.json" is not a value that you could pattern-match on. That value wouldn't be purely functional, because it depends on a real-world resource (the file).
Instead, it is an IO action which is able to obtain the value when executed (it then takes a “snapshot of the world state, frozen into a pure value”).
Really, there's only one way to execute IO actions: assign them to the name main, compile and run. That may sound crazy prohibitive – you can only have one single action in a program? – but actually it's not, because you can combine multiple actions (to be executed sequentially) into a single action and execute that. This composing of actions is accomplished through the concept of monads. In your case, you basically want to compose the action of type IO (Either String Module) with a function of type Either String Module -> IO (). That's what the >>= operator is for.
main :: IO ()
main = parseModule <$> readFile "foo.json" >>= f
where f modul = case modul of ...
...or in lambda-style,
main = parseModule <$> readFile "foo.json"
>>= \modul -> case modul of
Left s -> error s
Right m -> what m `shouldBe` "module"
Because that kind of binding is needed so often, Haskell has the special do syntax for it, which you've already used in the first snippet. It is simply syntactic sugar for this >>= operator chain, but that monadic bind is essential.
Consider the code below taken from a working example I've built to help me learn Haskell. This code parses a CSV file containing stock quotes downloaded from Yahoo into a nice simple list of bars with which I can then work.
My question: how can I write a function that will take a file name as its parameter and return an OHLCBarList so that the first four lines inside main can be properly encapsulated?
In other words, how can I implement (without getting all sorts of errors about IO stuff) the function whose type would be
getBarsFromFile :: Filename -> OHLCBarList
so that the grunt work that was being done in the first four lines of main can be properly encapsulated?
I've tried to do this myself but with my limited Haskell knowledge, I'm failing miserably.
import qualified Data.ByteString as BS
type Filename = String
getContentsOfFile :: Filename -> IO BS.ByteString
barParser :: Parser Bar
barParser = do
time <- timeParser
char ','
open <- double
char ','
high <- double
char ','
low <- double
char ','
close <- double
char ','
volume <- decimal
char ','
return $ Bar Bar1Day time open high low close volume
type OHLCBar = (UTCTime, Double, Double, Double, Double)
type OHLCBarList = [OHLCBar]
barsToBarList :: [Either String Bar] -> OHLCBarList
main :: IO ()
main = do
contents :: C.ByteString <- getContentsOfFile "PriceData/Daily/yhoo1.csv" --PriceData/Daily/Yhoo.csv"
let lineList :: [C.ByteString] = C.lines contents -- Break the contents into a list of lines
let bars :: [Either String Bar] = map (parseOnly barParser) lineList -- Using the attoparsec
let ohlcBarList :: OHLCBarList = barsToBarList bars -- Now I have a nice simple list of tuples with which to work
--- Now I can do simple operations like
print $ ohlcBarList !! 0
If you really want your function to have type Filename -> OHLCBarList, it can't be done.* Reading the contents of a file is an IO operation, and Haskell's IO monad is specifically designed so that values in the IO monad can never leave. If this restriction were broken, it would (in general) mess with a lot of things. Instead of doing this, you have two options: make the type of getBarsFromFile be Filename -> IO OHLCBarList — thus essentially copying the first four lines of main — or write a function with type C.ByteString -> OHLCBarList that the output of getContentsOfFile can be piped through to encapsulate lines 2 through 4 of main.
* Technically, it can be done, but you really, really, really shouldn't even try, especially if you're new to Haskell.
Others have explained that the correct type of your function has to be Filename -> IO OHLCBarList, I'd like to try and give you some insight as to why the compiler imposes this draconian measure on you.
Imperative programming is all about managing state: "do certain operations to certain bits of memory in sequence". When they grow large, procedural programs become brittle; we need a way of limiting the scope of state changes. OO programs encapsulate state in classes but the paradigm is not fundamentally different: you can call the same method twice and get different results. The output of the method depends on the (hidden) state of the object.
Functional programming goes all the way and bans mutable state entirely. A Haskell function, when called with certain inputs, will always produce the same output. Simple examples of
pure functions are mathematical operators like + and *, or most of the list-processing functions like map. Pure functions are all about the inputs and outputs, not managing internal state.
This allows the compiler to be very smart in optimising your program (for example, it can safely collapse duplicated code for you), and helps the programmer not to make mistakes: you can't put the system in an invalid state if there is none! We like pure functions.
The exception to the rule is IO. Code that performs IO is impure by definition: you could call getLine a hundred times and never get the same result, because it depends on what the user typed. Haskell handles this using the type system: all impure functions are marred with the IO type. IO can be thought of as a dependency on the state of the real world, sort of like World -> (NewWorld, a)
To summarise: pure functions are good because they are easy to reason about; this is why Haskell makes functions pure by default. Any impure code has to be labelled as such with an IO type signature; this tells the compiler and the reader to be careful with this function. So your function which reads from a file (a fundamentally impure action) but returns a pure value can't exist.
Addendum in response to your comment
You can still write pure functions to operate on data that was obtained impurely. Consider the following straw-man:
main :: IO ()
main = do
putStrLn "Enter the numbers you want me to process, separated by spaces"
line <- getLine
let numberStrings = words line
let numbers = map read numberStrings
putStrLn $ "The result of the calculation is " ++ (show $ foldr1 (*) numbers + 10)
Lots of code inside IO here. Let's extract some functions:
main :: IO ()
main = do
putStrLn "Enter the numbers you want me to process, separated by spaces"
result <- fmap processLine getLine -- fmap :: (a -> b) -> IO a -> IO b
-- runs an impure result through a pure function
-- without leaving IO
putStrLn $ "The result of the calculation is " ++ result
processLine :: String -> String -- look ma, no IO!
processLine = show . calculate . readNumbers
readNumbers :: String -> [Int]
readNumbers = map read . words
calculate :: [Int] -> Int
calculate numbers = product numbers + 10
product :: [Int] -> Int
product = foldr1 (*)
I've pulled logic out of main into pure functions which are easier to read, easier for the compiler to optimise, and more reusable (and so more testable). The program as a whole still lives inside IO because the data is obtained impurely (see the last part of this answer for a more thorough treatment of this argument). Impure data can be piped through pure functions using fmap and other combinators; you should try to put as little logic in main as possible.
Your code does seem to be most of the way there; as others have suggested you could extract lines 2-4 of your main into another function.
In other words, how can I implement (without getting all sorts of errors about IO stuff) the function whose type would be
getBarsFromFile :: Filename -> OHLCBarList
so that the grunt work that was being done in the first four lines of main can be properly encapsulated?
You cannot do this without getting all sorts of errors about IO stuff because this type for getBarsFromFile misses an IO. Probably that's what the errors about IO stuff are trying to tell you. Did you try understanding and fixing the errors?
In your situation, I would start by abstracting over the second to fourth line of your main in a function:
parseBars :: ByteString -> OHLCBarList
And then I would combine this function with getContentsOfFile to get:
getBarsFromFile :: FilePath -> IO OHLCBarList
This I would call in main.
I' ve got a problem with Haskell. I have text file looking like this:
5.
7.
[(1,2,3),(4,5,6),(7,8,9),(10,11,12)].
I haven't any idea how can I get the first 2 numbers (2 and 7 above) and the list from the last line. There are dots on the end of each line.
I tried to build a parser, but function called 'readFile' return the Monad called IO String. I don't know how can I get information from that type of string.
I prefer work on a array of chars. Maybe there is a function which can convert from 'IO String' to [Char]?
I think you have a fundamental misunderstanding about IO in Haskell. Particularly, you say this:
Maybe there is a function which can convert from 'IO String' to [Char]?
No, there isn't1, and the fact that there is no such function is one of the most important things about Haskell.
Haskell is a very principled language. It tries to maintain a distinction between "pure" functions (which don't have any side-effects, and always return the same result when give the same input) and "impure" functions (which have side effects like reading from files, printing to the screen, writing to disk etc). The rules are:
You can use a pure function anywhere (in other pure functions, or in impure functions)
You can only use impure functions inside other impure functions.
The way that code is marked as pure or impure is using the type system. When you see a function signature like
digitToInt :: String -> Int
you know that this function is pure. If you give it a String it will return an Int and moreover it will always return the same Int if you give it the same String. On the other hand, a function signature like
getLine :: IO String
is impure, because the return type of String is marked with IO. Obviously getLine (which reads a line of user input) will not always return the same String, because it depends on what the user types in. You can't use this function in pure code, because adding even the smallest bit of impurity will pollute the pure code. Once you go IO you can never go back.
You can think of IO as a wrapper. When you see a particular type, for example, x :: IO String, you should interpret that to mean "x is an action that, when performed, does some arbitrary I/O and then returns something of type String" (note that in Haskell, String and [Char] are exactly the same thing).
So how do you ever get access to the values from an IO action? Fortunately, the type of the function main is IO () (it's an action that does some I/O and returns (), which is the same as returning nothing). So you can always use your IO functions inside main. When you execute a Haskell program, what you are doing is running the main function, which causes all the I/O in the program definition to actually be executed - for example, you can read and write from files, ask the user for input, write to stdout etc etc.
You can think of structuring a Haskell program like this:
All code that does I/O gets the IO tag (basically, you put it in a do block)
Code that doesn't need to perform I/O doesn't need to be in a do block - these are the "pure" functions.
Your main function sequences together the I/O actions you've defined in an order that makes the program do what you want it to do (interspersed with the pure functions wherever you like).
When you run main, you cause all of those I/O actions to be executed.
So, given all that, how do you write your program? Well, the function
readFile :: FilePath -> IO String
reads a file as a String. So we can use that to get the contents of the file. The function
lines:: String -> [String]
splits a String on newlines, so now you have a list of Strings, each corresponding to one line of the file. The function
init :: [a] -> [a]
Drops the last element from a list (this will get rid of the final . on each line). The function
read :: (Read a) => String -> a
takes a String and turns it into an arbitrary Haskell data type, such as Int or Bool. Combining these functions sensibly will give you your program.
Note that the only time you actually need to do any I/O is when you are reading the file. Therefore that is the only part of the program that needs to use the IO tag. The rest of the program can be written "purely".
It sounds like what you need is the article The IO Monad For People Who Simply Don't Care, which should explain a lot of your questions. Don't be scared by the term "monad" - you don't need to understand what a monad is to write Haskell programs (notice that this paragraph is the only one in my answer that uses the word "monad", although admittedly I have used it four times now...)
Here's the program that (I think) you want to write
run :: IO (Int, Int, [(Int,Int,Int)])
run = do
contents <- readFile "text.txt" -- use '<-' here so that 'contents' is a String
let [a,b,c] = lines contents -- split on newlines
let firstLine = read (init a) -- 'init' drops the trailing period
let secondLine = read (init b)
let thirdLine = read (init c) -- this reads a list of Int-tuples
return (firstLine, secondLine, thirdLine)
To answer npfedwards comment about applying lines to the output of readFile text.txt, you need to realize that readFile text.txt gives you an IO String, and it's only when you bind it to a variable (using contents <-) that you get access to the underlying String, so that you can apply lines to it.
Remember: once you go IO, you never go back.
1 I am deliberately ignoring unsafePerformIO because, as implied by the name, it is very unsafe! Don't ever use it unless you really know what you are doing.
As a programming noob, I too was confused by IOs. Just remember that if you go IO you never come out. Chris wrote a great explanation on why. I just thought it might help to give some examples on how to use IO String in a monad. I'll use getLine which reads user input and returns an IO String.
line <- getLine
All this does is bind the user input from getLine to a value named line. If you type this this in ghci, and type :type line it will return:
:type line
line :: String
But wait! getLine returns an IO String
:type getLine
getLine :: IO String
So what happened to the IOness from getLine? <- is what happened. <- is your IO friend. It allows you to bring out the value that is tainted by the IO within a monad and use it with your normal functions. Monads are easily identified because they begin with do. Like so:
main = do
putStrLn "How much do you love Haskell?"
amount <- getLine
putStrln ("You love Haskell this much: " ++ amount)
If you're like me, you'll soon discover that liftIO is your next best monad friend, and that $ help reduce the number of parenthesis you need to write.
So how do you get the information from readFile? Well if readFile's output is IO String like so:
:type readFile
readFile :: FilePath -> IO String
Then all you need is your friendly <-:
yourdata <- readFile "samplefile.txt"
Now if type that in ghci and check the type of yourdata you'll notice it's a simple String.
:type yourdata
text :: String
As people already say, if you have two functions, one is readStringFromFile :: FilePath -> IO String, and another is doTheRightThingWithString :: String -> Something, then you don't really need to escape a string from IO, since you can combine this two functions in various ways:
With fmap for IO (IO is Functor):
fmap doTheRightThingWithString readStringFromFile
With (<$>) for IO (IO is Applicative and (<$>) == fmap):
import Control.Applicative
...
doTheRightThingWithString <$> readStringFromFile
With liftM for IO (liftM == fmap):
import Control.Monad
...
liftM doTheRightThingWithString readStringFromFile
With (>>=) for IO (IO is Monad, fmap == (<$>) == liftM == \f m -> m >>= return . f):
readStringFromFile >>= \string -> return (doTheRightThingWithString string)
readStringFromFile >>= \string -> return $ doTheRightThingWithString string
readStringFromFile >>= return . doTheRightThingWithString
return . doTheRightThingWithString =<< readStringFromFile
With do notation:
do
...
string <- readStringFromFile
-- ^ you escape String from IO but only inside this do-block
let result = doTheRightThingWithString string
...
return result
Every time you will get IO Something.
Why you would want to do it like that? Well, with this you will have pure and
referentially transparent programs (functions) in your language. This means that every function which type is IO-free is pure and referentially transparent, so that for the same arguments it will returns the same values. For example, doTheRightThingWithString would return the same Something for the same String. However readStringFromFile which is not IO-free can return different strings every time (because file can change), so that you can't escape such unpure value from IO.
If you have a parser of this type:
myParser :: String -> Foo
and you read the file using
readFile "thisfile.txt"
then you can read and parse the file using
fmap myParser (readFile "thisfile.txt")
The result of that will have type IO Foo.
The fmap means myParser runs "inside" the IO.
Another way to think of it is that whereas myParser :: String -> Foo, fmap myParser :: IO String -> IO Foo.
Following a haskell tutorial, the author provides the following implementation of the withFile method:
withFile' :: FilePath -> IOMode -> (Handle -> IO a) -> IO a
withFile' path mode f = do
handle <- openFile path mode
result <- f handle
hClose handle
return result
But why do we need to wrap the result in a return? Doesn't the supplied function f already return an IO as can be seen by it's type Handle -> IO a?
You're right: f already returns an IO, so if the function were written like this:
withFile' path mode f = do
handle <- openFile path mode
f handle
there would be no need for a return. The problem is hClose handle comes in between, so we have to store the result first:
result <- f handle
and doing <- gets rid of the IO. So return puts it back.
This is one of the tricky little things that confused me when I first tried Haskell. You're misunderstanding the meaning of the <- construct in do-notation. result <- f handle doesn't mean "assign the value of f handle to result"; it means "bind result to a value 'extracted' from the monadic value of f handle" (where the 'extraction' happens in some way that's defined by the particular Monad instance that you're using, in this case the IO monad).
I.e., for some Monad typeclass m, the <- statement takes an expression of type m a in the right hand side and a variable of type a on the left hand side, and binds the variable to a value. Thus in your particular example, with result <- f handle, we have the types f result :: IO a, result :: a and return result :: IO a.
PS do-notation has also a special form of let (without the in keyword in this case!) that works as a pure counterpart to <-. So you could rewrite your example as:
withFile' :: FilePath -> IOMode -> (Handle -> IO a) -> IO a
withFile' path mode f = do
handle <- openFile path mode
let result = f handle
hClose handle
result
In this case, because the let is a straightforward assignment, the type of result is IO a.