I would like to parse an infinite stream of bytes into an infinite stream of Haskell data. Each byte is read from the network, thus they are wrapped into IO monad.
More concretely I have an infinite stream of type [IO(ByteString)]. On the other hand I have a pure parsing function parse :: [ByteString] -> [Object] (where Object is a Haskell data type)
Is there a way to plug my infinite stream of monad into my parsing function ?
For instance, is it possible to write a function of type [IO(ByteString)] -> IO [ByteString] in order for me to use my function parse in a monad?
The Problem
Generally speaking, in order for IO actions to be properly ordered and behave predictably, each action needs to complete fully before the next action is run. In a do-block, this means that this works:
main = do
sequence (map putStrLn ["This","action","will","complete"])
putStrLn "before we get here"
but unfortunately this won't work, if that final IO action was important:
dontRunMe = do
putStrLn "This is a problem when an action is"
sequence (repeat (putStrLn "infinite"))
putStrLn "<not printed>"
So, even though sequence can be specialized to the right type signature:
sequence :: [IO a] -> IO [a]
it doesn't work as expected on an infinite list of IO actions. You'll have no problem defining such a sequence:
badSeq :: IO [Char]
badSeq = sequence (repeat (return '+'))
but any attempt to execute the IO action (e.g., by trying to print the head of the resulting list) will hang:
main = (head <$> badSeq) >>= print
It doesn't matter if you only need a part of the result. You won't get anything out of the IO monad until the entire sequence is done (so "never" if the list is infinite).
The "Lazy IO" Solution
If you want to get data from a partially completed IO action, you need to be explicit about it and make use of a scary-sounding Haskell escape hatch, unsafeInterleaveIO. This function takes an IO action and "defers" it so that it won't actually execute until the value is demanded.
The reason this is unsafe in general is that an IO action that makes sense now, might mean something different if actually executed at a later time point. As a simple example, an IO action that truncates/removes a file has a very different effect if it's executed before versus after updated file contents are written!
Anyway, what you'd want to do here is write a lazy version of sequence:
import System.IO.Unsafe (unsafeInterleaveIO)
lazySequence :: [IO a] -> IO [a]
lazySequence [] = return [] -- oops, not infinite after all
lazySequence (m:ms) = do
x <- m
xs <- unsafeInterleaveIO (lazySequence ms)
return (x:xs)
The key point here is that, when a lazySequence infstream action is executed, it will actually execute only the first action; the remaining actions will be wrapped up in a deferred IO action that won't truly execute until the second and subsequent elements of the returned list are demanded.
This works for fake IO actions:
> take 5 <$> lazySequence (repeat (return ('+'))
"+++++"
>
(where if you replaced lazySequence with sequence, it would hang). It also works for real IO actions:
> lns <- lazySequence (repeat getLine)
<waits for first line of input, then returns to prompt>
> print (head lns)
<prints whatever you entered>
> length (head (tail lns)) -- force next element
<waits for second line of input>
<then shows length of your second line before prompt>
>
Anyway, with this definition of lazySequence and types:
parse :: [ByteString] -> [Object]
input :: [IO ByteString]
you should have no trouble writing:
outputs :: IO [Object]
outputs = parse <$> lazySequence inputs
and then using it lazily however you want:
main = do
objs <- outputs
mapM_ doSomethingWithObj objs
Using Conduit
Even though the above lazy IO mechanism is pretty simple and straightforward, lazy IO has fallen out of favor for production code due to issues with resource management, fragility with respect to space leaks (where a small change to your code blows up the memory footprint), and problems with exception handling.
One solution is the conduit library. Another is pipes. Both are carefully designed streaming libraries that can support infinite streams.
For conduit, if you had a parse function that created one object per byte string, like:
parse1 :: ByteString -> Object
parse1 = ...
then given:
inputs :: [IO ByteString]
inputs = ...
useObject :: Object -> IO ()
useObject = ...
the conduit would look something like:
import Conduit
main :: IO ()
main = runConduit $ mapM_ yieldM inputs
.| mapC parse1
.| mapM_C useObject
Given that your parse function has signature:
parse :: [ByteString] -> [Object]
I'm pretty sure you can't integrate this with conduit directly (or at least not in any way that wouldn't toss out all the benefits of using conduit). You'd need to rewrite it to be conduit friendly in how it consumed byte strings and produced objects.
Related
I'm trying to write code in source -> transform -> sink style, for example:
let (|>) = flip ($)
repeat 1 |> take 5 |> sum |> print
But would like to do that using IO. I have this impression that my source can be an infinite list of IO actions, and each one gets evaluated once it is needed downstream. Something like this:
-- prints the number of lines entered before "quit" is entered
[getLine..] >>= takeWhile (/= "quit") >>= length >>= print
I think this is possible with the streaming libraries, but can it be done along the lines of what I'm proposing?
Using the repeatM, takeWhile and length_ functions from the streaming library:
import Streaming
import qualified Streaming.Prelude as S
count :: IO ()
count = do r <- S.length_ . S.takeWhile (/= "quit") . S.repeatM $ getLine
print r
This seems to be in that spirit:
let (|>) = flip ($)
let (.>) = flip (.)
getContents >>= lines .> takeWhile (/= "quit") .> length .> print
The issue here is that Monad is not the right abstraction for this, and attempting to do something like this results in a situation where referential transparency is broken.
Firstly, we can do a lazy IO read like so:
module Main where
import System.IO.Unsafe (unsafePerformIO)
import Control.Monad(forM_)
lazyIOSequence :: [IO a] -> IO [a]
lazyIOSequence = pure . go where
go :: [IO a] -> [a]
go (l:ls) = (unsafePerformIO l):(go ls)
main :: IO ()
main = do
l <- lazyIOSequence (repeat getLine)
forM_ l putStrLn
This when run will perform cat. It will read lines and output them. Everything works fine.
But consider changing the main function to this:
main :: IO ()
main = do
l <- lazyIOSequence (map (putStrLn . show) [1..])
putStrLn "Hello World"
This outputs Hello World only, as we didn't need to evaluate any of l. But now consider replacing the last line like the following:
main :: IO ()
main = do
x <- lazyIOSequence (map (putStrLn . show) [1..])
seq (head x) putStrLn "Hello World"
Same program, but the output is now:
1
Hello World
This is bad, we've changed the results of a program just by evaluating a value. This is not supposed to happen in Haskell, when you evaluate something it should just evaluate it, not change the outside world.
So if you restrict your IO actions to something like reading from a file nothing else is reading from, then you might be able to sensibly lazily evaluate things, because when you read from it in relation to all the other IO actions your program is taking doesn't matter. But you don't want to allow this for IO in general, because skipping actions or performing them in a different order can matter (and above, certainly does). Even in the reading a file lazily case, if something else in your program writes to the file, then whether you evaluate that list before or after the write action will affect the output of your program, which again, breaks referential transparency (because evaluation order shouldn't matter).
So for a restricted subset of IO actions, you can sensibly define Functor, Applicative and Monad on a stream type to work in a lazy way, but doing so in the IO Monad in general is a minefield and often just plain incorrect. Instead you want a specialised streaming type, and indeed Conduit defines Functor, Applicative and Monad on a lot of it's types so you can still use all your favourite functions.
consider the following simple IO function:
req :: IO [Integer]
req = do
print "x"
return [1,2,3]
In reality this might be a http request, which returns a list after parsing it's result.
I'm trying to concatenate the results of several calls of that function in a lazy way.
In simple terms, the following should print the 'x' only two times:
fmap (take 4) req'
--> [1, 2, 3, 4]
I thought this might be solved with sequence or mapM, however my approach fails in terms of laziness:
import Control.Monad
req' :: IO [Integer]
req' = fmap concat $ mapM req [1..1000] -- should be infinite..
This yields the right result, however the IO function req is called 1000 times instead of the necessary 2 times. When implementing the above with a map over an infinite list, the evaluation does not terminate at all.
Short version:
You shouldn't do this, look into a streaming IO library such as pipes or conduit instead.
Long version:
You can't. Or at least, you really shouldn't. Allowing lazily evaluated code to have side effects is generally a very bad idea. Not only does it very quickly become hard to reason about wich effects are performed when and how many times, but even worse, effects may not be performed in the order you expect them to! With pure code, this is not a big deal. With side-effecting code, this is a disaster.
Imagine that you want to read a value from a reference and then replace the value with an updated value. In the IO monad, where the order of computation is well defined, this is easy:
main = do
yesterdaysDate <- readIORef ref
writeIORef ref todaysDate
However, if the above code were instead to be evaluated lazily, there would be no guarantee that the reference was read before it was written - or even that both computations would be executed at all. The semantics of the program would depend entirely on if and when we needed the results of the computations. This is one of the reasons for coming up with monads in the first place: to give programmers a way to write code with side effects, which execute in a well-defined and easily understood order.
Now, it is actually possible to lazily concatenate the lists, if you create them using unsafeInterleaveIO:
import System.IO.Unsafe
req :: IO [Integer]
req = unsafeInterleaveIO $ do
print "x"
return [1,2,3]
req' :: IO [Integer]
req' = fmap concat $ mapM (const req) [1..1000]
This will cause each application of req to be deferred until the corresponding sublist is needed. However, lazily performing IO like this may lead to interesting race conditions and resource leaks, and is generally frowned upon. The recommended alternative would be to use a streaming IO library such as conduit or pipes, which are mentioned in the comments.
Here is how you would do something like this with the streaming and pipes libraries. Pipes programs will be somewhat similar those written with conduit especially in this sort of case. conduit uses different names, and pipes and conduit have somewhat fancier types and operators than streaming; but it's really a matter of indifference which you use. streaming is I think fundamentally simpler in this sort of case; the formulation will be structurally similar to the corresponding IO [a] program and indeed frequently simpler. The essential point is that a Stream (Of Integer) IO () is exactly like list of Integers but it is built in that the elements of the list or stream can arise from successive IO actions.
I gave req an argument in the following, since that seemed to be what you had in mind.
import Streaming
import qualified Streaming.Prelude as S
import Streaming.Prelude (for, each)
req :: Integer -> Stream (Of Integer) IO ()
req x = do -- this 'stream' is just a list of Integers arising in IO
liftIO $ putStr "Sending request #" >> print x
each [x..x+2]
req' :: Stream (Of Integer) IO ()
req' = for (S.each [1..]) req -- An infinite succession of requests
-- each yielding three numbers. Here we are not
-- actually using IO to get each but we could.
main = S.print $ S.take 4 req'
-- >>> main
-- Sending request #1
-- 1
-- 2
-- 3
-- Sending request #2
-- 2
To get our four desired values we had to send two "requests"; we of course don't end up applying req to all Integers! S.take doesn't permit any further development of the infinite stream req' it takes as argument; so only the first element from the second request is ever calculated. Then everything shuts down. The fancy signature Stream (Of Int) IO () could be replaced by a synonymn
type List a = Stream (Of a) IO ()
and you would barely notice the difference from Haskell lists, except you don't get the apocalypses you noticed. The extra moveable parts in the actual signature are distracting here, but make it possible to replicate the whole API of Data.List in basically every detail while permitting IO and avoidance of accumulation everywhere. (Without the further moveable parts it is e.g. impossible to write splitAt, partition and chunksOf, and indeed you will find stack overflow is awash with questions how to do these obvious things with e.g. conduit.)
The pipes equivalent is this
import Pipes
import qualified Pipes.Prelude as P
req :: Integer -> Producer Integer IO ()
req x = do
liftIO $ putStr "Sending request #" >> print x
each [x..x+2]
req' = for (each [1..]) req
main = runEffect $ req' >-> P.take 4 >-> P.print
-- >>> main
-- Sending request #1
-- 1
-- 2
-- 3
-- Sending request #2
-- 2
it differs by treating take and print as pipes, rather than as ordinary functions on streams as they are with Data.List. This has charm but is not needed in the present context where the conception of the stream as an effectful list predominates. Intuitively takeing and printing are things we do to a list, even if it is an effectful list as in this case, and the piping and conduiting aspect is a distraction (in bread-and-butter cases it also nearly doubles the time needed for calculation, due to the cost of >-> and .| which is akin to that of say map.)
It might help understanding if we note that req above could have been written
req x = do
liftIO $ putStr "Sending request #" >> print x
yield x -- yield a >> yield b == each [a,b]
yield (x+1)
yield (x+2)
this will be word for word the same in streaming pipes and conduit. yield a >> rest is the same as a:rest The difference is that a yield a line (in a do block) can be preceded by a bit of IO, e.g. a <- liftIO readLn; yield a
In general list mapM replicateM traverse and sequence should be avoided - except for short lists - for the reasons you mention. sequence is at the bottom of them all and it basically has to constitute the whole list before it can proceed. (Note sequence = mapM id; mapM f = sequence . map f) Thus we see
>>> sequence [getChar,getChar,getChar] >>= mapM_ print
abc'a' -- here and below I just type abc, ghci prints 'a' 'b' 'c'
'b'
'c'
but with a streaming library we see stuff like
>>> S.mapM_ print $ S.sequence $ S.each [getChar,getChar,getChar]
a'a'
b'b'
c'c'
Similarly
>>> replicateM 3 getChar >>= mapM_ print
abc'a'
'b'
'c'
is a mess - nothing happens till the whole list is constructed, then each of the collected Chars is printed in succession. But with a streaming library we write the simpler
>>> S.mapM_ print $ S.replicateM 3 getChar
a'a'
b'b'
c'c'
and the outputs are in sync with the inputs. In particular, no more than one character is in memory at a time. replicateM_, mapM_ and sequence_ by contrast don't accumulate lists aren't a problem. It's the others that should prompt one to think of a streaming library, any streaming library. A monad-general sequence can't do any better than this, as you can see by reflecting on
>>> sequence [Just 1, Just 2, Just 3]
Just [1,2,3]
>>> sequence [Just 1, Just 2, Nothing]
Nothing
If the list were a million Maybe Ints long, it would all have to be remembered and left unused while waiting to see if last item is Nothing. Since sequence, mapM, replicateM, traverse and company are monad general, what goes for Maybe goes for IO.
Continuing above, we can similarly collect the list as you seemed to want to do:
main = S.toList_ (S.take 4 req') >>= print
-- >>> main
-- Sending request #1
-- Sending request #2
-- [1,2,3,2]
or, in the pipes version:
main = P.toListM (req' >-> P.take 4) >>= print
-- >>> main
-- Sending request #1
-- Sending request #2
-- [1,2,3,2]
Or to pile on possibilities, we can do IO with each element, while collecting them in a list or vector or whatever
main = do
ls <- S.toList_ $ S.print $ S.copy $ S.take 4 req'
print ls
-- >>> main
-- Sending request #1
-- 1
-- 2
-- 3
-- Sending request #2
-- 2
-- [1,2,3,2]
Here I print the copies and save the 'originals' for a list. The games we are playing here start to come upon the limits of pipes and conduit, though this particular program can be replicated with them.
As far as I know, what you're looking for shouldn't/can't be done using mapM and should probably use some form of streaming. In case it's helpful, an example using io-streams:
import qualified System.IO.Streams as Streams
import qualified System.IO.Streams.Combinators as Streams
req :: IO (Maybe [Integer])
req = do
print "x"
return (Just [1,2,3])
req' :: IO [Integer]
req' = Streams.toList =<< Streams.take 4 =<< Streams.concatLists =<< Streams.makeInputStream req
The working version of your code:
module Foo where
req :: Integer -> IO [Integer]
req _x = do
print "x"
return [1,2,3]
req' :: IO [Integer]
req' = concat <$> mapM req [1..1000]
(Note: I replaced fmap concat with concat <$>.)
When you evalute fmap (take 4) req', the mapM expression's value is needed, which, in turn, needs the value of the [1..1000] list. So, a 1000 element list is generated and mapM applies the req function to each element -- hence, the 1000 'x'-es printed. concat then has to supply a value to the (take 4) section, which produces [1,2,3] repeated 1000 times. Then, and only then, can (take 4) take the first four elements.
All of these computations occur because a value is needed by ghci, if you're at the interpreter's REPL prompt. Otherwise, in an executing program, take 4 is simply stacked in a waiting thunk until its value is actually needed.
Best to think about this as a tree where expressions are pushed onto the root of the tree, replacing the root each time (root becomes a leaf in another expression that needs its value.) When the value at the root of the tree is needed, evaluate from the bottom up.
Now, if you really only wanted req evaluated once and only once because it is truly a constant value, here's the code:
module Foo where
req2 :: IO [Integer]
req2 = do
print "x"
return [1,2,3]
req2' :: IO [Integer]
req2' = concat <$> mapM (const req2) ([1..1000] :: [Integer])
req2' is evaluated only once because it evaluates to a constant (no function parameters guarantees this.) Admittedly, though, that's probably not what you really intended.
This is what the pipes and conduit ecosystems were designed for. Here's an example for pipes.
#!/usr/bin/env stack
--stack runghc --resolver=lts-7.16 --package pipes
module Main where
import Control.Monad (forever)
import Pipes as P
import qualified Pipes.Prelude as P
req :: Producer Int IO ()
req = forever $ do
liftIO $ putStrLn "Making a request."
mapM_ yield [1,2,3]
main :: IO ()
main = P.toListM (req >-> P.take 4) >>= print
Note that normally you don't collapse a result into a list using pipes, but that seems to be your use case.
I have a file number.txt which contains a large number and I read it into an IO String like this:
readNumber = readFile "number.txt" >>= return
In another function I want to create a list of Ints, one Int for each digit…
Lets assume the content of number.txt is:
1234567890
Then I want my function to return [1,2,3,4,5,6,7,8,9,0].
I tried severall versions with map, mapM(_), liftM, and, and, and, but I got several error messages everytime, which I was able to reduce to
Couldn't match expected type `[m0 Char]'
with actual type `IO String'
The last version I have on disk is the following:
module Main where
import Control.Monad
import Data.Char (digitToInt)
main = intify >>= putStrLn . show
readNumber = readFile "number.txt" >>= return
intify = mapM (liftM digitToInt) readNumber
So, as far as I understand the error, I need some function that takes IO [a] and returns [IO a], but I was not able to find such thing with hoogle… Only the other way round seemes to exist
In addition to the other great answers here, it's nice to talk about how to read [IO Char] versus IO [Char]. In particular, you'd call [IO Char] "an (immediate) list of (deferred) IO actions which produce Chars" and IO [Char] "a (deferred) IO action producing a list of Chars".
The important part is the location of "deferred" above---the major difference between a type IO a and a type a is that the former is best thought of as a set of instructions to be executed at runtime which eventually produce an a... while the latter is just that very a.
This phase distinction is key to understanding how IO values work. It's also worth noting that it can be very fluid within a program---functions like fmap or (>>=) allow us to peek behind the phase distinction. As an example, consider the following function
foo :: IO Int -- <-- our final result is an `IO` action
foo = fmap f getChar where -- <-- up here getChar is an `IO Char`, not a real one
f :: Char -> Int
f = Data.Char.ord -- <-- inside here we have a "real" `Char`
Here we build a deferred action (foo) by modifying a deferred action (getChar) by using a function which views a world that only comes into existence after our deferred IO action has run.
So let's tie this knot and get back to the question at hand. Why can't you turn an IO [Char] into an [IO Char] (in any meaningful way)? Well, if you're looking at a piece of code which has access to IO [Char] then the first thing you're going to want to do is sneak inside of that IO action
floob = do chars <- (getChars :: IO [Char])
...
where in the part left as ... we have access to chars :: [Char] because we've "stepped into" the IO action getChars. This means that by this point we've must have already run whatever runtime actions are required to generate that list of characters. We've let the cat out of the monad and we can't get it back in (in any meaningful way) since we can't go back and "unread" each individual character.
(Note: I keep saying "in any meaningful way" because we absolutely can put cats back into monads using return, but this won't let us go back in time and have never let them out in the first place. That ship has sailed.)
So how do we get a type [IO Char]? Well, we have to know (without running any IO) what kind of IO operations we'd like to do. For instance, we could write the following
replicate 10 getChar :: [IO Char]
and immediately do something like
take 5 (replicate 10 getChar)
without ever running an IO action---our list structure is immediately available and not deferred until the runtime has a chance to get to it. But note that we must know exactly the structure of the IO actions we'd like to perform in order to create a type [IO Char]. That said, we could use yet another level of IO to peek at the real world in order to determine the parameters of our action
do len <- (figureOutLengthOfReadWithoutActuallyReading :: IO Int)
return $ replicate len getChar
and this fragment has type IO [IO Char]. To run it we have to step through IO twice, we have to let the runtime perform two IO actions, first to determine the length and then second to actually act on our list of IO Char actions.
sequence :: [IO a] -> IO [a]
The above function, sequence, is a common way to execute some structure containing a, well, sequence of IO actions. We can use that to do our two-phase read
twoPhase :: IO [Char]
twoPhase = do len <- (figureOutLengthOfReadWithoutActuallyReading :: IO Int)
putStrLn ("About to read " ++ show len ++ " characters")
sequence (replicate len getChar)
>>> twoPhase
Determining length of read
About to read 22 characters
let me write 22 charac"let me write 22 charac"
You got some things mixed up:
readNumber = readFile "number.txt" >>= return
the return is unecessary, just leave it out.
Here is a working version:
module Main where
import Data.Char (digitToInt)
main :: IO ()
main = intify >>= print
readNumber :: IO String
readNumber = readFile "number.txt"
intify :: IO [Int]
intify = fmap (map digitToInt) readNumber
Such a function can't exists, because you would be able to evaluate the length of the list without ever invoking any IO.
What is possible is this:
imbue' :: IO [a] -> IO [IO a]
imbue' = fmap $ map return
Which of course generalises to
imbue :: (Functor f, Monad m) => m (f a) -> m (f (m a))
imbue = liftM $ fmap return
You can then do, say,
quun :: IO [Char]
bar :: [IO Char] -> IO Y
main = do
actsList <- imbue quun
y <- bar actsLists
...
Only, the whole thing about using [IO Char] is pointless: it's completely equivalent to the much more straightforward way of working only with lists of "pure values", only using the IO monad "outside"; how to do that is shown in Markus's answer.
Do you really need many different helper functions? Because you may write just
main = do
file <- readFile "number.txt"
let digits = map digitToInt file
print digits
or, if you really need to separate them, try to minimize the amount of IO signatures:
readNumber = readFile "number.txt" --Will be IO String
intify = map digitToInt --Will be String -> [Int], not IO
main = readNumber >>= print . intify
I' ve got a problem with Haskell. I have text file looking like this:
5.
7.
[(1,2,3),(4,5,6),(7,8,9),(10,11,12)].
I haven't any idea how can I get the first 2 numbers (2 and 7 above) and the list from the last line. There are dots on the end of each line.
I tried to build a parser, but function called 'readFile' return the Monad called IO String. I don't know how can I get information from that type of string.
I prefer work on a array of chars. Maybe there is a function which can convert from 'IO String' to [Char]?
I think you have a fundamental misunderstanding about IO in Haskell. Particularly, you say this:
Maybe there is a function which can convert from 'IO String' to [Char]?
No, there isn't1, and the fact that there is no such function is one of the most important things about Haskell.
Haskell is a very principled language. It tries to maintain a distinction between "pure" functions (which don't have any side-effects, and always return the same result when give the same input) and "impure" functions (which have side effects like reading from files, printing to the screen, writing to disk etc). The rules are:
You can use a pure function anywhere (in other pure functions, or in impure functions)
You can only use impure functions inside other impure functions.
The way that code is marked as pure or impure is using the type system. When you see a function signature like
digitToInt :: String -> Int
you know that this function is pure. If you give it a String it will return an Int and moreover it will always return the same Int if you give it the same String. On the other hand, a function signature like
getLine :: IO String
is impure, because the return type of String is marked with IO. Obviously getLine (which reads a line of user input) will not always return the same String, because it depends on what the user types in. You can't use this function in pure code, because adding even the smallest bit of impurity will pollute the pure code. Once you go IO you can never go back.
You can think of IO as a wrapper. When you see a particular type, for example, x :: IO String, you should interpret that to mean "x is an action that, when performed, does some arbitrary I/O and then returns something of type String" (note that in Haskell, String and [Char] are exactly the same thing).
So how do you ever get access to the values from an IO action? Fortunately, the type of the function main is IO () (it's an action that does some I/O and returns (), which is the same as returning nothing). So you can always use your IO functions inside main. When you execute a Haskell program, what you are doing is running the main function, which causes all the I/O in the program definition to actually be executed - for example, you can read and write from files, ask the user for input, write to stdout etc etc.
You can think of structuring a Haskell program like this:
All code that does I/O gets the IO tag (basically, you put it in a do block)
Code that doesn't need to perform I/O doesn't need to be in a do block - these are the "pure" functions.
Your main function sequences together the I/O actions you've defined in an order that makes the program do what you want it to do (interspersed with the pure functions wherever you like).
When you run main, you cause all of those I/O actions to be executed.
So, given all that, how do you write your program? Well, the function
readFile :: FilePath -> IO String
reads a file as a String. So we can use that to get the contents of the file. The function
lines:: String -> [String]
splits a String on newlines, so now you have a list of Strings, each corresponding to one line of the file. The function
init :: [a] -> [a]
Drops the last element from a list (this will get rid of the final . on each line). The function
read :: (Read a) => String -> a
takes a String and turns it into an arbitrary Haskell data type, such as Int or Bool. Combining these functions sensibly will give you your program.
Note that the only time you actually need to do any I/O is when you are reading the file. Therefore that is the only part of the program that needs to use the IO tag. The rest of the program can be written "purely".
It sounds like what you need is the article The IO Monad For People Who Simply Don't Care, which should explain a lot of your questions. Don't be scared by the term "monad" - you don't need to understand what a monad is to write Haskell programs (notice that this paragraph is the only one in my answer that uses the word "monad", although admittedly I have used it four times now...)
Here's the program that (I think) you want to write
run :: IO (Int, Int, [(Int,Int,Int)])
run = do
contents <- readFile "text.txt" -- use '<-' here so that 'contents' is a String
let [a,b,c] = lines contents -- split on newlines
let firstLine = read (init a) -- 'init' drops the trailing period
let secondLine = read (init b)
let thirdLine = read (init c) -- this reads a list of Int-tuples
return (firstLine, secondLine, thirdLine)
To answer npfedwards comment about applying lines to the output of readFile text.txt, you need to realize that readFile text.txt gives you an IO String, and it's only when you bind it to a variable (using contents <-) that you get access to the underlying String, so that you can apply lines to it.
Remember: once you go IO, you never go back.
1 I am deliberately ignoring unsafePerformIO because, as implied by the name, it is very unsafe! Don't ever use it unless you really know what you are doing.
As a programming noob, I too was confused by IOs. Just remember that if you go IO you never come out. Chris wrote a great explanation on why. I just thought it might help to give some examples on how to use IO String in a monad. I'll use getLine which reads user input and returns an IO String.
line <- getLine
All this does is bind the user input from getLine to a value named line. If you type this this in ghci, and type :type line it will return:
:type line
line :: String
But wait! getLine returns an IO String
:type getLine
getLine :: IO String
So what happened to the IOness from getLine? <- is what happened. <- is your IO friend. It allows you to bring out the value that is tainted by the IO within a monad and use it with your normal functions. Monads are easily identified because they begin with do. Like so:
main = do
putStrLn "How much do you love Haskell?"
amount <- getLine
putStrln ("You love Haskell this much: " ++ amount)
If you're like me, you'll soon discover that liftIO is your next best monad friend, and that $ help reduce the number of parenthesis you need to write.
So how do you get the information from readFile? Well if readFile's output is IO String like so:
:type readFile
readFile :: FilePath -> IO String
Then all you need is your friendly <-:
yourdata <- readFile "samplefile.txt"
Now if type that in ghci and check the type of yourdata you'll notice it's a simple String.
:type yourdata
text :: String
As people already say, if you have two functions, one is readStringFromFile :: FilePath -> IO String, and another is doTheRightThingWithString :: String -> Something, then you don't really need to escape a string from IO, since you can combine this two functions in various ways:
With fmap for IO (IO is Functor):
fmap doTheRightThingWithString readStringFromFile
With (<$>) for IO (IO is Applicative and (<$>) == fmap):
import Control.Applicative
...
doTheRightThingWithString <$> readStringFromFile
With liftM for IO (liftM == fmap):
import Control.Monad
...
liftM doTheRightThingWithString readStringFromFile
With (>>=) for IO (IO is Monad, fmap == (<$>) == liftM == \f m -> m >>= return . f):
readStringFromFile >>= \string -> return (doTheRightThingWithString string)
readStringFromFile >>= \string -> return $ doTheRightThingWithString string
readStringFromFile >>= return . doTheRightThingWithString
return . doTheRightThingWithString =<< readStringFromFile
With do notation:
do
...
string <- readStringFromFile
-- ^ you escape String from IO but only inside this do-block
let result = doTheRightThingWithString string
...
return result
Every time you will get IO Something.
Why you would want to do it like that? Well, with this you will have pure and
referentially transparent programs (functions) in your language. This means that every function which type is IO-free is pure and referentially transparent, so that for the same arguments it will returns the same values. For example, doTheRightThingWithString would return the same Something for the same String. However readStringFromFile which is not IO-free can return different strings every time (because file can change), so that you can't escape such unpure value from IO.
If you have a parser of this type:
myParser :: String -> Foo
and you read the file using
readFile "thisfile.txt"
then you can read and parse the file using
fmap myParser (readFile "thisfile.txt")
The result of that will have type IO Foo.
The fmap means myParser runs "inside" the IO.
Another way to think of it is that whereas myParser :: String -> Foo, fmap myParser :: IO String -> IO Foo.
Consider the following code that is supposed to print out random numbers:
import System.Random.Mersenne
main =
do g <- (newMTGen Nothing)
xs <- (randoms g) :: IO [Double]
mapM_ print xs
When run, I get a segmentation fault error. That is unsurprising, since the function 'randoms' produces an infinite list. Suppose I wanted to print out only the first ten values of xs. How could I do that? xs has type IO [Double], and I think I want a variable of type [IO Double]. What operators exist to convert between the two.
If you get a segmentation fault error, and you didn't use the FFI or any functions with unsafe in their name, that's not unsurprising, in any situation! It means there's a bug with either GHC, or a library you're using is doing something unsafe.
Printing out an infinite list of Doubles with mapM_ print is perfectly fine; the list will be processed incrementally and the program should run with constant memory usage. I suspect there is a bug in the System.Random.Mersenne module you're using, or a bug the C library it's based on, or a problem with your computer (such as faulty RAM).1 Note that newMTGen comes with this warning:
Due to the current SFMT library being vastly impure, currently only a single generator is allowed per-program. Attempts to reinitialise it will fail.
You might be better off using the provided global MTGen instead.
That said, you can't convert IO [Double] into [IO Double] in that way; there's no way to know how long the resulting list would be without executing the IO action, which is impossible, since you have a pure result (albeit one that happens to contain IO actions). For infinite lists, you could write:
desequence :: IO [a] -> [IO a]
desequence = desequence' 0
where
desequence n m = fmap (!! n) m : desequence (n+1) m
But every time you execute an action in this list, the IO [a] action would be executed again; it'd just discard the rest of the list.
The reason randoms can work and return an infinite list of random numbers is because it uses lazy IO with unsafeInterleaveIO. (Note that, despite the "unsafe" in the name, this one can't cause segfaults, so something else is afoot.)
1 Other, less likely possibilities include a miscompilation of the C library, or a bug in GHC.
Suppose I wanted to print out only the first ten values of xs. How could I do that?
Just use take:
main =
do g <- (newMTGen Nothing)
xs <- (randoms g) :: IO [Double]
mapM_ print $ take 10 xs
You wrote
xs has type IO [Double]
But actually, randoms g has type IO [Double], but thanks to the do notation, xs has type [Double], you can just apply take 10 to it.
You could also skip the binding using liftM:
main =
do g <- newMTGen Nothing
ys <- liftM (take 10) $ randoms g :: IO [Double]
mapM_ print ys