Consider the following code that is supposed to print out random numbers:
import System.Random.Mersenne
main =
do g <- (newMTGen Nothing)
xs <- (randoms g) :: IO [Double]
mapM_ print xs
When run, I get a segmentation fault error. That is unsurprising, since the function 'randoms' produces an infinite list. Suppose I wanted to print out only the first ten values of xs. How could I do that? xs has type IO [Double], and I think I want a variable of type [IO Double]. What operators exist to convert between the two.
If you get a segmentation fault error, and you didn't use the FFI or any functions with unsafe in their name, that's not unsurprising, in any situation! It means there's a bug with either GHC, or a library you're using is doing something unsafe.
Printing out an infinite list of Doubles with mapM_ print is perfectly fine; the list will be processed incrementally and the program should run with constant memory usage. I suspect there is a bug in the System.Random.Mersenne module you're using, or a bug the C library it's based on, or a problem with your computer (such as faulty RAM).1 Note that newMTGen comes with this warning:
Due to the current SFMT library being vastly impure, currently only a single generator is allowed per-program. Attempts to reinitialise it will fail.
You might be better off using the provided global MTGen instead.
That said, you can't convert IO [Double] into [IO Double] in that way; there's no way to know how long the resulting list would be without executing the IO action, which is impossible, since you have a pure result (albeit one that happens to contain IO actions). For infinite lists, you could write:
desequence :: IO [a] -> [IO a]
desequence = desequence' 0
where
desequence n m = fmap (!! n) m : desequence (n+1) m
But every time you execute an action in this list, the IO [a] action would be executed again; it'd just discard the rest of the list.
The reason randoms can work and return an infinite list of random numbers is because it uses lazy IO with unsafeInterleaveIO. (Note that, despite the "unsafe" in the name, this one can't cause segfaults, so something else is afoot.)
1 Other, less likely possibilities include a miscompilation of the C library, or a bug in GHC.
Suppose I wanted to print out only the first ten values of xs. How could I do that?
Just use take:
main =
do g <- (newMTGen Nothing)
xs <- (randoms g) :: IO [Double]
mapM_ print $ take 10 xs
You wrote
xs has type IO [Double]
But actually, randoms g has type IO [Double], but thanks to the do notation, xs has type [Double], you can just apply take 10 to it.
You could also skip the binding using liftM:
main =
do g <- newMTGen Nothing
ys <- liftM (take 10) $ randoms g :: IO [Double]
mapM_ print ys
Related
I would like to parse an infinite stream of bytes into an infinite stream of Haskell data. Each byte is read from the network, thus they are wrapped into IO monad.
More concretely I have an infinite stream of type [IO(ByteString)]. On the other hand I have a pure parsing function parse :: [ByteString] -> [Object] (where Object is a Haskell data type)
Is there a way to plug my infinite stream of monad into my parsing function ?
For instance, is it possible to write a function of type [IO(ByteString)] -> IO [ByteString] in order for me to use my function parse in a monad?
The Problem
Generally speaking, in order for IO actions to be properly ordered and behave predictably, each action needs to complete fully before the next action is run. In a do-block, this means that this works:
main = do
sequence (map putStrLn ["This","action","will","complete"])
putStrLn "before we get here"
but unfortunately this won't work, if that final IO action was important:
dontRunMe = do
putStrLn "This is a problem when an action is"
sequence (repeat (putStrLn "infinite"))
putStrLn "<not printed>"
So, even though sequence can be specialized to the right type signature:
sequence :: [IO a] -> IO [a]
it doesn't work as expected on an infinite list of IO actions. You'll have no problem defining such a sequence:
badSeq :: IO [Char]
badSeq = sequence (repeat (return '+'))
but any attempt to execute the IO action (e.g., by trying to print the head of the resulting list) will hang:
main = (head <$> badSeq) >>= print
It doesn't matter if you only need a part of the result. You won't get anything out of the IO monad until the entire sequence is done (so "never" if the list is infinite).
The "Lazy IO" Solution
If you want to get data from a partially completed IO action, you need to be explicit about it and make use of a scary-sounding Haskell escape hatch, unsafeInterleaveIO. This function takes an IO action and "defers" it so that it won't actually execute until the value is demanded.
The reason this is unsafe in general is that an IO action that makes sense now, might mean something different if actually executed at a later time point. As a simple example, an IO action that truncates/removes a file has a very different effect if it's executed before versus after updated file contents are written!
Anyway, what you'd want to do here is write a lazy version of sequence:
import System.IO.Unsafe (unsafeInterleaveIO)
lazySequence :: [IO a] -> IO [a]
lazySequence [] = return [] -- oops, not infinite after all
lazySequence (m:ms) = do
x <- m
xs <- unsafeInterleaveIO (lazySequence ms)
return (x:xs)
The key point here is that, when a lazySequence infstream action is executed, it will actually execute only the first action; the remaining actions will be wrapped up in a deferred IO action that won't truly execute until the second and subsequent elements of the returned list are demanded.
This works for fake IO actions:
> take 5 <$> lazySequence (repeat (return ('+'))
"+++++"
>
(where if you replaced lazySequence with sequence, it would hang). It also works for real IO actions:
> lns <- lazySequence (repeat getLine)
<waits for first line of input, then returns to prompt>
> print (head lns)
<prints whatever you entered>
> length (head (tail lns)) -- force next element
<waits for second line of input>
<then shows length of your second line before prompt>
>
Anyway, with this definition of lazySequence and types:
parse :: [ByteString] -> [Object]
input :: [IO ByteString]
you should have no trouble writing:
outputs :: IO [Object]
outputs = parse <$> lazySequence inputs
and then using it lazily however you want:
main = do
objs <- outputs
mapM_ doSomethingWithObj objs
Using Conduit
Even though the above lazy IO mechanism is pretty simple and straightforward, lazy IO has fallen out of favor for production code due to issues with resource management, fragility with respect to space leaks (where a small change to your code blows up the memory footprint), and problems with exception handling.
One solution is the conduit library. Another is pipes. Both are carefully designed streaming libraries that can support infinite streams.
For conduit, if you had a parse function that created one object per byte string, like:
parse1 :: ByteString -> Object
parse1 = ...
then given:
inputs :: [IO ByteString]
inputs = ...
useObject :: Object -> IO ()
useObject = ...
the conduit would look something like:
import Conduit
main :: IO ()
main = runConduit $ mapM_ yieldM inputs
.| mapC parse1
.| mapM_C useObject
Given that your parse function has signature:
parse :: [ByteString] -> [Object]
I'm pretty sure you can't integrate this with conduit directly (or at least not in any way that wouldn't toss out all the benefits of using conduit). You'd need to rewrite it to be conduit friendly in how it consumed byte strings and produced objects.
Consider the two following variations:
myReadListTailRecursive :: IO [String]
myReadListTailRecursive = go []
where
go :: [String] -> IO [String]
go l = do {
inp <- getLine;
if (inp == "") then
return l;
else go (inp:l);
}
myReadListOrdinary :: IO [String]
myReadListOrdinary = do
inp <- getLine
if inp == "" then
return []
else
do
moreInps <- myReadListOrdinary
return (inp:moreInps)
In ordinary programming languages, one would know that the tail recursive variant is a better choice.
However, going through this answer, it is apparent that haskell's implementation of recursion is not similar to that of using the recursion stack repeatedly.
But because in this case the program in question involves actions, and a strict monad, I am not sure if the same reasoning applies. In fact, I think in the IO case, the tail recursive form is indeed better. I am not sure how to correctly reason about this.
EDIT: David Young pointed out that the outermost call here is to (>>=). Even in that case, does one of these styles have an advantage over the other?
FWIW, I'd go for existing monadic combinators and focus on readability/consiseness. Using unfoldM :: Monad m => m (Maybe a) -> m [a]:
import Control.Monad (liftM, mfilter)
import Control.Monad.Loops (unfoldM)
myReadListTailRecursive :: IO [String]
myReadListTailRecursive = unfoldM go
where
go :: IO (Maybe String)
go = do
line <- getLine
return $ case line of
"" -> Nothing
s -> Just s
Or using MonadPlus instance of Maybe, with mfilter :: MonadPlus m => (a -> Bool) -> m a -> m a:
myReadListTailRecursive :: IO [String]
myReadListTailRecursive = unfoldM (liftM (mfilter (/= "") . Just) getLine)
Another, more versatile option, might be to use LoopT.
That’s really not how I would write it, but it’s clear enough what you’re doing. (By the way, if you want to be able to efficiently insert arbitrary output from any function in the chain, without using monads, you might try a Data.ByteString.Builder.)
Your first implementation is very similar to a left fold, and your second very similar to a right fold or map. (You might try actually writing them as such!) The second one has several advantages for I/O. One of the most important, for handling input and output, is that it can be interactive.
You’ll notice that the first builds the entire list from the outside in: in order to determine what the first element of the list is, the program needs to compute the entire structure to get to the innermost thunk, which is return l. The program generates the entire data structure first, then starts to process it. That’s useful when you’re reducing a list, because tail-recursive functions and strict left folds are efficient.
With the second, the outermost thunk contains the head and tail of the list, so you can grab the tail, then call the thunk to generate the second list. This can work with infinite lists, and it can produce and return partial results.
Here’s a contrived example: a program that reads in one integer per line and prints the sums so far.
main :: IO ()
main = interact( display . compute 0 . parse . lines )
where parse :: [String] -> [Int]
parse [] = []
parse (x:xs) = (read x):(parse xs)
compute :: Int -> [Int] -> [Int]
compute _ [] = []
compute accum (x:xs) = let accum' = accum + x
in accum':(compute accum' xs)
display = unlines . map show
If you run this interactively, you’ll get something like:
$ 1
1
$ 2
3
$ 3
6
$ 4
10
But you could also write compute tail-recursively, with an accumulating parameter:
main :: IO ()
main = interact( display . compute [] . parse . lines )
where parse :: [String] -> [Int]
parse = map read
compute :: [Int] -> [Int] -> [Int]
compute xs [] = reverse xs
compute [] (y:ys) = compute [y] ys
compute (x:xs) (y:ys) = compute (x+y:x:xs) ys
display = unlines . map show
This is an artificial example, but strict left folds are a common pattern. If, however, you write either compute or parse with an accumulating parameter, this is what you get when you try to run interactively, and hit EOF (control-D on Unix, control-Z on Windows) after the number 4:
$ 1
$ 2
$ 3
$ 4
1
3
6
10
This left-folded version needs to compute the entire data structure before it can read any of it. That can’t ever work on an infinite list (When would you reach the base case? How would you even reverse an infinite list if you did?) and an application that can’t respond to user input until it quits is a deal-breaker.
On the other hand, the tail-recursive version can be strict in its accumulating parameter, and will run more efficiently, especially when it’s not being consumed immediately. It doesn’t need to keep any thunks or context around other than its parameters, and it can even re-use the same stack frame. A strict accumulating function, such as Data.List.foldl', is a great choice whenver you’re reducing a list to a value, not building an eagerly-evaluated list of output. Functions such as sum, product or any can’t return any useful intermediate value. They inherently have to finish the computation first, then return the final result.
I have a file number.txt which contains a large number and I read it into an IO String like this:
readNumber = readFile "number.txt" >>= return
In another function I want to create a list of Ints, one Int for each digit…
Lets assume the content of number.txt is:
1234567890
Then I want my function to return [1,2,3,4,5,6,7,8,9,0].
I tried severall versions with map, mapM(_), liftM, and, and, and, but I got several error messages everytime, which I was able to reduce to
Couldn't match expected type `[m0 Char]'
with actual type `IO String'
The last version I have on disk is the following:
module Main where
import Control.Monad
import Data.Char (digitToInt)
main = intify >>= putStrLn . show
readNumber = readFile "number.txt" >>= return
intify = mapM (liftM digitToInt) readNumber
So, as far as I understand the error, I need some function that takes IO [a] and returns [IO a], but I was not able to find such thing with hoogle… Only the other way round seemes to exist
In addition to the other great answers here, it's nice to talk about how to read [IO Char] versus IO [Char]. In particular, you'd call [IO Char] "an (immediate) list of (deferred) IO actions which produce Chars" and IO [Char] "a (deferred) IO action producing a list of Chars".
The important part is the location of "deferred" above---the major difference between a type IO a and a type a is that the former is best thought of as a set of instructions to be executed at runtime which eventually produce an a... while the latter is just that very a.
This phase distinction is key to understanding how IO values work. It's also worth noting that it can be very fluid within a program---functions like fmap or (>>=) allow us to peek behind the phase distinction. As an example, consider the following function
foo :: IO Int -- <-- our final result is an `IO` action
foo = fmap f getChar where -- <-- up here getChar is an `IO Char`, not a real one
f :: Char -> Int
f = Data.Char.ord -- <-- inside here we have a "real" `Char`
Here we build a deferred action (foo) by modifying a deferred action (getChar) by using a function which views a world that only comes into existence after our deferred IO action has run.
So let's tie this knot and get back to the question at hand. Why can't you turn an IO [Char] into an [IO Char] (in any meaningful way)? Well, if you're looking at a piece of code which has access to IO [Char] then the first thing you're going to want to do is sneak inside of that IO action
floob = do chars <- (getChars :: IO [Char])
...
where in the part left as ... we have access to chars :: [Char] because we've "stepped into" the IO action getChars. This means that by this point we've must have already run whatever runtime actions are required to generate that list of characters. We've let the cat out of the monad and we can't get it back in (in any meaningful way) since we can't go back and "unread" each individual character.
(Note: I keep saying "in any meaningful way" because we absolutely can put cats back into monads using return, but this won't let us go back in time and have never let them out in the first place. That ship has sailed.)
So how do we get a type [IO Char]? Well, we have to know (without running any IO) what kind of IO operations we'd like to do. For instance, we could write the following
replicate 10 getChar :: [IO Char]
and immediately do something like
take 5 (replicate 10 getChar)
without ever running an IO action---our list structure is immediately available and not deferred until the runtime has a chance to get to it. But note that we must know exactly the structure of the IO actions we'd like to perform in order to create a type [IO Char]. That said, we could use yet another level of IO to peek at the real world in order to determine the parameters of our action
do len <- (figureOutLengthOfReadWithoutActuallyReading :: IO Int)
return $ replicate len getChar
and this fragment has type IO [IO Char]. To run it we have to step through IO twice, we have to let the runtime perform two IO actions, first to determine the length and then second to actually act on our list of IO Char actions.
sequence :: [IO a] -> IO [a]
The above function, sequence, is a common way to execute some structure containing a, well, sequence of IO actions. We can use that to do our two-phase read
twoPhase :: IO [Char]
twoPhase = do len <- (figureOutLengthOfReadWithoutActuallyReading :: IO Int)
putStrLn ("About to read " ++ show len ++ " characters")
sequence (replicate len getChar)
>>> twoPhase
Determining length of read
About to read 22 characters
let me write 22 charac"let me write 22 charac"
You got some things mixed up:
readNumber = readFile "number.txt" >>= return
the return is unecessary, just leave it out.
Here is a working version:
module Main where
import Data.Char (digitToInt)
main :: IO ()
main = intify >>= print
readNumber :: IO String
readNumber = readFile "number.txt"
intify :: IO [Int]
intify = fmap (map digitToInt) readNumber
Such a function can't exists, because you would be able to evaluate the length of the list without ever invoking any IO.
What is possible is this:
imbue' :: IO [a] -> IO [IO a]
imbue' = fmap $ map return
Which of course generalises to
imbue :: (Functor f, Monad m) => m (f a) -> m (f (m a))
imbue = liftM $ fmap return
You can then do, say,
quun :: IO [Char]
bar :: [IO Char] -> IO Y
main = do
actsList <- imbue quun
y <- bar actsLists
...
Only, the whole thing about using [IO Char] is pointless: it's completely equivalent to the much more straightforward way of working only with lists of "pure values", only using the IO monad "outside"; how to do that is shown in Markus's answer.
Do you really need many different helper functions? Because you may write just
main = do
file <- readFile "number.txt"
let digits = map digitToInt file
print digits
or, if you really need to separate them, try to minimize the amount of IO signatures:
readNumber = readFile "number.txt" --Will be IO String
intify = map digitToInt --Will be String -> [Int], not IO
main = readNumber >>= print . intify
writeStr []=putChar ' '
writeStr (x:xs) = (putChar x)(writeStr xs)
Hello, and thanks in advance, i get a type error, it should be a simple answer but i just dont get where the error is coming from.
Your code is a bit strange. If I got it right, you try to print a string. Your method is to put the first string, and than the second. But it's not possible in Haskell to combine two IO actions like this. Have a look in your tutorial again about this, here's hwo it could look like:
writeStr [] = return () -- you had putChar ' ',
writeStr (x:xs) = do putChar x -- but this would print a superfluous whtiespace
writeStr xs
If you want to do several things sequentially, either use the do-keyword or monadic combinators. Its very easy, just like this:
do action1
action2
action3
...
FUZxxl has answered the immediate question, but I'd like to expand on it with some more ways of writing "writeStr" to illustrate more about monads.
As delnan said in the comments, you can also write
writeStr [] = return ()
writeStr (x:xs) = putChar x >> writeStr xs
This is actually the "desugared" version of the "do" notation. The ">>" operator is used to daisy-chain monadic actions together. Its actually a specialised version of the "bind" operator, written ">>=". See this question for more details.
But when you look at this, it seems that all we are doing is applying "putChar" to each element in the argument list. There is already a function in the Prelude called "map" for doing this, so maybe we could write:
writeStr xs = map putChar xs
But when you try that it won't work. The reason becomes evident if you go into GHCi and type this:
:type map putChar "Hello"
[IO ()]
You want a single "IO()" action, but this gives you a list of them. What you need is a function that turns this list of IO actions into a single IO action. Fortunately one exists. The Prelude contains these two functions
sequence :: [IO a] -> IO [a]
sequence_ :: [IO a] -> IO ()
The first one is for when you want the list of results, the second is for cases where you don't, like this one. (Throughout this answer I'm going to be giving the IO-specific type signatures for clarity, but its important to remember that all these functions actually work for any monad.)
So now you can write:
writeStr xs = sequence_ $ map putChar xs
But there is a way of shortening this. Recall the "." operator, which sticks two functions together, and the way Haskell has of "currying" function arguments? We can rewrite the function above as:
writeStr = sequence_ . map putChar
This "point-free" style looks and feels very odd at first; it makes "writeStr" look more like a constant than a function. But it avoids the need to track variable names around the code when you are reading it, and so is often preferred. Its also a lot shorter and more readable when you are putting something complicated as the argument to "map" or similar higher order functions.
But we can go even shorter. The "sequence . map f" pattern is very common, so the "Control.Monad" module defines a couple more functions to embody it:
mapM :: (a -> IO b) -> [a] -> IO [b]
mapM f = sequence . map f
mapM_ :: (a -> IO b) -> [a] -> IO ()
mapM_ f = sequence_ . map f
So you can finally write
writeStr = mapM_ putChar
The following program terminates correctly:
import System.Random
randomList = mapM (\_->getStdRandom (randomR (0, 50000::Int))) [0..5000]
main = do
randomInts <- randomList
print $ take 5 randomInts
Running:
$ runhaskell test.hs
[26156,7258,29057,40002,26339]
However, feeding it with an infinite list, the program never terminates, and when compiled, eventually gives a stack overflow error!
import System.Random
randomList = mapM (\_->getStdRandom (randomR (0, 50000::Int))) [0..]
main = do
randomInts <- randomList
print $ take 5 randomInts
Running,
$ ./test
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.
I expected the program to lazily evaluate getStdRandom each time I pick an item off the list, finishing after doing so 5 times. Why is it trying to evaluate the whole list?
Thanks.
Is there a better way to get an infinite list of random numbers? I want to pass this list into a pure function.
EDIT: Some more reading revealed that the function
randomList r = do g <- getStdGen
return $ randomRs r g
is what I was looking for.
EDIT2: after reading camccann's answer, I realized that getStdGen is getting a new seed on every call. Instead, better to use this function as a simple one-shot random list generator:
import System.Random
randomList :: Random a => a -> a -> IO [a]
randomList r g = do s <- newStdGen
return $ randomRs (r,g) s
main = do r <- randomList 0 (50::Int)
print $ take 5 r
But I still don't understand why my mapM call did not terminate. Evidently not related to random numbers, but something to do with mapM maybe.
For example, I found that the following also does not terminate:
randomList = mapM (\_->return 0) [0..]
main = do
randomInts <- randomList
print $ take 50000 randomInts
What gives? By the way, IMHO, the above randomInts function should be in System.Random. It's extremely convenient to be able to very simply generate a random list in the IO monad and pass it into a pure function when needed, I don't see why this should not be in the standard library.
Random numbers in general are not strict, but monadic binding is--the problem here is that mapM has to sequence the entire list. Consider its type signature, (a -> m b) -> [a] -> m [b]; as this implies, what it does is first map the list of type [a] into a list of type [m b], then sequence that list to get a result of type m [b]. So, when you bind the result of applying mapM, e.g. by putting it on the right-hand side of <-, what this means is "map this function over the list, then execute each monadic action, and combine the results back into a single list". If the list is infinite, this of course won't terminate.
If you simply want a stream of random numbers, you need to generate the list without using a monad for each number. I'm not entirely sure why you've used the design you have, but the basic idea is this: Given a seed value, use a pseudo-random number generator to produce a pair of 1) a random number 2) a new seed, then repeat with the new seed. Any given seed will of course provide the same sequence each time. So, you can use the function getStdGen, which will provide a fresh seed in the IO monad; you can then use that seed to create an infinite sequence in completely pure code.
In fact, System.Random provides functions for precisely that purpose, randoms or randomRs instead of random and randomR.
If for some reason you want to do it yourself, what you want is essentially an unfold. The function unfoldr from Data.List has the type signature (b -> Maybe (a, b)) -> b -> [a], which is fairly self-explanatory: Given a value of type b, it applies the function to get either something of type a and a new generator value of type b, or Nothing to indicate the end of the sequence.
You want an infinite list, so will never need to return Nothing. Thus, partially applying randomR to the desired range and composing it with Just gives this:
Just . randomR (0, 50000::Int) :: (RandomGen a) => a -> Maybe (Int, a)
Feeding that into unfoldr gives this:
unfoldr (Just . randomR (0, 50000::Int)) :: (RandomGen a) => a -> [Int]
...which does exactly as it claims: Given an instance of RandomGen, it will produce an infinite (and lazy) list of random numbers generated from that seed.
I would do something more like this, letting randomRs do the work with an initial RandomGen:
#! /usr/bin/env runhaskell
import Control.Monad
import System.Random
randomList :: RandomGen g => g -> [Int]
randomList = randomRs (0, 50000)
main :: IO ()
main = do
randomInts <- liftM randomList newStdGen
print $ take 5 randomInts
As for the laziness, what's happening here is that mapM is (sequence . map)
Its type is: mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]
It's mapping the function, giving a [m b] and then needs to execute all those actions to make an m [b]. It's the sequence that'll never get through the infinite list.
This is explained better in the answers to a prior question: Is Haskell's mapM not lazy?