I want to wait until user input terminates with EOF and then output it all whole. Isn't that what getContents supposed to do? The following code outputs each time user hits enter, what am I doing wrong?
import System.IO
main = do
hSetBuffering stdin NoBuffering
contents <- getContents
putStrLn contents
The fundamental problem is that getContents is an instances of Lazy IO. This means that getContents produces a thunk that can be evaluated like a normal Haskell value, and only does the relevant IO when it's forced.
contents is a lazy list that putStr tries to print, which forces the list and causes getContents to read as much as it can. putStr then prints everything that's forced, and continues trying to force the rest of the list until it hits []. As getContents can read more and more of the stream—the exact behavior depends on buffering—putStr can print more and more of it immediately, giving you the behavior you see.
While this behavior is useful for very simple scripts, it ties in Haskell's evaluation order into observable effects—something it was never meant to do. This means that controlling exactly when parts of contents get printed is awkward because you have to break the normal Haskell abstraction and understand exactly how things are getting evaluated.
This leads to some potentially unintuitive behavior. For example, if you try to get the length of the input—and actually use it—the list is forced before you get to printing it, giving you the behavior you want:
main = do
contents <- getContents
let n = length contents
print n
putStr contents
but if you move the print n after the putStr, you go back to the original behavior because n does not get forced until after printing the input (even though n still got defined before putStr was used):
main = do
contents <- getContents
let n = length contents
putStr contents
print n
Normally, this sort of thing is not a problem because it won't change the behavior of your code (although it can affect performance). Lazy IO just brings it into the realm of correctness by piercing the abstraction layer.
This also gives us a hint on how we can fix your issue: we need some way of forcing contents before printing it. As we saw, we can do this with length because length needs to traverse the whole list before computing its result. Instead of printing it, we can use seq which forces the lefthand expression to be evaluated at the same time as the righthand one, but throws away the actual value:
main = do
contents <- getContents
let n = length contents
n `seq` putStr contents
At the same time, this is still a bit ugly because we're using length just to traverse the list, not because we actually care about it. What we would really like is a function that just traverses the list enough to evaluate it, without doing anything else. Happily, this is exactly what deepseq does (for many data structures, not just lists):
import Control.DeepSeq
import System.IO
main = do
contents <- getContents
contents `deepseq` putStr contents
This is a problem of lazy I/O. One simple solution is to use strict I/O, such as via ByteStrings:
import qualified Data.ByteString as S
main :: IO ()
main = S.getContents >>= S.putStr
You can use the replacement functions from the strict package (link):
import qualified System.IO.Strict as S
main = do
contents <- S.getContents
putStrLn contents
Note that for reading there isn't a need to set buffering. Buffering really only helps when writing to files. See this answer (link) for more details.
The definition of the strict version of hGetContents in System.IO.Strict is pretty simple:
hGetContents :: IO.Handle -> IO.IO String
hGetContents h = IO.hGetContents h >>= \s -> length s `seq` return s
I.e., it forces everything to read into memory by calling length on the string returned by the standard/lazy version of hGetContents.
Related
I understand that I should not try to re-read from stdin because of errors about Haskell IO - handle closed
For example, in below:
main = do
x <- getContents
putStrLn $ map id x
x <- getContents --problem line
putStrLn x
the second call x <- getContents will cause the error:
test: <stdin>: hGetContents: illegal operation (handle is closed)
Of course, I can omit the second line to read from getContents.
main = do
x <- getContents
putStrLn $ map id x
putStrLn x
But will this become a performance/memory issue? Will GHC have to keep all of the contents read from stdin in the main memory?
I imagine the first time around when x is consumed, GHC can throw away the portions of x that are already processed. So theoretically, GHC could only use a small amount of constant memory for the processing. But since we are going to use x again (and again), it seems that GHC cannot throw away anything. (Nor can it read again from stdin).
Is my understanding about the memory implications here correct? And if so, is there a fix?
Yes, your understanding is correct: If you reuse x, ghc has to keep it all in memory.
I think a possible fix is to consume it lazily (once).
Let's say you want to output x to several output handles hdls :: [Handle]. The naive approach is:
main :: IO ()
main = do
x <- getContents
forM_ hdls $ \hdl -> do
hPutStr hdl x
This will read stdin into x as the first hPutStr traverses the string (at least for unbuffered handles, hPutStr is simply a loop that calls hPutChar for each character in the string). From then on it'll be kept in memory for all following hdls.
Alternatively:
main :: IO ()
main = do
x <- getContents
forM_ x $ \c -> do
forM_ hdls $ \hdl -> do
hPutChar hdl c
Here we've transposed the loops: Instead of iterating over the handles (and for each handle iterating over the input characters), we iterate over the input characters, and for each character, we print it to each handle.
I haven't tested it, but this form should guarantee that we don't need a lot of memory because each input character c is used once and then discarded.
Haskell IO is often explained in terms of the entire program being a pure function (main) that returns an IO value (often described as an imperative IO program), which is then executed by the runtime.
This mental model works fine for simple examples, but fell over for me as soon as I saw a recursive main in Learn You A Haskell. For example:
main = do
line <- getLine
putStrLn line
main
Or, if you prefer:
main = getLine >>= putStrLn >> main
Since main never terminates, it never actually returns an IO value, yet the program endlessly reads and echoes back lines just fine - so the simple explanation above doesn't quite work. Am I missing something simple or is there a more complete explanation (or is it 'simply' compiler magic) ?
In this case, main is a value of type IO () rather than a function. You can think of it as a sequence of IO a values:
main = getLine >>= putStrLn >> main
This makes it a recursive value, not unlike infinite lists:
foo = 1 : 2 : foo
We can return a value like this without needing to evaluate the whole thing. In fact, it's a reasonably common idiom.
foo will loop forever if you try to use the whole thing. But that's true of main too: unless you use some external method to break out of it, it will never stop looping! But you can start getting elements out of foo, or executing parts of main, without evaluating all of it.
The value main denotes is an infinite program:
main = do
line <- getLine
putStrLn line
line <- getLine
putStrLn line
line <- getLine
putStrLn line
line <- getLine
putStrLn line
line <- getLine
putStrLn line
line <- getLine
putStrLn line
...
But it's represented in memory as a recursive structure that references itself. That representation is finite, unless someone tries to unfold the entire thing to get a non-recursive representation of the entire program - that would never finish.
But just as you can probably figure out how to start executing the infinite program I wrote above without waiting for me to tell you "all" of it, so can Haskell's runtime system figure out how to execute main without unfolding the recursion up-front.
Haskell's lazy evaluation is actually interleaved with the runtime system's execution of the main IO program, so this works even for a function that returns an IO action which recursively invokes the function, like:
main = foo 1
foo :: Integer -> IO ()
foo x = do
print x
foo (x + 1)
Here foo 1 is not a recursive value (it contains foo 2, not foo 1), but it's still an infinite program. However this works just fine, because the program denoted by foo 1 is only generated lazily on-demand; it can be produced as the runtime system's execution of main goes along.
By default Haskell's laziness means that nothing is evaluated until it's needed, and then only "just enough" to get past the current block. Ultimately the source of all the "need" in "until it's needed" comes from the runtime system needing to know what the next step in the main program is so it can execute it. But it's only ever the next step; the rest of the program after that can remain unevaluated until after the next step has been fully executed. So infininte programs can be executed and do useful work so long as it's always only a finite amount of work to generate "one more step".
I implemented withFile in Haskell:
withFile' :: FilePath -> IOMode -> (Handle -> IO a) -> IO a
withFile' path iomode f = do
handle <- openFile path iomode
result <- f handle
hClose handle
return result
When I ran the main provided by Learn You a Haskell, it printed out the content of "girlfriend.txt," as expected:
import System.IO
main = do
withFile' "girlfriend.txt" ReadMode (\handle -> do
contents <- hGetContents handle
putStr contents)
I wasn't sure if my withFile' would've worked with the last 2 lines: (1) close the handle and (2) returning the result as anIO a.
Why didn't the following happen?
result gets lazily bound to f handle
hClose handle closes the file handle
result gets return'd, which results in the actual evaluate of f handle. Since handle was closed, an error gets thrown.
Lazy IO is popularly known as confusing.
It depends on whether putStr executes before hClose or not.
Notice the difference between the first and second uses (the brackets are unnecessary but clarifying in the second example).
ghci> withFile' "temp.hs" ReadMode (hGetContents >=> putStr) -- putStr
import System.IO
import Control.Monad
withFile' :: FilePath -> IOMode -> (Handle -> IO a) -> IO a
withFile' path iomode f = do
handle <- openFile path iomode
result <- f handle
hClose handle
return result
ghci> (withFile' "temp.hs" ReadMode hGetContents) >>= putStr
ghci>
In both cases, the f passed in gets a chance to run before the handle is closed. Because of lazy evaluation, hGetContents only reads the file if it needs to, i.e. is forced to in order to produce output for some other function.
In the first example, since f is (hGetContents >=> putStr), the full contents of the file must be read in order to execute putStr.
In the second example, nothing needs to be evaluated after hGetContents in order to return result, which is a lazy list. (I can quite happily return (show [1..]) which will only fail to terminate if I choose to use the entire output.) This is seen as a problem for lazy IO, which is fixed by alternatives such as strict IO, pipes or conduit.
Maybe returning the empty string for a file when the handle was closed prematurely is a bug, but certainly running the entirety of f before closing it is not.
Equational reasoning means that you can reason about Haskell code by just inlining and substituting things (with certain caveats, but they don't apply here).
This means that all I need to do to understand your code is to take the withFile' here:
import System.IO
main = do
withFile' "girlfriend.txt" ReadMode (\handle -> do
contents <- hGetContents handle
putStr contents)
... and inline its definition:
main = do
handle <- openFile "girlfriend.txt" ReadMode
contents <- hGetContents handle
result <- putStr contents
hClose handle
return result
Once you inline its definition, it's easier to see what is going on. putStr evaluates the entire contents of the file before you close the handle, so there is no error. Also, result is not what you think it is: it's the return value of putStr, which is just (), not the contents of the file.
Most IO actions are not lazily executed.
IO action execution is different from normal Haskell evaluation of values. IO execution is only ever carried out by the outer driver that is trying to execute all the effects of main; it does so in the correct order implied by the monadic sequencing of IO actions.
The driver's need to know what the next IO action is ultimately triggers all evaluation of lazy values in Haskell; if it were happy with an unevaluated lazy value and moved on to the next thing without fully evaluating and executing it, then it would just leave main unevaluated and no Haskell program could ever do anything.
The Haskell value resulting from executing an IO action may of course be an unevaluated lazy value, but each IO action itself is evaluated and executed by the driver (including all sub-actions sequenced with do blocks or binds).
So result doesn't get lazily bound to f handle completely unevaluated; f handle is evaluated to come up with the sub actions hGetContents handle and putStr contents. These are both fully executed before the outer driver moves on to hClose handle, so everything's okay.
Note however that hGetContents is special. Quoting from the documentation:
Computation hGetContents hdl returns the list of characters corresponding to the unread portion of the channel or file managed by hdl, which is put into an intermediate state, semi-closed. In this state, hdl is effectively closed, but items are read from hdl on demand and accumulated in a special list returned by hGetContents hdl.
Any operation that fails because a handle is closed, also fails if a handle is semi-closed. The only exception is hClose. A semi-closed handle becomes closed:
if hClose is applied to it;
if an I/O error occurs when reading an item from the handle;
or once the entire contents of the handle has been read.
Once a semi-closed handle becomes closed, the contents of the associated list becomes fixed. The contents of this final list is only partially specified: it will contain at least all the items of the stream that were evaluated prior to the handle becoming closed.
So executing hGetContents handle actually results in a partially evaluated list, whose lazy evaluation is tied to further IO operations under the hood. This is impossible to do yourself without using the Unsafe family of operations, since it is essentially bypassing the type system and can result in exactly the sort of problem you were concerned about; if you had attempted the following code:
main = do
text <- withFile' "girlfriend.txt" ReadMode (\handle -> do
contents <- hGetContents handle
return contents)
putStr text
(where the function passed to withFile' tries to return the file contents, and they are passed to putStr after the withFile' call), then the putStr would be executed after hClose, and the file may well not have been fully read before it was closed.
I have a program in haskell that has to read arbitrary lines of input from the user and when the user is finished the accumulated input has to be sent to a function.
In an imperative programming language this would look like this:
content = ''
while True:
line = readLine()
if line == 'q':
break
content += line
func(content)
I find this incredibly difficult to do in haskell so I would like to know if there's an haskell equivalent.
The Haskell equivalent to iteration is recursion. You would also need to work in the IO monad, if you have to read lines of input. The general picture is:
import Control.Monad
main = do
line <- getLine
unless (line == "q") $ do
-- process line
main
If you just want to accumulate all read lines in content, you don't have to do that. Just use getContents which will retrieve (lazily) all user input. Just stop when you see the 'q'. In quite idiomatic Haskell, all reading could be done in a single line of code:
main = mapM_ process . takeWhile (/= "q") . lines =<< getContents
where process line = do -- whatever you like, e.g.
putStrLn line
If you read the first line of code from right to left, it says:
get everything that the user will provide as input (never fear, this is lazy);
split it in lines as it comes;
only take lines as long as they're not equal to "q", stop when you see such a line;
and call process for each line.
If you didn't figure it out already, you need to read carefully a Haskell tutorial!
It's reasonably simple in Haskell. The trickiest part is that you want to accumulate the sequence of user inputs. In an imperative language you use a loop to do this, whereas in Haskell the canonical way is to use a recursive helper function. It would look something like this:
getUserLines :: IO String -- optional type signature
getUserLines = go ""
where go contents = do
line <- getLine
if line == "q"
then return contents
else go (contents ++ line ++ "\n") -- add a newline
This is actually a definition of an IO action which returns a String. Since it is an IO action, you access the returned string using the <- syntax rather than the = assignment syntax. If you want a quick overview, I recommend reading The IO Monad For People Who Simply Don't Care.
You can use this function at the GHCI prompt like this
>>> str <- getUserLines
Hello<Enter> -- user input
World<Enter> -- user input
q<Enter> -- user input
>>> putStrLn str
Hello -- program output
World -- program output
Using pipes-4.0, which is coming out this weekend:
import Pipes
import qualified Pipes.Prelude as P
f :: [String] -> IO ()
f = ??
main = do
contents <- P.toListM (P.stdinLn >-> P.takeWhile (/= "q"))
f contents
That loads all the lines into memory. However, you can also process each line as it is being generated, too:
f :: String -> IO ()
main = runEffect $
for (P.stdinLn >-> P.takeWhile (/= "q")) $ \str -> do
lift (f str)
That will stream the input and never load more than one line into memory.
You could do something like
import Control.Applicative ((<$>))
input <- unlines . takeWhile (/= "q") . lines <$> getContents
Then input would be what the user wrote up until (but not including) the q.
I am trying to parse an input stream where the first line tells me how many lines of data there are. I'm ending up with the following code, and it works, but I think there is a better way. Is there?
main = do
numCases <- getLine
proc $ read numCases
proc :: Integer -> IO ()
proc numCases
| numCases == 0 = return ()
| otherwise = do
str <- getLine
putStrLn $ findNextPalin str
proc (numCases - 1)
Note: The code solves the Sphere problem https://www.spoj.pl/problems/PALIN/ but I didn't think posting the rest of the code would impact the discussion of what to do here.
Use replicate and sequence_.
main, proc :: IO ()
main = do numCases <- getLine
sequence_ $ replicate (read numCases) proc
proc = do str <- getLine
putStrLn $ findNextPalin str
sequence_ takes a list of actions, and runs them one after the other, in sequence. (Then it throws away the results; if you were interested in the return values from the actions, you'd use sequence.)
replicate n x makes a list of length n, with each element being x. So we use it to build up the list of actions we want to run.
Dave Hinton's answer is correct, but as an aside here's another way of writing the same code:
import Control.Applicative
main = (sequence_ . proc) =<< (read <$> getLine)
proc x = replicate x (putStrLn =<< (findNextPalin <$> getLine))
Just to remind everyone that do blocks aren't necessary! Note that in the above, both =<< and <$> stand in for plain old function application. If you ignore both operators, the code reads exactly the same as similarly-structured pure functions would. I've added some gratuitous parentheses to make things more explicit.
Their purpose is that <$> applies a regular function inside a monad, while =<< does the same but then compresses an extra layer of the monad (e.g., turning IO (IO a) into IO a).
The interesting part of looking at code this way is that you can mostly ignore where the monads and such are; typically there's very few ways to place the "function application" operators to make the types work.
You (and the previous answers) should work harder to divide up the IO from the logic. Make main gather the input and separately (purely, if possible) do the work.
import Control.Monad -- not needed, but cleans some things up
main = do
numCases <- liftM read getLine
lines <- replicateM numCases getLine
let results = map findNextPalin lines
mapM_ putStrLn results
When solving SPOJ problems in Haskell, try not to use standard strings at all. ByteStrings are much faster, and I've found you can usually ignore the number of tests and just run a map over everything but the first line, like so:
{-# OPTIONS_GHC -O2 -optc-O2 #-}
import qualified Data.ByteString.Lazy.Char8 as BS
main :: IO ()
main = do
(l:ls) <- BS.lines `fmap` BS.getContents
mapM_ findNextPalin ls
The SPOJ page in the Haskell Wiki gives a lot of good pointers about how to read Ints from ByteStrings, as well as how to deal with a large quantities of input. It'll help you avoid exceeding the time limit.