What caused this "delayed read on closed handle" error? - haskell

I just installed GHC from the latest sources, and now my program gives me an error message about a "delayed read on closed handle". What does this mean?

The fundamental lazy I/O primitive, hGetContents, produces a String lazily—it only reads from the handle as needed to produce the parts of the string your program actually demands. Once the handle has been closed, however, it is no longer possible to read from the handle, and if you try to inspect a part of the string that was not yet read, you will get this exception. For example, suppose you write
main = do
most <- withFile "myfile" ReadMode
(\h -> do
s <- hGetContents h
let (first12,rest) = splitAt 12 s
print first12
return rest)
putStrLn most
GHC opens myfile and sets it up for lazy reading into the string we've bound to s. It does not actually begin reading from the file. Then it sets up a lazy computation to split the string after 12 characters. Then print forces that computation, and GHC reads in a chunk of myfile at least 12 characters long, and prints out the first twelve. It then closes the file when withFile completes, and attempts to print out the rest. If the file was longer than the chunk GHC buffered, you will get the delayed read exception once it reaches the end of the chunk.
How to avoid this problem
You need to be sure that you've actually read everything you need before closing the file or returning from withFile. If the function you pass to withFile just does some IO and returns a constant (such as ()), you don't need to worry about this. If you need to it to produce a real value from a lazy read, you need to be sure to force that value sufficiently before returning. In the example above, you can force the string to "normal form" using a function or operator from the Control.DeepSeq module:
return $!! rest
This ensures that the rest of the string is actually read before withFile closes the file. The $!! approach also works perfectly well if what you return is some value calculated from the file contents, as long as it's an instance of the NFData class. In this case, and many others, it's even better to simply move the rest of the code for processing the file contents into the function passed to withFile, like this:
main = withFile "myfile" ReadMode
(\h -> do
s <- hGetContents h
let (first12,rest) = splitAt 12 s
print first12
putStrLn rest)
Another function to consider, as an alternative, is readFile. readFile holds the file open until it has finished reading the file. You should only use readFile, however, if you know that you will actually demand the entire contents of the file—otherwise you could leak file descriptors.
History
According to the Haskell Report, once the handle is closed, the contents of the string become fixed.
In the past, GHC has simply ended the string at the end of whatever was buffered at the time the handle was closed. For example, if you had inspected the first 10 characters of the string before you closed the handle, and GHC had buffered an additional 634 characters, but not reached the end of the file, then you would get a normal string with 644 characters. This was a common source of confusion among new users and an occasional source of bugs in production code.
As of GHC 7.10.1, this behavior is changing. When you close a handle that you are reading from lazily, it now effectively puts an exception at the end of the buffer instead of the usual :"". So if you attempt to inspect the string beyond the point where the file was closed, you will get an error message.

Related

Why doesn't hSetBuffering return a new handle instead of changing the given handle?

In Haskell, as we try to write most of our code in immutable way by not changing variables or passed parameters and instead we create a new value from the old one with required changes.
main = do
withFile "something.txt" ReadMode (\handle -> do
hSetBuffering handle $ BlockBuffering (Just 2048)
contents <- hGetContents handle
putStr contents)
Then what is the reason than hSetBuffering, a function that takes a handle and sets its buffering mode, changes the passed handle itself instead of returning a new handle with required buffering mode?
With regular Haskell values, there is no problem keeping older versions of a value around. However, Handles are references to mutable resources allocated with the operating system, and carry state. After calling a version of hSetBufferingthat returned a new Handle, what should happen to earlier versions of the Handle that are still kept around? Should they reflect the change? If the answer is yes, then the new-handle-returning version of hSetBuffering is a bit of a lie.
This new-handle-returning version of hSetBuffering could work if the type system somehow disallowed keeping old versions of the Handle after calling the function. It could do that by enforcing a constraint: functions that receive a Handle as parameter can only use that parameter one single time, and functions that "duplicate" handles like dup :: Handle -> (Handle,Handle) are disallowed.
There is a (not yet accepted) proposal to extend Haskell with the ability to enforce such restrictions. In fact, file operations are one of the motivating examples. From section 2.3 of the paper:
type File
openFile :: FilePath → IOL 1 File
readLine :: File ⊸ IOL 1 (File,Unrestricted ByteString)
closeFile :: File ⊸ IOL ω ()
Under this proposal, we can only have a single version of a File around at any given time. closeFile makes the reference to File unavailable so that we can't close an already closed file. Every read operation takes the previous version of the File and returns a new one along with the read data. And hSetBuffering would have a type like:
hSetBuffering :: BufferingMode -> File ⊸ IOL 1 File

Reading a couple of lines of a pipe obtained with createProcess and then closing it

I have a function that spawns a process as follows:
(_, Just outh, _, ph) <- createProcess $
(proc "someproc" []) { std_out = CreatePipe }
line <- hGetLine outh
doSomeComputationOn line
-- ... 'outh' is not read anymore from here onwards
So "someproc" is created, and then the parent reads a line from it to obtain some information, and then it forgets about the pipe whose handle is outh.
Now the problem is that if the pipe is not read, "someproc" will block as soon as it is full. So this requires that the parent process reads outh even if it does not do anything with it. So my questions are:
Is this this a good way to get the first line of a child process' output and then forget about additional output?
Is there any way in Haskell in which I can automatically discard the input to the pipe (or even redirect it to a file)?.
So far the only way I can see to woraround this problem is spawning a new thread that constantly tries to read from outh (and just discards the output), which indicates that I'm doing something wrong ...
As additional background, this question is related to this one.
Now the problem is that if the pipe is not read, "someproc" will block as soon as it is full. [...] Is there any way in Haskell in which I can automatically discard the
input to the pipe (or even redirect it to a file)?
There is a helper library for process called process-streaming (written by the author of this answer) that tries to do just that: even if the user passes a stream-consuming function that doesn't exhaust a standard stream, it drains the stream automatically under the hood to avoid potential deadlocks.
The library doesn't work directly with handles, but accepts pipe-consuming functions and foldl folds through an adapter type.
An example of reading the first line:
import Data.Text.Lazy
import qualified Pipes.Prelude
import System.Process.Streaming (CreateProcess,shell,
piped,execute,foldOut,transduce1,withCont)
import System.Process.Streaming.Text (utf8x,foldedLines)
program :: CreateProcess
program = piped $ shell "{ echo aaa ; echo bbbb ; }"
firstLine :: IO (Maybe Text)
firstLine = execute program streams
where
streams = foldOut
. transduce1 utf8x
. transduce1 foldedLines
$ withCont Pipes.Prelude.head
The library has a much bigger dependency footprint than process though.
The alternative to use depends on the behavior of the external command.
If you simply want to interrupt that, you can hClose outh. This will close the pipe, and further writes to the pipe by the external command will fail with a "broken pipe" error. Most processes terminate upon receiving this.
If you instead want to read and discard the output, you can do that as well. Perhaps the easiest way is
do c <- hGetContents outh
evaluate (length c) -- force this to fetch all data
doStuff -- now we are sure that the remote end closed its output
which should run in constant space.
If you don't want to wait for the process to end before performing doStuff, wrap everything in forkIO.

Detecting I/O exceptions in a lazy String from hGetContents?

hGetContents returns a lazy String object that can be used in purely functional code to read from a file handle. If an I/O exception occurs while reading this lazy string, the underlying file handle is closed silently and no additional characters are added to the lazy string.
How can this I/O exception be detected?
As a concrete example, consider the following program:
import System.IO -- for stdin
lengthOfFirstLine :: String -> Int
lengthOfFirstLine "" = 0
lengthOfFirstLine s = (length . head . lines) s
main :: IO ()
main = do
lazyStdin <- hGetContents stdin
print (lengthOfFirstLine lazyStdin)
If an exception occurs while reading the first line of the file, this program will print the number of characters until the I/O exception occurs. Instead I want the program to crash with the appropriate I/O exception. How could this program be modified to have that behavior?
Edit: Upon closer inspection of the hGetContents implementation, it appears that the I/O exception is not ignored but rather bubbles up through the calling pure functional code to whatever IO code happened to trigger evaluation, which has the opportunity to then handle it. (I was not previously aware that pure functional code could raise exceptions.) Thus this question is a misunderstanding.
Aside: It would be best if this exceptional behavior were verified empirically. Unfortunately it is difficult to simulate a low level I/O error.
Lazy IO is considered to be a pitfall by many haskellers and as such is advised to keep away from. Your case colorfully describes why.
There is a non-lazy alternative of hGetContents function. It works on Text, but Text is also a generally preferred alternative to String. For convenience, there are modern preludes, replacing the String with Text: basic-prelude and classy-prelude.
Aside: It would be best if this exceptional behavior were verified
empirically. Unfortunately it is difficult to simulate a low level I/O
error.
I was wondering about the same thing, found this old question, and decided to perform an experiment.
I ran this little program in Windows, that listens for a connection and reads from it lazily:
import System.IO
import Network
import Control.Concurrent
main :: IO ()
main = withSocketsDo (do
socket <- listenOn (PortNumber 19999)
print "created socket"
(h,_,_) <- accept socket
print "accepted connection"
contents <- hGetContents h
print contents)
From a Linux machine, I opened a connection using nc:
nc -v mymachine 19999
Connection to mymachine 19999 port [tcp/*] succeeded!
And then used Windows Sysinternal's TCPView utility to forcibly close the connection. The result was:
Main.exe: <socket: 348>: hGetContents: failed (Unknown error)
It appears that I/O exceptions do bubble up.
A further experiment: I added a delay just after the hGetContents call:
...
contents <- hGetContents h
threadDelay (60 * 1000^2)
print contents)
With this change, killing the connection doesn't immediately raise an exception because, thanks to lazy I/O, nothing is actually read until print executes.

Do Haskell files close automatically after readFile?

I want to use the Haskell function
readFile :: FilePath -> IO String
to read the content of a file into a string. In the documentation I have read that "The file is read lazily, on demand, as with getContents."
I am not sure I understand this completely. For example, suppose that I write
s <- readFile "t.txt"
When this action is executed:
The file is opened.
The characters in s are actually read from the file as soon as (but not sooner) they are needed to evaluate some expression (e.g. if I evaluate length s all the content of the file will be read and the file will be closed).
As soon as the last character has been read, the file handle associated to this call to readFile is closed (automatically).
Is my third statement correct? So, can I just invoke readFile without closing the file handle myself? Will the handle stay open as long as I have not consumed (visited) the whole result string?
EDIT
Here is some more information regarding my doubts. Suppose I have the following:
foo :: String -> IO String
foo filename = do
s <- readFile "t.txt"
putStrLn "File has been read."
return s
When the putStrLn is executed, I would (intuitively) expect that
s contains the whole content of file t.txt,
The handle used to read the file has been closed.
If this is not the case:
What does s contain when putStrLn is executed?
In what state is the file handle when putStrLn is executed?
If when putStrLn is executed s does not contain the whole content of the file, when will this content actually be read, and when will the file be closed?
Is my third statement correct?
Not quite, the file is not closed "As soon as the last character has been read", at least not usually, it lingers in the semi-closed state it was in during the read for a few moments, the IO-manager/runtime will close it when it next performs such actions. If you're rapidly opening and reading files, that lag may cause you to run out of file handles if the OS limit isn't too high.
For most use cases (in my limited experience), however, the closing of the file handle is timely enough. [There are people who disagree and view lazy IO as extremely dangerous in all cases. It definitely has pitfalls, but IMO its dangers are often overstated.]
So, can I just invoke readFile without closing the file handle myself?
Yes, when you're using readFile, the file handle is closed automatically when the file contents has been entirely read or when it is noticed that the file handle is not referenced anymore.
Will the handle stay open as long as I have not consumed (visited) the whole result string?
Not quite, readFile puts the file handle in a semi-closed state, described in the docs for hGetContents:
Computation hGetContents hdl returns the list of characters corresponding to the unread portion of the channel or file managed by hdl, which is put into an intermediate state, semi-closed. In this state, hdl is effectively closed, but items are read from hdl on demand and accumulated in a special list returned by hGetContents hdl.
foo :: String -> IO String
foo filename = do
s <- readFile "t.txt"
putStrLn "File has been read."
return s
Ah, that's one of the pitfalls of lazy IO on the other end. Here the file is closed before its contents have been read. When foo returns, the file handle isn't referenced anymore, and then closed. The consumer of foos result will then find that s is an empty string, because when hGetContents tries to actually read from the file, the handle is already closed.
I confused the behaviour of readFile with that of
bracket (openFile file ReadMode) hClose hGetContents
there. readFile only closes the file handle after s is not referenced anymore, so it behaves correctly as expected here.
When the putStrLn is executed, I would (intuitively) expect that
s contains the whole content of file t.txt,
The handle used to read the file has been closed.
No, s does not contain anything yet but a recipe to maybe get some characters from the file handle. The file handle is semi-closed, but not closed. It will be closed when the file contents has been entirely read, or s goes out of scope.
If this is not the case:
What does s contain when putStrLn is executed?
In what state is the file handle when putStrLn is executed?
If when putStrLn is executed s does not contain the whole content of the file, when will this content actually be read, and when will the file be closed?
The first two questions have been answered, the answer to the third is "the file will be read when the contents is consumed", and it will be closed when the entire contents has been read or when it is no longer referenced.
That would be different with the above bracket invocation - bracket guarantees that the final operation, here the hClose will be run even if the other actions throw an exception, therefore its use is often recommended. However, the hClose is run when bracket returns, and then the hGetContents can't get any contents from the now really closed file handle. But readFile would not necessarily close the file handle if an exception occurs.
That is one of the dangers or quirks of lazy IO, files are not read until their contents is demanded, and if you use lazy IO wrongly, that will be too late and you don't get any contents.
It's a trap many (or even most) fall into one time or another, but after having been bitten by it, one quickly learns when IO needs to be non-lazy and do it non-lazily in those cases.
The alternatives (iteratees, enumerators, conduits, pipes, ...) avoid those traps [unless the implementer made a mistake], but are considerably less nice to use in those cases where lazy IO is perfectly fine. On the other hand, they treat the cases where laziness is not desired much better.
When the putStrLn is executed, I would (intuitively) expect that s contains the whole content of file t.txt,
You need to think about the fact you're using lazy IO here. Reading from the file merely creates an unevalauted string computation that, if it is later required, will then read the file.
By using lazy IO you defer your IO until the value is needed.
Once the last character of your file has been read, or all references to the open file are dropped (e.g. your s value), your open file will be closed by the garbage collector.

Forcing evaluation on lazy IO

My program reads a line from a network socket and writes it to disc. Since lines can be really long and strings had terrible performance I started using lazy byte strings. Now it seems that Haskell will go past hClose on disc file handle without actually flushing whole byte string to disc, so doing:
open file for writing
write byte string to file with hPut
close file
open file for reading
usually results in openFile: resource busy (file is locked).
Is it possible to enforce evaluation and wait for the whole byte string to be written before closing the file, so I can be sure that the file is actually closed after that operation?
Try using strict i/o with strict byte strings instead of lazy i/o and lazy byte strings.
If that turns out to be too inefficient, try using conduit or a similar package.
Since the only other answer is of the "use something else" variety I'm posting my own solution. Using this function after hClose will hang the thread until lazy write is done.
waitForLazyIO location = do
t <- liftIO $ try $ openFile location AppendMode
handle t
where
handle (Right v) = hClose v
handle (Left e)
-- Add some sleep if you expect the write operation to be slow.
| isAlreadyInUseError e = waitForLazyIO location
| otherwise = throwError $ show e

Resources