Detecting I/O exceptions in a lazy String from hGetContents?

Detecting I/O exceptions in a lazy String from hGetContents? - haskell

hGetContents returns a lazy String object that can be used in purely functional code to read from a file handle. If an I/O exception occurs while reading this lazy string, the underlying file handle is closed silently and no additional characters are added to the lazy string.
How can this I/O exception be detected?
As a concrete example, consider the following program:
import System.IO -- for stdin
lengthOfFirstLine :: String -> Int
lengthOfFirstLine "" = 0
lengthOfFirstLine s = (length . head . lines) s
main :: IO ()
main = do
lazyStdin <- hGetContents stdin
print (lengthOfFirstLine lazyStdin)
If an exception occurs while reading the first line of the file, this program will print the number of characters until the I/O exception occurs. Instead I want the program to crash with the appropriate I/O exception. How could this program be modified to have that behavior?
Edit: Upon closer inspection of the hGetContents implementation, it appears that the I/O exception is not ignored but rather bubbles up through the calling pure functional code to whatever IO code happened to trigger evaluation, which has the opportunity to then handle it. (I was not previously aware that pure functional code could raise exceptions.) Thus this question is a misunderstanding.
Aside: It would be best if this exceptional behavior were verified empirically. Unfortunately it is difficult to simulate a low level I/O error.

Lazy IO is considered to be a pitfall by many haskellers and as such is advised to keep away from. Your case colorfully describes why.
There is a non-lazy alternative of hGetContents function. It works on Text, but Text is also a generally preferred alternative to String. For convenience, there are modern preludes, replacing the String with Text: basic-prelude and classy-prelude.

Aside: It would be best if this exceptional behavior were verified
empirically. Unfortunately it is difficult to simulate a low level I/O
error.
I was wondering about the same thing, found this old question, and decided to perform an experiment.
I ran this little program in Windows, that listens for a connection and reads from it lazily:
import System.IO
import Network
import Control.Concurrent
main :: IO ()
main = withSocketsDo (do
socket <- listenOn (PortNumber 19999)
print "created socket"
(h,_,_) <- accept socket
print "accepted connection"
contents <- hGetContents h
print contents)
From a Linux machine, I opened a connection using nc:
nc -v mymachine 19999
Connection to mymachine 19999 port [tcp/*] succeeded!
And then used Windows Sysinternal's TCPView utility to forcibly close the connection. The result was:
Main.exe: <socket: 348>: hGetContents: failed (Unknown error)
It appears that I/O exceptions do bubble up.
A further experiment: I added a delay just after the hGetContents call:
...
contents <- hGetContents h
threadDelay (60 * 1000^2)
print contents)
With this change, killing the connection doesn't immediately raise an exception because, thanks to lazy I/O, nothing is actually read until print executes.

Related

Reading a couple of lines of a pipe obtained with createProcess and then closing it

I have a function that spawns a process as follows:
(_, Just outh, _, ph) <- createProcess $
(proc "someproc" []) { std_out = CreatePipe }
line <- hGetLine outh
doSomeComputationOn line
-- ... 'outh' is not read anymore from here onwards
So "someproc" is created, and then the parent reads a line from it to obtain some information, and then it forgets about the pipe whose handle is outh.
Now the problem is that if the pipe is not read, "someproc" will block as soon as it is full. So this requires that the parent process reads outh even if it does not do anything with it. So my questions are:
Is this this a good way to get the first line of a child process' output and then forget about additional output?
Is there any way in Haskell in which I can automatically discard the input to the pipe (or even redirect it to a file)?.
So far the only way I can see to woraround this problem is spawning a new thread that constantly tries to read from outh (and just discards the output), which indicates that I'm doing something wrong ...
As additional background, this question is related to this one.

Now the problem is that if the pipe is not read, "someproc" will block as soon as it is full. [...] Is there any way in Haskell in which I can automatically discard the
input to the pipe (or even redirect it to a file)?
There is a helper library for process called process-streaming (written by the author of this answer) that tries to do just that: even if the user passes a stream-consuming function that doesn't exhaust a standard stream, it drains the stream automatically under the hood to avoid potential deadlocks.
The library doesn't work directly with handles, but accepts pipe-consuming functions and foldl folds through an adapter type.
An example of reading the first line:
import Data.Text.Lazy
import qualified Pipes.Prelude
import System.Process.Streaming (CreateProcess,shell,
piped,execute,foldOut,transduce1,withCont)
import System.Process.Streaming.Text (utf8x,foldedLines)
program :: CreateProcess
program = piped $ shell "{ echo aaa ; echo bbbb ; }"
firstLine :: IO (Maybe Text)
firstLine = execute program streams
where
streams = foldOut
. transduce1 utf8x
. transduce1 foldedLines
$ withCont Pipes.Prelude.head
The library has a much bigger dependency footprint than process though.

The alternative to use depends on the behavior of the external command.
If you simply want to interrupt that, you can hClose outh. This will close the pipe, and further writes to the pipe by the external command will fail with a "broken pipe" error. Most processes terminate upon receiving this.
If you instead want to read and discard the output, you can do that as well. Perhaps the easiest way is
do c <- hGetContents outh
evaluate (length c) -- force this to fetch all data
doStuff -- now we are sure that the remote end closed its output
which should run in constant space.
If you don't want to wait for the process to end before performing doStuff, wrap everything in forkIO.

What caused this "delayed read on closed handle" error?

I just installed GHC from the latest sources, and now my program gives me an error message about a "delayed read on closed handle". What does this mean?

The fundamental lazy I/O primitive, hGetContents, produces a String lazily—it only reads from the handle as needed to produce the parts of the string your program actually demands. Once the handle has been closed, however, it is no longer possible to read from the handle, and if you try to inspect a part of the string that was not yet read, you will get this exception. For example, suppose you write
main = do
most <- withFile "myfile" ReadMode
(\h -> do
s <- hGetContents h
let (first12,rest) = splitAt 12 s
print first12
return rest)
putStrLn most
GHC opens myfile and sets it up for lazy reading into the string we've bound to s. It does not actually begin reading from the file. Then it sets up a lazy computation to split the string after 12 characters. Then print forces that computation, and GHC reads in a chunk of myfile at least 12 characters long, and prints out the first twelve. It then closes the file when withFile completes, and attempts to print out the rest. If the file was longer than the chunk GHC buffered, you will get the delayed read exception once it reaches the end of the chunk.
How to avoid this problem
You need to be sure that you've actually read everything you need before closing the file or returning from withFile. If the function you pass to withFile just does some IO and returns a constant (such as ()), you don't need to worry about this. If you need to it to produce a real value from a lazy read, you need to be sure to force that value sufficiently before returning. In the example above, you can force the string to "normal form" using a function or operator from the Control.DeepSeq module:
return $!! rest
This ensures that the rest of the string is actually read before withFile closes the file. The $!! approach also works perfectly well if what you return is some value calculated from the file contents, as long as it's an instance of the NFData class. In this case, and many others, it's even better to simply move the rest of the code for processing the file contents into the function passed to withFile, like this:
main = withFile "myfile" ReadMode
(\h -> do
s <- hGetContents h
let (first12,rest) = splitAt 12 s
print first12
putStrLn rest)
Another function to consider, as an alternative, is readFile. readFile holds the file open until it has finished reading the file. You should only use readFile, however, if you know that you will actually demand the entire contents of the file—otherwise you could leak file descriptors.
History
According to the Haskell Report, once the handle is closed, the contents of the string become fixed.
In the past, GHC has simply ended the string at the end of whatever was buffered at the time the handle was closed. For example, if you had inspected the first 10 characters of the string before you closed the handle, and GHC had buffered an additional 634 characters, but not reached the end of the file, then you would get a normal string with 644 characters. This was a common source of confusion among new users and an occasional source of bugs in production code.
As of GHC 7.10.1, this behavior is changing. When you close a handle that you are reading from lazily, it now effectively puts an exception at the end of the buffer instead of the usual :"". So if you attempt to inspect the string beyond the point where the file was closed, you will get an error message.

Why should buffering not be used in the following example?

I was reading this tutorial:
http://www.catonmat.net/blog/simple-haskell-tcp-server/
To learn the basics of Haskell's Network module. He wrote a small function called sockHandler:
sockHandler :: Socket -> IO ()
sockHandler sock = do
(handle, _, _) <- accept sock
hSetBuffering handle NoBuffering
forkIO $ commandProcessor handle
sockHandler sock
That accepts a connection, and forks it to a new thread. While breaking down the code, he says:
"Next we use hSetBuffering to change buffering mode for the client's socket handle to NoBuffering, so we didn't have buffering surprises."
But doesn't elaborate on that point. What surprises is he talking about? I Google'd it, and saw a few security articles (Which I'm guessing is related to the cache being intercepted), but nothing seemingly related to the content of this tutorial.
What is the issue? I thought about it, but I don't think I have enough networking experience to fill in the blanks.
Thank you.

For the sake of illustration, suppose the protocol allows the server to query the client for some information, e.g. (silly example follows)
hPutStr sock "Please choose between A or B"
choice <- hGetLine sock
case decode choice of
Just A -> handleA
Just B -> handleB
Nothing -> protocolError
Everything looks fine... but the server seems to hang. Why? This is because the message was not really sent over the network by hPutStr, but merely inserted in a local buffer. Hence, the other end never receives the query, so does not reply, causing the server to get stuck in its read.
A solution here would be to insert an hFlush sock before reading. This has to be manually inserted at the "right" points, and is prone to error. A lazier option would be to disable buffering entirely -- this is safer, albeit it severely impacts performance.

Why is chat server example on haskell.org thread safe?

I'm new to Haskell and I can't figure out what I'm not understanding about this example on the Haskell wiki: http://www.haskell.org/haskellwiki/Implement_a_chat_server
The specific code in question is this:
runConn :: (Socket, SockAddr) -> Chan Msg -> -> IO ()
runConn (sock, _) chan = do
let broadcast msg = writeChan chan msg
hdl <- socketToHandle sock ReadWriteMode
hSetBuffering hdl NoBuffering
chan' <- dupChan chan
-- fork off thread for reading from the duplicated channel
forkIO $ fix $ \loop -> do
line <- readChan chan'
hPutStrLn hdl line
loop
-- read lines from socket and echo them back to the user
fix $ \loop -> do
line <- liftM init (hGetLine hdl)
broadcast line
loop
The code above has one thread writing to the handle hdl at the same time (potentially) as another thread is reading from it. Is this safe?
I suspect the nature of forkIO (being internal to Haskell and not a system thread library or process) is what makes this work, but I'm not sure.
I checked the documentation of forkIO for any mention of IO handles
but found nothing. I also checked the documentation of System.IO but couldn't find any mention of using handles between threads without using locking.
So can someone tell me how I should know when something like this is safe when the docs don't mention anything about thread safety?

It's not the nature of forkIO that makes this works but the nature of MVar that is used to implement both Chan and Handle.
If you want to understand how Chan works take a look at this section "MVar as building blocks: Unbounded Channels" in chapter 7 of the excellent book "Parallel and Concurrent Programming in Haskell" by Simon Marlow. In the same chapter there is a section about forkIO and MVar that will help you understand how Handle can be implemented in a thread safe way.
Chapter 12 talks specifically about various ways to implement network servers, including a chat server that is implemented using STM instead of Chans.

If it wasn't safe, blocking sockets would be almost impossible to use. If your protocol is asynchronous and you're using blocking sockets, you need a thread blocking on read pretty much all the time. If you then needed to send a message to the other side, how could you do it? Wait for the other side to send you a message?

Forcing evaluation on lazy IO

My program reads a line from a network socket and writes it to disc. Since lines can be really long and strings had terrible performance I started using lazy byte strings. Now it seems that Haskell will go past hClose on disc file handle without actually flushing whole byte string to disc, so doing:
open file for writing
write byte string to file with hPut
close file
open file for reading
usually results in openFile: resource busy (file is locked).
Is it possible to enforce evaluation and wait for the whole byte string to be written before closing the file, so I can be sure that the file is actually closed after that operation?

Try using strict i/o with strict byte strings instead of lazy i/o and lazy byte strings.
If that turns out to be too inefficient, try using conduit or a similar package.

Since the only other answer is of the "use something else" variety I'm posting my own solution. Using this function after hClose will hang the thread until lazy write is done.
waitForLazyIO location = do
t <- liftIO $ try $ openFile location AppendMode
handle t
where
handle (Right v) = hClose v
handle (Left e)
-- Add some sleep if you expect the write operation to be slow.
| isAlreadyInUseError e = waitForLazyIO location
| otherwise = throwError $ show e

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Detecting I/O exceptions in a lazy String from hGetContents? - haskell

Related

Reading a couple of lines of a pipe obtained with createProcess and then closing it

What caused this "delayed read on closed handle" error?

Why should buffering not be used in the following example?

Why is chat server example on haskell.org thread safe?

Forcing evaluation on lazy IO

Categories

Resources