Why should buffering not be used in the following example? - haskell

I was reading this tutorial:
http://www.catonmat.net/blog/simple-haskell-tcp-server/
To learn the basics of Haskell's Network module. He wrote a small function called sockHandler:
sockHandler :: Socket -> IO ()
sockHandler sock = do
(handle, _, _) <- accept sock
hSetBuffering handle NoBuffering
forkIO $ commandProcessor handle
sockHandler sock
That accepts a connection, and forks it to a new thread. While breaking down the code, he says:
"Next we use hSetBuffering to change buffering mode for the client's socket handle to NoBuffering, so we didn't have buffering surprises."
But doesn't elaborate on that point. What surprises is he talking about? I Google'd it, and saw a few security articles (Which I'm guessing is related to the cache being intercepted), but nothing seemingly related to the content of this tutorial.
What is the issue? I thought about it, but I don't think I have enough networking experience to fill in the blanks.
Thank you.

For the sake of illustration, suppose the protocol allows the server to query the client for some information, e.g. (silly example follows)
hPutStr sock "Please choose between A or B"
choice <- hGetLine sock
case decode choice of
Just A -> handleA
Just B -> handleB
Nothing -> protocolError
Everything looks fine... but the server seems to hang. Why? This is because the message was not really sent over the network by hPutStr, but merely inserted in a local buffer. Hence, the other end never receives the query, so does not reply, causing the server to get stuck in its read.
A solution here would be to insert an hFlush sock before reading. This has to be manually inserted at the "right" points, and is prone to error. A lazier option would be to disable buffering entirely -- this is safer, albeit it severely impacts performance.

Related

Which high level interface to use instead of Network for sending and haskell `Strings`

The first line of documentation of the Network module reads:
This module is kept for backwards-compatibility. New users are encouraged to use Network.Socket instead.
The Network library has functions that allow to send and receive Strings in a convenient manner, for instance:
h <- connectTo "localhost" (PortNumber 9090)
-- ...
line <- hGetLine h
Without using this library (and using Network.Socket instead), the code above will become something like:
addrinfos <- getAddrInfo Nothing (Just "") (Just "9090")
let serveraddr = head addrinfos
sock <- socket (addrFamily serveraddr) Stream defaultProtocol
connect sock (addrAddress serveraddr)
-- ...
msg <- recv sock Size
-- What `Size` should be? The line above should be probably repeated until all the data is received.
Which is is quite low level, and requires coding/decoding a String's. So my question is, if all I want to do is to send and receive strings (with some encoding) over a socket what are the alternatives for accomplishing this task, provided that Network is not an option?
EDIT: although pipes are conduits are good options, the project I'm working on makes heavy use of these Network functions, so I ended up developing a library for sending and receiving Text using similar functions as in the Network library.

Why is chat server example on haskell.org thread safe?

I'm new to Haskell and I can't figure out what I'm not understanding about this example on the Haskell wiki: http://www.haskell.org/haskellwiki/Implement_a_chat_server
The specific code in question is this:
runConn :: (Socket, SockAddr) -> Chan Msg -> -> IO ()
runConn (sock, _) chan = do
let broadcast msg = writeChan chan msg
hdl <- socketToHandle sock ReadWriteMode
hSetBuffering hdl NoBuffering
chan' <- dupChan chan
-- fork off thread for reading from the duplicated channel
forkIO $ fix $ \loop -> do
line <- readChan chan'
hPutStrLn hdl line
loop
-- read lines from socket and echo them back to the user
fix $ \loop -> do
line <- liftM init (hGetLine hdl)
broadcast line
loop
The code above has one thread writing to the handle hdl at the same time (potentially) as another thread is reading from it. Is this safe?
I suspect the nature of forkIO (being internal to Haskell and not a system thread library or process) is what makes this work, but I'm not sure.
I checked the documentation of forkIO for any mention of IO handles
but found nothing. I also checked the documentation of System.IO but couldn't find any mention of using handles between threads without using locking.
So can someone tell me how I should know when something like this is safe when the docs don't mention anything about thread safety?
It's not the nature of forkIO that makes this works but the nature of MVar that is used to implement both Chan and Handle.
If you want to understand how Chan works take a look at this section "MVar as building blocks: Unbounded Channels" in chapter 7 of the excellent book "Parallel and Concurrent Programming in Haskell" by Simon Marlow. In the same chapter there is a section about forkIO and MVar that will help you understand how Handle can be implemented in a thread safe way.
Chapter 12 talks specifically about various ways to implement network servers, including a chat server that is implemented using STM instead of Chans.
If it wasn't safe, blocking sockets would be almost impossible to use. If your protocol is asynchronous and you're using blocking sockets, you need a thread blocking on read pretty much all the time. If you then needed to send a message to the other side, how could you do it? Wait for the other side to send you a message?

Detecting I/O exceptions in a lazy String from hGetContents?

hGetContents returns a lazy String object that can be used in purely functional code to read from a file handle. If an I/O exception occurs while reading this lazy string, the underlying file handle is closed silently and no additional characters are added to the lazy string.
How can this I/O exception be detected?
As a concrete example, consider the following program:
import System.IO -- for stdin
lengthOfFirstLine :: String -> Int
lengthOfFirstLine "" = 0
lengthOfFirstLine s = (length . head . lines) s
main :: IO ()
main = do
lazyStdin <- hGetContents stdin
print (lengthOfFirstLine lazyStdin)
If an exception occurs while reading the first line of the file, this program will print the number of characters until the I/O exception occurs. Instead I want the program to crash with the appropriate I/O exception. How could this program be modified to have that behavior?
Edit: Upon closer inspection of the hGetContents implementation, it appears that the I/O exception is not ignored but rather bubbles up through the calling pure functional code to whatever IO code happened to trigger evaluation, which has the opportunity to then handle it. (I was not previously aware that pure functional code could raise exceptions.) Thus this question is a misunderstanding.
Aside: It would be best if this exceptional behavior were verified empirically. Unfortunately it is difficult to simulate a low level I/O error.
Lazy IO is considered to be a pitfall by many haskellers and as such is advised to keep away from. Your case colorfully describes why.
There is a non-lazy alternative of hGetContents function. It works on Text, but Text is also a generally preferred alternative to String. For convenience, there are modern preludes, replacing the String with Text: basic-prelude and classy-prelude.
Aside: It would be best if this exceptional behavior were verified
empirically. Unfortunately it is difficult to simulate a low level I/O
error.
I was wondering about the same thing, found this old question, and decided to perform an experiment.
I ran this little program in Windows, that listens for a connection and reads from it lazily:
import System.IO
import Network
import Control.Concurrent
main :: IO ()
main = withSocketsDo (do
socket <- listenOn (PortNumber 19999)
print "created socket"
(h,_,_) <- accept socket
print "accepted connection"
contents <- hGetContents h
print contents)
From a Linux machine, I opened a connection using nc:
nc -v mymachine 19999
Connection to mymachine 19999 port [tcp/*] succeeded!
And then used Windows Sysinternal's TCPView utility to forcibly close the connection. The result was:
Main.exe: <socket: 348>: hGetContents: failed (Unknown error)
It appears that I/O exceptions do bubble up.
A further experiment: I added a delay just after the hGetContents call:
...
contents <- hGetContents h
threadDelay (60 * 1000^2)
print contents)
With this change, killing the connection doesn't immediately raise an exception because, thanks to lazy I/O, nothing is actually read until print executes.

Automatically reconnect a Haskell Network connection in an idiomatic way

I've worked my way through Don Stewart's Roll your own IRC bot tutorial, and am playing around with some extensions to it. My current code is essentially the same as the "The monadic, stateful, exception-handling bot in all its glory"; it's a bit too long to paste here unless someone requests it.
Being a Comcast subscriber, it's particularly important that the bot be able to reconnect after periods of poor connectivity. My approach is to simply time the PING requests from the server, and if it goes without seeing a PING for a certain time, to try reconnecting.
So far, the best solution I've found is to wrap the hGetLine in the listen loop with System.Timeout.timeout. However, this seems to require defining a custom exception so that the catch in main can call main again, rather than return (). It also seems quite fragile to specify a timeout value for each individual hGetLine.
Is there a better solution, perhaps something that wraps an IO a like bracket and catch so that the entire main can handle network timeouts without the overhead of a new exception type?
How about running a separate thread that performs all the reading and writing and takes care of periodically reconnecting the handle?
Something like this
input :: Chan Char
output :: Chan Char
putChar c = writeChan output c
keepAlive = forever $ do
h <- connectToServer
catch
(forever $
do c <- readChan output; timeout 4000 (hPutChar h c); return ())
(\_ -> return ())
The idea is to encapsulate all the difficulty with periodically reconnecting into a separate thread.

Haskell concurrency and Handles

I'm writing a little notification server to push data to a client. The basic architecture looks something like this (pared down pseudo-code):
acceptConnections sock = forever $ do
connection <- accept sock
forkIO (handleConnection connection)
handleConnection connection = do
connectionHandle <- socketToHandle connection ReadWriteMode
handleMessage connectionHandle
hClose connectionHandle
handleMessage connectionHandle = forever $ do
message <- hGetLine connectionHandle
if shouldPushMessage message
then hPutStrLn targetConnection message
else return ()
Where targetConnection (in handleMessage) is from a separate connection and is hanging up handleMessage in a different thread waiting for its buffer to be filled. I would think this would cause a problem as I have 2 threads accessing the same Handle. So, my question is, why isn't this a problem? Or is it, and I just haven't seen it turn into an issue yet? In my actual application, when I grab targetConnection, I do so through a map I access via an MVar, but it's not being safely accessed at the hGetLine call.
Disclaimer: I'm a complete Haskell and multi-threaded newb
Thanks for any explanations/insight!
Handle, as implemented in GHC, is already an MVar wrapping over the underlying IODevice. I didn't quite get what you're doing (not saying it was unclear, I'm a little ill so perhaps I'm slow) but am guessing GHCs built in thread-safe handling of Handle is saving you.

Resources