Why is chat server example on haskell.org thread safe? - multithreading

I'm new to Haskell and I can't figure out what I'm not understanding about this example on the Haskell wiki: http://www.haskell.org/haskellwiki/Implement_a_chat_server
The specific code in question is this:
runConn :: (Socket, SockAddr) -> Chan Msg -> -> IO ()
runConn (sock, _) chan = do
let broadcast msg = writeChan chan msg
hdl <- socketToHandle sock ReadWriteMode
hSetBuffering hdl NoBuffering
chan' <- dupChan chan
-- fork off thread for reading from the duplicated channel
forkIO $ fix $ \loop -> do
line <- readChan chan'
hPutStrLn hdl line
loop
-- read lines from socket and echo them back to the user
fix $ \loop -> do
line <- liftM init (hGetLine hdl)
broadcast line
loop
The code above has one thread writing to the handle hdl at the same time (potentially) as another thread is reading from it. Is this safe?
I suspect the nature of forkIO (being internal to Haskell and not a system thread library or process) is what makes this work, but I'm not sure.
I checked the documentation of forkIO for any mention of IO handles
but found nothing. I also checked the documentation of System.IO but couldn't find any mention of using handles between threads without using locking.
So can someone tell me how I should know when something like this is safe when the docs don't mention anything about thread safety?

It's not the nature of forkIO that makes this works but the nature of MVar that is used to implement both Chan and Handle.
If you want to understand how Chan works take a look at this section "MVar as building blocks: Unbounded Channels" in chapter 7 of the excellent book "Parallel and Concurrent Programming in Haskell" by Simon Marlow. In the same chapter there is a section about forkIO and MVar that will help you understand how Handle can be implemented in a thread safe way.
Chapter 12 talks specifically about various ways to implement network servers, including a chat server that is implemented using STM instead of Chans.

If it wasn't safe, blocking sockets would be almost impossible to use. If your protocol is asynchronous and you're using blocking sockets, you need a thread blocking on read pretty much all the time. If you then needed to send a message to the other side, how could you do it? Wait for the other side to send you a message?

Related

Why should buffering not be used in the following example?

I was reading this tutorial:
http://www.catonmat.net/blog/simple-haskell-tcp-server/
To learn the basics of Haskell's Network module. He wrote a small function called sockHandler:
sockHandler :: Socket -> IO ()
sockHandler sock = do
(handle, _, _) <- accept sock
hSetBuffering handle NoBuffering
forkIO $ commandProcessor handle
sockHandler sock
That accepts a connection, and forks it to a new thread. While breaking down the code, he says:
"Next we use hSetBuffering to change buffering mode for the client's socket handle to NoBuffering, so we didn't have buffering surprises."
But doesn't elaborate on that point. What surprises is he talking about? I Google'd it, and saw a few security articles (Which I'm guessing is related to the cache being intercepted), but nothing seemingly related to the content of this tutorial.
What is the issue? I thought about it, but I don't think I have enough networking experience to fill in the blanks.
Thank you.
For the sake of illustration, suppose the protocol allows the server to query the client for some information, e.g. (silly example follows)
hPutStr sock "Please choose between A or B"
choice <- hGetLine sock
case decode choice of
Just A -> handleA
Just B -> handleB
Nothing -> protocolError
Everything looks fine... but the server seems to hang. Why? This is because the message was not really sent over the network by hPutStr, but merely inserted in a local buffer. Hence, the other end never receives the query, so does not reply, causing the server to get stuck in its read.
A solution here would be to insert an hFlush sock before reading. This has to be manually inserted at the "right" points, and is prone to error. A lazier option would be to disable buffering entirely -- this is safer, albeit it severely impacts performance.

Detecting I/O exceptions in a lazy String from hGetContents?

hGetContents returns a lazy String object that can be used in purely functional code to read from a file handle. If an I/O exception occurs while reading this lazy string, the underlying file handle is closed silently and no additional characters are added to the lazy string.
How can this I/O exception be detected?
As a concrete example, consider the following program:
import System.IO -- for stdin
lengthOfFirstLine :: String -> Int
lengthOfFirstLine "" = 0
lengthOfFirstLine s = (length . head . lines) s
main :: IO ()
main = do
lazyStdin <- hGetContents stdin
print (lengthOfFirstLine lazyStdin)
If an exception occurs while reading the first line of the file, this program will print the number of characters until the I/O exception occurs. Instead I want the program to crash with the appropriate I/O exception. How could this program be modified to have that behavior?
Edit: Upon closer inspection of the hGetContents implementation, it appears that the I/O exception is not ignored but rather bubbles up through the calling pure functional code to whatever IO code happened to trigger evaluation, which has the opportunity to then handle it. (I was not previously aware that pure functional code could raise exceptions.) Thus this question is a misunderstanding.
Aside: It would be best if this exceptional behavior were verified empirically. Unfortunately it is difficult to simulate a low level I/O error.
Lazy IO is considered to be a pitfall by many haskellers and as such is advised to keep away from. Your case colorfully describes why.
There is a non-lazy alternative of hGetContents function. It works on Text, but Text is also a generally preferred alternative to String. For convenience, there are modern preludes, replacing the String with Text: basic-prelude and classy-prelude.
Aside: It would be best if this exceptional behavior were verified
empirically. Unfortunately it is difficult to simulate a low level I/O
error.
I was wondering about the same thing, found this old question, and decided to perform an experiment.
I ran this little program in Windows, that listens for a connection and reads from it lazily:
import System.IO
import Network
import Control.Concurrent
main :: IO ()
main = withSocketsDo (do
socket <- listenOn (PortNumber 19999)
print "created socket"
(h,_,_) <- accept socket
print "accepted connection"
contents <- hGetContents h
print contents)
From a Linux machine, I opened a connection using nc:
nc -v mymachine 19999
Connection to mymachine 19999 port [tcp/*] succeeded!
And then used Windows Sysinternal's TCPView utility to forcibly close the connection. The result was:
Main.exe: <socket: 348>: hGetContents: failed (Unknown error)
It appears that I/O exceptions do bubble up.
A further experiment: I added a delay just after the hGetContents call:
...
contents <- hGetContents h
threadDelay (60 * 1000^2)
print contents)
With this change, killing the connection doesn't immediately raise an exception because, thanks to lazy I/O, nothing is actually read until print executes.

Asynchronous UDP server/client as base for IPC in Haskell

I want to put together the basics for asynchronous UDP IPC in Haskell. For this the sender/receiver should issue e.g. an synchronous receive (or send, depending from what side you view it) thread and carry on with other tasks.
This might involve to define a new data type that consists of optional message/data serial numbers and some sort of buffer so that the send thread can stop sending when it gets a notification from the receiver that it cannot cope with the speed.
I aim to make this as light weight & asynchronous as possible.
I have tried a number of things such as starting a new receive thread for every packet (took this approach from a paper about multi player online games), but this was grinding almost everything to a halt.
Below is my innocent first take on this. Any help on e.g. creating buffers, creating serial numbers or a DCCP implementation (that I could not find) in Haskell appreciated. - I would not like to get into opinionated discussions about UDP vs TCP etc..
My snippet stops working once something gets out of sync e.g. when no data arrives any more or when less data arrives than expected. I am looking as said for some way of lightweight (featherweight :D) sync between the send and the receive thread of for an example of such.
main = withSocketsDo $ do
s <- socket AF_INET Datagram defaultProtocol
hostAddr <- inet_addr host
done <- newEmptyMVar
let p = B.pack "ping"
thread <- forkIO $ receiveMessages s done
forM_ [0 .. 10000] $ \i -> do
sendAllTo s (B.pack "ping") (SockAddrInet port hostAddr)
takeMVar done
killThread thread
sClose s
return()
receiveMessages :: Socket -> MVar () -> IO ()
receiveMessages socket done = do
forM_ [0 .. 10000] $ \i -> do
r <- recvFrom socket 1024
print (r) --this is a placeholder to make the fun complete
putMVar done ()
If you don't trust your messenger, you can never agree on anything -- not even a single bit like "are we done yet"!

Automatically reconnect a Haskell Network connection in an idiomatic way

I've worked my way through Don Stewart's Roll your own IRC bot tutorial, and am playing around with some extensions to it. My current code is essentially the same as the "The monadic, stateful, exception-handling bot in all its glory"; it's a bit too long to paste here unless someone requests it.
Being a Comcast subscriber, it's particularly important that the bot be able to reconnect after periods of poor connectivity. My approach is to simply time the PING requests from the server, and if it goes without seeing a PING for a certain time, to try reconnecting.
So far, the best solution I've found is to wrap the hGetLine in the listen loop with System.Timeout.timeout. However, this seems to require defining a custom exception so that the catch in main can call main again, rather than return (). It also seems quite fragile to specify a timeout value for each individual hGetLine.
Is there a better solution, perhaps something that wraps an IO a like bracket and catch so that the entire main can handle network timeouts without the overhead of a new exception type?
How about running a separate thread that performs all the reading and writing and takes care of periodically reconnecting the handle?
Something like this
input :: Chan Char
output :: Chan Char
putChar c = writeChan output c
keepAlive = forever $ do
h <- connectToServer
catch
(forever $
do c <- readChan output; timeout 4000 (hPutChar h c); return ())
(\_ -> return ())
The idea is to encapsulate all the difficulty with periodically reconnecting into a separate thread.

Haskell concurrency and Handles

I'm writing a little notification server to push data to a client. The basic architecture looks something like this (pared down pseudo-code):
acceptConnections sock = forever $ do
connection <- accept sock
forkIO (handleConnection connection)
handleConnection connection = do
connectionHandle <- socketToHandle connection ReadWriteMode
handleMessage connectionHandle
hClose connectionHandle
handleMessage connectionHandle = forever $ do
message <- hGetLine connectionHandle
if shouldPushMessage message
then hPutStrLn targetConnection message
else return ()
Where targetConnection (in handleMessage) is from a separate connection and is hanging up handleMessage in a different thread waiting for its buffer to be filled. I would think this would cause a problem as I have 2 threads accessing the same Handle. So, my question is, why isn't this a problem? Or is it, and I just haven't seen it turn into an issue yet? In my actual application, when I grab targetConnection, I do so through a map I access via an MVar, but it's not being safely accessed at the hGetLine call.
Disclaimer: I'm a complete Haskell and multi-threaded newb
Thanks for any explanations/insight!
Handle, as implemented in GHC, is already an MVar wrapping over the underlying IODevice. I didn't quite get what you're doing (not saying it was unclear, I'm a little ill so perhaps I'm slow) but am guessing GHCs built in thread-safe handling of Handle is saving you.

Resources