Haskell - Slow socket connection when threaded - multithreading

I've started with Haskell a while ago and now I'm focusing on networking. I followed some
tutorials and source samples to put together a very simple echo server:
main = withSocketsDo $ do
forkIO $ acceptor 8080
print "Server running ... " >> getLine >>= print
tcpSock :: IO Socket
tcpSock = socket AF_INET Stream 0
acceptor :: PortNumber -> IO ()
acceptor port = do
-- Setup server socket
sock <- tcpSock
setSocketOption sock ReuseAddr 1
bindSocket sock (SockAddrInet port iNADDR_ANY)
listen sock 50
-- Start with zero index
loop sock 0
where
loop sock sockId = do
-- Accept socket
(nextSock, addr) <- accept sock
-- Setup the socket for performance
(_, handle) <- setupClient nextSock
-- Run client in own thread
forkIO $ do
-- Get a stream of bytes
stream <- BS.hGetContents handle
-- Echo the first received char
BS.hPut handle $ BS.take 1 stream
-- Kill the socket
SIO.hClose handle
-- Accept next client
loop sock (sockId + 1)
setupClient :: Socket -> IO (Socket, SIO.Handle)
setupClient sock = do
-- Disable nagle
setSocketOption sock NoDelay 1
-- Disable buffering
hdl <- socketToHandle sock SIO.ReadWriteMode
SIO.hSetBuffering hdl SIO.NoBuffering
return (sock, hdl)
Now, I've tested the code with the ab-Tool to benchmark the server. The code is compiled
using -O2 and -threaded and the program is started using +RTS -N to use multiple OS threads.
The code creates a new lightweight thread per client and as far as I remember are these threads
pretty cheap because they are scheduled by a bunch of real OS threads.
After running the tool, the results are:
ab -n 10000 -c 1000 http://localhost:8080/
~ 500 - 1600 req/sec
Yes, it does change sometimes between 500 and 1600!
At first I thought well, not bad. Then I ran the program without "+RTS -N" and results are almost every
time ~20000 req/sec.
Obviously the threading kills the performance pretty badly, but why ?
My guess is, that the IO manager does a pretty bad job when dealing with a lot of connections.
BTW: I use Ubuntu 13.04 and ghc 7.6, but I've tested the code under Windows 8 which gave me far worse results, but I think the IO manager is tuned for linux, which makes sense.
Am I doing something reallly stupid here ?? I know, the example is quite trivial but here is obviously something going wrong.
Regards,
Chris

Okay, I think I semi-solved the problem, though I'm still not sure where the error is/was.
I'm now using the Network package so the accept routine is handle-based. I tried this because I noticed a memory leak after a couple of tests.
This way I solved magially two problem at once, because now the threading makes NO difference.
I really dont know why this is happening, but the handle-based impl. is simpler and obviously faster/more safe.
Maybe this helps other people experiencing the same problem.

Related

Which high level interface to use instead of Network for sending and haskell `Strings`

The first line of documentation of the Network module reads:
This module is kept for backwards-compatibility. New users are encouraged to use Network.Socket instead.
The Network library has functions that allow to send and receive Strings in a convenient manner, for instance:
h <- connectTo "localhost" (PortNumber 9090)
-- ...
line <- hGetLine h
Without using this library (and using Network.Socket instead), the code above will become something like:
addrinfos <- getAddrInfo Nothing (Just "") (Just "9090")
let serveraddr = head addrinfos
sock <- socket (addrFamily serveraddr) Stream defaultProtocol
connect sock (addrAddress serveraddr)
-- ...
msg <- recv sock Size
-- What `Size` should be? The line above should be probably repeated until all the data is received.
Which is is quite low level, and requires coding/decoding a String's. So my question is, if all I want to do is to send and receive strings (with some encoding) over a socket what are the alternatives for accomplishing this task, provided that Network is not an option?
EDIT: although pipes are conduits are good options, the project I'm working on makes heavy use of these Network functions, so I ended up developing a library for sending and receiving Text using similar functions as in the Network library.

Why should buffering not be used in the following example?

I was reading this tutorial:
http://www.catonmat.net/blog/simple-haskell-tcp-server/
To learn the basics of Haskell's Network module. He wrote a small function called sockHandler:
sockHandler :: Socket -> IO ()
sockHandler sock = do
(handle, _, _) <- accept sock
hSetBuffering handle NoBuffering
forkIO $ commandProcessor handle
sockHandler sock
That accepts a connection, and forks it to a new thread. While breaking down the code, he says:
"Next we use hSetBuffering to change buffering mode for the client's socket handle to NoBuffering, so we didn't have buffering surprises."
But doesn't elaborate on that point. What surprises is he talking about? I Google'd it, and saw a few security articles (Which I'm guessing is related to the cache being intercepted), but nothing seemingly related to the content of this tutorial.
What is the issue? I thought about it, but I don't think I have enough networking experience to fill in the blanks.
Thank you.
For the sake of illustration, suppose the protocol allows the server to query the client for some information, e.g. (silly example follows)
hPutStr sock "Please choose between A or B"
choice <- hGetLine sock
case decode choice of
Just A -> handleA
Just B -> handleB
Nothing -> protocolError
Everything looks fine... but the server seems to hang. Why? This is because the message was not really sent over the network by hPutStr, but merely inserted in a local buffer. Hence, the other end never receives the query, so does not reply, causing the server to get stuck in its read.
A solution here would be to insert an hFlush sock before reading. This has to be manually inserted at the "right" points, and is prone to error. A lazier option would be to disable buffering entirely -- this is safer, albeit it severely impacts performance.

Can a Haskell or Haskell OS thread waiting on Network.Socket.accept not be killed on Windows?

-- thread A
t <- forkIO $ do
_ <- accept listener -- blocks
-- thread B
killThread t
works on Linux (probably also on OS X and FreeBSD) but not on Windows (tried -threaded with +RTS -N4 -RTS etc.).
What is the correct way to terminate thread A in this case?
Is there a way to fork thread A in a special mode that would allow termination at the point it blocks on accept?
Would it help if A were forked with forkOS rather than forkIO?
I noticed this deviant Windows behaviour only when alerted to by a bug report.
Interesting question!
You can't interrupt blocking foreign calls, so I'm somewhat surprised that you're able to interrupt the thread on Linux. Also, forkOS doesn't help -- that just lets foreign code allocate thread-local storage, but has nothing to do with blocking behavior. But recall that accept can be set to non-blocking:
If no pending connections are present on the queue, and the socket is
not marked as nonblocking, accept() blocks the caller until a
connection is present. If the socket is marked nonblocking and no
pending connections are present on the queue, accept() fails with the
error EAGAIN or EWOULDBLOCK.
Which is what is done in the Network library for Posix systems. This then allows the accept to be interrupted.
An interesting note about Windows:
-- On Windows, our sockets are not put in non-blocking mode (non-blocking
-- is not supported for regular file descriptors on Windows, and it would
-- be a pain to support it only for sockets). So there are two cases:
--
-- - the threaded RTS uses safe calls for socket operations to get
-- non-blocking I/O, just like the rest of the I/O library
--
-- - with the non-threaded RTS, only some operations on sockets will be
-- non-blocking. Reads and writes go through the normal async I/O
-- system. accept() uses asyncDoProc so is non-blocking. A handful
-- of others (recvFrom, sendFd, recvFd) will block all threads - if this
-- is a problem, -threaded is the workaround.
Now, accept on Windows, with the -threaded runtime, uses accept_safe (which allows other threads to make progress) -- but it doesn't put the socket into non-blocking mode:
accept sock#(MkSocket s family stype protocol status) = do
currentStatus <- readMVar status
okay <- sIsAcceptable sock
if not okay
then
ioError (userError ("accept: can't perform accept on socket (" ++ (show (family,stype,protocol)) ++") in status " ++
show currentStatus))
else do
let sz = sizeOfSockAddrByFamily family
allocaBytes sz $ \ sockaddr -> do
#if defined(mingw32_HOST_OS) && defined(__GLASGOW_HASKELL__)
new_sock <-
if threaded
then with (fromIntegral sz) $ \ ptr_len ->
throwErrnoIfMinus1Retry "Network.Socket.accept" $
c_accept_safe s sockaddr ptr_len
else do
paramData <- c_newAcceptParams s (fromIntegral sz) sockaddr
rc <- asyncDoProc c_acceptDoProc paramData
new_sock <- c_acceptNewSock paramData
c_free paramData
when (rc /= 0)
(ioError (errnoToIOError "Network.Socket.accept" (Errno (fromIntegral rc)) Nothing Nothing))
return new_sock
Since 2005, versions of the network package, on Windows with -threaded explicitly use an accept call marked as safe, allowing other threads to make progress, but not setting the socket itself into non-blocking mode (so the calling thread blocks).
To work around it I see two options:
work out how to make a non-blocking accept call on Windows, and patch the network library -- look at what e.g. snap or yesod do here, to see if they already solved it.
use some kind of supervisory thread to fake epoll, monitoring the blocked child threads for progress.

Asynchronous UDP server/client as base for IPC in Haskell

I want to put together the basics for asynchronous UDP IPC in Haskell. For this the sender/receiver should issue e.g. an synchronous receive (or send, depending from what side you view it) thread and carry on with other tasks.
This might involve to define a new data type that consists of optional message/data serial numbers and some sort of buffer so that the send thread can stop sending when it gets a notification from the receiver that it cannot cope with the speed.
I aim to make this as light weight & asynchronous as possible.
I have tried a number of things such as starting a new receive thread for every packet (took this approach from a paper about multi player online games), but this was grinding almost everything to a halt.
Below is my innocent first take on this. Any help on e.g. creating buffers, creating serial numbers or a DCCP implementation (that I could not find) in Haskell appreciated. - I would not like to get into opinionated discussions about UDP vs TCP etc..
My snippet stops working once something gets out of sync e.g. when no data arrives any more or when less data arrives than expected. I am looking as said for some way of lightweight (featherweight :D) sync between the send and the receive thread of for an example of such.
main = withSocketsDo $ do
s <- socket AF_INET Datagram defaultProtocol
hostAddr <- inet_addr host
done <- newEmptyMVar
let p = B.pack "ping"
thread <- forkIO $ receiveMessages s done
forM_ [0 .. 10000] $ \i -> do
sendAllTo s (B.pack "ping") (SockAddrInet port hostAddr)
takeMVar done
killThread thread
sClose s
return()
receiveMessages :: Socket -> MVar () -> IO ()
receiveMessages socket done = do
forM_ [0 .. 10000] $ \i -> do
r <- recvFrom socket 1024
print (r) --this is a placeholder to make the fun complete
putMVar done ()
If you don't trust your messenger, you can never agree on anything -- not even a single bit like "are we done yet"!

Haskell concurrency and Handles

I'm writing a little notification server to push data to a client. The basic architecture looks something like this (pared down pseudo-code):
acceptConnections sock = forever $ do
connection <- accept sock
forkIO (handleConnection connection)
handleConnection connection = do
connectionHandle <- socketToHandle connection ReadWriteMode
handleMessage connectionHandle
hClose connectionHandle
handleMessage connectionHandle = forever $ do
message <- hGetLine connectionHandle
if shouldPushMessage message
then hPutStrLn targetConnection message
else return ()
Where targetConnection (in handleMessage) is from a separate connection and is hanging up handleMessage in a different thread waiting for its buffer to be filled. I would think this would cause a problem as I have 2 threads accessing the same Handle. So, my question is, why isn't this a problem? Or is it, and I just haven't seen it turn into an issue yet? In my actual application, when I grab targetConnection, I do so through a map I access via an MVar, but it's not being safely accessed at the hGetLine call.
Disclaimer: I'm a complete Haskell and multi-threaded newb
Thanks for any explanations/insight!
Handle, as implemented in GHC, is already an MVar wrapping over the underlying IODevice. I didn't quite get what you're doing (not saying it was unclear, I'm a little ill so perhaps I'm slow) but am guessing GHCs built in thread-safe handling of Handle is saving you.

Resources