How to exit a conduit when using mergeSources - haskell

I have a simple forked conduit setup, with two inputs feeding one single output....
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent (threadDelay)
import Control.Monad.IO.Class
import Control.Monad.Trans.Resource
import qualified Data.ByteString as B
import Data.Conduit
import Data.Conduit.TMChan
import Data.Conduit.Network
main::IO ()
main = do
runTCPClient (clientSettings 3000 "127.0.0.1") $ \server -> do
runResourceT $ do
input <- mergeSources [
transPipe liftIO (appSource server),
infiniteSource
] 2
input $$ transPipe liftIO (appSink server)
infiniteSource::MonadIO m=>Source m B.ByteString
infiniteSource = do
liftIO $ threadDelay 10000000
yield "infinite source"
infiniteSource
(here I connect to a tcp socket, then combine the socket input with a timed infinite source, then respond back to the socket)
This works great, until the connection is dropped.... Because the second input still exists, the conduit keeps running. (In this case, the program does end when the timed input fires and there is no socket to write to, but this isn't always the case in my real example).
What is the proper way to shut down the full conduit when one of the inputs is closed?
I tried to brute force a crash by adding the following
crashOnEndOfStream::MonadIO m=>Conduit B.ByteString m B.ByteString
crashOnEndOfStream = do
awaitForever $ yield
error "the peer connection has disconnected" --tried with error
liftIO $ exitWith ExitSuccess --also tried with exitWith
but because the input conduit runs in a thread, the executable was immune to runtime exceptions shutting it down (plus, there is probably a smoother way to shut stuff down than halting the program).

the Source created by mergeSources keeps a count of unclosed sources. It's only closed when the count reaches 0 i.e. every upstream source is closed. This mechanism and the underlying TBMChannel is hidden from user code so you have no way to change its behavior.
One possible solution is to create the channel and the source manually with some medium-level functions exported by Data.Conduit.TMChan, so you can finalize the source by closing the TBMChannel. I haven't tested the code below since your program is not runnable on my machine.
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent (threadDelay)
import Control.Monad.IO.Class
import Control.Monad.Trans.Resource
import qualified Data.ByteString as B
import Data.Conduit
import Data.Conduit.Network
import Data.Conduit.TMChan
main::IO ()
main = do
runTCPClient (clientSettings 3000 "127.0.0.1") $ \server -> do
runResourceT $ do
-- create the TBMChannel
chan <- liftIO $ newTBMChanIO 2
let
-- everything piped to the sink will appear at the source
chanSink = sinkTBMChan chan True
chanSource = sourceTBMChan chan
tid1 <- resourceForkIO $ appSource server $$ chanSink
tid2 <- resourceForkIO $ infiniteSource $$ chanSink
chanSource $$ transPipe liftIO (appSink server)
-- and call 'closeTBMChan chan' when you want to exit.
-- 'chanSource' will be closed when the underlying TBMChannel is closed.
infiniteSource :: MonadIO m => Source m B.ByteString
infiniteSource = do
liftIO $ threadDelay 10000000
yield "infinite source"
infiniteSource

Related

Haskell server does not reply to client

I tried building a simple client-server program following this tutorial about Haskell's network-conduit library.
This is the client, which concurrently sends a file to the server and receives the answer:
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent.Async (concurrently)
import Data.Functor (void)
import Conduit
import Data.Conduit.Network
main = runTCPClient (clientSettings 4000 "localhost") $ \server ->
void $ concurrently
(runConduitRes $ sourceFile "input.txt" .| appSink server)
(runConduit $ appSource server .| stdoutC)
And this is the server, which counts the occurrences of each word and sends the result back to the client:
{-# LANGUAGE OverloadedStrings #-}
import Data.ByteString.Char8 (pack)
import Data.Foldable (toList)
import Data.HashMap.Lazy (empty, insertWith)
import Data.Word8 (isAlphaNum)
import Conduit
import Data.Conduit.Network
import qualified Data.Conduit.Combinators as CC
main = runTCPServer (serverSettings 4000 "*") $ \appData -> do
hashMap <- runConduit $ appSource appData
.| CC.splitOnUnboundedE (not . isAlphaNum)
.| foldMC insertInHashMap empty
runConduit $ yield (pack $ show $ toList hashMap)
.| iterMC print
.| appSink appData
insertInHashMap x v = do
return (insertWith (+) v 1 x)
The problem is that the server doesn't reach the yield phase until I manually shut down the client and therefore never answers to it. I noticed that removing the concurrency from the client and keeping only the part in which it sends data to the server, everything works fine.
So, how can I preserve the receiving part of the client without breaking the flow?
You have a deadlock: the client is waiting for the server to respond before it closes the connection, but the server is unaware that the client is done sending data and is waiting for more. This is basically the problem described at https://cr.yp.to/tcpip/twofd.html:
When the generate-data program finishes, the same fd is still open in the consume-data program, so the kernel has no idea that it should send a FIN.
In your case, the fix needs to go on the client side. You need to call shutdown with ShutdownSend on the socket once conduit is done sending the contents of input.txt over it.
Here's one way to do so (I'm not sure if there's a nicer one):
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent.Async (concurrently)
import Data.Functor (void)
import Data.Foldable (traverse_)
import Conduit
import Data.Conduit.Network
import Data.Streaming.Network (appRawSocket)
import Network.Socket (shutdown, ShutdownCmd(..))
main = runTCPClient (clientSettings 4000 "localhost") $ \server ->
void $ concurrently
((runConduitRes $ sourceFile "input.txt" .| appSink server) >> doneWriting server)
(runConduit $ appSource server .| stdoutC)
doneWriting = traverse_ (`shutdown` ShutdownSend) . appRawSocket
Side note: you don't really need concurrency in the client in this case, since there will never be anything to read from the server until you're done writing to the server. You could just do the reading after the writing and shutdown.

Getting "stuck" using STM

I have the following Scotty app which tries to use STM to keep a count of API calls served:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Web.Scotty
import Data.Monoid (mconcat)
import Control.Concurrent.STM
import Control.Monad.IO.Class
main :: IO ()
main = do
counter <- newTVarIO 0
scotty 3000 $
get "/:word" $ do
liftIO $ atomically $ do
counter' <- readTVar counter
writeTVar counter (counter' + 1)
liftIO $ do
counter' <- atomically (readTVar counter)
print counter'
beam <- param "word"
html $ mconcat ["<h1>Scotty, ", beam, " me up!</h1>"]
I "load test" the API like this:
ab -c 100 -n 100000 http://127.0.0.1:3000/z
However, the API serves roughly about 16 thousand requests and then gets "stuck" - ab stops with error apr_socket_recv: Operation timed out (60).
I think I'm misusing STM, but not sure what I'm doing wrong. Any suggestions?
Quick guess here. 16,000 is about the number of available TCP ports. Is is possible you have not closed any connections and therefore run out of open ports for ab?

Can't pass data via stdin to process spawned with conduit-extra

In my program I am starting external process and communicate with it via stdin and stdout. I'm feeding the input through conduit (producer) started from STMs TQueue. It worked like a charm until I've decided to bump lts version. It worked great with lts <= 8.24.
Here is the minimized program that reproduces my problem:
#!/usr/bin/env stack
-- stack --resolver lts-10.4 --install-ghc runghc --package conduit-extra --package stm-conduit
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent
import Control.Monad.STM
import Control.Concurrent.STM.TQueue
import Data.Conduit
import qualified Data.Conduit.Binary as CB
import qualified Data.Conduit.List as CL
import Data.Conduit.Process (CreateProcess (..),
proc, sourceProcessWithStreams)
import qualified Data.Conduit.TQueue as CTQ
import qualified Data.ByteString.Char8 as BS
import Data.Monoid ((<>))
main :: IO ()
main = do
putStrLn "Enter \"exit\" to exit."
q <- open
putStrLn "connection opened"
loop q
where loop q = do
s <- BS.getLine
case s of
"exit" -> return ()
req -> do
atomically $ writeTQueue q req
loop q
open :: IO (TQueue BS.ByteString)
open = do
req <- atomically newTQueue
let chat :: CreateProcess
chat = proc "cat" []
input :: Producer IO BS.ByteString
input = toProducer
$ CTQ.sourceTQueue req
-- .| CL.mapM_ (\bs -> BS.putStrLn (("queue: " :: BS.ByteString) <> bs))
output :: Consumer BS.ByteString IO ()
output = toConsumer
$ CL.mapM_ BS.putStrLn
_ <- forkIO (sourceProcessWithStreams chat input output output >> pure ())
pure req
With newer lts it seems like the problem is not with communication via TQueue, as uncommenting the line which prints content from input conduit gives shows data from the queue. It looks like the spawned process never receives anything on it's stdin.
Furthermore writing to spawned cat stdin from console, like so:
echo "test" > /proc/<pid of spawned cat>/fd/0
produces output in my program.
Am I missing something that changed between versions?
So the issue was that default behaviour of sinkHandle was changed to not flush after every chunk of data.
I've fixed the issue by first porting to Data.Conduit.Process.Typed and then rolling my own variant of createSink that is using sinkHandleFlush instead of sinkHandle.

How do I shut down `runTCPServer`?

I'm writing a socket server with runTCPServer from conduit-extra (formerly known as network-conduit). My goal is to interact with my editor using this server --- activate the server from the editor (most likely just by calling external command), use it, and terminate the server when the work is done.
For simplicity, I start with a simple echo server, and let's say I'd like to shut down the whole process when the connection is closed.
So I tried:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Conduit
import Data.Conduit.Network
import Data.ByteString (ByteString)
import Control.Monad.IO.Class (liftIO)
import System.Exit (exitSuccess)
import Control.Exception
defaultPort :: Int
defaultPort = 4567
main :: IO ()
main = runTCPServer (serverSettings defaultPort "*") $ \ appData ->
appSource appData $$ conduit =$= appSink appData
conduit :: ConduitM ByteString ByteString IO ()
conduit = do
msg <- await
case msg of
Nothing -> liftIO $ do
putStrLn "Nothing left"
exitSuccess
-- I'd like the server to shut down here
(Just s) -> do
yield s
conduit
But this doesn't work -- the program continues to accept new connections. If I am not mistaken, this is because the thread listening to the connection we're dealing with exits with exitSuccess, but the entire process doesn't. So this is totally understandable, but I haven't been able to find a way to exit the whole process.
How do I terminate a server run by runTCPServer? Is runTCPServer something that's supposed to serve forever?
Here's a simple implementation of the idea described in comments:
main = do
mv <- newEmptyMVar
tid <- forkTCPServer (serverSettings defaultPort "*") $ \ appData ->
appSource appData $$ conduit mv =$= appSink appData
() <- takeMVar mv -- < -- wait for done signal
return ()
conduit :: MVar () -> ConduitM ByteString ByteString IO ()
conduit mv = do
msg <- await
case msg of
Nothing -> liftIO $ do
putStrLn "Nothing left"
putMVar mv () -- < -- signal that we're done
(Just s) -> do
yield s
conduit mv

Cloud Haskell hanging forever when sending messages to ManagedProcess

The Problem
Hello! I'm writing in Cloud Haskell a simple Server - Worker program. The problem is, that when I try to create ManagedProcess, after the server disovery step, my example hangs forever even while using callTimeout (which should break after 100 ms). The code is very simple, but I cannot find anything wrong with it.
I've posted the question on the mailing list also, but as far as I know the SO community, I canget the answer a lot faster here. If I get the answer from mailing list, I will postit here also.
Source Code
The Worker.hs:
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE ExistentialQuantification #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE TemplateHaskell #-}
module Main where
import Network.Transport (EndPointAddress(EndPointAddress))
import Control.Distributed.Process hiding (call)
import Control.Distributed.Process.Platform hiding (__remoteTable)
import Control.Distributed.Process.Platform.Async
import Control.Distributed.Process.Platform.ManagedProcess
import Control.Distributed.Process.Platform.Time
import Control.Distributed.Process.Platform.Timer (sleep)
import Control.Distributed.Process.Closure (mkClosure, remotable)
import Network.Transport.TCP (createTransport, defaultTCPParameters)
import Control.Distributed.Process.Node hiding (call)
import Control.Concurrent (threadDelay)
import GHC.Generics (Generic)
import Data.Binary (Binary)
import Data.Typeable (Typeable)
import Data.ByteString.Char8 (pack)
import System.Environment (getArgs)
import qualified Server as Server
main = do
[host, port, serverAddr] <- getArgs
Right transport <- createTransport host port defaultTCPParameters
node <- newLocalNode transport initRemoteTable
let addr = EndPointAddress (pack serverAddr)
srvID = NodeId addr
_ <- forkProcess node $ do
sid <- discoverServer srvID
liftIO $ putStrLn "x"
liftIO $ print sid
r <- callTimeout sid (Server.Add 5 6) 100 :: Process (Maybe Double)
liftIO $ putStrLn "x"
liftIO $ threadDelay (10 * 1000 * 1000)
threadDelay (10 * 1000 * 1000)
return ()
discoverServer srvID = do
whereisRemoteAsync srvID "serverPID"
reply <- expectTimeout 100 :: Process (Maybe WhereIsReply)
case reply of
Just (WhereIsReply _ msid) -> case msid of
Just sid -> return sid
Nothing -> discoverServer srvID
Nothing -> discoverServer srvID
The Server.hs:
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE ExistentialQuantification #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE TemplateHaskell #-}
module Server where
import Control.Distributed.Process hiding (call)
import Control.Distributed.Process.Platform hiding (__remoteTable)
import Control.Distributed.Process.Platform.Async
import Control.Distributed.Process.Platform.ManagedProcess
import Control.Distributed.Process.Platform.Time
import Control.Distributed.Process.Platform.Timer (sleep)
import Control.Distributed.Process.Closure (mkClosure, remotable)
import Network.Transport.TCP (createTransport, defaultTCPParameters)
import Control.Distributed.Process.Node hiding (call)
import Control.Concurrent (threadDelay)
import GHC.Generics (Generic)
import Data.Binary (Binary)
import Data.Typeable (Typeable)
data Add = Add Double Double
deriving (Typeable, Generic)
instance Binary Add
launchServer :: Process ProcessId
launchServer = spawnLocal $ serve () (statelessInit Infinity) server >> return () where
server = statelessProcess { apiHandlers = [ handleCall_ (\(Add x y) -> liftIO (putStrLn "!") >> return (x + y)) ]
, unhandledMessagePolicy = Drop
}
main = do
Right transport <- createTransport "127.0.0.1" "8080" defaultTCPParameters
node <- newLocalNode transport initRemoteTable
_ <- forkProcess node $ do
self <- getSelfPid
register "serverPID" self
liftIO $ putStrLn "x"
mid <- launchServer
liftIO $ putStrLn "y"
r <- call mid (Add 5 6) :: Process Double
liftIO $ print r
liftIO $ putStrLn "z"
liftIO $ threadDelay (10 * 1000 * 1000)
liftIO $ putStrLn "z2"
threadDelay (10 * 1000 * 1000)
return ()
We can run them as follow:
runhaskell Server.hs
runhaskell Worker.hs 127.0.0.2 8080 127.0.0.1:8080:0
The Results
When we run the programs, we got following results:
from Server:
x
y
!
11.0 -- this one shows that inside the same process we were able to use the "call" function
z
-- waiting - all the output above were tests from inside the server now it waits for external messages
from Worker:
x
pid://127.0.0.1:8080:0:10 -- this is the process id of the server optained with whereisRemoteAsync
-- waiting forever on the "callTimeout sid (Server.Add 5 6) 100" code!
As a sidenote - I've found out that, when sending messages with send (from Control.Distributed.Process) and reciving them with expect works. But sending them with call (from Control.Distributed.Process.Platform) and trying to recive them with ManagedProcess api handlers - hangs the call forever (even using callTimeout!)
Your client is getting an exception, which you are not able to observe easily because you are running your client in a forkProcess. If you want to do that it is fine but then you need to monitor or link to that process. In this case, simply using runProcess would be much simpler. If you do that, you will see you get this exception:
Worker.hs: trying to call fromInteger for a TimeInterval. Cannot guess units
callTimeout does not take an Integer, it takes a TimeInterval which are constructed with the functions in the Time module. This is a pseudo-Num - it does not actually support fromInteger it seems. I would consider that a bug or at least bad form (in Haskell) but in any case the way to fix your code is simply
r <- callTimeout sid (Server.Add 5 6) (milliSeconds 100) :: Process (Maybe Double)
To fix the problem with the client calling into the server, you need to register the pid of the server process you spawned rather than the main process you spawn it from - i.e. change
self <- getSelfPid
register "serverPID" self
liftIO $ putStrLn "x"
mid <- launchServer
liftIO $ putStrLn "y"
to
mid <- launchServer
register "serverPID" mid
liftIO $ putStrLn "y"

Resources