Getting "stuck" using STM - haskell

I have the following Scotty app which tries to use STM to keep a count of API calls served:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Web.Scotty
import Data.Monoid (mconcat)
import Control.Concurrent.STM
import Control.Monad.IO.Class
main :: IO ()
main = do
counter <- newTVarIO 0
scotty 3000 $
get "/:word" $ do
liftIO $ atomically $ do
counter' <- readTVar counter
writeTVar counter (counter' + 1)
liftIO $ do
counter' <- atomically (readTVar counter)
print counter'
beam <- param "word"
html $ mconcat ["<h1>Scotty, ", beam, " me up!</h1>"]
I "load test" the API like this:
ab -c 100 -n 100000 http://127.0.0.1:3000/z
However, the API serves roughly about 16 thousand requests and then gets "stuck" - ab stops with error apr_socket_recv: Operation timed out (60).
I think I'm misusing STM, but not sure what I'm doing wrong. Any suggestions?

Quick guess here. 16,000 is about the number of available TCP ports. Is is possible you have not closed any connections and therefore run out of open ports for ab?

Related

Save state in memory [Haskell Server]

I am attempting to create a server that returns two different values from a route depending if a user has visited it before. I have the following code:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Web.Scotty
main = do
putStrLn "Starting Server..."
scotty 3000 $ do
get "/" $ do
-- if first time
text "hello!"
-- if second time
text "hello, again!"
I have two questions:
1. How can I check if a user has requested the route before?
2. Where and how can I persist application state?
You can use STM to keep a mutable variable in memory:
import Control.Concurrent.STM.TVar
main = do
putStrLn "Starting Server..."
state <- newTVarIO :: IO VisitorsData
scotty 3000 $ do
get "/" $ do
visitorsData <- readTVarIO state
-- if the visitor's ID/cookie is in visitorsData
text "hello!"
-- if new visitor, add them to visitorsData
atomically $ modifyTVar state $ insertVisitor visitorId visitorsData
-- if second time
text "hello, again!"
(If you expect to scale this to a complex server, you'll want to pass the TVar around in the form of a ReaderT pattern)

Waiting until a file stops being modified

I'm trying to use hinotify and STM to make a simple concept:
Block the thread of execution until the watched file stops being modified
Continue once modifications stop, or their interval is larger than some time threshold (debounces)
Currently, I'm trying to use a TSem to get this working correctly, but I keep running into either of these problems:
the thread doesn't block at all, and I end up removing the hinotify watcher before it even starts, throwing an exception
the thread blocks indefinitely, causing STM to throw an exception
the program prints 3 times (3 concurrent notifications), but only lasts for 1 second and not 10
The code I've written is below, and can be checked out on github to see for yourself.
module Main where
import System.INotify
import System.Environment (getArgs)
import Control.Concurrent (forkIO, threadDelay)
import Control.Concurrent.STM
import Control.Concurrent.STM.TSem
import Control.Concurrent.STM.TVar
import Control.Monad (forM_)
main :: IO ()
main = do
[file] <- getArgs
-- make changes every 1/10th of a second for 10 seconds
forkIO $ forM_ [0..100] $ \s -> do
appendFile file $ show s
threadDelay (second `div` 10)
debouncer <- atomically $ newTSem 0
notif <- initINotify
expectation <- newTVarIO (0 :: Int)
watcher <- addWatch notif [Modify] file $ \e -> do
e' <- atomically $ do
modifyTVar expectation (+1)
readTVar expectation
print e
threadDelay second
e'' <- readTVarIO expectation
if e' == e''
then atomically $ signalTSem debouncer
else pure ()
atomically $ waitTSem debouncer
removeWatch watcher
killINotify notif
second = 1000000
Do you see anything immediately wrong with what I'm trying to do?
Does it have to be STM? You can achieve you goal with ordinary MVars:
#!/usr/bin/env stack
{- stack
--resolver lts-7.9
--install-ghc runghc
--package hinotify
--package stm
-}
import System.INotify
import System.Environment (getArgs)
import Control.Concurrent (forkIO, threadDelay)
import Control.Concurrent.MVar (newMVar, newEmptyMVar, readMVar, swapMVar, putMVar, takeMVar, modifyMVar_)
import Control.Monad (forM_, forever)
main :: IO ()
main = do
[file] <- getArgs
mainBlocker <- newEmptyMVar
tickCounter <- newMVar 0
-- make changes every 1/10th of a second for 10 seconds
forkIO $ forM_ [0..100] $ \s -> do
appendFile file $ show s
threadDelay (second `div` 10)
-- set up file watches
notif <- initINotify
watcher <- addWatch notif [Modify] file $ \e -> do
swapMVar tickCounter 10
print "file has been modified; reset ticks to 10"
-- 'decreaser' thread
forkIO $ forever $ do
threadDelay second
ticks <- readMVar tickCounter
print $ "current ticks in decreaser thread: " ++ show ticks
if ticks <= 0
then putMVar mainBlocker ()
else modifyMVar_ tickCounter (\v -> return (v-1))
takeMVar mainBlocker
print "exiting..."
removeWatch watcher
killINotify notif
second = 1000000
The idea is a 'tick' counter that gets set to 10 whenever the file has been modified. A separate thread tries to count down to 0 and, when it succeeds, releases the block of the main thread.
If you use stack you can execute the code as a script like this:
stack theCode.hs fileToBeWatched

How to exit a conduit when using mergeSources

I have a simple forked conduit setup, with two inputs feeding one single output....
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent (threadDelay)
import Control.Monad.IO.Class
import Control.Monad.Trans.Resource
import qualified Data.ByteString as B
import Data.Conduit
import Data.Conduit.TMChan
import Data.Conduit.Network
main::IO ()
main = do
runTCPClient (clientSettings 3000 "127.0.0.1") $ \server -> do
runResourceT $ do
input <- mergeSources [
transPipe liftIO (appSource server),
infiniteSource
] 2
input $$ transPipe liftIO (appSink server)
infiniteSource::MonadIO m=>Source m B.ByteString
infiniteSource = do
liftIO $ threadDelay 10000000
yield "infinite source"
infiniteSource
(here I connect to a tcp socket, then combine the socket input with a timed infinite source, then respond back to the socket)
This works great, until the connection is dropped.... Because the second input still exists, the conduit keeps running. (In this case, the program does end when the timed input fires and there is no socket to write to, but this isn't always the case in my real example).
What is the proper way to shut down the full conduit when one of the inputs is closed?
I tried to brute force a crash by adding the following
crashOnEndOfStream::MonadIO m=>Conduit B.ByteString m B.ByteString
crashOnEndOfStream = do
awaitForever $ yield
error "the peer connection has disconnected" --tried with error
liftIO $ exitWith ExitSuccess --also tried with exitWith
but because the input conduit runs in a thread, the executable was immune to runtime exceptions shutting it down (plus, there is probably a smoother way to shut stuff down than halting the program).
the Source created by mergeSources keeps a count of unclosed sources. It's only closed when the count reaches 0 i.e. every upstream source is closed. This mechanism and the underlying TBMChannel is hidden from user code so you have no way to change its behavior.
One possible solution is to create the channel and the source manually with some medium-level functions exported by Data.Conduit.TMChan, so you can finalize the source by closing the TBMChannel. I haven't tested the code below since your program is not runnable on my machine.
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent (threadDelay)
import Control.Monad.IO.Class
import Control.Monad.Trans.Resource
import qualified Data.ByteString as B
import Data.Conduit
import Data.Conduit.Network
import Data.Conduit.TMChan
main::IO ()
main = do
runTCPClient (clientSettings 3000 "127.0.0.1") $ \server -> do
runResourceT $ do
-- create the TBMChannel
chan <- liftIO $ newTBMChanIO 2
let
-- everything piped to the sink will appear at the source
chanSink = sinkTBMChan chan True
chanSource = sourceTBMChan chan
tid1 <- resourceForkIO $ appSource server $$ chanSink
tid2 <- resourceForkIO $ infiniteSource $$ chanSink
chanSource $$ transPipe liftIO (appSink server)
-- and call 'closeTBMChan chan' when you want to exit.
-- 'chanSource' will be closed when the underlying TBMChannel is closed.
infiniteSource :: MonadIO m => Source m B.ByteString
infiniteSource = do
liftIO $ threadDelay 10000000
yield "infinite source"
infiniteSource

Cloud Haskell hanging forever when sending messages to ManagedProcess

The Problem
Hello! I'm writing in Cloud Haskell a simple Server - Worker program. The problem is, that when I try to create ManagedProcess, after the server disovery step, my example hangs forever even while using callTimeout (which should break after 100 ms). The code is very simple, but I cannot find anything wrong with it.
I've posted the question on the mailing list also, but as far as I know the SO community, I canget the answer a lot faster here. If I get the answer from mailing list, I will postit here also.
Source Code
The Worker.hs:
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE ExistentialQuantification #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE TemplateHaskell #-}
module Main where
import Network.Transport (EndPointAddress(EndPointAddress))
import Control.Distributed.Process hiding (call)
import Control.Distributed.Process.Platform hiding (__remoteTable)
import Control.Distributed.Process.Platform.Async
import Control.Distributed.Process.Platform.ManagedProcess
import Control.Distributed.Process.Platform.Time
import Control.Distributed.Process.Platform.Timer (sleep)
import Control.Distributed.Process.Closure (mkClosure, remotable)
import Network.Transport.TCP (createTransport, defaultTCPParameters)
import Control.Distributed.Process.Node hiding (call)
import Control.Concurrent (threadDelay)
import GHC.Generics (Generic)
import Data.Binary (Binary)
import Data.Typeable (Typeable)
import Data.ByteString.Char8 (pack)
import System.Environment (getArgs)
import qualified Server as Server
main = do
[host, port, serverAddr] <- getArgs
Right transport <- createTransport host port defaultTCPParameters
node <- newLocalNode transport initRemoteTable
let addr = EndPointAddress (pack serverAddr)
srvID = NodeId addr
_ <- forkProcess node $ do
sid <- discoverServer srvID
liftIO $ putStrLn "x"
liftIO $ print sid
r <- callTimeout sid (Server.Add 5 6) 100 :: Process (Maybe Double)
liftIO $ putStrLn "x"
liftIO $ threadDelay (10 * 1000 * 1000)
threadDelay (10 * 1000 * 1000)
return ()
discoverServer srvID = do
whereisRemoteAsync srvID "serverPID"
reply <- expectTimeout 100 :: Process (Maybe WhereIsReply)
case reply of
Just (WhereIsReply _ msid) -> case msid of
Just sid -> return sid
Nothing -> discoverServer srvID
Nothing -> discoverServer srvID
The Server.hs:
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE ExistentialQuantification #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE TemplateHaskell #-}
module Server where
import Control.Distributed.Process hiding (call)
import Control.Distributed.Process.Platform hiding (__remoteTable)
import Control.Distributed.Process.Platform.Async
import Control.Distributed.Process.Platform.ManagedProcess
import Control.Distributed.Process.Platform.Time
import Control.Distributed.Process.Platform.Timer (sleep)
import Control.Distributed.Process.Closure (mkClosure, remotable)
import Network.Transport.TCP (createTransport, defaultTCPParameters)
import Control.Distributed.Process.Node hiding (call)
import Control.Concurrent (threadDelay)
import GHC.Generics (Generic)
import Data.Binary (Binary)
import Data.Typeable (Typeable)
data Add = Add Double Double
deriving (Typeable, Generic)
instance Binary Add
launchServer :: Process ProcessId
launchServer = spawnLocal $ serve () (statelessInit Infinity) server >> return () where
server = statelessProcess { apiHandlers = [ handleCall_ (\(Add x y) -> liftIO (putStrLn "!") >> return (x + y)) ]
, unhandledMessagePolicy = Drop
}
main = do
Right transport <- createTransport "127.0.0.1" "8080" defaultTCPParameters
node <- newLocalNode transport initRemoteTable
_ <- forkProcess node $ do
self <- getSelfPid
register "serverPID" self
liftIO $ putStrLn "x"
mid <- launchServer
liftIO $ putStrLn "y"
r <- call mid (Add 5 6) :: Process Double
liftIO $ print r
liftIO $ putStrLn "z"
liftIO $ threadDelay (10 * 1000 * 1000)
liftIO $ putStrLn "z2"
threadDelay (10 * 1000 * 1000)
return ()
We can run them as follow:
runhaskell Server.hs
runhaskell Worker.hs 127.0.0.2 8080 127.0.0.1:8080:0
The Results
When we run the programs, we got following results:
from Server:
x
y
!
11.0 -- this one shows that inside the same process we were able to use the "call" function
z
-- waiting - all the output above were tests from inside the server now it waits for external messages
from Worker:
x
pid://127.0.0.1:8080:0:10 -- this is the process id of the server optained with whereisRemoteAsync
-- waiting forever on the "callTimeout sid (Server.Add 5 6) 100" code!
As a sidenote - I've found out that, when sending messages with send (from Control.Distributed.Process) and reciving them with expect works. But sending them with call (from Control.Distributed.Process.Platform) and trying to recive them with ManagedProcess api handlers - hangs the call forever (even using callTimeout!)
Your client is getting an exception, which you are not able to observe easily because you are running your client in a forkProcess. If you want to do that it is fine but then you need to monitor or link to that process. In this case, simply using runProcess would be much simpler. If you do that, you will see you get this exception:
Worker.hs: trying to call fromInteger for a TimeInterval. Cannot guess units
callTimeout does not take an Integer, it takes a TimeInterval which are constructed with the functions in the Time module. This is a pseudo-Num - it does not actually support fromInteger it seems. I would consider that a bug or at least bad form (in Haskell) but in any case the way to fix your code is simply
r <- callTimeout sid (Server.Add 5 6) (milliSeconds 100) :: Process (Maybe Double)
To fix the problem with the client calling into the server, you need to register the pid of the server process you spawned rather than the main process you spawn it from - i.e. change
self <- getSelfPid
register "serverPID" self
liftIO $ putStrLn "x"
mid <- launchServer
liftIO $ putStrLn "y"
to
mid <- launchServer
register "serverPID" mid
liftIO $ putStrLn "y"

QSem doesn't seem to block threads

I'm writing a simple script to run bunch of tasks in parallel using the Shelly library but I want to limit the max number of tasks running at any one time. The script takes a file with an input on each line and runs a task for that input. There are a few hundred inputs in the file and I want to limit to around 16 processes at a time.
The current script actually limits to 1 (well tries to) using a QSem with an initial count of 1. I seem to be missing something though because when I run on a test file with 4 inputs I see this:
Starting
Starting
Starting
Starting
Done
Done
Done
Done
So the threads are not blocking on the QSem as I would expect, they're all running simultaneously. I've even gone so far as to implement my own semaphores both on MVar and TVar and neither worked the way I expected. I'm obviously missing something fundamental but what? I've also tried compiling the code and running it as a binary.
#!/usr/bin/env runhaskell
{-# LANGUAGE TemplateHaskell, QuasiQuotes, DeriveDataTypeable, OverloadedStrings #-}
import Shelly
import Prelude hiding (FilePath)
import Text.Shakespeare.Text (lt)
import qualified Data.Text.Lazy as LT
import Control.Monad (forM)
import System.Environment (getArgs)
import qualified Control.Concurrent.QSem as QSem
import Control.Concurrent (forkIO, MVar, putMVar, newEmptyMVar, takeMVar)
-- Define max number of simultaneous processes
maxProcesses :: IO QSem.QSem
maxProcesses = QSem.newQSem 1
bkGrnd :: ShIO a -&gt ShIO (MVar a)
bkGrnd proc = do
mvar &lt- liftIO newEmptyMVar
_ &lt- liftIO $ forkIO $ do
-- Block until there are free processes
sem &lt- maxProcesses
QSem.waitQSem sem
putStrLn "Starting"
-- Run the shell command
result &lt- shelly $ silently proc
liftIO $ putMVar mvar result
putStrLn "Done"
-- Signal that this process is done and another can run.
QSem.signalQSem sem
return mvar
main :: IO ()
main = shelly $ silently $ do
[img, file] &lt- liftIO $ getArgs
contents &lt- readfile $ fromText $ LT.pack file
-- Run a backgrounded process for each line of input.
results &lt- forM (LT.lines contents) $ \line -> bkGrnd $ do
runStdin &ltcommand> &ltarguments>
liftIO $ mapM_ takeMVar results
As I said in my comment, each call to bkGrnd creates its own semaphonre, allowing every thread to continue without waiting. I would try something like this instead, where the semaphore is created in the main and passed each time to bkGrnd.
bkGrnd :: QSem.QSem -> ShIO a -> ShIO (MVar a)
bkGrnd sem proc = do
mvar <- liftIO newEmptyMVar
_ <- liftIO $ forkIO $ do
-- Block until there are free processes
QSem.waitQSem sem
--
-- code continues as before
--
main :: IO ()
main = shelly $ silently $ do
[img, file] <- liftIO $ getArgs
contents <- readfile $ fromText $ LT.pack file
sem <- maxProcesses
-- Run a backgrounded process for each line of input.
results <- forM (LT.lines contents) $ \line -> bkGrnd sem $ do
runStdin <command> <arguments>
liftIO $ mapM_ takeMVar results
You have an answer, but I need to add: QSem and QSemN are not thread safe if killThread or asynchronous thread death is possible.
My bug report and patch are GHC trac ticket #3160. The fixed code is available as a new library called SafeSemaphore with module Control.Concurrent.MSem, MSemN, MSampleVar, and a bonus FairRWLock.
Isn't it better
bkGrnd sem proc = do
QSem.waitQSem sem
mvar <- liftIO newEmptyMVar
_ <- liftIO $ forkIO $ do
...
so not even forkIO until you get the semaphore?

Resources