How does hgearman worker work? - haskell

I asked a question how does hgearman-client work? three weeks ago. With some help I wrote a simple client application and I'm working on the worker side now. The worker implementation below compiles well and runs without any exception. The only trouble is that W.runWorker gc (return g) will not be executed. If I understand it correctly, it's the result of Haskell laziness and return t Monad wrapping. But I haven't faintest idea how to get rid of the issue. Could some one help?
import qualified Control.Monad.State as S
import qualified Data.ByteString.Char8 as B
import qualified Network.Gearman.Client as C
import qualified Network.Gearman.Worker as W
import Network.Gearman.Internal (Function, Port)
import Network.Socket (HostName)
main :: IO ()
main = do
c <- connect
case c of
Left e -> error $ B.unpack e
Right gc -> do
(res, _) <- flip S.runStateT gc $ do
g <- (W.registerWorker name func)
let t = W.runWorker gc (return g)
return t >> return ()
return res
where
connect = C.connectGearman (B.pack "i") host port
host = "localhost"::HostName
port = 4730::Port
name = (B.pack "foo")::Function
func _ = B.pack "bar"
Unfortunately an attempt to bind t <- W.runWorker ends in compiler exception. If I change the code in this way:
Right gc -> do
(res, _) <- flip S.runStateT gc $ do
g <- (W.registerWorker name func)
t <- W.runWorker gc (return ())
return t >> return ()
return res
The compilation fails with exception:
Couldn't match expected type `S.StateT
Network.Gearman.Internal.GearmanClient IO a0'
with actual type `IO GHC.Conc.Sync.ThreadId'
In a stmt of a 'do' block: t <- W.runWorker gc (return ())
In the second argument of `($)', namely
`do { g <- (W.registerWorker name func);
t <- W.runWorker gc (return ());
return t >> return () }'
IO GHC.Conc.Sync.ThreadId is the result of runWorker.

A value of type Gearman a for some a is an action, a recipe to do something. You can bind together such recipes to make bigger recipes, until you have constructed the main recipe which is the one that gets run.
Practically speaking, it means if you are running a do-block that looks like this:
do ...
foo
...
then foo will be run. If you have a do-block that looks like this:
do ...
ret <- foo
...
then foo will be run and the result of running foo will be stored in ret. Those two syntaxes are binds. However if you are running a do-block that looks like this:
do ...
let ret = foo
...
then foo will not be run -- instead you are just asking for the variable ret to be shorthand for foo, so foo and ret are afterwards interchangeable.
So now you can see, in:
do g <- W.registerWorker name func
let t = W.runWorker gc (return g)
return t >> return ()
the second line does not actually run a worker, it just lets t be a shorthand for running a worker. Returning an action does not bind it either. You need to bind:
t <- W.runWorker gc (return g)
By the way, I've been looking at the documentation and it looks like registerWorker returns a Gearman (), which means that the result of running the action is (), or "nothing interesting". So g is nothing interesting, you could get rid of it and say
do W.registerWorker name func
t <- W.runWorker gc (return ())
return t >> return ()
Presumably in place of that return () in the second line you would put an action that you wanted to run in the worker. Like:
t <- W.runWorker gc $ do
... the things you want the worker to do ...
return t >> return ()
And finally that last line: return t >> return (), also written
do return t
return ()
is exactly the same thing as return (). return x constructs an action with no side-effects, which is only used for the result. Then when you use >> (or don't bind the result in a do block) you run an action only for its side-effects and discard its result. So the first return does nothing whatsoever.

Finally I implemented a gearman worker in Haskell.
{-# LANGUAGE DeriveDataTypeable #-}
import Control.Exception (Exception, IOException, catch, throwIO)
import qualified Data.ByteString.Char8 as B
import Control.Monad.State
import Data.Typeable (Typeable)
import qualified Network.Gearman.Client as C
import qualified Network.Gearman.Worker as W
import Network.Gearman.Internal (Function, GearmanClient, Port)
import Network.Socket (HostName)
import Control.Concurrent
import qualified Control.Monad.State as S
data ConnectException = ConnectException HostName Port IOException
deriving (Show, Typeable)
instance Exception ConnectException
main :: IO ()
main = do
c <- connect
gc <- either (error . B.unpack) return c
work gc
return ()
where
connect = C.connectGearman (B.pack "worker-id") host port `catch` \e -> throwIO (ConnectException host port e)
host = "localhost"::HostName
port = 4730::Port
work :: GearmanClient -> IO ()
work gc = do
(res, _) <- flip S.runStateT gc $ do
W.registerWorker (B.pack "reverse"::Function) B.reverse
S.get >>= (\env -> forever $ S.liftIO (W.runWorker env (return ()) >> threadDelay (1000*1000)))
return ()
return res

Related

Data.ConfigFile not using the Bool instance of get

According to https://hackage.haskell.org/package/ConfigFile-1.0.5/docs/Data-ConfigFile.html, the package will convert a value in a config. file to a Bool. The following code:
{-# LANGUAGE FlexibleContexts #-}
import qualified Data.ConfigFile as DC
import qualified Control.Monad.Except as CME
-- | The foundation object
data JRState = JRState {
secureOnly :: Bool -- ^ restrict connections to HTTPS
}
main :: IO ()
main = (CME.runExceptT $ pipe (JRState False)) >>= estate
estate :: Show t => Either t JRState -> IO ()
estate (Right state) = return ()
estate (Left err) = do
putStrLn $ "<<" ++ show err ++ ">>"
return ()
pipe :: (CME.MonadError DC.CPError m, CME.MonadIO m) => JRState -> m JRState
pipe site = do
cp <- CME.join $ CME.liftIO $ return $ DC.readstring DC.emptyCP{DC.optionxform=id} "secureSession = True\n"
DC.get cp "DEFAULT" "secureSession" >>= return . nubb where
nubb (Left err) = error err
nubb (Right value) = site{secureOnly = value}
when run, produces
<<(ParseError "couldn't parse value True from (DEFAULT/secureSession)","genericget")>>
which has obviously come from the putStrLn in estate. But I would expect that the extraction of the value, in pipe and nubb (silly names, I know) would force a Boolean context and thus force the conversion of the True string to a Bool. I've tried 1 and Yes with the same result. What's going on?
Here is a more minimal program with similarly problematic behavior:
import qualified Data.ConfigFile as DC
import qualified Control.Monad.Except as CME
main = CME.runExceptT pipe >>= print
pipe = do
cp <- DC.readstring DC.emptyCP{DC.optionxform=id} "secureSession = True\n"
DC.get cp "DEFAULT" "secureSession" >>= nubb
nubb :: Either String Bool -> m Bool
nubb = undefined
When it's stripped down to this bare-bones form, it's obvious what has gone wrong: you are asking DC.get to return an Either String Bool when in fact you should simply be asking it to return a Bool. Simple fix for the stripped-down version is to eliminate the >>= nubb part of that line entirely; it should be easy to translate this fix back into your bigger context.

How does ManagedProcess in Cloud Haskell work?

I'm following this tutorial and looking at the test case in source code.
My code use SimplePool.hs in the source code and created the following file: (snippet)
sampleTask :: (TimeInterval, String) -> Process String
sampleTask (t, s) = sleep t >> return s
$(remotable ['sampleTask])
jobTest :: MVar (AsyncResult (Either String String)) -> Process ()
jobTest result = do
pid <- startTestPool 1 -- start the pool of workers here only one worker
job <- return $ ($(mkClosure 'sampleTask) (seconds 2, "foobar"))
-- callAsync put job into pool
p <- callAsync pid job
a <- wait p
setResult result a
where
setResult :: MVar a -> a -> Process ()
setResult mvar x = liftIO $ putMVar mvar x
startTestPool :: Int -> Process ProcessId
startTestPool s = spawnLocal $ do
_ <- runPool s
return ()
runPool :: Int -> Process (Either (InitResult (Pool String)) TerminateReason)
runPool s =
-- setting a to String
let s' = poolServer :: ProcessDefinition (Pool String)
in simplePool s s'
myRemoteTable :: RemoteTable
myRemoteTable = Control.Distributed.Process.Platform.__remoteTable initRemoteTable
main :: IO ()
main = do
Right (transport, _) <- createTransportExposeInternals
"127.0.0.1" "9901" defaultTCPParameters
localNode <- newLocalNode transport myRemoteTable
result <- newEmptyMVar
pid <- forkProcess localNode $ jobTest result
ans <- takeMVar result
putStrLn $ show pid
putStrLn $ show ans
I'm getting this error once I run it:
AsyncFailed (DiedException "exit-from=pid://127.0.0.1:9901:0:6")
Correct me if I'm wrong, I assume the job did not run correctly, must be some problem with the slave process.p <- callAsync pid job this line of code I think is where the task is passed on to slave process for execution. I looked into the library to find the definition of callAsync. The key line in callAsyncUsing is sendTo sid (CallMessage msg (Pid wpid)) where the function passes the task to the poolServer.
SimplePool.hs in the acceptTask the line asyncHandle <- async proc is where I think they spawn a new process to execute the task. So I think maybe the async process didn't finish running cause the the caller terminated prematurely? Or could it be that the process didn't spawn correctly? Any idea on what the best way to debug this? Also, can someone point me in the right direction to finding out how to make the poolSever span different nodes/different computers (using Control.Distributed.Process.Platform.Async.AsyncChan?)?
I have modified your code slightly, and this snippet includes the imports, so it compiles. Make sure that you are using the latest SimplePool module, as your code is using simplePool which I cannot find, and your use of runPool is ambiguous.
{-# LANGUAGE TemplateHaskell #-}
import Control.Concurrent.MVar
import Control.Exception (SomeException)
import Control.Distributed.Process hiding (call)
import Control.Distributed.Process.Closure
import Control.Distributed.Process.Node
import Control.Distributed.Process.Platform hiding (__remoteTable)
import Control.Distributed.Process.Platform.Async
import Control.Distributed.Process.Platform.ManagedProcess
import Control.Distributed.Process.Platform.Test
import Control.Distributed.Process.Platform.Time
import Control.Distributed.Process.Platform.Timer
import Control.Distributed.Process.Serializable()
import Network.Transport
import Network.Transport.TCP
import Data.Binary
import Data.Typeable (Typeable)
import SimplePool hiding (runPool)
import qualified SimplePool (runPool)
sampleTask :: (TimeInterval, String) -> Process String
sampleTask (t, s) = sleep t >> return s
$(remotable ['sampleTask])
jobTest :: MVar (AsyncResult (Either String String)) -> Process ()
jobTest result = do
pid <- startTestPool 1 -- start the pool of workers here only one worker
let job = $(mkClosure 'sampleTask) (seconds 2, "foobar")
-- callAsync put job into pool
p <- callAsync pid job
a <- wait p
setResult result a
where
setResult :: MVar a -> a -> Process ()
setResult mvar x = liftIO $ putMVar mvar x
startTestPool :: Int -> Process ProcessId
startTestPool s = spawnLocal $ do
_ <- runPool s
return ()
runPool :: Int -> Process (Either (InitResult (Pool String)) TerminateReason)
runPool = SimplePool.runPool
myRemoteTable :: RemoteTable
myRemoteTable = Main.__remoteTable initRemoteTable
main :: IO ()
main = do
Right (transport, _) <- createTransportExposeInternals
"127.0.0.1" "9901" defaultTCPParameters
localNode <- newLocalNode transport myRemoteTable
result <- newEmptyMVar
pid <- forkProcess localNode $ jobTest result
ans <- takeMVar result
print pid >> print ans
Running this compilable code:
$ ./Example
pid://127.0.0.1:9901:0:3
AsyncDone (Right "foobar")
Please note that the SimplePool sample module from the distributed-process-platform test suite has been promoted to a fully fledged component of the library. Its new location on the latest (development) branch is https://github.com/haskell-distributed/distributed-process-platform/blob/development/src/Control/Distributed/Process/Platform/Task/Queue/BlockingQueue.hs.
Some names/types have changed, so you may need to update your code in order to continue using it.

Pipes and callbacks in Haskell

I'm processing some audio using portaudio. The haskell FFI bindings call a user defined callback whenever there's audio data to be processed. This callback should be handled very quickly and ideally with no I/O. I wanted to save the audio input and return quickly since my application doesn't need to react to the audio in realtime (right now I'm just saving the audio data to a file; later I'll construct a simple speech recognition system).
I like the idea of pipes and thought I could use that library. The problem is that I don't know how to create a Producer that returns data that came in through a callback.
How do I handle my use case?
Here's what I'm working with right now, in case that helps (the datum mvar isn't working right now but I don't like storing all the data in a seq... I'd rather process it as it came instead of just at the end):
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
module Main where
import Codec.Wav
import Sound.PortAudio
import Sound.PortAudio.Base
import Sound.PortAudio.Buffer
import Foreign.Ptr
import Foreign.ForeignPtr
import Foreign.C.Types
import Foreign.Storable
import qualified Data.StorableVector as SV
import qualified Data.StorableVector.Base as SVB
import Control.Exception.Base (evaluate)
import Data.Int
import Data.Sequence as Seq
import Control.Concurrent
instance Buffer SV.Vector a where
fromForeignPtr fp = return . SVB.fromForeignPtr fp
toForeignPtr = return . (\(a, b, c) -> (a, c)) . SVB.toForeignPtr
-- | Wrap a buffer callback into the generic stream callback type.
buffCBtoRawCB' :: (StreamFormat input, StreamFormat output, Buffer a input, Buffer b output) =>
BuffStreamCallback input output a b -> StreamCallback input output
buffCBtoRawCB' func = \a b c d e -> do
fpA <- newForeignPtr_ d -- We will not free, as callback system will do that for us
fpB <- newForeignPtr_ e -- We will not free, as callback system will do that for us
storeInp <- fromForeignPtr fpA (fromIntegral $ 1 * c)
storeOut <- fromForeignPtr fpB (fromIntegral $ 0 * c)
func a b c storeInp storeOut
callback :: MVar (Seq.Seq [Int32]) -> PaStreamCallbackTimeInfo -> [StreamCallbackFlag] -> CULong
-> SV.Vector Int32 -> SV.Vector Int32 -> IO StreamResult
callback seqmvar = \timeinfo flags numsamples input output -> do
putStrLn $ "timeinfo: " ++ show timeinfo ++ "; flags are " ++ show flags ++ " in callback with " ++ show numsamples ++ " samples."
print input
-- write data to output
--mapM_ (uncurry $ pokeElemOff output) $ zip (map fromIntegral [0..(numsamples-1)]) datum
--print "wrote data"
input' <- evaluate $ SV.unpack input
modifyMVar_ seqmvar (\s -> return $ s Seq.|> input')
case flags of
[] -> return $ if unPaTime (outputBufferDacTime timeinfo) > 0.2 then Complete else Continue
_ -> return Complete
done doneMVar = do
putStrLn "total done dood!"
putMVar doneMVar True
return ()
main = do
let samplerate = 16000
Nothing <- initialize
print "initialized"
m <- newEmptyMVar
datum <- newMVar Seq.empty
Right s <- openDefaultStream 1 0 samplerate Nothing (Just $ buffCBtoRawCB' (callback datum)) (Just $ done m)
startStream s
_ <- takeMVar m -- wait until our callbacks decide they are done!
Nothing <- terminate
print "let's see what we've recorded..."
stuff <- takeMVar datum
print stuff
-- write out wav file
-- let datum =
-- audio = Audio { sampleRate = samplerate
-- , channelNumber = 1
-- , sampleData = datum
-- }
-- exportFile "foo.wav" audio
print "main done"
The simplest solution is to use MVars to communicate between the callback and Producer. Here's how:
import Control.Proxy
import Control.Concurrent.MVar
fromMVar :: (Proxy p) => MVar (Maybe a) -> () -> Producer p a IO ()
fromMVar mvar () = runIdentityP loop where
loop = do
ma <- lift $ takeMVar mvar
case ma of
Nothing -> return ()
Just a -> do
respond a
loop
Your stream callback will write Just input to the MVar and your finalization callback will write Nothing to terminate the Producer.
Here's a ghci example demonstrating how it works:
>>> mvar <- newEmptyMVar :: IO (MVar (Maybe Int))
>>> forkIO $ runProxy $ fromMVar mvar >-> printD
>>> putMVar mvar (Just 1)
1
>>> putMVar mvar (Just 2)
2
>>> putMVar mvar Nothing
>>> putMVar mvar (Just 3)
>>>
Edit: The pipes-concurrency library now provides this feature, and it even has a section in the tutorial explaining specifically how to use it to get data out of callbacks.

How can I refactor this without IORefs?

How could I refactor this so that eventually IORefs would not be necessary?
inc :: IORef Int -> IO ()
inc ref = modifyIORef ref (+1)
main = withSocketsDo $ do
s <- socket AF_INET Datagram defaultProtocol
c <- newIORef 0
f <- newIORef 0
hostAddr <- inet_addr host
time $ forM [0 .. 10000] $ \i -> do
sendAllTo s (B.pack "ping") (SockAddrInet port hostAddr)
(r, _) <- recvFrom s 1024
if (B.unpack r) == "PING" then (inc c) else (inc f)
c' <- readIORef c
print (c')
sClose s
return()
What's wrong with using IORefs here? You're in IO anyways with the networking operations. IORefs aren't always the cleanest solution, but they seem to do the job well in this case.
Regardless, for the sake of answering the question, let's remove the IORefs. These references serve as a way of keeping state, so we'll have to come up with an alternate way to keep the stateful information.
The pseudocode for what we want to do is this:
open the connection
10000 times:
send a message
receive the response
(keep track of how many responses are the message "PING")
print how many responses were the message "PING"
The chunk that is indented under 1000 times can be abstracted into its own function. If we are to avoid IORefs, then this function will have to take in a previous state and produce a next state.
main = withSocketsDo $ do
s <- socket AF_INET Datagram defaultProtocol
hostAddr <- inet_addr host
let sendMsg = sendAllTo s (B.pack "ping") (SockAddrInet port hostAddr)
recvMsg = fst `fmap` recvFrom s 1024
(c,f) <- ???
print c
sClose s
So the question is this: what do we put at the ??? place? We need to define some way to "perform" an IO action, take its result, and modify state with that result somehow. We also need to know how many times to do it.
performRepeatedlyWithState :: a -- some state
-> IO b -- some IO action that yields a value
-> (a -> b -> a) -- some way to produce a new state
-> Int -- how many times to do it
-> IO a -- the resultant state, as an IO action
performRepeatedlyWithState s _ _ 0 = return s
performRepeatedlyWithState someState someAction produceNewState timesToDoIt = do
actionresult <- someAction
let newState = produceNewState someState actionResult
doWithState newState someAction produceNewState (pred timesToDoIt)
All I did here was write down the type signature that matched what I said above, and produced the relatively obvious implementation. I gave everything a very verbose name to hopefully make it apparent exactly what this function means. Equipped with this simple function, we just need to use it.
let origState = (0,0)
action = ???
mkNewState = ???
times = 10000
(c,f) <- performRepeatedlyWithState origState action mkNewState times
I've filled in the easy parameters here. The original state is (c,f) = (0,0), and we want to perform this 10000 times. (Or is it 10001?) But what should action and mkNewState look like? The action should have type IO b; it's some IO action that produces something.
action = sendMsg >> recvMsg
I bound sendMsg and recvMsg to expressions from your code earlier. The action we want to perform is to send a message, and then receive a message. The value this action produces is the message received.
Now, what should mkNewState look like? It should have the type a -> b -> a, where a is the type of the State, and b is the type of the action result.
mkNewState (c,f) val = if (B.unpack val) == "PING"
then (succ c, f)
else (c, succ f)
This isn't the cleanest solution, but do you get the general idea? You can replace IORefs by writing a function that recursively calls itself, passing extra parameters along in order to keep track of state. The exact same idea is embodied in the foldM solution suggested on the similar question.
Bang patterns, as Nathan Howell suggests, would be wise, to avoid building up a large thunk of succ (succ (succ ...))) in your state:
mkNewState (!c, !f) val = ...
Building on the earlier comment regarding a stack overflow.
The accumulators 'f' and 'c' in either the IORef or foldM case need to be evaluated to prevent a long chain of thunks from being allocated while you're iterating. One way of forcing evaluation of the thunks is to use a bang pattern. This tells the compiler to evaluate the value, removing the thunk, even though it's value is not demanded in the function.
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent
import Control.Monad
import Data.ByteString.Char8
import Data.Foldable (foldlM)
import Data.IORef
import Network.Socket hiding (recvFrom)
import Network.Socket.ByteString (recvFrom, sendAllTo)
main = withSocketsDo $ do
let host = "127.0.0.1"
port= 9898
s <- socket AF_INET Datagram defaultProtocol
hostAddr <- inet_addr host
-- explicitly mark both accumulators as strict using bang patterns
let step (!c, !f) i = do
sendAllTo s "PING" (SockAddrInet port hostAddr)
(r, _) <- recvFrom s 1024
return $ case r of
-- because c and f are never used, the addition operator below
-- builds a thunk chain. these can lead to a stack overflow
-- when the chain is being evalulated by the 'print c' call below.
"PING" -> (c+1, f)
_ -> (c, f+1)
(c, f) <- foldlM step (0, 0) [0..10000]
print c
sClose s
return ()

Concurrent Haskell Actions with Timeout

how could one implement a function in concurrent haskell that either returns 'a' successfully or due to timeout 'b'?
timed :: Int → IO a → b → IO (Either a b)
timed max act def = do
Best Regards,
Cetin SertNote: the signature of timed can be completely or slightly different.
Implementing your desired timed on top of System.Timeout.timeout is easy:
import System.Timeout (timeout)
timed :: Int -> IO a -> b -> IO (Either b a)
timed us act def = liftM (maybe (Left def) Right) (timeout us act)
By the way, the common implementation of timeout is closer to this: ($! = seq to try to force evaluation of the returned value in the thread rather than only returning a thunk):
import Control.Concurrent (forkIO, threadDelay, killThread)
import Control.Concurrent.MVar (newEmptyMVar, putMVar, takeMVar)
import System.IO (hPrint, stderr)
timeout :: Int -> IO a -> IO (Maybe a)
timeout us act = do
mvar <- newEmptyMVar
tid1 <- forkIO $ (putMVar mvar . Just $!) =<< act
tid2 <- forkIO $ threadDelay us >> putMVar mvar Nothing
res <- takeMVar mvar
killThread (maybe tid1 (const tid2) res) `catch` hPrint stderr
return res
The implementation of System.Timeout.timeout in the libraries is a little more complex, handling more exceptional cases.
import Control.Concurrent (forkIO, threadDelay, myThreadId, killThread)
import Control.Exception (Exception, handleJust, throwTo, bracket)
import Data.Typeable
import Data.Unique (Unique, newUnique)
data Timeout = Timeout Unique deriving Eq
timeoutTc :: TyCon
timeoutTc = mkTyCon "Timeout"
instance Typeable Timeout where { typeOf _ = mkTyConApp timeoutTc [] }
instance Show Timeout where
show _ = "<<timeout>>"
instance Exception Timeout
timeout n f
| n < 0 = fmap Just f
| n == 0 = return Nothing
| otherwise = do
pid <- myThreadId
ex <- fmap Timeout newUnique
handleJust (\e -> if e == ex then Just () else Nothing)
(\_ -> return Nothing)
(bracket (forkIO (threadDelay n >> throwTo pid ex))
(killThread)
(\_ -> fmap Just f))
Here's the first answer I could come up with. I needed this for a port scanner. o_O Forgot the admin password of my router and wanted to check which ports I had opened on our home server before that I could now repurpose and reuse ^ _ ^" ... This implementation should do the job for the time being.
module Control.Concurrent.Timed (timed) where
import Prelude hiding (take)
import System.IO
import Control.Monad
import System.Process
import System.Timeout
import Control.Concurrent
import System.Environment
timed :: Int → IO a → b → IO (Either b a)
timed max act def = do
w ← new
r ← new
t ← forkIO $ do
a ← act
r ≔ Right a
e ← em w
case e of
False → kill =<< take w
True → return ()
s ← forkIO $ do
(w ≔) =<< mine
wait max
e ← em r
case e of
True → do
kill t
r ≔ Left def
False → return ()
take r
timed_ :: Int → IO a → a → IO a
timed_ max act def = do
r ← timed max act def
return $ case r of
Right a → a
Left a → a
(≔) = putMVar
new = newEmptyMVar
wait = threadDelay
em = isEmptyMVar
kill = killThread
mine = myThreadId
take = takeMVar
or just use System.Timeout.timeout -__-"

Resources