unixODBC: invalid handle when handle is allocated in a Haskell thread - multithreading

I am building an ODBC application in Haskell, using foreign calls. When calling odbc functions in forkIO or forkOS thread (ie unbounded or bounded thread) on handles allocated in the same thread, the function returns invalid handle error.
The same call works perfect when made in the main thread.

I found that the problem is caused by a bug in unixODBC, I documented it here: https://sourceforge.net/p/unixodbc/bugs/41/
In a few words, unixODBC uses as handles pointers to memory allocated to keep data for those handles. As handles are 32 bit integers, on 64 bit architectures they are truncated to the last significant half (for x86 processors).
Thus, if the pointer value is less than 2G, everything is ok, but if it is more than 2G (and not 4G due to the fact that the sign bit extends) then unixODBC will not be able to locate the data structure for the handle and will report invalid handle.
When SQLAllocHandle is called from Haskell in the main thread, the allocated pointer has a value less than 2G, so everything works. But when it is called from another thread (liftIO or liftOS), the allocated pointer has a value larger than 2G, so multithreaded ODBC applications using unixODBC are not possible in Haskell unless all the handles allocation is done in the main thread.
The workaround I found is based on the fact that, in the main thread I have a function that waits for the work in the threads to complete. I modified that function to also listen for a channel for handle allocation requests and then allocate the handle and return the response.
This is the sample code I used for workarround:
-- All actions are ReaderT monad actions that share a global environment
-- for threads execution
-- | wait for worker threads to complete the work
waitForWorkToEnd :: (MonadIO m) => ReaderT MyEnvironment m ()
waitForWorkToEnd = do
threadsCountVar <- asks threads_WorkerThreadsVar
allocHandleChan <- asks threads_AllocHandleChan
let waitIO = join $ atomically $ orElse (readTVar threadsCountVar >>= check . (<= 0) >> (return $ return ())) (allocHandleT allocHandleChan >>= \ io -> return (io >> waitIO))
liftIO $ waitIO
liftIO $ log $ fromString $ "all worker threads have finished"
-- | creates an IO action inside a STM monad to allocate a new handler in the current thread
allocHandleT :: (MonadIO m, MonadFail m) => TQueue (SQLSMALLINT, SQLINTEGER, TMVar SQLINTEGER) -> STM (m ())
allocHandleT chan = do
(hType, hParent, retVar) <- readTQueue chan
return $ allocHandle hType hParent >>= liftIO . atomically . (putTMVar retVar)
-- | make a handle alloc request to the main thread and wait for result
allocHandleReq :: (MonadIO m, MonadFail m) => SQLSMALLINT -> SQLINTEGER -> ReaderT MyEnvironment m SQLINTEGER
allocHandleReq htype hparent = do
allocHandleChan <- asks threads_AllocHandleChan
resultVar <- liftIO $ atomically $ newEmptyTMVar
liftIO $ atomically $ writeTQueue allocHandleChan (htype, hparent, resultVar)
liftIO $ atomically $ takeTMVar resultVar
-- allocHandle simply calls SQLAllocHandle and takes care of the diagnostics
-- information; it is part of the sqlcli package you can find it here:
-- https://hub.darcs.net/mihaigiurgeanu/sqlcli

Related

In Haskell, how can I abort a calculation when a web client disconnects

I have a Haskell-based web service that performs a calculation that for some input can take a really long time to finish. ("really long" here means over a minute)
Because performing that calculation takes all the CPU available on the server, I place incoming requests in a queue (well, actually a stack for reasons that have to do with the typical client, but that's besides the point) when they arrive and service them when the currently running calculation finishes.
My problem is that the clients don't always wait long enough, and sometimes time out on their end, disconnect, and try a different server (well, they try again and hit the elb, and usually get a different instance). Also, occasionally the calculation the web client was asking for will become obsolete because of external factors and the web client will be killed.
In those cases I'd really like to be able to detect that the web client has gone away before I pull the next request off the stack and start the (expensive) calculation. Unfortunately, my experience with snap leads me to believe that there's no way in that framework to ask "is the client's TCP connection still connected?" and I haven't found any documentation for other web frameworks that cover the "client disconnected" case.
So is there a Haskell web framework that makes it easy to detect whether a web client has disconnected? Or failing that, is there one that at least makes it possible?
(I understand that it may not be possible to be absolutely certain in all cases whether a TCP client is still there without sending data to the other end; however, when the client actually sends RST packets to the server and the server's framework doesn't let the application code determine that the connection is gone, that's a problem)
Incidentally, though one might suspect that warp's onClose handler would let you do this, this fires only when a response is ready and written to the client so is useless as a way of aborting a calculation in progress. There also seems to be no way to get access to the accepted socket so as to set SO_KEEPALIVE or similar. (There are ways to access the initial listening socket, but not the accepted one)
So I found an answer that works for me and it might work for someone else.
It turns out that you can in fact mess around enough with the internals of Warp to do this, but then what you're left with is a basic version of Warp and if you need things like logging, etc., will need to add other packages on to that.
Also, note that so-called "half-closed" connections (when the client closes their sending end, but is still waiting for data) will be detected as closed, interrupting your calculation. I don't know of any HTTP clients that deal in half-closed connections, but just something to be aware of.
Anyway, what I did was first copy the functions runSettings and runSettingsSocket exposed by Network.Wai.Handler.Warp and Network.Wai.Handler.Warp.Internal and made versions that called a function I supplied instead of WarpI.socketConnection, so that I have the signature:
runSettings' :: Warp.Settings -> (Socket -> IO (IO WarpI.Connection))
-> Wai.Application -> IO ()
This required copying out a few helper methods, like setSocketCloseOnExec and windowsThreadBlockHack. The double-IO signature there might look weird, but it's what you want - the outer IO is run in the main thread (that calls accept) and the inner IO is run in the per-connection thread that is forked after accept returns. The original Warp function runSettings is equivalent to:
\set -> runSettings' set (WarpI.socketConnection >=> return . return)
Then I did:
data ClientDisappeared = ClientDisappeared deriving (Show, Eq, Enum, Ord)
instance Exception ClientDisappeared
runSettingsSignalDisconnect :: Warp.Settings -> Wai.Application -> IO ()
runSettingsSignalDisconnect set =
runSettings' set (WarpI.socketConnection >=> return . wrapConn)
where
-- Fork a 'monitor' thread that does nothing but attempt to
-- perform a read from conn in a loop 1/sec, and wrap the receive
-- methods on conn so that they first consume from the stuff read
-- by the monitoring thread. If the monitoring thread sees
-- end-of-file (signaled by an empty string read), raise
-- ClientDisappered on the per-connection thread.
wrapConn conn = do
tid <- myThreadId
nxtBstr <- newEmptyMVar :: IO (MVar ByteString)
semaphore <- newMVar ()
readerCount <- newIORef (0 :: Int)
monitorThread <- forkIO (monitor tid nxtBstr semaphore readerCount)
return $ conn {
WarpI.connClose = throwTo monitorThread ClientDisappeared
>> WarpI.connClose conn
, WarpI.connRecv = newRecv nxtBstr semaphore readerCount
, WarpI.connRecvBuf = newRecvBuf nxtBstr semaphore readerCount
}
where
newRecv :: MVar ByteString -> MVar () -> IORef Int
-> IO ByteString
newRecv nxtBstr sem readerCount =
bracket_
(atomicModifyIORef' readerCount $ \x -> (succ x, ()))
(atomicModifyIORef' readerCount $ \x -> (pred x, ()))
(withMVar sem $ \_ -> do w <- tryTakeMVar nxtBstr
case w of
Just w' -> return w'
Nothing -> WarpI.connRecv conn
)
newRecvBuf :: MVar ByteString -> MVar () -> IORef Int
-> WarpI.Buffer -> WarpI.BufSize -> IO Bool
newRecvBuf nxtBstr sem readerCount buf bufSize =
bracket_
(atomicModifyIORef' readerCount $ \x -> (succ x, ()))
(atomicModifyIORef' readerCount $ \x -> (pred x, ()))
(withMVar sem $ \_ -> do
(fulfilled, buf', bufSize') <-
if bufSize == 0 then return (False, buf, bufSize)
else
do w <- tryTakeMVar nxtBstr
case w of
Nothing -> return (False, buf, bufSize)
Just w' -> do
let wlen = B.length w'
if wlen > bufSize
then do BU.unsafeUseAsCString w' $ \cw' ->
copyBytes buf (castPtr cw') bufSize
putMVar nxtBstr (B.drop bufSize w')
return (True, buf, 0)
else do BU.unsafeUseAsCString w' $ \cw' ->
copyBytes buf (castPtr cw') wlen
return (wlen == bufSize, plusPtr buf wlen,
bufSize - wlen)
if fulfilled then return True
else WarpI.connRecvBuf conn buf' bufSize'
)
dropClientDisappeared :: ClientDisappeared -> IO ()
dropClientDisappeared _ = return ()
monitor tid nxtBstr sem st =
catch (monitor' tid nxtBstr sem st) dropClientDisappeared
monitor' tid nxtBstr sem st = do
(hitEOF, readerCount) <- withMVar sem $ \_ -> do
w <- tryTakeMVar nxtBstr
case w of
-- No one picked up our bytestring from last time
Just w' -> putMVar nxtBstr w' >> return (False, 0)
Nothing -> do
w <- WarpI.connRecv conn
putMVar nxtBstr w
readerCount <- readIORef st
return (B.null w, readerCount)
if hitEOF && (readerCount == 0)
-- Don't signal if main thread is also trying to read -
-- in that case, main thread will see EOF directly
then throwTo tid ClientDisappeared
else do threadDelay oneSecondInMicros
monitor' tid nxtBstr sem st
oneSecondInMicros = 1000000
Assuming that 'web service' means HTTP(S)-based clients, one option is to use a RESTful approach. Instead of assuming that clients are going to stay connected, the service could accept the request and return 202 Accepted. As the HTTP status code specification outlines:
The request has been accepted for processing, but the processing has not been completed [...]
The 202 response is intentionally non-committal. Its purpose is to allow a server to accept a request for some other process (perhaps a batch-oriented process that is only run once per day) without requiring that the user agent's connection to the server persist until the process is completed. The entity returned with this response SHOULD include an indication of the request's current status and either a pointer to a status monitor or some estimate of when the user can expect the request to be fulfilled.
The server immediately responds with a 202 Accepted response, and also includes a URL that the client can use to poll for status. One option is to put this URL in the response's Location header, but you can also put the URL in a link in the response's body.
The client can poll the status URL for status. Once the calculation finishes, the status resource can provide a link to the finished result.
You can add cache headers to the status resource and final result if you're concerned that the clients will be polling too hard.
REST in Practice outlines the general concepts, while the RESTful Web Services Cookbook has lots of good details.
I'm not saying that you can't do something with either HTTP or TCP/IP (I don't know), but if you can't, then the above is a tried-and-true solution to similar problems.
Obviously, this is completely independent on programming language, but it's been my experience that REST and algebraic data types go well together.

Why does my parallel traversal Haskell program leak memory?

Consider the following Haskell program (I'm doing this mostly for learning purposes):
import qualified Control.Concurrent.MSem as Sem
import System.Environment (getArgs)
import Control.Concurrent (forkIO)
import Control.Monad
-- Traverse with maximum n threads
parallelTraverse :: Foldable a => Int -> (b -> IO()) -> a b -> IO ()
parallelTraverse n action values = do
sem <- Sem.new n
forM_ values $ \value -> Sem.with sem (forkIO $ action value)
main :: IO ()
main = do
args <- getArgs
let nThreads = read . head $ args :: Int
parallelTraverse nThreads print [(1::Int)..]
when I run it, memory quickly climbs to several GB. I tried various combinations to make sure I discard the results of intermediate computations (the print actions). Why is it still leaking space?
First of all, you have an evident mistake in the following piece:
Sem.with sem (forkIO $ action value)
You're addressing the semaphore from the master thread around the "fork" operation instead of the action there. Following is the proper way to implement it:
forkIO (Sem.with sem (action value))
I.e., to address the semaphore from the context of the forked thread.
Secondly, in the following code you're calling the parallelTraverse operation on an infinite list:
parallelTraverse nThreads print [(1::Int)..]
Which results in the infinite forking of threads. And since the forkIO operation is roughly instantaneous for the calling thread, it's pretty much no surprise that you're running out of resources quite soon.
To use the semaphore to limit the number of worker threads the with pattern simply won't do in your case. Instead you should use the explicit combination of wait and signal and not forget to treat the exceptions properly (in case you expect them). E.g.,:
parallelTraverse :: Foldable a => Int -> (b -> IO()) -> a b -> IO ()
parallelTraverse n action values = do
sem <- Sem.new n
forM_ values $ \value -> do
Sem.wait sem
forkIO $ finally (action value) (Sem.signal sem)

Using TChan with Timeout

I have a TChan as input for a thread which should behave like this:
If sombody writes to the TChan within a specific time, the content should be retrieved. If there is nothing written within the specified time, it should unblock and continue with Nothing.
My attempt on this was to use the timeout function from System.Timeout like this:
timeout 1000000 $ atomically $ readTChan pktChannel
This seemed to work but now I discovered, that I am sometimes loosing packets (they are written to the channel, but not read on the other side. In the log I get this:
2014.063.11.53.43.588365 Pushing Recorded Packet: 2 1439
2014.063.11.53.43.592319 Run into timeout
2014.063.11.53.44.593396 Run into timeout
2014.063.11.53.44.593553 Pushing Recorded Packet: 3 1439
2014.063.11.53.44.597177 Sending Recorded Packet: 3 1439
Where "Pushing Recorded Packet" is the writing from the one thread and "Sending Recorded Packet" is the reading from the TChan in the sender thread. The line with Sending Recorded Packet 2 1439 is missing, which would indicate a successful read from the TChan.
It seems that if the timeout is received at the wrong point in time, the channel looses the packet. I suspect that the threadKill function used inside timeout and STM don't play well together.
Is this correct? Does somebody have another solution that does not loose the packet?
Use registerDelay, an STM function, to signal a TVar when the timeout is reached. You can then use the orElse function or the Alternative operator <|> to select between the next TChan value or the timeout.
import Control.Applicative
import Control.Monad
import Control.Concurrent
import Control.Concurrent.STM
import System.Random
-- write random values after a random delay
packetWriter :: Int -> TChan Int -> IO ()
packetWriter maxDelay chan = do
let xs = randomRs (10000 :: Int, maxDelay + 50000) (mkStdGen 24036583)
forM_ xs $ \ x -> do
threadDelay x
atomically $ writeTChan chan x
-- block (retry) until the delay TVar is set to True
fini :: TVar Bool -> STM ()
fini = check <=< readTVar
-- Read the next value from a TChan or timeout
readTChanTimeout :: Int -> TChan a -> IO (Maybe a)
readTChanTimeout timeoutAfter pktChannel = do
delay <- registerDelay timeoutAfter
atomically $
Just <$> readTChan pktChannel
<|> Nothing <$ fini delay
-- | Print packets until a timeout is reached
readLoop :: Show a => Int -> TChan a -> IO ()
readLoop timeoutAfter pktChannel = do
res <- readTChanTimeout timeoutAfter pktChannel
case res of
Nothing -> putStrLn "timeout"
Just val -> do
putStrLn $ "packet: " ++ show val
readLoop timeoutAfter pktChannel
main :: IO ()
main = do
let timeoutAfter = 1000000
-- spin up a packet writer simulation
pktChannel <- newTChanIO
tid <- forkIO $ packetWriter timeoutAfter pktChannel
readLoop timeoutAfter pktChannel
killThread tid
The thumb rule of concurrency is: if adding a sleep in some point inside an IO action matters, your program is not safe.
To understand why the code timeout 1000000 $ atomically $ readTChan pktChannel does not work, consider the following alternative implementation of atomically:
atomically' :: STM a -> IO a
atomically' action = do
result <- atomically action
threadDelay someTimeAmount
return result
The above is equal to atomically, but for an extra innocent delay. Now it is easy to see that if timeout kills the thread during the threadDelay, the atomic action has completed (consuming a message from the channel), yet timeout will return Nothing.
A simple fix to timeout n $ atomically ... could be the following
smartTimeout :: Int -> STM a -> IO (Maybe a)
smartTimeout n action = do
v <- atomically $ newEmptyTMvar
_ <- timeout n $ atomically $ do
result <- action
putTMvar v result
atomically $ tryTakeTMvar v
The above uses an extra transactional variable v to do the trick. The result value of the action is stored into v inside the same atomic block in which the action is run. The return value of timeout is not trusted, since it does not tell us if action was run or not. After that, we check the TMVar v, which will be full if and only if action was run.
Instead of TChan a, use TChan (Maybe a) . Your normal producer (of x) now writes Just x. Fork an extra "ticking" process that writes Nothing to the channel (every x seconds). Then have a reader for the channel, and abort if you get two successive Nothing. This way, you avoid exceptions, which may cause data to get lost in your case (but I am not sure).

Is it possible to run several instances of hint in parallel?

Is there any way to start two hint interpreters and at runtime & subsequently assign smaller computations to either one or the other? When I invoke hint for a small expression (e.g. typed into a website) then, - without reliable testing -, it seems to me as if the time to start/load hint is approximately one second. If the instance is already started that second would be shaved.
The hint seems to have no function where I can start it and keep it nicely pending for later use.
(Auto)Plugins would be a further option of course but I think that is more suitable for modules and less elegant for smaller computations.
The GHC api, which hint is implemented in terms of (the various plugin packages are, too), does not support concurrent use.
You can leave hint running, though. It's an instance of MonadIO.
interpreterLoop :: (MonadIO m, Typeable) a => Chan ((MVar a, String)) -> InterpreterT m ()
interpreterLoop ch = do
(mvar, command) <- liftIO $ readChan ch
a <- interpret command $ argTypeWitness mvar
liftIO $ putMVar mvar a
interpreterLoop ch
where
argTypeWitness :: MVar a -> a
argTypeWitness = undefined -- this value is only used for type checking, never evaluated
runInLoop :: Typeable a => Chan ((MVar a, String)) -> String -> IO a
runInLoop ch command = do
mvar <- newEmptyMVar
writeChan ch (mvar, command)
takeMVar mvar
(I didn't test this, so I may have missed a detail or two, but the basic idea will work.)

Strict evaluation techniques for concurrent channels in Haskell

I'm toying with Haskell threads, and I'm running into the problem of communicating lazily-evaluated values across a channel. For example, with N worker threads and 1 output thread, the workers communicate unevaluated work and the output thread ends up doing the work for them.
I've read about this problem in various documentation and seen various solutions, but I only found one solution that works and the rest do not. Below is some code in which worker threads start some computation that can take a long time. I start the threads in descending order, so that the first thread should take the longest, and the later threads should finish earlier.
import Control.Concurrent (forkIO)
import Control.Concurrent.Chan -- .Strict
import Control.Concurrent.MVar
import Control.Exception (finally, evaluate)
import Control.Monad (forM_)
import Control.Parallel.Strategies (using, rdeepseq)
main = (>>=) newChan $ (>>=) (newMVar []) . run
run :: Chan (Maybe String) -> MVar [MVar ()] -> IO ()
run logCh statVars = do
logV <- spawn1 readWriteLoop
say "START"
forM_ [18,17..10] $ spawn . busyWork
await
writeChan logCh Nothing -- poison the logger
takeMVar logV
putStrLn "DONE"
where
say mesg = force mesg >>= writeChan logCh . Just
force s = mapM evaluate s -- works
-- force s = return $ s `using` rdeepseq -- no difference
-- force s = return s -- no-op; try this with strict channel
busyWork = say . show . sum . filter odd . enumFromTo 2 . embiggen
embiggen i = i*i*i*i*i
readWriteLoop = readChan logCh >>= writeReadLoop
writeReadLoop Nothing = return ()
writeReadLoop (Just mesg) = putStrLn mesg >> readWriteLoop
spawn1 action = do
v <- newEmptyMVar
forkIO $ action `finally` putMVar v ()
return v
spawn action = do
v <- spawn1 action
modifyMVar statVars $ \vs -> return (v:vs, ())
await = do
vs <- modifyMVar statVars $ \vs -> return ([], vs)
mapM_ takeMVar vs
Using most techniques, the results are reported in the order spawned; that is, the longest-running computation first. I interpret this to mean that the output thread is doing all the work:
-- results in order spawned (longest-running first = broken)
START
892616806655
503999185040
274877906943
144162977343
72313663743
34464808608
15479341055
6484436675
2499999999
DONE
I thought the answer to this would be strict channels, but they didn't work. I understand that WHNF for strings is insufficient because that would just force the outermost constructor (nil or cons for the first character of the string). The rdeepseq is supposed to fully evaluate, but it makes no difference. The only thing I've found that works is to map Control.Exception.evaluate :: a -> IO a over all the characters in the string. (See the force function comments in the code for several different alternatives.) Here's the result with Control.Exception.evaluate:
-- results in order finished (shortest-running first = correct)
START
2499999999
6484436675
15479341055
34464808608
72313663743
144162977343
274877906943
503999185040
892616806655
DONE
So why don't strict channels or rdeepseq produce this result? Are there other techniques? Am I misinterpreting why the first result is broken?
There are two issues going on here.
The reason the first attempt (using an explicit rnf) doesn't work is that, by using return, you've created a thunk that fully evaluates itself when it is evaluated, but the thunk itself has not being evaluated. Notice that the type of evaluate is a -> IO a: the fact that it returns a value in IO means that evaluate can impose ordering:
return (error "foo") >> return 1 == return 1
evaluate (error "foo") >> return 1 == error "foo"
The upshot is that this code:
force s = evaluate $ s `using` rdeepseq
will work (as in, have the same behavior as mapM_ evaluate s).
The case of using strict channels is a little trickier, but I believe this is due to a bug in strict-concurrency. The expensive computation is actually being run on the worker threads, but it's not doing you much good (you can check for this explicitly by hiding some asynchronous exceptions in your strings and seeing which thread the exception surfaces on).
What's the bug? Let's take a look at the code for strict writeChan:
writeChan :: NFData a => Chan a -> a -> IO ()
writeChan (Chan _read write) val = do
new_hole <- newEmptyMVar
modifyMVar_ write $ \old_hole -> do
putMVar old_hole $! ChItem val new_hole
return new_hole
We see that modifyMVar_ is called on write before we evaluate the thunk. The sequence of operations then is:
writeChan is entered
We takeMVar write (blocking anyone else who wants to write to the channel)
We evaluate the expensive thunk
We put the expensive thunk onto the channel
We putMVar write, unblocking all of the other threads
You don't see this behavior with the evaluate variants, because they perform the evaluation before the lock is acquired.
I’ll send Don mail about this and see if he agrees that this behavior is kind of suboptimal.
Don agrees that this behavior is suboptimal. We're working on a patch.

Resources