Pipeline-like operation using TChan - multithreading

I want to implement a pipeline between two threads. I have thread A that take the data, process it, and send it to thread B. I have a MVar that check if the data is completely processed
However, I'm having an exception *** Exception: thread blocked indefinitely in an STM transaction
Why are my threads blocked? I though than when the first thread write on the channel, then when there is a data on the channel, the second one can read it
fstPipe :: (a -> b) -> TChan b -> MVar () -> [a] -> IO ()
fstPipe f chIn m xs = do
( mapM_(\x-> atomically $ writeTChan chIn $ f x) xs) >> putMVar m ()
pipelineDone channel mIn = do
isDone <- fmap isJust $ tryTakeMVar mIn
isEmpty <- atomically $ isEmptyTChan channel
return $ isDone && isEmpty
lastPipe f chIn mIn = iter
where iter = do
atomically $ fmap f $ readTChan chIn
isDone <- pipelineDone chIn mIn
unless isDone $ iter
pipeline = do
chIn <- atomically newTChan
m <- newEmptyMVar
first <- async $ fstPipe reverse chIn m $ replicate 10 [1..500]
last <- async $ lastPipe print chIn m
wait first
wait last

It seems odd to me to be using STM and semaphores in the same code block... Why not do the entire thing in STM?
In particular, why not a TChan (Maybe x), with Nothing indicating the end of the sequence?
Also, notice that your fstPipe likely just generates a bunch of unevaluated thunks and immediately chucks them into the TChan, without actually computing anything. You probably want a seq or similar in there to force some actual work to happen on that thread.

I think there's a race condition:
stop fstPipe just before the putMVar
advance lastPipe to read everything, and then call pipelineDone
pipelineDone returns False since putMVar was not yet done
lastPipe will try to read from the channel
putMVar executes, but it's too late
Now lastPipe is stuck reading on an empty channel.

Your problem is in the logic of pipelineDone. Currently, you have:
pipelineDone channel mIn = do
isDone <- fmap isJust $ tryTakeMVar mIn
isEmpty <- atomically $ isEmptyTChan channel
return $ isDone && isEmpty
tryTakeMVar is going to take the contents of the MVar assuming there is something in there. Assuming your producer finishes first, it is going to write () into the MVar. Your consumer is then going to try and take the contents of it. If it succeeds, then the MVar goes empty. Any subsequent tryTakeMVar will always return Nothing, thus isDone && isEmpty will always return false and you will keep trying to read from the TChan. Once the TChan goes empty, GHC can tell you that it has encountered a deadlock.
You should instead change your pipelineDone implementation to:
pipelineDone channel mIn = do
stillRunning <- isEmptyMVar mIn
isEmpty <- atomically $ isEmptyTChan channel
return $ (not stillRunning) && isEmpty
This will instead simply poll the MVar, instead of actually emptying it.

Related

Haskell IORef usage in concurrent setting [duplicate]

This question already has an answer here:
Haskell computationally intensive thread blocks all other threads
(1 answer)
Closed 11 months ago.
I am having hard time to understand Haskell's memory model. I got stuck implementing a simple lock-free hash map and while trying to reproduce the error on a smaller example I realized I have no idea what is going on.
I am only targeting x86_64
data TicketLock = TicketLock {cur :: !(IORef Int), next :: !(IORef Int)}
testTicket = do
l <- TicketLock <$> newIORef 0 <*> newIORef 0
mapConcurrently_ (\x -> replicateM_ 500000 (lockunlock l)) [0 .. 3]
print "finished"
lockunlock :: TicketLock -> IO ()
lockunlock (TicketLock {..}) = do
myticket <- atomicModifyIORef' next (\a -> (a + 1, a))
let wait = do
x <- readIORef cur -- atomicModifyIORef' cur (\a -> (a, a)) works correctly
unless (x == myticket) wait
wait
print "lock acquired" -- to observe it gets stuck
atomicModifyIORef' cur (\a -> (a + 1, ())) -- unlock
The test gets stuck after a couple of thousand iterations.
What is going wrong with this code? I kind of expect it to fail because I guess iorefs can be reordered by compiler but i can't figure out what exactly is going on. Is it observable in generated STG code or cmm code?
My real question is why adding barriers does not fix the problem? I tried adding barriers (from atomics-primops package) after every line of the lockunlock function and problem still continues.
I also tried using AtomicCounter from atomic-primops package, the problem is the same, even though incrCounter is implemented by fetch-add instruction the test somehow gets stuck after couple of thousand iterations, even with barriers all over the code.
data BetterTicketLock = BetterTicketLock {curb :: !(AtomicCounter), nextb :: !(AtomicCounter)}
testTicketb = do
l <- BetterTicketLock <$> newCounter 1 <*> newCounter 0
mapConcurrently_ (\x -> replicateM_ 500000 (lockunlockb l)) [0 .. 3]
print "finished"
lockunlockb :: BetterTicketLock -> IO ()
lockunlockb (BetterTicketLock {..}) = do
myticket <- incrCounter 1 nextb -- it returns new value not old value
let wait = do
x <- incrCounter 0 curb -- I am not even using readCounter which is similar to readIOref
unless (x == myticket) wait
wait
print "lock acquired"
void $ incrCounter 1 curb
Ideally i want a way to reason about haskell programs like in rust using acquire-release semantics how is the following can be converted to Haskell?
struct TicketLock { current: AtomicUsize, next: AtomicUsize}
impl TicketLock {
fn new() -> Self{
TicketLock {current: AtomicUsize::new(0),next: AtomicUsize::new(0)}
}
fn lockunlock(&self) {
let ticket = self.next.fetch_add(1, Ordering::Relaxed);
while self.current.load(Ordering::Acquire) != ticket { }
// critical section
let _ = self.current.fetch_add(1, Ordering::Release);
}
}
Edit: For some reason this works as expected without any performance penalty. Is it because of barriers being ignored in tight loops?
lockunlock (TicketLock {..}) s = do
myticket <- atomicModifyIORef' next (\a -> (a + 1, a))
let wait = do
x <- readIORef cur
unless (x == myticket) $ threadDelay 0 >> wait
wait
modifyIORef' s (+ 1)
atomicModifyIORef' cur (\a -> (a + 1, ())) -- unlock
You're running into a pathological case with the implementation of pre-emptive multitasking in GHC. See the documentation of the Control.Concurrent module:
GHC implements pre-emptive multitasking: the execution of threads are interleaved in a random fashion. More specifically, a thread may be pre-empted whenever it allocates some memory, which unfortunately means that tight loops which do no allocation tend to lock out other threads (this only seems to happen with pathological benchmark-style code, however).
Your wait loop seems to not allocate anything, so it cannot be pre-empted. You can solve this issue by inserting a manual yield:
lockunlock :: TicketLock -> IO ()
lockunlock (TicketLock{..}) = do
myticket <- atomicModifyIORef' next (\a -> (a + 1, a))
let wait = do
x <- readIORef cur
unless (x == myticket) (yield *> wait)
wait
print "lock acquired"
atomicModifyIORef' cur (\a -> (a + 1, ()))
Alternatively, you can compile with -fno-omit-yields.
I haven't been able to figure anything more out by trying to debug using the eventlog, because even that seems to be blocked. As K.A. Buhr remarks below he has written a better answer:
Less obviously, the same thing can happen even if you compile with -threaded and use +RTS -N options. The problem is that an uncooperative thread can starve out the runtime scheduler itself. If at some point the uncooperative thread is the only thread currently scheduled to run, it will become uninterruptible, and the scheduler will never be re-run to consider scheduling additional threads, even if they could run on other O/S threads.

How to use Atomics Counter for counting operation order of randomly occuring operations on different threads?

What I'd like to do is something like this where every time one of these print actions occurs it updates the counter to ensure that the next subsequent occurrence of a print action will always have the correct order in which it occurred among any of the possible print actions that may occur across multiple threads shown by the counter. The problem in my example is that if the IORef is read at the same time between threads then two or more print actions will have the same counter value. From what I've read it seems that using the Data.Atomics.Counter library would solve this problem but i am having a really hard time understanding how to use it do so. Can anyone show me an example or try to explain it to me please?
main = do
myref <- newIORef 1 :: IO (IORef Int)
void(forkIO (forever $ do ref <- readIORef myref
print ("hi " ++ show (ref))
modifyIORef myref (+1) ))
void(forkIO (forever $ do ref <- readIORef myref
print ("hey " ++ show (ref))
modifyIORef myref (+1) ))
forever $ do ref <- readIORef myref
print ("hello " ++ show (ref))
modifyIORef myref (+1)
I would use an MVar for this.
inc mvar = forever $ do
v <- takeMVar mvar
print v
putMVar mvar (v+1)
main = do
mvar <- newMVar 1
forkIO (inc mvar)
forkIO (inc mvar)
inc mvar
It is important that the print occur between takeMVar and putMVar, while the MVar is empty; otherwise another thread may empty the MVar and execute its print.
You could use atomicModifyIORef'. It would look something like:
increment ref = forever do
val <- atomicModifyIORef' ref \old -> (old + 1, old)
print val
main = do
ref <- newIORef 0
forkIO $ increment ref
forkIO $ increment ref
increment ref

How to force main thread to wait for all its child threads finish in Haskell

In the following Haskell code, how to force main thread to wait till all its child threads finish.
I could not able to use forkFinally as given in the section "Terminating the Program" here in this link: (http://hackage.haskell.org/package/base-4.7.0.2/docs/Control-Concurrent.html).
I get desired result when using TMVar. But I want to do this with TVar.
Please help.
module Main
where
import Control.Monad
import Control.Concurrent
import Control.Concurrent.STM
type TInt = TVar Int
transTest :: TInt -> Int -> IO ()
transTest n t = do
atomically $ do
t1 <- readTVar n
doSomeJob t
t2 <- readTVar n
writeTVar n t
doSomeJob :: Int -> STM ()
doSomeJob t = do
x <- newTVar 0
let l = 10^6*t
forM_ [1..l] (\i -> do
writeTVar x i )
main :: IO ()
main = do
n <- newTVarIO 0
let v = 5
forkIO (transTest n v)
let v = 3
forkIO (transTest n v)
let v = 7
forkIO (transTest n v)
let v = 1
forkIO (transTest n v)
r <- atomically $ readTVar n
putStrLn("Last updated value = " ++ (show r))
What I did in the past was to create a little MVar for each forked thread and then use forkFinally to fork the threads such that at the very end, each thread would put a dummy value into the MVar (i.e. I used the MVar as a synchronisation primitive). I could then call takeMVar on those MVars to wait.
I wrapped it into a little helper function:
forkThread :: IO () -> IO (MVar ())
forkThread proc = do
handle <- newEmptyMVar
_ <- forkFinally proc (\_ -> putMVar handle ())
return handle
Using this, your code could be changed to something like
-- Fork four threads
threads <- forM [5, 3, 7, 1] (\v -> forkThread (transTest n v))
-- Wait for all of them
mapM_ takeMVar threads
However, that was before I read the (most excellent) book "Parallel and Concurrent Programming in Haskell" by Simon Marlow, which made me aware of the async package. The package provides an abstraction which not only takes care of all these things, so you can write just
-- Runs 'transTest n {5,3,7,1}' in parallel and waits for all threads
_ <- mapConcurrently (transTest n) [5, 3, 7, 1]
...it also takes care of things such as (asynchronous) exceptions.

suppress runtime error "Irrefutable pattern failed for pattern Data.Maybe.Just b"

I have the following function:
loopMyQ s q m = forever $ do
q' <- atomically $ readTVar q
let Just b = PSQ.findMin q' --irrefutable pattern here in case the queue has just been created
duetime = (PSQ.prio b) + 2.000
now <- getPOSIXTime
when (now > duetime) (transMit2 s now q m)
The problem is that when the PSQ has "just" been created as empty PSQ the Just cannot match and gives me an irrefutable pattern error at runtime. This occurs exactly once since apparently the queue later is filled and Just b always matches.
I have tried to test whether the queue is empty or not and then act on it in my function BUT this made the whole thing run twice as slow.
Since this apparently does not hurt can this error somehow be suppressed with e.g. a compiler option or would I need to catch the exception and then ignore it (that also might cost extra time).
You are probably better off by using retry
if the queue is empty: the STM action will not be retried until the queue in the TVar has been updated!
loopMyQ s q m = forever $ do
b <- atomically $ do q' <- readTVar q
case PSQ.findMin q' of
Just b -> return b
Nothing -> retry
let duetime = (PSQ.prio b) + 2.000
now <- getPOSIXTime
when (now > duetime) (transMit2 s now q m)
Assuming that once your queue is non-empty, it will never be empty again, one thing you could do is do the expensive version (check for non-empty) only until it becomes non-empty, then switch to the cheap version.
loopMyQ s q m = do
q' <- atomically $ readTVar q
case PSQ.findMin q' of
Nothing -> loopMyQ s q m
Just b -> do
body b
forever $ do
q' <- atomically $ readTVar q
let Just b <- PSQ.findMin q'
body b
where body b = do
let duetime = 2 + PSQ.prio b
now <- getPOSIXTime
when (now > duetime) (transMit2 s now q m)

Limiting memory usage when reading files

I'm a Haskell beginner and thought this would be good exercise. I have an
assignment where I need to read file in a thread A, handle the file lines
in threads B_i, and then output the results in thread C.
I have implemented this far already, but one of the requirements is that we
cannot trust that the entire file fits into memory. I was hoping that lazy
IO and garbage collector would do this for me, but alas the memory usage
keeps rising and rising.
The reader thread (A) reads the file with readFile which is then zipped
with line numbers and wrapped in Just. These zipped lines are then written
to Control.Concurrent.Chan. Each consumer thread B has its own channel.
Each consumer reads their own channel when it has data and if the regex
matches, it's outputted to their own respective output channel wrapped
within Maybe (made of lists).
The printer checks the output channel of each of the B threads. If none of
the results (line) is Nothing, the line is printed. Since at this point
there should be no reference to the older lines, I thought that the garbage
collector would be able to release these lines, but alas I seem to be in
the wrong here.
The .lhs file is in here:
http://gitorious.org/hajautettujen-sovellusten-muodostamistekniikat/hajautettujen-sovellusten-muodostamistekniikat/blobs/master/mgrep.lhs
So the question is, how do I limit the memory usage, or allow the garbage
collector to remove the lines.
Snippets as per requested. Hopefully indenting isn't too badly destroyed :)
data Global = Global {done :: MVar Bool, consumers :: Consumers}
type Done = Bool
type Linenum = Int
type Line = (Linenum, Maybe String)
type Output = MVar [Line]
type Input = Chan Line
type Consumers = MVar (M.Map ThreadId (Done, (Input, Output)))
type State a = ReaderT Global IO a
producer :: [Input] -> FilePath -> State ()
producer c p = do
liftIO $ Main.log "Starting producer"
d <- asks done
f <- liftIO $ readFile p
mapM_ (\l -> mapM_
(liftIO . flip writeChan l) c)
$ zip [1..] $ map Just $ lines f
liftIO $ modifyMVar_ d (return . not)
printer :: State ()
printer = do
liftIO $ Main.log "Starting printer"
c <- (fmap (map (snd . snd) . M.elems)
(asks consumers >>= liftIO . readMVar))
uniq' c
where head' :: Output -> IO Line
head' ch = fmap head (readMVar ch)
tail' = mapM_ (liftIO . flip modifyMVar_
(return . tail))
cont ch = tail' ch >> uniq' ch
printMsg ch = readMVar (head ch) >>=
liftIO . putStrLn . fromJust . snd . head
cempty :: [Output] -> IO Bool
cempty ch = fmap (any id)
(mapM (fmap ((==) 0 . length) . readMVar ) ch)
{- Return false unless none are Nothing -}
uniq :: [Output] -> IO Bool
uniq ch = fmap (any id . map (isNothing . snd))
(mapM (liftIO . head') ch)
uniq' :: [Output] -> State ()
uniq' ch = do
d <- consumersDone
e <- liftIO $ cempty ch
if not e
then do
u <- liftIO $ uniq ch
if u then cont ch else do
liftIO $ printMsg ch
cont ch
else unless d $ uniq' ch
Concurrent programming offers no defined execution order unless you enforce one yourself with mvars and the like. So its likely that the producer thread sticks all/most of the lines in the chan before any consumer reads them off and passes them on. Another architecture that should fit the requirements is just have thread A call the lazy readfile and stick the result in an mvar. Then each consumer thread takes the mvar, reads a line, then replaces the mvar before proceeding to handle the line. Even then, if the output thread can't keep up, then the number of matching lines stored on the chan there can build up arbitrarily.
What you have is a push architecture. To really make it work in constant space, think in terms of demand driven. Find a mechanism such that the output thread signals to the processing threads that they should do something, and such that the processing threads signal to the reader thread that they should do something.
Another way to do this is to have chans of limited size instead -- so the reader thread blocks when the processor threads haven't caught up, and so the processor threads block when the output thread hasn't caught up.
As a whole, the problem in fact reminds me of Tim Bray's widefinder benchmark, although the requirements are somewhat different. In any case, it led to a widespread discussion on the best way to implement multicore grep. The big punchline was that the problem is IO bound, and you want multiple reader threads over mmapped files.
See here for more than you'll ever want to know: http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder

Resources