A way to form a 'select' on MVars without polling

A way to form a 'select' on MVars without polling - haskell

I have two MVars (well an MVar and a Chan). I need to pull things out of the Chan and process them until the other MVar is not empty any more. My ideal solution would be something like the UNIX select function where I pass in a list of (presumably empty) MVars and the thread blocks until one of them is full, then it returns the full MVar. Try as I might I can think of no way of doing this beyond repeatedly polling each MVar with isEmptyMVar until I get false. This seems inefficient.
A different thought was to use throwTo, but it interrupts what ever is happening in the thread and I need to complete processing a job out the the Chan in an atomic fashion.
A final thought as I'm typing is to create a new forkIO for each MVar which tries to read its MVar then fill a newly created MVar with its own instance. The original thread can then block on that MVar. Are Haskell threads cheap enough to go running that many?

Haskell threads are very cheap, so you could solve it that way, but it sounds like STM would be a better fit for your problem. With STM you can do
do var <- atomically (takeTMVar a `orElse` takeTMVar b)
... do stuff with var
Because of the behavior of retry and orElse, this code tries to get a, then if that fails, get b. If both fail, it blocks until either of them is updated and tries again.
You could even use this to make your own rudimentary version of select:
select :: [TMVar a] -> STM a
select = foldr1 orElse . map takeTMVar

How about using STM versions, TChan and TVar, with the retry and orElse behavior?
Implementing select is one of STM's nice capabilities. From "Composable Memory Transactions":
Beyond this, we also provide orElse,
which allows them to be composed as alternatives, so that
the second is run if the ﬁrst retries (Section 3.4). This ability allows threads to wait for many things at once, like the
Unix select system call – except that orElse composes well,
whereas select does not.
orElse in RWH.
The STM package
Papers on Haskell's STM

Related

Can I use IORef as a kind of "mutex" in this way?

I have an operation that must be executed in the mutual exclusive way. In other languages I can do something like (Python like pseudocode):
with myLock:
# here lock is acquired
do mutual exclusive operation
# here lock is released
Sure, I can do it in Haskell with MVar: to take it and then to put it back. But I want to do it with IORef as:
-- somewhere
duringOperation :: IORef Bool
.....
-- operation execution:
mayIStart <- atomicModifyIORef' duringOperation $ \case
-- tuple is treated as (lock/keep-locked, may I start the operation)
True -> (True, False)
False -> (True, True)
if mayIStart then do
-- here I am doing my operation in a mutual exclusive way
...
-- after completion I reset duringOperation flag
atomicWriteIORef' _duringOperation False
else
-- something else...
It looks like typical "atomic" or "synchronized" flag in other languages. But I am not sure how it's safe in Haskell and with an usage of IORef for such goal. Again, my idea is to do it with IORef. Is it really safe (the operation will be really mutual exclusive)?

Yes, it's safe. That is precisely what the atomic in atomicModifyIORef means. From the Fine Documentation:
Atomically modifies the contents of an IORef.
This function is useful for using IORef in a safe way in a multithreaded program. If you only have one IORef, then using atomicModifyIORef to access and modify it will prevent race conditions.
I believe you do not need to write atomically when releasing the "mutex", i.e. writeIORef duringOperation False should be fine -- believing that the operation is running when it isn't is safe, just less efficient.

TMVar, but without the buffer?

I'm trying to do communication between Haskell lightweight threads. Threads want to send each other messages for communication and synchronisation.
I was originally using TMVar for this, but I've just realised that the semantics are wrong: a TMVar will store one message in it internally, so positing a message to an empty TMVar won't block. It'll only block if you post a message to a full TMVar.
Can anyone suggest a similar STM IPC construct which:
will cause all writes to block until the message is consumed;
will cause all reads to block until a message is provided?
i.e. a zero-length pipe would be ideal; but I don't think BoundedChan would be happy if I gave it a capacity of 0. (Also, it's not STM.)

If I understand your problem correctly, I don't think you can, since the transactional guarantees mean that transaction A can't read from transaction B's write until transaction B is committed, at which point it can no longer block.
TMVar is the closest you're going to get if you're using STM. With IO, you may be able to build a structure which only completes a write when a reader is available (this structure may already exist, but I'm not aware of it).

I'd suggest to reformulate the two requirements:
will cause all writes to block until the message is consumed;
will cause all reads to block until a message is provided.
The problem is with terms block and consumed/provided. With STM there is no notion of block, there is just retry, which has a different semantics: It restarts the current transaction - it doesn't wait until something happens (this could cause deadlocks). So we can't say "block until ...", we can only say something like "the transaction succeeds only when ...".
Similarly, what does "until a message is consumed/provided" mean? Since transactions are atomic, it can only be "until the transaction that consumed/provided a message succeeded".
So let's try to reformulate:
will cause all writes to retry until a transaction that consumes the message succeeds;
will cause all reads to retry until a transaction that provides a message succeeds.
But now the first point doesn't make sense: If a write retries, there is no message to be consumed, the transaction didn't pause, it's been discarded and started over - possibly producing a different message!
In other words: Any data can ever leave a STM transaction only when it succeeds (completes). This is by design - the transactions are always atomic from the point of view of the outside world / other transactions - you can never observe results of only a part of a transaction. You can never observe two transactions interacting.
So a 0-length queue is a bad analogy - it will never allow to pass any data though. At the end of any transaction, it'll have to have to be empty, so no data will ever pass through.
Nevertheless I believe it'll be possible to reformulate the requirements according to your goals and subsequently find a solution.

You say you would be happy with one side or the other being in IO rather than STM. So then it is not too hard to code this up. Let's start with the version that has receiving in IO. To make this happen, the receiver will have to initiate the handshake.
type SynchronousVar a = TChan (TMVar a)
send :: SynchronousVar a -> a -> STM a
receive :: SynchronousVar a -> IO a
send svar a = do
tmvar <- readTChan svar
putTMVar tmvar a
receive svar = do
tmvar <- newEmptyTMVarIO
atomically $ writeTChan svar tmvar
atomically $ takeTMVar tmvar
A similar protocol can be written that has sending start the handshake.
type SynchronousVar a = TChan (a, TMVar ())
send :: SynchronousVar a -> a -> IO a
receive :: SynchronousVar a -> STM a
send svar a = do
tmvar <- newEmptyTMVarIO
atomically $ writeTChan svar (a, tmvar)
atomically $ takeTMVar tmvar
receive svar = do
(a, tmvar) <- readTChan svar
putTMvar tmvar ()
return a
Probably, if you really need synchronous communication, this is because you want two-way communication (i.e. the action that's running in IO wants to know something about the thread it's synchronizing with). It is not hard to extend the above protocol to pass off a tad more information about the synchronization (by adding it to the one-tuple in the former case or to the TMVar in the latter case).

Implement main server loop in Haskell?

What is the generally accepted way to implement the main loop of a server that needs to wait on a heterogeneous set of events? That is the server should wait (not busywait) until one of the following occurs:
new socket connection
data available on an existing socket
OS signal
third-party library callbacks

I think you're thinking in terms of a C paradigm with a single thread, nonblocking I/O, and a select() call.
You can manage to write something like that in Haskell, but Haskell has much more to offer:
lightweight threads
safe and efficient concurrent data primitives like Mvar and Chan
the Big Gun: Software Transactional Memory
I recommend you fork a new thread for every separate point of contact with the outside world, and keep everything coordinated with STM.

Use takeMVar and putMVar to synchronize between threads. They generally block the thread if operation is not permitted.
Read ghc docs.

I'd like to make it clear I think the two solutions posted first are better than this one for the specific problem you have, but here's a way to solve the type of problem you presented.
A simple way round this is to take your definitions like
data SocketConn = ....
data DataAvail = ...
data OSSignal = ...
data Callback = ...
and define the unsimplified version of
data ServerEvent = Sok SocketConn | Dat DataAvail | Sig OSSignal | Call Callback
handleEvent :: ServerEvent -> IO ()
handleEvent (Soc s) = ....
handleEvent (Dat d) = ....
handleEvent (Sig o) = ....
handleEvent (Call c) = ....
Like I say, read up on the other answers!

Software Transactional Memory (STM) is the main way to do a multi-way wait.
However, by the looks of things, in your case you probably just want to spawn a seperate Haskell thread for each task, and let each such thread block while there's nothing happening.
You wouldn't want to create a thousand OS threads, but a thousand Haskell threads is no trouble at all.
(If these threads need to coordinate from time to time, then again, STM is probably the simplest, most reliable way to do that.)

How can one implement a forking try-catch in Haskell?

I want to write a function
forkos_try :: IO (Maybe α) -> IO (Maybe α)
which Takes a command x. x is an imperative operation which first mutates state, and then checks whether that state is messed up or not. (It does not do anything external, which would require some kind of OS-level sandboxing to revert the state.)
if x evaluates to Just y, forkos_try returns Just y.
otherwise, forkos_try rolls back state, and returns Nothing.
Internally, it should fork() into threads parent and child, with x running on child.
if x succeeds, child should keep running (returning x's result) and parent should die
otherwise, parent should keep running (returning Nothing) and child should die
Question: What's the way to write something with equivalent, or more powerful semantics than forkos_try? N.B. -- the state mutated (by x) is in an external library, and cannot be passed between threads. Hence, the semantic of which thread to keep alive is important.
Formally, "keep running" means "execute some continuation rest :: Maybe α -> IO () ". But, that continuation isn't kept anywhere explicit in code.
For my case, I think it will (for the time) work to write it in different style, using forkOS (which takes the entire computation child will run), since I can write an explicit expression for rest. But, it troubles me that I can't figure out how do this with the primitive function forkOS -- one would think it would be general enough to support any specific case (which could appear as a high-level API, like forkos_try).
EDIT -- please see the example code with explicit rest if the problem's still not clear [ http://pastebin.com/nJ1NNdda ].
p.s. I haven't written concurrency code in a while; hopefully my knowledge of POSIX fork() is correct! Thanks in advance.

Things are a lot simpler to reason about if you model state explicitly.
someStateFunc :: (s -> Maybe (a, s))
-- inside some other function
case someStateFunc initialState of
Nothing -> ... -- it failed. stick with initial state
Just (a, newState) -> ... -- it suceeded. do something with
-- the result and new state
With immutable state, "rolling back" is simple: just keep using initialState. And "not rolling back" is also simple: just use newState.
So...I'm assuming from your explanation that this "external library" performs some nontrivial IO effects that are nevertheless restricted to a few knowable and reversible operations (modify a file, an IORef, etc). There is no way to reverse some things (launch the missiles, write to stdout, etc), so I see one of two choices for you here:
clone the world, and run the action in a sandbox. If it succeeds, then go ahead and run the action in the Real World.
clone the world, and run the action in the real world. If it fails, then replace the Real World with the snapshot you took earlier.
Of course, both of these are actually the same approach: fork the world. One world runs the action, one world doesn't. If the action succeeds, then that world continues; otherwise, the other world continues. You are proposing to accomplish this by building upon forkOS, which would clone the entire state of the program, but this would not be sufficient to deal with, for example, file modifications. Allow me to suggest instead an approach that is nearer to the simplicity of immutable state:
tryIO :: IO s -> (s -> IO ()) -> IO (Maybe a) -> IO (Maybe a)
tryIO save restore action = do
initialState <- save
result <- action
case result of
Nothing -> restore initialState >> return Nothing
Just x -> return (Just x)
Here you must provide some data structure s, and a way to save to and restore from said data structure. This allows you the flexibility to perform any cloning you know to be necessary. (e.g. save could copy a certain file to a temporary location, and then restore could copy it back and delete the temporary file. Or save could copy the value of certain IORefs, and then restore could put the value back.) This approach may not be the most efficient, but it's very straightforward.

Removing the contents of a Chan or MVar in a single discrete step

I'm writing a discrete simulation where request values from multiple threads accumulate in a centralized queue. Every n milliseconds, a manager wakes up to process requests. When the manager wakes up, it should retrieve all of the contents of the central queue in a single discrete step. While processing these, any client threads attempting to submit to the queue should block. When processing completes, the queue reopens and the manager goes back to sleep.
What's the best way to do this? The retry behavior of STM isn't really what I want. If I use a Chan or MVar, there's no way to prevent clients from enqueuing additional requests during processing. One approach is to use an MVar as a mutex on a Chan holding the queue. Are there other ways to do this?

I'd have to benchmark under your expected contention levels to know exactly what the best solution is, but here's my guess.
Use an MVar containing [Item], whatever your item type is. Initialize the MVar with with newMVar []. To add an element to the central list, use modifyMVar_ (return . (item :)), where item is what you are adding to the list. Use takeMVar at the start of the processing pass, and putMVar [] and the end of it.
First, note that this isn't a queue, internally. If you want to handle things in the order they were added, reverse the list after extracting it.
Second, so long as these are the only operations you perform on the MVar, this is free of race conditions. That's because the MVar is initialized full, and each operation is "take the contents of the MVar, put something else in." Operations may block while waiting for the latter part of that, but this can't deadlock and there will be no lost updates.

Isn't there a problem with MVar [a] though? When there are no items to be read we want readers to be blocked, but since the MVar actually contains [] this doesn't happen.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string