Haskell: TVar: orElse

Haskell: TVar: orElse - haskell

Is the "else" part of orElse called when a transaction is retried due to another transaction writing to a TVar it had read, or only when retry is explicitly called?

If you have
orElse a b
then b is only run if retry is called explicitly in a. Otherwise orElse would essentially become nondeterministic. (The rerunning of transactions that is done by the STM runtime is transparent and should not effect the outcome of any computation.)

Related

Can I use IORef as a kind of "mutex" in this way?

I have an operation that must be executed in the mutual exclusive way. In other languages I can do something like (Python like pseudocode):
with myLock:
# here lock is acquired
do mutual exclusive operation
# here lock is released
Sure, I can do it in Haskell with MVar: to take it and then to put it back. But I want to do it with IORef as:
-- somewhere
duringOperation :: IORef Bool
.....
-- operation execution:
mayIStart <- atomicModifyIORef' duringOperation $ \case
-- tuple is treated as (lock/keep-locked, may I start the operation)
True -> (True, False)
False -> (True, True)
if mayIStart then do
-- here I am doing my operation in a mutual exclusive way
...
-- after completion I reset duringOperation flag
atomicWriteIORef' _duringOperation False
else
-- something else...
It looks like typical "atomic" or "synchronized" flag in other languages. But I am not sure how it's safe in Haskell and with an usage of IORef for such goal. Again, my idea is to do it with IORef. Is it really safe (the operation will be really mutual exclusive)?

Yes, it's safe. That is precisely what the atomic in atomicModifyIORef means. From the Fine Documentation:
Atomically modifies the contents of an IORef.
This function is useful for using IORef in a safe way in a multithreaded program. If you only have one IORef, then using atomicModifyIORef to access and modify it will prevent race conditions.
I believe you do not need to write atomically when releasing the "mutex", i.e. writeIORef duringOperation False should be fine -- believing that the operation is running when it isn't is safe, just less efficient.

TMVar, but without the buffer?

I'm trying to do communication between Haskell lightweight threads. Threads want to send each other messages for communication and synchronisation.
I was originally using TMVar for this, but I've just realised that the semantics are wrong: a TMVar will store one message in it internally, so positing a message to an empty TMVar won't block. It'll only block if you post a message to a full TMVar.
Can anyone suggest a similar STM IPC construct which:
will cause all writes to block until the message is consumed;
will cause all reads to block until a message is provided?
i.e. a zero-length pipe would be ideal; but I don't think BoundedChan would be happy if I gave it a capacity of 0. (Also, it's not STM.)

If I understand your problem correctly, I don't think you can, since the transactional guarantees mean that transaction A can't read from transaction B's write until transaction B is committed, at which point it can no longer block.
TMVar is the closest you're going to get if you're using STM. With IO, you may be able to build a structure which only completes a write when a reader is available (this structure may already exist, but I'm not aware of it).

I'd suggest to reformulate the two requirements:
will cause all writes to block until the message is consumed;
will cause all reads to block until a message is provided.
The problem is with terms block and consumed/provided. With STM there is no notion of block, there is just retry, which has a different semantics: It restarts the current transaction - it doesn't wait until something happens (this could cause deadlocks). So we can't say "block until ...", we can only say something like "the transaction succeeds only when ...".
Similarly, what does "until a message is consumed/provided" mean? Since transactions are atomic, it can only be "until the transaction that consumed/provided a message succeeded".
So let's try to reformulate:
will cause all writes to retry until a transaction that consumes the message succeeds;
will cause all reads to retry until a transaction that provides a message succeeds.
But now the first point doesn't make sense: If a write retries, there is no message to be consumed, the transaction didn't pause, it's been discarded and started over - possibly producing a different message!
In other words: Any data can ever leave a STM transaction only when it succeeds (completes). This is by design - the transactions are always atomic from the point of view of the outside world / other transactions - you can never observe results of only a part of a transaction. You can never observe two transactions interacting.
So a 0-length queue is a bad analogy - it will never allow to pass any data though. At the end of any transaction, it'll have to have to be empty, so no data will ever pass through.
Nevertheless I believe it'll be possible to reformulate the requirements according to your goals and subsequently find a solution.

You say you would be happy with one side or the other being in IO rather than STM. So then it is not too hard to code this up. Let's start with the version that has receiving in IO. To make this happen, the receiver will have to initiate the handshake.
type SynchronousVar a = TChan (TMVar a)
send :: SynchronousVar a -> a -> STM a
receive :: SynchronousVar a -> IO a
send svar a = do
tmvar <- readTChan svar
putTMVar tmvar a
receive svar = do
tmvar <- newEmptyTMVarIO
atomically $ writeTChan svar tmvar
atomically $ takeTMVar tmvar
A similar protocol can be written that has sending start the handshake.
type SynchronousVar a = TChan (a, TMVar ())
send :: SynchronousVar a -> a -> IO a
receive :: SynchronousVar a -> STM a
send svar a = do
tmvar <- newEmptyTMVarIO
atomically $ writeTChan svar (a, tmvar)
atomically $ takeTMVar tmvar
receive svar = do
(a, tmvar) <- readTChan svar
putTMvar tmvar ()
return a
Probably, if you really need synchronous communication, this is because you want two-way communication (i.e. the action that's running in IO wants to know something about the thread it's synchronizing with). It is not hard to extend the above protocol to pass off a tad more information about the synchronization (by adding it to the one-tuple in the former case or to the TMVar in the latter case).

Haskell STM alwaysSucceeds

There is a function in haskell's stm library with the following type signature:
alwaysSucceeds :: STM a -> STM ()
From what I understand of STM in haskell, there are three ways that something can "go wrong" (using that term loosely) while an STM computation is executing:
The value of an TVar that has been read is changed by another thread.
An user-specified invariant is violated. This seems to usually be triggered by calling retry to make it start over. This effectively makes the thread block and then retry once a TVar in the read set changes.
An exception is thrown. Calling throwSTM causes this. This one differs from the first two because the transaction doesn't get restarted. Instead, the error is propagated and either crashes the program or is caught in the IO monad.
If these are accurate (and if they are not, please tell me), I can't understand what alwaysSucceeds could possibly do. The always function, which appears to be built on top of it, seems like it could be written without alwaysSucceeds as:
--This is probably wrong
always :: STM Bool -> STM ()
always stmBool = stmBool >>= check
The documentation for alwaysSucceeds says:
alwaysSucceeds adds a new invariant that must be true when passed to
alwaysSucceeds, at the end of the current transaction, and at the end
of every subsequent transaction. If it fails at any of those points
then the transaction violating it is aborted and the exception raised
by the invariant is propagated.
But since the argument is of type STM a (polymorphic in a), it can't use the value that the transaction returns for any part of the decision making. So, it seems like it would be looking for the different types of failures that I listed earlier. But what's the point of that? The STM monad already handles the failures. How would wrapping it in this function affect it? And why does the variable of type a get dropped, resulting in STM ()?

The special effect of alwaysSucceeds is not how it checks for failure at the spot it's run (running the "invariant" action alone should do the same thing), but how it reruns the invariant check at the end of transactions.
Basically, this function creates a user-specified invariant as in (2) above, that has to hold not just right now, but also at the end of later transactions.
Note that a "transaction" doesn't refer to every single subaction in the STM monad, but to a combined action that is passed to atomically.
I guess the a is dropped just for convenience so you don't have to convert an action to STM () (e.g. with void) before passing it to alwaysSucceeds. The return value will be useless anyhow for the later repeated checks.

How can one implement a forking try-catch in Haskell?

I want to write a function
forkos_try :: IO (Maybe α) -> IO (Maybe α)
which Takes a command x. x is an imperative operation which first mutates state, and then checks whether that state is messed up or not. (It does not do anything external, which would require some kind of OS-level sandboxing to revert the state.)
if x evaluates to Just y, forkos_try returns Just y.
otherwise, forkos_try rolls back state, and returns Nothing.
Internally, it should fork() into threads parent and child, with x running on child.
if x succeeds, child should keep running (returning x's result) and parent should die
otherwise, parent should keep running (returning Nothing) and child should die
Question: What's the way to write something with equivalent, or more powerful semantics than forkos_try? N.B. -- the state mutated (by x) is in an external library, and cannot be passed between threads. Hence, the semantic of which thread to keep alive is important.
Formally, "keep running" means "execute some continuation rest :: Maybe α -> IO () ". But, that continuation isn't kept anywhere explicit in code.
For my case, I think it will (for the time) work to write it in different style, using forkOS (which takes the entire computation child will run), since I can write an explicit expression for rest. But, it troubles me that I can't figure out how do this with the primitive function forkOS -- one would think it would be general enough to support any specific case (which could appear as a high-level API, like forkos_try).
EDIT -- please see the example code with explicit rest if the problem's still not clear [ http://pastebin.com/nJ1NNdda ].
p.s. I haven't written concurrency code in a while; hopefully my knowledge of POSIX fork() is correct! Thanks in advance.

Things are a lot simpler to reason about if you model state explicitly.
someStateFunc :: (s -> Maybe (a, s))
-- inside some other function
case someStateFunc initialState of
Nothing -> ... -- it failed. stick with initial state
Just (a, newState) -> ... -- it suceeded. do something with
-- the result and new state
With immutable state, "rolling back" is simple: just keep using initialState. And "not rolling back" is also simple: just use newState.
So...I'm assuming from your explanation that this "external library" performs some nontrivial IO effects that are nevertheless restricted to a few knowable and reversible operations (modify a file, an IORef, etc). There is no way to reverse some things (launch the missiles, write to stdout, etc), so I see one of two choices for you here:
clone the world, and run the action in a sandbox. If it succeeds, then go ahead and run the action in the Real World.
clone the world, and run the action in the real world. If it fails, then replace the Real World with the snapshot you took earlier.
Of course, both of these are actually the same approach: fork the world. One world runs the action, one world doesn't. If the action succeeds, then that world continues; otherwise, the other world continues. You are proposing to accomplish this by building upon forkOS, which would clone the entire state of the program, but this would not be sufficient to deal with, for example, file modifications. Allow me to suggest instead an approach that is nearer to the simplicity of immutable state:
tryIO :: IO s -> (s -> IO ()) -> IO (Maybe a) -> IO (Maybe a)
tryIO save restore action = do
initialState <- save
result <- action
case result of
Nothing -> restore initialState >> return Nothing
Just x -> return (Just x)
Here you must provide some data structure s, and a way to save to and restore from said data structure. This allows you the flexibility to perform any cloning you know to be necessary. (e.g. save could copy a certain file to a temporary location, and then restore could copy it back and delete the temporary file. Or save could copy the value of certain IORefs, and then restore could put the value back.) This approach may not be the most efficient, but it's very straightforward.

A way to form a 'select' on MVars without polling

I have two MVars (well an MVar and a Chan). I need to pull things out of the Chan and process them until the other MVar is not empty any more. My ideal solution would be something like the UNIX select function where I pass in a list of (presumably empty) MVars and the thread blocks until one of them is full, then it returns the full MVar. Try as I might I can think of no way of doing this beyond repeatedly polling each MVar with isEmptyMVar until I get false. This seems inefficient.
A different thought was to use throwTo, but it interrupts what ever is happening in the thread and I need to complete processing a job out the the Chan in an atomic fashion.
A final thought as I'm typing is to create a new forkIO for each MVar which tries to read its MVar then fill a newly created MVar with its own instance. The original thread can then block on that MVar. Are Haskell threads cheap enough to go running that many?

Haskell threads are very cheap, so you could solve it that way, but it sounds like STM would be a better fit for your problem. With STM you can do
do var <- atomically (takeTMVar a `orElse` takeTMVar b)
... do stuff with var
Because of the behavior of retry and orElse, this code tries to get a, then if that fails, get b. If both fail, it blocks until either of them is updated and tries again.
You could even use this to make your own rudimentary version of select:
select :: [TMVar a] -> STM a
select = foldr1 orElse . map takeTMVar

How about using STM versions, TChan and TVar, with the retry and orElse behavior?
Implementing select is one of STM's nice capabilities. From "Composable Memory Transactions":
Beyond this, we also provide orElse,
which allows them to be composed as alternatives, so that
the second is run if the ﬁrst retries (Section 3.4). This ability allows threads to wait for many things at once, like the
Unix select system call – except that orElse composes well,
whereas select does not.
orElse in RWH.
The STM package
Papers on Haskell's STM

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string