Unmask async exceptions in STM transaction

Unmask async exceptions in STM transaction - haskell

I have a function that posts a request to an STM channel and then blocks the thread waiting for that request's completion. If my blocked thread is terminated by an async exception then I want to post a cancellation request to that STM channel. It currently looks like this:
runRequest channel request = mask $ \restore -> do
(reqId, resultVar) <- restore . atomically $ requestPostSTM channel request
onException
(restore . atomically $ do
maybeResult <- readTVar resultVar
case maybeResult of
Just x -> return x
Nothing -> retry)
(atomically $ requestCancelSTM channel reqId)
This is almost what I need, but there is a small possibility that async exception would arrive right after requestPostSTM transaction succeeds, but still before we exit restore block. If this happens, then I will have my request posted, but wouldn't be able to post a cancellation for it.
Ideally I would like to change the second line to something like
(reqId, resultVar) <- atomically . restore $ requestPostSTM channel request
This way I would be sure that no async exception could sneak in after I have requestPostSTM transaction commited. But this, of course, doesn't compile because restore is IO a -> IO a, not STM a -> STM a.
Is there any way I could unmask async exceptions only for duration of my STM transaction? Or maybe another way to have a guarantee that either an async exception arrives and my first STM transaction aborts or my first STM transaction commits and no async exception could terminate my thread until I install a handler for it with onException?

It is better to call requestPostSTM with async exceptions masked. All STM actions, that retries, are interruptible, so they will be aborted in case of async exception, yet exception can't arrive in unexpected point. The rule is: if your action allocates resource (i.e. something, that should be deallocated), then you should run it with async exceptions masked, and rely on interruptible actions to abort the action at some known points (or pull async exceptions with allowInterrupt manually).

Related

Behavior of async package

What would the following code using the async package do:
action <- async $ mapM_ someFunc someList
wait action
Will this merely spawn a single thread in which mapM_ occurs? (Implying that this has no benefit over just mapM_ someFunc someList)
Or will it perform the mapM_ action asynchronously (or is mapConcurrently the only way to get such behavior)?

Will this merely spawn a single thread in which mapM_ occurs?
Yes, it will fork a thread and immediately block waiting for the mapM_ to finish and return a () (or to throw an exception).
The async package is very simple; you might like to look at the source to see how it all works together and learn more about the underlying haskell concurrency primitives.

TMVar, but without the buffer?

I'm trying to do communication between Haskell lightweight threads. Threads want to send each other messages for communication and synchronisation.
I was originally using TMVar for this, but I've just realised that the semantics are wrong: a TMVar will store one message in it internally, so positing a message to an empty TMVar won't block. It'll only block if you post a message to a full TMVar.
Can anyone suggest a similar STM IPC construct which:
will cause all writes to block until the message is consumed;
will cause all reads to block until a message is provided?
i.e. a zero-length pipe would be ideal; but I don't think BoundedChan would be happy if I gave it a capacity of 0. (Also, it's not STM.)

If I understand your problem correctly, I don't think you can, since the transactional guarantees mean that transaction A can't read from transaction B's write until transaction B is committed, at which point it can no longer block.
TMVar is the closest you're going to get if you're using STM. With IO, you may be able to build a structure which only completes a write when a reader is available (this structure may already exist, but I'm not aware of it).

I'd suggest to reformulate the two requirements:
will cause all writes to block until the message is consumed;
will cause all reads to block until a message is provided.
The problem is with terms block and consumed/provided. With STM there is no notion of block, there is just retry, which has a different semantics: It restarts the current transaction - it doesn't wait until something happens (this could cause deadlocks). So we can't say "block until ...", we can only say something like "the transaction succeeds only when ...".
Similarly, what does "until a message is consumed/provided" mean? Since transactions are atomic, it can only be "until the transaction that consumed/provided a message succeeded".
So let's try to reformulate:
will cause all writes to retry until a transaction that consumes the message succeeds;
will cause all reads to retry until a transaction that provides a message succeeds.
But now the first point doesn't make sense: If a write retries, there is no message to be consumed, the transaction didn't pause, it's been discarded and started over - possibly producing a different message!
In other words: Any data can ever leave a STM transaction only when it succeeds (completes). This is by design - the transactions are always atomic from the point of view of the outside world / other transactions - you can never observe results of only a part of a transaction. You can never observe two transactions interacting.
So a 0-length queue is a bad analogy - it will never allow to pass any data though. At the end of any transaction, it'll have to have to be empty, so no data will ever pass through.
Nevertheless I believe it'll be possible to reformulate the requirements according to your goals and subsequently find a solution.

You say you would be happy with one side or the other being in IO rather than STM. So then it is not too hard to code this up. Let's start with the version that has receiving in IO. To make this happen, the receiver will have to initiate the handshake.
type SynchronousVar a = TChan (TMVar a)
send :: SynchronousVar a -> a -> STM a
receive :: SynchronousVar a -> IO a
send svar a = do
tmvar <- readTChan svar
putTMVar tmvar a
receive svar = do
tmvar <- newEmptyTMVarIO
atomically $ writeTChan svar tmvar
atomically $ takeTMVar tmvar
A similar protocol can be written that has sending start the handshake.
type SynchronousVar a = TChan (a, TMVar ())
send :: SynchronousVar a -> a -> IO a
receive :: SynchronousVar a -> STM a
send svar a = do
tmvar <- newEmptyTMVarIO
atomically $ writeTChan svar (a, tmvar)
atomically $ takeTMVar tmvar
receive svar = do
(a, tmvar) <- readTChan svar
putTMvar tmvar ()
return a
Probably, if you really need synchronous communication, this is because you want two-way communication (i.e. the action that's running in IO wants to know something about the thread it's synchronizing with). It is not hard to extend the above protocol to pass off a tad more information about the synchronization (by adding it to the one-tuple in the former case or to the TMVar in the latter case).

Is it safe to use trace inside a STM stransaction?

I have a transaction failing indefinitely for some reason, and I would like to use trace instructions inside. For example, to print the state of the MVar's before executing the transaction in this fragment:
data_out <- atomically $ do
rtg_state <- takeTMVar ready_to_go
JobDescr hashid url <- T.readTBChan next_job_descr
case rtg_state of
Ready_RTG n -> do
putTMVar ready_to_go $ Processing_RTG n
putTMVar start_harvester_browser hashid
putTMVar next_test_url_to_check_chan hashid
putTMVar next_harvest_url hashid
return (n,hashid,url)
_ -> retry
Would that make the program segfault or miss-behave?

As long as you use trace for debug purposes only you should be OK. As a general rule, just assume that in the final production-ready version of your program there will be no traces around.
You will never observe segfaults from trace. Its "unsafety" stems from it injecting observable effects in pure code. E.g., in STM, when a transaction retries, its effects are assumed to be rolled back. If trace was used to send a message to the user, you can't roll that back. If the output of trace triggers a missile launch, you will have to deal with international side effects. If trace instead just signals the developer with "FYI, the code is doing X", this is not part of the core logic of the program, and is perfectly fine.

Ensure IO computations are run in a specific thread

I need to make sure that some actions are run on a specific OS thread. I wrote an API where this thread runs a loop listening to a TQueue and executes the given commands. From the API user side, there is an opaque value that is really just a newtype over this queue.
One problem is that what I really need is to embed arbitrary actions (type IO a), but I believe I can't directly exchange messages of that type. So I currently have something like this (pseudo code) :
makeSafe :: RubyInterpreter -> IO a -> IO (Either RubyError a)
makeSafe (RubyInterpreter q) a = do
mv <- newEmptyTMVarIO
-- embedded is of type IO (), letting me send this in my queue
let embedded = handleErrors a >>= atomically . putTMVar mv
atomically (writeTQueue q (SomeMessage embedded))
atomically (readTMVar mv)
(for more details, this is for the hruby package)
edit - clarifications :
Being able to send actions of type IO a would be nicer, but is not my main objective.
My main problem is that you can shoot yourself in the foot with this API, for example if there is a makeSafe call in the IO action that is passed as a parameter, this will hang.
My secondary problem is that this solution feels a bit contrived, and I wondered if there was a nicer/safer solution around.

State-dependent event processing with state updates

I want to use FRP (i.e., reactive banana 0.6.0.0) for my project (a GDB/MI front-end). But I have troubles declaring the event network.
There are commands from the GUI and there are stop events from GDB. Both need to be handled and handling them depends on the state of the system.
My current approach looks like this (I think this is the minimum required complexity to show the problem):
data Command = CommandA | CommandB
data Stopped = ReasonA | ReasonB
data State = State {stateExec :: Exec, stateFoo :: Int}
data StateExec = Running | Stopped
create_network :: NetworkDescription t (Command -> IO ())
create_network = do
(eCommand, fCommand) <- newEvent
(eStopped, fStopped) <- newEvent
(eStateUpdate, fStateUpdate) <- newEvent
gdb <- liftIO $ gdb_init fStopped
let
eState = accumE initialState eStateUpdate
bState = stepper initialState eState
reactimate $ (handleCommand gdb fStateUpdate <$> bState) <#> eCommand
reactimate $ (handleStopped gdb fStateUpdate <$> bState) <#> eStopped
return fCommand
handleCommand and handelStopped react on commands and stop events depending on the current state. Possible reactions are calling (synchronous) GDB I/O functions and firing state update events. For example:
handleCommand :: GDB -> ((State -> State) -> IO ()) -> State -> Command -> IO ()
handleCommand gdb fStateUpdate state CommandA = case stateExec state of
Running -> do
gdb_interrupt gdb
fStateUpdate f
where f state' = state' {stateFoo = 23}
The problem is, when f gets evaluated by accumE, state' sometimes is different from state.
I am not 100% sure why this can happen as I don't fully understand the semantics of time and simultaneity and the order of "reactimation" in reactive banana. But I guess that state update functions fired by handleStopped might get evaluated before f thus changing the state.
Anyway, this event network leads to inconsistent state because the assumptions of f on the "current" state are sometimes wrong.
I have been trying to solve this problem for over a week now and I just cannot figure it out. Any help is much appreciated.

It looks like you want to make a eStateUpdate event occur whenever eStop or eCommand occurs?
If so, you can simply express it as the union of the two events:
let
eStateUpdate = union (handleCommand' <$> eCommand)
(handleStopped' <$> eStopped)
handleCommand' :: Command -> (State -> State)
handleStopped' :: Stopped -> (State -> State)
eState = accumE initialState eStateUpdate
etc.
Remember: events behave like ordinary values which you can combine to make new ones, you're not writing a chain of callback functions.
The newEvent function should only be used if you want to import an event from the outside world. That's the case for eCommand and eStopped, as they are triggered by the external GDB, but the eStateUpdate event seems to be internal to the network.
Concerning behavior of your current code, reactive-banana always does the following things when receiving an external event:
Calculate/update all event occurrences and behavior values.
Run the reactimates in order.
But it may well happen happen that step 2 triggers the network again (for instance via the fStateUpdate function), in which case the network calculates new values and calls the reactimates again, as part of this function call. After this, flow control returns to the first sequence of reactimates that is still being run, and a second call to fStateUpdate will have strange effects: the behaviors inside the network have been updated already, but the argument to this call is still an old value. Something like this:
reactimate1
reactimate2
fStateUpdate -- behaviors inside network get new values
reactimate1'
reactimate2'
reactimate3 -- may contain old values from first run!
Apparently, this is tricky to explain and tricky to reason about, but fortunately unnecessary if you stick to the guidelines above.
In a sense, the latter part embodies the trickiness of writing event handlers in the traditional style, whereas the former part embodies the (relative) simplicity of programming with events in FRP-style.
The golden rule is:
Do not call another event handler while handling an event.
You don't have to follow this rule, and it can be useful at times; but things will become complicated if you do that.

As far as I can see, FRP seems not to be the right abstraction for my problem.
So I switched to actors with messages of type State -> IO State.
This gives me the required serialization of events and the possibility to do IO when updating the state. What I loose is the nice description of the event network. But it's not too bad with actors either.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string