Getting parallel IO while accounting for failure

Getting parallel IO while accounting for failure - haskell

I'm making several API calls that are encapsulated in a type alias:
type ConnectT a = EitherT String (RWST ConnectReader ConnectWriter ConnectState IO) a
Here's a simplified version of a function which connects to two separate APIs:
connectBoth :: ConnectT ()
connectBoth = do
a <- connectAPI SomeAPI someFunction
b <- connectAPI OtherAPI otherFunction
connectAPI OtherAPI (b `sendTo` a)
The final call in connectBoth is very time sensitive (and the transactions are of a financial nature). I figure a and b could be evaluated in parallel, and with lazy IO I should be able to do this:
b <- a `par` connectAPI OtherAPI otherFunction
The documentation for par says that it Indicates that it may be beneficial to evaluate the first argument in parallel with the second.
Does this work with IO?
Can I get any more guaranteed than "it may be beneficial?"
Or if I want greater guarantees will I need to use an MVar and liftIO . forkIO?
If I evaluate a first, I think I can use eitherT to check if a succeeded. But if I evaluate both at the same time I get confused. Here is the situation:
If only a failed, I will retry a, if that fails I will run a function that manually reverses b
If only b failed, I will retry b, write to the log in RWS and return left
if both fail write to the log in RWS and return left
if both succeed process c (which is not as time sensitive as a or b)
But if I evaluate both in parallel, then how can I identify which one failed? If I use eitherT immediately after a then a will evaluate first. If I use it after b then I won't be able to tell which one failed.
Is there a way I can evaluate the IO calls in parallel but respond differently depending on which one (if any) fails? Or am I left with a choice of parallelism vs failure mitigation?

The solution you are looking for will use forkIO and MVars.
par
par is for multiprocessor parallelism, it helps evaluate terms in parallel. It doesn't help with IO. If you do
do
a <- (someProcess :: IO a)
...
By the time you reach ... everything from the IO action has happened (if we ignore evil lazy IO) to a point that a can be determined entirely by ordinary evaluation. This means that by the time you do b <- someOtherProcess, all of someProcess is already done. It's too late to do anything in parallel.
EitherT
You can explicitly examine the Either e a result of an EitherT e m a. runEitherT :: EitherT e m a -> m (Either e a) makes the success or failure explicit in the underlying monad. We can lift that right back into EitherT to make a computation that always succeeds (sometimes with an error) from one that sometimes fails.
import Control.Monad.Trans.Class
examine :: (MonadTrans t, Monad m) => EitherT e m a -> t m (Either e a)
examine = lift . runEitherT
forkIO
The simplest solution for doing two things in IO is forkIO. It starts another lightweight thread that you can forget about.
If you run a value with your transformer stack, there will be four pieces of data when you are done. The state ConnectState, the written ConnectWriter log, whether the computation was successful, and, depending on whether or not it was successful, either the value or the error.
EitherT String (RWST ConnectReader ConnectWriter ConnectState IO) a
^ ^ ^ ^ ^
If we write out the structure of this, it looks like
(RWST ConnectReader ConnectWriter ConnectState IO) (Either String a)
^ ^ ^ ^ ^
ConnectReader -> ConnectState -> IO (Either String a, ConnectState, ConnectWriter)
^ ^ ^ ^ ^
All four of those pieces of information end up in the result of the IO action. If you fork your stack, you need to decide what to do with all of them when you join the results back together. You have already decided that you want to explicitly handle the Either String a. The ConnectWriters can probably be combined together with <>. You will need to decide what to do with ConnectState.
We'll make a fork that returns all four of these pieces of data by shoving them into an MVar.
import Control.Concurrent
import Control.Concurrent.MVar
import Control.Monad.IO.Class
forkConnectT :: ConnectT a -> ConnectT (MVar (Either String a, ConnectState, ConnectWriter))
forkConnectT cta = do
result <- liftIO newEmptyMVar
r <- lift ask
s <- lift get
liftIO $ forkIO $ do
state <- runRWST (runEitherT cta) r s
putMVar result state
return result
Later, when we want the result, we can try and see if it is ready. We'll handle the Either for success and failure explicitly, while handling the state and writer behind the scenes.
import Data.Traversable
tryJoinConnectT :: MVar (Either String a, ConnectState, ConnectWriter) -> ConnectT (Maybe (Either String a))
tryJoinConnectT result = liftIO (tryTakeMVar result) >>= traverse reintegrate
Behind the scenes we reintegrate the ConnectWriter by telling this ConnectT to write what was accumulated in the other thread. You will need to decide what to do to combine the two states.
reintegrate :: (a, ConnectState, ConnectWriter) -> ConnectT a
reintegrate (a, s, w) = do
-- Whatever needs to be done with the state.
-- stateHere <- lift get
lift $ tell w
return a
If we want to wait until the result is ready, we can block reading the MVar. This offers less opportunity for handling errors such as timeouts.
joinConnectT :: MVar (Either String a, ConnectState, ConnectWriter) -> ConnectT (Either String a)
joinConnectT result = liftIO (takeMVar result) >>= reintegrate
Example
Putting it all together, we can fork a task in parallel, do something in this thread explicitly examining the success or failure, join with the result from the other thread, and reason about what to do next with explicit Eithers representing success or failure from each process.
connectBoth :: ConnectT ()
connectBoth = do
bVar <- forkConnectT $ connectAPI OtherAPI otherFunction
a <- examine $ connectAPI SomeAPI someFunction
b <- joinConnectT bVar
...
Going farther
If you are paranoid, you will also want to handle exceptions (some of which can be handled by forkFinally) and asynchronous exceptions. You will need to decide whether to bundle these exceptions into your stack or treat IO like it can always throw exceptions.
Consider using async instead of forkIO and MVars.
monad-control (which you already have dependencies on via either) provides mechanisms for building up, one transformer at a time, the type that represents the state of a monad transformer stack. We wrote this by hand as (Either String a, ConnectState, ConnectWriter). If you are going to grow your transformer stack, you might want to get this from MonadTransControl instead. You can restore the state from the forked thread(see MonadBaseControl section) in the parent to inspect it. You will still need to decide how to deal with the data from the two states..

Related

Apparent redundant calls in a IO monad?

Here is a snippet of code taken from the Haskell GPipe project (commented by myself, save the line with "Really?"). In the memoize function, I don't understand why its author call the getter a second time to cache a newly computed value. It doesn't seem necessary to me and it can be removed without apparent bad consequences (at least, a medium-sized project of mine still works without it).
{- | A map (SN stands for stable name) to cache the results 'a' of computations 'm a'.
The type 'm' ends up being constrained to 'MonadIO m' in the various functions using it.
-}
newtype SNMap m a = SNMap (HT.BasicHashTable (StableName (m a)) a)
newSNMap :: IO (SNMap m a)
newSNMap = SNMap <$> HT.new
memoize :: MonadIO m
=> m (SNMap m a) -- ^ A "IO call" to retrieve our cache.
-> m a -- ^ The "IO call" to execute and cache the result.
-> m a -- ^ The result being naturally also returned.
memoize getter m = do
s <- liftIO $ makeStableName $! m -- Does forcing the evaluation make sense here (since we try to avoid it...)?
SNMap h <- getter
x <- liftIO $ HT.lookup h s
case x of
Just a -> return a
Nothing -> do
a <- m
SNMap h' <- getter -- Need to redo because of scope. <- Really?
liftIO $ HT.insert h' s a
return a

I get it. The scope term used is not related to the Haskell 'do' scope. It is simply that a computation could recursively update the cache when evaluated (as in the scopedM function in the same module...). It is kind of obvious in retrospect.

Haskell Conduit: is it possible to optionally have the result of a source?

I have the following types built from Data.Conduit:
type Footers = [(ByteString, ByteString)]
type DataAndConclusion = ConduitM () ByteString IO Footers
The idea of the second type being "produce a lot of ByteStrings, and if you can produce all of them, return a Footers". The condition is because conduits are governed by downstream functions, so the consumer of DataAndConclusion may have no need to consume all its items, and in that case the return wouldn't be reached. Which is precisely the behavior that I need. But when the end of the source is reached, I would like to have the produced Footers. This would be useful for example if the DataAndConclusions were incrementally computing an MD5 and such MD5 was only needed if the entire message was processed by the downstream (for example, downstream could be simply sending it through the network, but it doesn't make sense to finish computing and send the MD5 if the socket was closed before the last piece was sent by downstream).
So, basically I want to have something with this signature to consume a DataAndConclusions:
type MySink = Sink ByteString IO ()
mySink :: MySink
mySink = ...
difficultFunction :: ConduitM () a2 m r1 -> ConduitM a2 Void m r2 -> m (Maybe r1)
Question is, is there any way to implement "difficultFunction"? How?

There should be definitely a nice solution, but I wasn't able to construct it using ConduitM primitives. Something with signature
ConduitM i a m r1 -> ConduitM a o m r2 -> ConduitM i o m (Maybe r1, r2)
Looks like a primitive function with this signature would be a good addition for the conduit library.
Nevertheless, #danidiaz's suggestion about StateT lead me to the following generic solution that lifts the whole computation to WriterT internally in order to remember the output of the first conduit, if it's reached:
import Control.Monad
import Control.Monad.Trans
import Control.Monad.Trans.Writer
import Data.Conduit
import Data.Monoid
import Data.Void
difficultFunction :: (Monad m)
=> ConduitM () a2 m r1 -> ConduitM a2 Void m r2
-> m (r2, Maybe r1)
difficultFunction l r = liftM (fmap getLast) $ runWriterT (l' $$ r')
where
l' = transPipe lift l >>= lift . tell . Last . Just
r' = transPipe lift r
(untested!)

This would be useful for example if the DataAndConclusions were
incrementally computing an MD5 and such MD5 was only needed if the
entire message was processed by the downstream
Instead of relying on the return value of the upstream conduit, in this case perhaps you could accumulate the ongoing MD5 computation in a StateT layer beneath ConduitM, and access it after running the conduit.
The other part of the puzzle is detecting that the producer has finished first. Sinks can detect upstream end-of-input in await calls. You could write a Sink that notifies you of upstream termination in its own result type, perhaps with a Maybe.
But what if you are given a Sink that doesn't already do that? We would need a function like Sink i m r -> Sink i m (Maybe r). "Given a Sink that may halt early, return a new Sink that returns Nothing if upstream finishes first". But I don't know how to write that function.
Edit: This conduit sets an IORef to True when it detects upstream termination:
detectUpstreamClose :: IORef Bool -> Conduit i IO i
detectUpstreamClose ref = conduit
where
conduit = do
m <- await
case m of
Nothing -> liftIO (writeIORef ref True)
Just i -> do
yield i
conduit
detectUpstreamClose could be inserted in a pipeline, and the IORef could be checked afterwards.

Haskell: Resume monadic computation inside IO

I'm trying to "resume" a monadic computation from within IO and fearing that I may be out of luck. The situation is the following:
ioBracketFoo :: (a - > IO b) -> IO b
withBar :: MonadIO m => (a -> m b) -> m b
withBar action = liftIO $ ioBracketFoo $ \foo -> runMagic (action f)
Basically I want to resume my (unknown) monadic computation from within ioBracketFoo. If it were not a bracketing function then I'd be able to get the resource using res <- liftIO getFoo and release it later, and I wouldn't have to resume my monadic computation from within IO.
Is there any other creative use of lift or similar to make this possible?

This problem is sloved by MonadBaseControl. MonadBaseControl provides the functions to store and restart a monadic computation. You'll require an additional dependency to MonadBaseControl, which will prevent unstorable monads from beeing used in your bracket-funciton, for example
There is a tutorial on fp-complete, that should answer all basic questions.

Is it possible to run several instances of hint in parallel?

Is there any way to start two hint interpreters and at runtime & subsequently assign smaller computations to either one or the other? When I invoke hint for a small expression (e.g. typed into a website) then, - without reliable testing -, it seems to me as if the time to start/load hint is approximately one second. If the instance is already started that second would be shaved.
The hint seems to have no function where I can start it and keep it nicely pending for later use.
(Auto)Plugins would be a further option of course but I think that is more suitable for modules and less elegant for smaller computations.

The GHC api, which hint is implemented in terms of (the various plugin packages are, too), does not support concurrent use.
You can leave hint running, though. It's an instance of MonadIO.
interpreterLoop :: (MonadIO m, Typeable) a => Chan ((MVar a, String)) -> InterpreterT m ()
interpreterLoop ch = do
(mvar, command) <- liftIO $ readChan ch
a <- interpret command $ argTypeWitness mvar
liftIO $ putMVar mvar a
interpreterLoop ch
where
argTypeWitness :: MVar a -> a
argTypeWitness = undefined -- this value is only used for type checking, never evaluated
runInLoop :: Typeable a => Chan ((MVar a, String)) -> String -> IO a
runInLoop ch command = do
mvar <- newEmptyMVar
writeChan ch (mvar, command)
takeMVar mvar
(I didn't test this, so I may have missed a detail or two, but the basic idea will work.)

Composable atomic-like operations

I want to compose operations that may fail, but there is a way to roll back.
For example - an external call to book a hotel room, and an external call to charge a credit card. Both of those calls may fail such as no rooms left, invalid credit card. Both have ways to roll back - cancel hotel room, cancel credit charge.
Is there a name for this type of (not real) atomic. Whenever i search for haskell transaction, I get STM.
Is there an abstraction, a way to compose them, or a library in haskell or any other language?
I feel you could write a monad Atomic T which will track these operations and roll them back if there is an exception.
Edit:
These operations may be IO operations. If the operations were only memory operations, as the two answers suggest, STM would suffice.
For example booking hotels would via HTTP requests. Database operations such as inserting records via socket communication.
In the real world, for irreversible operations there is a grace period before the operation will be done - e.g. credit cards payments and hotel bookings may be settled at the end of the day, and therefore it is fine to cancel before then.

This is exactly the purpose of STM. Actions are composed so that they succeed or fail together, automatically.
Very similar to your hotel room problem is the bank transaction example in Simon Peyton-Jones's chapter in "Beautiful Code": http://research.microsoft.com/en-us/um/people/simonpj/papers/stm/beautiful.pdf

If you need to resort to making your own monad, it will look something like this:
import Control.Exception (onException, throwIO)
newtype Rollbackable a = Rollbackable (IO (IO (), a))
runRollbackable :: Rollbackable a -> IO a
runRollbackable (Rollbackable m) = fmap snd m
-- you might want this to catch exceptions and return IO (Either SomeException a) instead
instance Monad Rollbackable where
return x = Rollbackable $ return (return (), x)
Rollbackable m >>= f
= do (rollback, x) <- m
Rollbackable (f x `onException` rollback)
(You will probably want Functor and Applicative instances also, but they're trivial.)
You would define your rollbackable primitive actions in this way:
rollbackableChargeCreditCard :: CardNumber -> CurrencyAmount -> Rollbackable CCTransactionRef
rollbackableChargeCreditCard ccno amount = Rollbackable
$ do ref <- ioChargeCreditCard ccno amount
return (ioUnchargeCreditCard ref, ref)
ioChargeCreditCard :: CardNumber -> CurrencyAmount -> IO CCTransactionRef
-- use throwIO on failure
ioUnchargeCreditCard :: CCTransactionRef -> IO ()
-- these both just do ordinary i/o
Then run them like so:
runRollbackable
$ do price <- rollbackableReserveRoom roomRequirements when
paymentRef <- rollbackableChargeCreditCard ccno price
-- etc

If your computations could be done only with TVar like things then STM is perfect.
If you need a side effect (like "charge Bob $100") and if there is a error later issue a retraction (like "refund Bob $100") then you need, drumroll please: Control.Exceptions.bracketOnError
bracketOnError
:: IO a -- ^ computation to run first (\"acquire resource\")
-> (a -> IO b) -- ^ computation to run last (\"release resource\")
-> (a -> IO c) -- ^ computation to run in-between
-> IO c -- returns the value from the in-between computation
Like Control.Exception.bracket, but only performs the final action if there was an
exception raised by the in-between computation.
Thus I could imagine using this like:
let safe'charge'Bob = bracketOnError (charge'Bob) (\a -> refund'Bob)
safe'charge'Bob $ \a -> do
rest'of'transaction
which'may'throw'error
Make sure you understand where to use the Control.Exception.mask operation if you are in a multi-threaded program and try things like this.
And I should emphasize that you can and should read the Source Code to Control.Exception and Control.Exception.Base to see how this is done in GHC.

You really can do this with the clever application of STM. The key is to separate out the IO parts. I assume the trouble is that a transaction might appear to succeed initially, and fail only later on. (If you can recognize failure right away, or soon after, things are simpler):
main = do
r <- reserveHotel
c <- chargeCreditCard
let room = newTVar r
card = newTVar c
transFailure = newEmptyTMVar
rollback <- forkIO $ do
a <- atomically $ takeTMVar transFailure --blocks until we put something here
case a of
Left "No Room" -> allFullRollback
Right "Card declined" -> badCardRollback
failure <- listenForFailure -- A hypothetical IO action that blocks, waiting for
-- a failure message or an "all clear"
case failures of
"No Room" -> atomically $ putTMVar (Left "No Room")
"Card Declined" -> atomically $ putTMVar (Right "Card declined")
_ -> return ()
Now, there's nothing here that MVars couldn't handle: all we're doing is forking a thread to wait and see if we need to fix things. But you'll presumably be doing some other stuff with your card charges and hotel reservations...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Getting parallel IO while accounting for failure - haskell

Related

Apparent redundant calls in a IO monad?

Haskell Conduit: is it possible to optionally have the result of a source?

Haskell: Resume monadic computation inside IO

Is it possible to run several instances of hint in parallel?

Composable atomic-like operations

Categories

Resources