Using the Par monad with STM and Deterministic IO - haskell

I'm in the process of writing a report for an assignment in which I implemented a concurrent multicore branch and bound algorithm using the STM package and there was an issue I've come up against.
The implementation which uses STM is obviously in the IO monad since it both uses STM's 'atomically' and Concurrent's 'forkIO', but it is deterministic. Despite the use of a shared memory variable, the final result of the function will always be the same for the same input.
My question is, what are my options when it comes to getting out of IO, besides 'unsafePerformIO'? Should I even try and get it out of the IO monad, since the use of multiple cores could potentially affect other concurrent code that doesn't have the same guarantee for determinism.
I've heard of the Par monad package (although not used it), but STM exists in the IO monad, and in order to get thread safe global variables my only alternative to STM is MVars (that I'm aware of), which also exist in the IO monad.

Please do not use unsafePerformIO with STM. STM has side-effects under the hood, using unsafePerformIO hides these side effects, making your code deceptively impure and thus hard or dangerous to refactor. Try harder to see if the parallel package will help you.
One example of unsafe STM operations being unsafe is when you end up using the "pure" STM operation nested inside of another (perhaps by a higher level library). For example, the below code loops (terminates with <loop>) due to the nested STM operations. I recall older GHC versions crashing but can't seem to reproduce that behavior right now with GHC 7.0.1.
import Control.Concurrent
import Control.Concurrent.STM
import System.IO.Unsafe
import GHC.Conc.Sync
main = newTVarIO 5 >>= runComputation >>= print
runComputation :: TVar Int -> IO Int
runComputation tv = atomically $ do
let y = getFiveUnsafe tv + 1
writeTVar tv y
return y
getFiveUnsafe tv = unsafePerformIO . atomically $ do
x <- readTVar tv
writeTVar tv (x + 5)
return x
(I welcome other people editing and adding more convincing examples - I believe better ones exist)

STM and it's related functions cannot be used from unsafePerformIO but forkIO can be, and the new thread can call atomically safely. You can do something like this:
purifiedAlgorithm = unsafePerformIO $ do
rr <- newEmptyMVar
forkIO $ concurrentAlgorithm >> putMVar rr
takeMVar rr

Related

Can I use IORef as a kind of "mutex" in this way?

I have an operation that must be executed in the mutual exclusive way. In other languages I can do something like (Python like pseudocode):
with myLock:
# here lock is acquired
do mutual exclusive operation
# here lock is released
Sure, I can do it in Haskell with MVar: to take it and then to put it back. But I want to do it with IORef as:
-- somewhere
duringOperation :: IORef Bool
.....
-- operation execution:
mayIStart <- atomicModifyIORef' duringOperation $ \case
-- tuple is treated as (lock/keep-locked, may I start the operation)
True -> (True, False)
False -> (True, True)
if mayIStart then do
-- here I am doing my operation in a mutual exclusive way
...
-- after completion I reset duringOperation flag
atomicWriteIORef' _duringOperation False
else
-- something else...
It looks like typical "atomic" or "synchronized" flag in other languages. But I am not sure how it's safe in Haskell and with an usage of IORef for such goal. Again, my idea is to do it with IORef. Is it really safe (the operation will be really mutual exclusive)?
Yes, it's safe. That is precisely what the atomic in atomicModifyIORef means. From the Fine Documentation:
Atomically modifies the contents of an IORef.
This function is useful for using IORef in a safe way in a multithreaded program. If you only have one IORef, then using atomicModifyIORef to access and modify it will prevent race conditions.
I believe you do not need to write atomically when releasing the "mutex", i.e. writeIORef duringOperation False should be fine -- believing that the operation is running when it isn't is safe, just less efficient.

When using bracket with a Ptr as resource, can it be replaced with ForeignPtr?

My code uses a resource that can be described as a pointer; I'll use a void pointer here for simplicity. The resource must be closed after the computation with it finishes, so the Control.Exception.bracket function is a natural choice to make sure the code won't leak if an error occurs:
run :: (Ptr () -> IO a) -> IO a
run action = bracket acquireResource closeResource action
-- no eta reduction for clarity
The downside of this pattern is that the resource will always be closed after action completes. AFAIU this means that it isn't possible to do something like
cont <- run $ \ptr -> do
a <- someAction ptr
return (\x -> otherActionUsingResource ptr a x)
cont ()
The resource will already be close by the time cont is executed. Now my approach is to use a ForeignPtr instead:
run' :: (ForeignPtr () -> IO a) -> IO a
run' action = do
ptr <- acquireResource
foreignPtr <- newForeignPtr closeResourceFunPtr ptr
action foreignPtr
Now it seems that this is roughly equivalent to the first version, minor typing differences and resource closing latency aside. However, I do wonder whether this is true, or if I miss something. Can some error conditions can lead to different outcomes with those two versions? Are ForeignPtr safe to use in this way?
If you want to do this, I'd recommend avoiding that run', which makes it look like you're going to close the resource. Do something like this instead.
acquire :: IO (ForeignPtr ())
acquire action = mask $ \unmask -> do
ptr <- unmask acquireResource
newForeignPtr closeResourceFunPtr ptr
As Carl pointed out in a comment, it's important that exceptions be masked between acquiring the resource and installing the finalizer to close it; otherwise an asynchronous exception could be delivered in between and cause a resource leak.
The challenge with anything of this sort is that you're leaving it up to the user and/or garbage collector to make sure the resource gets freed. The original Ptr-based code made the lifespan explicit. Now it's not. Many people believe that explicit lifespans are better for critical resources. What ForeignPtr gives you, automatic finalization by the GC, these people consider poor design. So think carefully! Is this a cheap resource (like a little bit of malloced memory) that you just want to free eventually? Or is it something expensive (like a file descriptor) that you really want to be sure about?
Side note: Ptr () and ForeignPtr () aren't very idiomatic. Usually the type argument should be a Haskell type representing whatever is being pointed to.

Concurrency considerations between pipes and non-pipes code

I'm in the process of wrapping a C library for some encoding in a pipes interface, but I've hit upon some design decisions that need to be made.
After the C library is set up, we hold on to an encoder context. With this, we can either encode, or change some parameters (let's call the Haskell interface to this last function tune :: Context -> Int -> IO ()). There are two parts to my question:
The encoding part is easily wrapped up in a Pipe Foo Bar IO (), but I would also like to expose tune. Since simultaneous use of the encoding context must be lock protected, I would need to take a lock at every iteration in the pipe, and protect tune with taking the same lock. But now I feel I'm forcing hidden locks on the user. Am I barking up the wrong tree here? How is this kind of situation normally resolved in the pipes ecosystem? In my case I expect the pipe that my specific code is part of to always run in its own thread, with tuning happening concurrently, but I don't want to force this point of view upon any users. Other packages in the pipes ecosystem do not seem to force their users like either.
An encoding context that is no longer used needs to be properly de-initialized. How does one, in the pipes ecosystem, ensure that such things (in this case performing som IO actions) are taken care of when the pipe is destroyed?
A concrete example would be wrapping a compression library, in which case the above can be:
The compression strength is tunable. We set up the pipe and it runs along merrily. How should one best go about allowing the compression strength setting to be changed while the pipe keeps running, assuming that concurrent access to the compression codec context must be serialized?
The compression library allocated a bunch of memory off the Haskell heap when set up, and we'll need to call some library function to clean this up when the pipe is torn down.
Thanks… this might all be obvious, but I'm quite new to the pipes ecosystem.
Edit: Reading this after posting, I'm quite sure it's the vaguest question I've ever asked here. Ugh! Sorry ;-)
Regarding (1), the general solution is to change your Pipe's type to:
Pipe (Either (Context, Int) Foo) Bar IO ()
In other words, it accepts both Foo inputs and tune requests, which it processes internally.
So let's then assume that you have two concurrent Producers corresponding to inputs and tune requests:
producer1 :: Producer Foo IO ()
producer2 :: Producer (Context, Int) IO ()
You can use pipes-concurrency to create a buffer that they both feed into, like this:
example = do
(output, input) <- spawn Unbounded
-- input :: Input (Either (Context, Int) Foo)
-- output :: Output (Either (Context, Int) Foo)
let io1 = runEffect $ producer1 >-> Pipes.Prelude.map Right >-> toOutput output
io2 = runEffect $ producer2 >-> Pipes.Prelude.map Left >-> toOutput output
as <- mapM async [io1, io2]
runEffect (fromInput >-> yourPipe >-> someConsumer)
mapM_ wait as
You can learn more about the pipes-concurrency library by reading this tutorial.
By forcing all tune requests to go through the same single-threaded Pipe you can ensure that you don't accidentally have two concurrent invocations of the tune function.
Regarding (2) there are two ways you can acquire a resource using pipes. The more sophisticated approach is to use the pipes-safe library, which provides a bracket function that you can use within a Pipe, but that is probably overkill for your purpose and only exists for acquiring and releasing multiple resources over the lifetime of a pipe. A simpler solution is just to use the following with idiom to acquire the pipe:
withEncoder :: (Pipe Foo Bar IO () -> IO r) -> IO r
withEncoder k = bracket acquire release $ \resource -> do
k (createPipeFromResource resource)
Then a user would just write:
withEncoder $ \yourPipe -> do
runEffect (someProducer >-> yourPipe >-> someConsumer)
You can optionally use the managed package, which simplifies the types a bit and makes it easier to acquire multiple resources. You can learn more about it from reading this blog post of mine.

Ensure IO computations are run in a specific thread

I need to make sure that some actions are run on a specific OS thread. I wrote an API where this thread runs a loop listening to a TQueue and executes the given commands. From the API user side, there is an opaque value that is really just a newtype over this queue.
One problem is that what I really need is to embed arbitrary actions (type IO a), but I believe I can't directly exchange messages of that type. So I currently have something like this (pseudo code) :
makeSafe :: RubyInterpreter -> IO a -> IO (Either RubyError a)
makeSafe (RubyInterpreter q) a = do
mv <- newEmptyTMVarIO
-- embedded is of type IO (), letting me send this in my queue
let embedded = handleErrors a >>= atomically . putTMVar mv
atomically (writeTQueue q (SomeMessage embedded))
atomically (readTMVar mv)
(for more details, this is for the hruby package)
edit - clarifications :
Being able to send actions of type IO a would be nicer, but is not my main objective.
My main problem is that you can shoot yourself in the foot with this API, for example if there is a makeSafe call in the IO action that is passed as a parameter, this will hang.
My secondary problem is that this solution feels a bit contrived, and I wondered if there was a nicer/safer solution around.

A way to form a 'select' on MVars without polling

I have two MVars (well an MVar and a Chan). I need to pull things out of the Chan and process them until the other MVar is not empty any more. My ideal solution would be something like the UNIX select function where I pass in a list of (presumably empty) MVars and the thread blocks until one of them is full, then it returns the full MVar. Try as I might I can think of no way of doing this beyond repeatedly polling each MVar with isEmptyMVar until I get false. This seems inefficient.
A different thought was to use throwTo, but it interrupts what ever is happening in the thread and I need to complete processing a job out the the Chan in an atomic fashion.
A final thought as I'm typing is to create a new forkIO for each MVar which tries to read its MVar then fill a newly created MVar with its own instance. The original thread can then block on that MVar. Are Haskell threads cheap enough to go running that many?
Haskell threads are very cheap, so you could solve it that way, but it sounds like STM would be a better fit for your problem. With STM you can do
do var <- atomically (takeTMVar a `orElse` takeTMVar b)
... do stuff with var
Because of the behavior of retry and orElse, this code tries to get a, then if that fails, get b. If both fail, it blocks until either of them is updated and tries again.
You could even use this to make your own rudimentary version of select:
select :: [TMVar a] -> STM a
select = foldr1 orElse . map takeTMVar
How about using STM versions, TChan and TVar, with the retry and orElse behavior?
Implementing select is one of STM's nice capabilities. From "Composable Memory Transactions":
Beyond this, we also provide orElse,
which allows them to be composed as alternatives, so that
the second is run if the first retries (Section 3.4). This ability allows threads to wait for many things at once, like the
Unix select system call – except that orElse composes well,
whereas select does not.
orElse in RWH.
The STM package
Papers on Haskell's STM

Resources