Haskell Thread Communication Pattern Scenario

Haskell Thread Communication Pattern Scenario - multithreading

You have two threads, a and b. Thread a is in a forever loop, listening on a blocking socket 1. Thread b is also in a forever loop, listening on blocking socket 2. Both socket 1 and socket 2 may return data at arbitrary times, so Thread a may be sleeping forever waiting for data whereas Thread b constantly gets data from the socket and goes on with its processing. That's the background.
Now suppose they need to share a dictionary. When Thread a gets some data (if ever) it adds a key value pair into the dictionary after some processing, and then continues to wait for more data. When Thread b receives data from its socket it first queries the dictionary to see if there is information related to the data it has received before going on with its processing. There are no deletions to the dictionary, only inserts and queries (I'd be interested if this makes a difference in the end solution).
In a standard imperative language like python or c, this is pretty easy to do by making the dictionary available in both scopes and only querying it after a thread has acquired a lock, so Thread B always sees the most (well almost) up to date dictionary.
In Haskell, I seem to be struggling to come up with a good implementation of this pattern. MVars, can only have one item at a time so it can't be that Thread a puts in the dictionary, since a new update might occur and it would not be able to push that new dictionary until Thread b fetched it from the MVar. On the other hand if thread b uses an MVar to send a ready signal "ok!" to thread a, it may be the case that Thread a is sleeping on its read socket, so it would be unable to send back data until its read socket unblocked! There are also channels, but that seems messy since I would have to keep sending new dictionaries and Thread B would discard all but the last one.
The alternative solution that would work is to simply send the updates down a channel, and have thread B construct the dictionary for itself. However I'm wondering if there are better alternative solutions.
Thanks for taking the time to read this very long question!

You can use an MVar in the following way:
When thread A gets new data, it tries to get the dictionary with takeMVar. When that succeeds, it updates the dictionary and puts it back into the MVar
When thread B gets data, it tries to get the dictionary with takeMVar - in the above scenario where A seldom gets data that should succeed rather quickly on average. Then it does the lookup and puts the dictionary back.
As hammar pointed out, it's probably better to not directly use takeMVar and putMVar but rather wrap them in modifyMVar_ resp. modifyMVar to not leave the MVar empty if one thread gets an exception while using the dictionary.
In thread A, something like
modifyMVar_ mvar (\dict -> putMVar mvar (insert newStuff dict))
in thread B all you need is a simple readMVar (thanks to #hammar again for pointing that out).

Related

Sending variables from one python thread to another

Lets say I have a function that will run in its own thread since its gettign serial data through a port.
def serialDataIncoming ():
device = Radar()
device.connect(port 1, baudrate 256000)
serialdata = device.startscan
for count, scan in enumerate(serialdata):
distance = device.distance
sector = device.angle
Now I want to run this in its own thread
try:
thread.start_new_thread(serialDataIncoming())
except:
# error handling here
now , I want to add to the code of serialDataIncoming(), a line where I send the distance and sector to another function to be processed and then send somewhere else, now here is this issue, the data incoming from "device" is continusly being sent, I can experience a delay or even lose some data if I lose some time inside the loop for another loop, so I want to create a new thread and from that thread run a function that will receive data from the first thread and process it and do whatever.
def dataProcessing():
# random code here where I process the data
However my issue is , how do I send both variables from one thread to the second thread, in my mind within multiple threads the second thread would have to wait until it receives variables and then start working, its going to be send a lot of data at the same time so I might have to introduce a third thread that would hold that data and then send it to the thread that processes.
So the question is basically that, how would I write in python sending 2 variables to another thread, and how would that be written in the function being used on the second thread?

To pass arguments to the thread function you can do:
def thread_fn(a, b, c):
print(a, b, c)
thread.start_new_thread(thread_fn, ("asdsd", 123, False))
The list of arguments must be a tuple or list. However in Python only one thread is actually running at a time so it may actually be more reliable (and simpler) to work out a way to do this with one thread. From the sounds of it you are polling the data so this is not like file access where the OS will notify the thread when it can wake up again once the file operation has completed (hence you wont get the kind of gains you would from multithreaded file access.)

What's the best way to exit a Haskell program?

I've got a program which uses several threads. As I understand it, when thread 0 exits, the entire program exits, regardless of any other threads which might still be running.
The thing is, these other threads may have files open. Naturally, this is wrapped in exception-handling code which cleanly closes the files in case of a problem. That also means that if I use killThread (which is implemented via throwTo), the file should also be closed before the thread exits.
My question is, if I just let thread 0 exit, without attempting to stop the other threads, will all the various file handles be closed nicely? Does any buffered output get flushed?
In short, can I just exit, or do I need to manually kill threads first?

You can use Control.Concurrent.MVar to achieve this. An MVar is essentially a flag which is either ''empty'' or "full". A thread can try to read an MVar and if it is empty it blocks the thread. Wherever you have a thread which performs file IO, create an MVar for it, and pass it that MVar as an argument. Put all the MVars you create into a list:
main = do
let mvars = sequence (replicate num_of_child_threads newEmptyMVar)
returnVals <- sequence (zipWith (\m f -> f m)
mvars
(list_of_child_threads :: [MVar -> IO a]))
Once a child thread has finished all file operations that you are worried about, write to the MVar. Instead of writing killThread you can do
mapM_ takeMVar mvars >> killThread
and where-ever your thread would exit otherwise, just take all the MVars.
See the documentation on GHC concurrency for more details.

From my testing, I have discovered a few things:
exitFailure and friends only work in thread 0. (The documentation actually says so, if you go to the trouble of reading it. These functions just throw exceptions, which are silently ignored in other threads.)
If an exception kills your thread, or your whole program, any open handles are not flushed. This is excruciatingly annoying when you're desperately trying to figure out exactly where your program crashed!
So it appears it if you want your stuff flushed before the program exits, then you have to implement this. Just letting thread 0 die doesn't flush stuff, doesn't throw any exception, just silently terminates all threads without running exception handlers.

Implement main server loop in Haskell?

What is the generally accepted way to implement the main loop of a server that needs to wait on a heterogeneous set of events? That is the server should wait (not busywait) until one of the following occurs:
new socket connection
data available on an existing socket
OS signal
third-party library callbacks

I think you're thinking in terms of a C paradigm with a single thread, nonblocking I/O, and a select() call.
You can manage to write something like that in Haskell, but Haskell has much more to offer:
lightweight threads
safe and efficient concurrent data primitives like Mvar and Chan
the Big Gun: Software Transactional Memory
I recommend you fork a new thread for every separate point of contact with the outside world, and keep everything coordinated with STM.

Use takeMVar and putMVar to synchronize between threads. They generally block the thread if operation is not permitted.
Read ghc docs.

I'd like to make it clear I think the two solutions posted first are better than this one for the specific problem you have, but here's a way to solve the type of problem you presented.
A simple way round this is to take your definitions like
data SocketConn = ....
data DataAvail = ...
data OSSignal = ...
data Callback = ...
and define the unsimplified version of
data ServerEvent = Sok SocketConn | Dat DataAvail | Sig OSSignal | Call Callback
handleEvent :: ServerEvent -> IO ()
handleEvent (Soc s) = ....
handleEvent (Dat d) = ....
handleEvent (Sig o) = ....
handleEvent (Call c) = ....
Like I say, read up on the other answers!

Software Transactional Memory (STM) is the main way to do a multi-way wait.
However, by the looks of things, in your case you probably just want to spawn a seperate Haskell thread for each task, and let each such thread block while there's nothing happening.
You wouldn't want to create a thousand OS threads, but a thousand Haskell threads is no trouble at all.
(If these threads need to coordinate from time to time, then again, STM is probably the simplest, most reliable way to do that.)

A way to form a 'select' on MVars without polling

I have two MVars (well an MVar and a Chan). I need to pull things out of the Chan and process them until the other MVar is not empty any more. My ideal solution would be something like the UNIX select function where I pass in a list of (presumably empty) MVars and the thread blocks until one of them is full, then it returns the full MVar. Try as I might I can think of no way of doing this beyond repeatedly polling each MVar with isEmptyMVar until I get false. This seems inefficient.
A different thought was to use throwTo, but it interrupts what ever is happening in the thread and I need to complete processing a job out the the Chan in an atomic fashion.
A final thought as I'm typing is to create a new forkIO for each MVar which tries to read its MVar then fill a newly created MVar with its own instance. The original thread can then block on that MVar. Are Haskell threads cheap enough to go running that many?

Haskell threads are very cheap, so you could solve it that way, but it sounds like STM would be a better fit for your problem. With STM you can do
do var <- atomically (takeTMVar a `orElse` takeTMVar b)
... do stuff with var
Because of the behavior of retry and orElse, this code tries to get a, then if that fails, get b. If both fail, it blocks until either of them is updated and tries again.
You could even use this to make your own rudimentary version of select:
select :: [TMVar a] -> STM a
select = foldr1 orElse . map takeTMVar

How about using STM versions, TChan and TVar, with the retry and orElse behavior?
Implementing select is one of STM's nice capabilities. From "Composable Memory Transactions":
Beyond this, we also provide orElse,
which allows them to be composed as alternatives, so that
the second is run if the ﬁrst retries (Section 3.4). This ability allows threads to wait for many things at once, like the
Unix select system call – except that orElse composes well,
whereas select does not.
orElse in RWH.
The STM package
Papers on Haskell's STM

Removing the contents of a Chan or MVar in a single discrete step

I'm writing a discrete simulation where request values from multiple threads accumulate in a centralized queue. Every n milliseconds, a manager wakes up to process requests. When the manager wakes up, it should retrieve all of the contents of the central queue in a single discrete step. While processing these, any client threads attempting to submit to the queue should block. When processing completes, the queue reopens and the manager goes back to sleep.
What's the best way to do this? The retry behavior of STM isn't really what I want. If I use a Chan or MVar, there's no way to prevent clients from enqueuing additional requests during processing. One approach is to use an MVar as a mutex on a Chan holding the queue. Are there other ways to do this?

I'd have to benchmark under your expected contention levels to know exactly what the best solution is, but here's my guess.
Use an MVar containing [Item], whatever your item type is. Initialize the MVar with with newMVar []. To add an element to the central list, use modifyMVar_ (return . (item :)), where item is what you are adding to the list. Use takeMVar at the start of the processing pass, and putMVar [] and the end of it.
First, note that this isn't a queue, internally. If you want to handle things in the order they were added, reverse the list after extracting it.
Second, so long as these are the only operations you perform on the MVar, this is free of race conditions. That's because the MVar is initialized full, and each operation is "take the contents of the MVar, put something else in." Operations may block while waiting for the latter part of that, but this can't deadlock and there will be no lost updates.

Isn't there a problem with MVar [a] though? When there are no items to be read we want readers to be blocked, but since the MVar actually contains [] this doesn't happen.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string