Multithreading - Does this new design cause any data races?

Multithreading - Does this new design cause any data races? - multithreading

I figured it would be a lot easier if I drew a picture of my problem. Here it is:
Everything that is black in the diagram is part of the old design. Everything that is blue is part of the new design. Basically, I need to add a new thread (Worker Thread C) that will handle most of the work that Worker Thread B used to do. Worker Thread A is listening for real time updates from an external application. When he receives an update, he posts a message to Worker Thread B. Worker Thread B will set its copy of the new data (he still needs it in the new design) and then notify the GUI Thread as well as Worker Thread C that new data has arrived.
The user will send a request from the GUI to the new thread (Worker Thread C). Worker Thread C will process the request using the last received copy of the data that originally came from Worker Thread A. So my question is: Will Worker Thread C always be using the latest copy of the data when processing a request with this new design? What if Worker Thread B is too slow to update and then the user submits a request from the GUI? Thanks!

If I'm not mistaken, worker A is conceptually different than workers B and C, right? It rather looks like B and C handle user requests in the background in order to not block the UI. So, there could be a whole list of these background workers that perform UI operations or even none, while there will always be a worker A that pulls/receives updates.
Now, what I would do is that the worker A sends new data to the UI. The UI then uses this data in the next request. When it starts one of the workers like B or C, it just passes the data along with the other info that tells the thread what to do.
Note that you need to take care that you don't modify the data in different threads. The easiest way is to always copy the data when passing it between different parts, but that is often too expensive. Another easy way is to make the data constant. In worker A, you use a unique_ptr<Data> to accumulate the update and then send that data as a shared_ptr<Data const> to the UI thread. From that point on, this data is immutable (the compiler makes sure that you don't change it by accident) so it can be shared between threads without any further lock.
When creating a worker for a background operation, you pass in the shared_ptr<Data const>. If it needs to modify that data, it would first have to copy it, but usually that isn't something that can't be avoided.
Notes:
The basic idea is that you have either shared and immutable data or exclusive-owned and mutable data.
The data received from thread A is stored in the UI here, but conceptually it is part of the model in an MVC design. There, you only keep a reference to the last update, the earlier ones can be discarded. The worker thread still using the data won't notice, because the data is refcounted using shared_ptr.
At some point, I would consider aborting the background workers. Computing anything based on old data is not necessary, so it could be worthwhile to not waste time on it but to restart based on recent data.
I'm assuming that the channels between the threads (message queues) are synchronized. If they are already synchronized, that is all that you need.
If you're using C++98, you will need auto_ptr instead of unique_ptr and Boost's shared_ptr.

Related

QSemaphore - implementing overwrite policy

I want to implement ring buffer for classic Producer--Consumer interaction. In the future both P and C will be implemented as permanent threads running during data processing task, and GUI will be the third thread only for displaying actual data and coordinate starts and stops of data processing by user interaction. C can be quite slow to be able to fully process all incoming data, but only a bit and for a short periods of time. So I want to just allocate ring buffer of several P's MTUs in size, but in any case, if C will be too slow to process existing data it's okay to loose old data in favor of new one (overwrite policy).
I've read QSemaphore example in Qt help and realized that by usage of semaphore's acquires and releases I can only implement discard policy, because acquiring of specified chunk in queue will block until there are no free space.
Are there any ways of implementing overwrite policy together with QSemaphore or I just need to go and implement another approach?

I've came to this solution. If we should push portion of the src data to the ring buffer at any costs (it's ok to drop possible newly incoming data) we should use acquire() in Producer part - that would provide us discard policy. In case we need overwrite policy we should use tryAcquire() in Producer - thus at the very first possible moment of time only the newest data will be pushed to the ring buffer

ActiveMQ CMS: Is there a way to use it without threading?

I took the sample code from Apache here: https://activemq.apache.org/components/cms/example
(The producer section specfically) and tried to rewrite it so it doesn't create any threads for producing. And instead, in my program's main thread, creates a producer object and sets up the connection, session, destination, and so on. Then it sends messages using a message producer. This is all done in a singleton so that my program just has one Producer object and just goes to it whenever it needs to dump any message to one of my queues. This example code seems to create a producer for every thread, set it up everytime, just to send a message, then deletes everything. And it does this for every time you want to want to produce something from your program.
I am crashing right when I try to call send on a message producer with any given message. I found out after some digging that after the send call it tries to lock a mutex and enter a critical section. I guess this is for threading? I don't use threads at all in my code so I guess it crashes because of that... Does anyone know a way to bypass this? I don't want to use multiple threads, I won't need to worry about two threads trying to call send at the same time or whatever the problem is that using mutexes is trying to solve.

You don't need to create a thread to run the producer in but internally the library is going to use a couple of threads as that is necessary for meeting the API requirements and also just because you don't use multiple threads doesn't means others won't so the mutex is an internal requirement.
You are free to modify the example to only create a producer inside the main thread of the application, the example uses two threads because it is acting as both a producer and consumer.
One likely cause of the error you are receiving is because you did not initialize the ActiveMQ-CPP library:
activemq::library::ActiveMQCPP::initializeLibrary();

locking between 2 user threads

I have a main thread that creates/destroys objects. Let's name the object 'f'.
Now, every time this object is created it is added to the tailqueue of another object - say 'mi' . conversely when this object is deleted.
Now, there is another thread that runs every second, that tries to gather say statistics for this object 'f'. So it basically walks through all the max possible instance of 'mi' (say 2048)and then for each such 'mi', it gathers all the 'f' objects attached to it, sends a cmd down to the lower layer which emits some values corresponding to these objects. Now it must update the corresponding 'f' objects with these values.
Now the concern is what IF one of these 'f' objects gets deleted by the main thread while this walk is happening every 1s ?
Intuitively one would think of having a lock at the 'mi' level that is acquired before beginning the walk and released post the walk /update of all the 'f' objects belonging to a particular instance of 'mi', correct?
But the only hitch with this is that there could be 10,000's and even millions of 'f' objects tied to this instance of 'mi'.
The other requirement being that the main thread performance of creating/destroying these 'f' objects should be high i.e at the rate of atleast 10000 objects per second....
So given that, i'm not sure if it's feasible to have this per 'mi' object lock? Or am i overestimating the side effects of lock contention?
Any other ideas ?

Now the concern is what IF one of these 'f' objects gets deleted by
the main thread while this walk is happening every 1s ?
If an f object gets deleted while the other thread is trying to use it, undefined behavior will be invoked and you will probably end up spending some hours debugging your program to try to figure out why it is occasionally crashing. :) The trick is to make sure that you never delete any f while the other thread might be using it -- typically that would mean that your main thread needs to lock the mi's mutex before removing the f from its queue -- once the f is no longer in the queue, you can release the mutex before deleting the f if you want to, since at that point the other thread will not be able to access the f anyway.
i'm not sure if it's feasible to have this per 'mi' object lock?
It's feasible, as long as you don't mind your main thread occasionally getting held off (i.e. blocked waiting in a mutex::lock() method-call) until your other thread finishes iterating through the mi's queue and releases the mutex. Whether that holdoff-time is acceptable or not will depend on what the latency requirements of your main thread are (e.g. if it's generating a report, then being blocked for some number of milliseconds is no problem; OTOH if it is operating the control surfaces on a rocket in flight, being blocked for any length of time is unacceptable)
Any other ideas ?
My first idea is to get rid of the second thread entirely -- just have main thread call the statistics-collection function directly once per second, instead. Then you don't have to worry about mutexes or mutex-contention at all. This does mean that your main thread won't be able to perform its primary function during the time it is running the statistics-collection function, but at least now its "down time" is predictable rather than being a random function of which mi objects the two threads happen to try to lock/access at any given instant.
If that's no good (i.e. you can't tolerate any significant hold-off time whatsoever), another approach would be to use a message-passing paradigm rather than a shared-data paradigm. That is, instead of allowing both threads direct access the same set of mi's, use a message-queue of some sort so that the main thread can take a mi out of service and send it over to the second thread for statistics-gathering purposes. The second thread would then scan/update it as usual, and when it's done, pass it back (via a second message-queue) to the primary thread, which would put it back into service. You could periodically do this with various mi's to keep statistics updated on each of them without every requiring shared access to any of them. (This only works if your main thread can afford to go without access to certain mi's for short periods, though)

non-blocking producer and consumer using .NET 2.0

In our scenario,
the consumer takes at least half-a-second to complete a cycle of process (against a row in a data table).
Producer produces at least 8 items in a second (no worries, we don't mind about the duration of a consuming).
the shared data is simply a data table.
we should never ask producer to wait (as it is a server and we don't want it to wait on this)
How can we achieve the above without locking the data table at all (as we don't want producer to wait in any way).
We cannot use .NET 4.0 yet in our org.

There is a great example of a producer/consumer queue using Monitors at this page under the "Producer/Consumer Queue" section. In order to synchronize access to the underlying data table, you can have a single consumer.
That page is probably the best resource for threading in .NET on the net.

Create a buffer that holds the data while it is being processed.
It takes you half a second to process, and you get 8 items a second... unless you have at least 4 processors working on it, you'll have a problem.
Just to be safe I'd use a buffer at least twice the side needed (16 rows), and make sure it's possible with the hardware.

There is no magic bullet that is going to let you access a DataTable from multiple threads without using a blocking synchronization mechanism. What I would do is to hold the lock for as short a duration as possible. Keep in mind that modifying any object in the data table's hierarchy will require locking the whole data table. This is because modifying a column value on a DataRow can change the internal indexing structures inside the parent DataTable.
So what I would do is from the producer acquire a lock, add a new row, and release the lock. Then in the conumser you will acquire the same lock, copy data contained in a DataRow into a separate data structure, and then release the lock immediately. Now, you can operate on the copied data without synchronization mechanisms since it is isolated. After you have completed the operation on it you will again acquire the lock, merge the changes back into the DataRow, and then release the lock and start the process all over again.

C++/CLI efficient multithreaded circular buffer

I have four threads in a C++/CLI GUI I'm developing:
Collects raw data
The GUI itself
A background processing thread which takes chunks of raw data and produces useful information
Acts as a controller which joins the other three threads
I've got the raw data collector working and posting results to the controller, but the next step is to store all of those results so that the GUI and background processor have access to them.
New raw data is fed in one result at a time at regular (frequent) intervals. The GUI will access each new item as it arrives (the controller announces new data and the GUI then accesses the shared buffer). The data processor will periodically read a chunk of the buffer (a seconds worth for example) and produce a new result. So effectively, there's one producer and two consumers which need access.
I've hunted around, but none of the CLI-supplied stuff sounds all that useful, so I'm considering rolling my own. A shared circular buffer which allows write-locks for the collector and read locks for the gui and data processor. This will allow multiple threads to read the data as long as those sections of the buffer are not being written to.
So my question is: Are there any simple solutions in the .net libraries which could achieve this? Am I mad for considering rolling my own? Is there a better way of doing this?

Is it possible to rephrase the problem so that:
The Collector collects a new data point ...
... which it passes to the Controller.
The Controller fires a GUI "NewDataPointEvent" ...
... and stores the data point in an array.
If the array is full (or otherwise ready for processing), the Controller sends the array to the Processor ...
... and starts a new array.
If the values passed between threads are not modified after they are shared, this might save you from needing the custom thread-safe collection class, and reduce the amount of locking required.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string