Multi-threaded application, parrarel reading with write possibility - multithreading

For example, I have multi-threaded application which can be presented as:
Data bigData;
void thread1()
{
workOn(bigData);
}
void thread2()
{
workOn(bigData);
}
void thread3()
{
workOn(bigData);
}
There are few threads that are working on data. I could leave it as it is, but the problem is that sometimes (very seldom) data are modified by thread4.
void thread4()
{
sometimesModifyData(bigData);
}
Critical sections could be added there, but it would make no sense to multi-threading, because only one thread could work on data at the same time.
What is the best method to make it sense multi-threading while making it thread safe?
I am thinking about kind of state (sempahore?), that would prevent reading and writing at the same time but would allow parallel reading.

This is called a readers–writer lock. You could implement what is called a mutex to make sure no one reads when write is going on and no one writes when reads are going on. One way to solve the problem would be to have flags. If the writer is got something to modify, then switch on a lock. Upon which NO MORE readers will get to read and after all the current readers have finished, the writer will get to do its job and then again the readers read.

Related

Is this reader-writer lock implementation correct?

wondering if the following implementation of reader/writer problem correct.
We're using only one mutex and a count variable to indicate the num of readers.
read api:
void read() {
mutex.lock();
count ++;
mutex.unlock();
// Do read
mutex.lock();
count --;
mutex.unlock();
}
write api:
void write() {
while(1) {
mutex.lock();
if(count == 0) {
// Do write
mutex.unlock();
return;
}
mutex.unlock();
}
}
Looks like in the code:
Only one lock is used so there is no deadlock problem;
Writer can only write when count == 0 so there is no race conditions.
As for a read/write problem prior to reader, is there any problem for the above code? Looks like all the standard implementation uses two locks(eg. https://en.wikipedia.org/wiki/Readers%E2%80%93writers_problem#First_readers-writers_problem). If the above implementation seems correct, why are we using two locks in wiki? Thank you!
It's correct, but it will perform atrociously. Imagine if while a reader is trying to do work there are two waiting writers. Those two waiting writers will constantly acquire and release the mutex, saturating the CPU resources while the reader is trying to finish its work so that the system as a whole can make forward progress.
The nightmare scenario would be where the reader shares a physical core with one of the waiting writers. Yikes.
Correct, yes. Useful and sensible, definitely not!
One reason to use two locks is to prevent two writers from competing. A more common solution, at least in my experience, is to use a lock with a condition variable to release waiting writers or alternate phases.

Verifying thread safety with JUnit

I would like to test if my code is thread safe by calling it from multiple threads at the same time with different parameters.
Below is an example of how the code I want to test looks like
public void writeStringToFile(String fileName, String toBeWritten) {
//some implementation that junit should not care.
}
In the above code I want to verify if the code is thread safe if it is invoked with different file names from multiple threads at the same time.
How should I design my JUnit so that it is easy to understand and maintain?
If you changed the code to public void writeStringToOutputStream(OutputStream out, String toBeWritten) (or made this the actual implementation, and the original method just a thin wrapper), you could pass a FilterOutputStream that looks out for concurrent writes, possibly even blocking for a while so other potential concurrent writes can try to execute.
But this really only checks that the writes aren't concurrent, not that the class isn't thread-safe. If you detect concurrent writes, you could read back the data and make sure it's valid.

<Spring Batch> Why does making ItemReader thread-safe leads us to loosing restartability?

I have a multi-threaded batch job reading from a DB and I am concerned about different threads re-reading records as ItemReader is not thread safe in Spring batch. I went through SpringBatch FAQ section which states that
You can synchronize the read() method (e.g. by wrapping it in a delegator that does the synchronization). Remember that you will lose restartability, so best practice is to mark the step as not restartable and to be safe (and efficient) you can also set saveState=false on the reader.
I want to know why will I loose re-startability in this case? What has restartability got to do with synchronizing my read operations? It can always try again,right?
Also, will this piece of code be enough for synchronizing the reader?
public SynchronizedItemReader<T> implements ItemReader<T> {
private final ItemReader<T> delegate;
public SynchronizedItemReader(ItemReader<T> delegate) {
this.delegate = delegate;
}
public synchronized T read () {
return delegate.read();
}
}
When using an ItemReader with multithreads, the lack of restartability is not about the read itself. It's about saving the state of the reader which occurs in the update method. The issue is that there needs to be coordination between the calls to read() - the method providing the data and update() - the method persisting the state. When you use multiple threads, the internal state of the reader (and therefore the update() call) may or may not reflect the work that has been done. Take for example the FlatFileItemReader using a chunk size of 5 and running on multiple threads. You could have thread1 having read 5 items (time to update), yet thread 2 could have read an additional 3. This means that the call to update would save that 8 items have been read. If the chunk on thread 2 fails, the state would due incorrect and the restart would miss the three items that were already read.
This is not to say that it is impossible to write a thread safe ItemReader. However, as your example above illustrates, if delegate is a stateful ItemReader (implements ItemStream as well), the state will not be persisted correctly with calls to update (in fact, your example above doesn't even take the ItemStream aspect of stageful readers into account).
If you want make restartable your job, with parallel execution of items, you can save item, that reader read plus state of this item by yourself.

Synchronizing data access in different threads

I have a worker thread, which sends some data over TCP, taking that data from several other threads. I need to fill some data, having sort of a mutex over it, and then call another thread's method, which would then unlock the mutex when finished while caller thread would continue its own job.
I've first implemented this using Qt as follows:
Data globalData;
QMutex mutex;
void requestSend() // several such functions in other threads
{
mutex.lock(); // we want to change the data
globalData=fillData();
invokeMethod(workerClass,"work",Qt::QueuedConnection);
}
void work() // a slot in a class instanced in worker thread
{
sendData(globalData);
mutex.unlock(); // data is now available to be changed
}
This seems reasonable and even works, but then I found this in the QMutex documentation:
void QMutex::unlock ()
Unlocks the mutex. Attempting to unlock a mutex in a different thread to the one that locked
it results in an error. Unlocking a mutex that is not locked results in undefined behavior.
I have two questions:
What's the reason of such restriction to unlock in a different thread? (and why don't I see the error the doc says about?)
What should I use instead of QMutex to achieve what I'm trying to? Would QWaitCondition be an adequate replacement?
The purpose of the mutex is to ensure that only one thread can access the data at any one time. Therefore, it doesn't really make sense to lock in one thread and unlock the same mutex in another.
If you're finding it works, you're probably just lucky at the moment, but doesn't mean it won't cause you issues if the timing of threads changes.
I'm not quite sure exactly what you're trying to do, but it appears that you have various threads that can write to the globalData and as soon as you write to it, you want another thread to send the data before more data writes to the globalData.
What I suggest is to create a mutex around the writing of the data and just call a signal to send the data to the thread that will send the data. Being on different threads, the data will be copied anyway: -
void requestSend() // several such functions in other threads
{
QMutexLocker locker(&mutex);
globalData=fillData();
emit SendData(globalData); // send signal to the thread which will send the data
}
Note that QMutexLocker is used to ensure the lock is released, even if an exception should occur.
Don't be too concerned about the copying of data in signals and slots; Qt is very efficient, and will only create a "copy on write", due to implicit sharing, if you use its container objects. Even if it has to make the copy for passing the data between the threads, you shouldn't really worry about it, unless you can see a performance issue.
Finally, note that implicit sharing and multithreading can work happily together, as you can read here.

Effects of swapping buffers on concurrent access

Consider an application with two threads, Producer and Consumer.
Both threads are running approximately equally frequent, multiple times in a second.
Both threads access the same memory region, where Producer writes to the memory, and Consumer reads the current chunk of data and does something with it, without invalidating the data.
A classical approach is this one:
int[] sharedData;
//Called frequently by thread Producer
void WriteValues(int[] data)
{
lock(sharedData)
{
Array.Copy(data, sharedData, LENGTH);
}
}
//Called frequently by thread Consumer
void WriteValues()
{
int[] data;
lock(sharedData)
{
Array.Copy(sharedData, data, LENGTH);
}
DoSomething(data);
}
If we assume that the Array.Copy takes time, this code would run slow, since Producer always has to wait for Consumer during copying and vice versa.
An approach to this problem would be to create two buffers, one which is accessed by the Consumer, and one which is written to by the Producer, and swap the buffers, as soon as writing has finished.
int[] frontBuffer;
int[] backBuffer;
//Called frequently by thread Producer
void WriteValues(int[] data)
{
lock(backBuffer)
{
Array.Copy(data, backBuffer, LENGTH);
int[] temp = frontBuffer;
frontBuffer = backBuffer;
backBuffer = temp;
}
}
//Called frequently by thread Consumer
void WriteValues()
{
int[] data;
int[] currentFrontBuffer = frontBuffer;
lock(currentForntBuffer)
{
Array.Copy(currentFrontBuffer , data, LENGTH);
}
DoSomething(currentForntBuffer );
}
Now, my questions:
Is locking, as shown in the 2nd example, safe? Or does the change of references introduce problems?
Will the code in the 2nd example execute faster than the code in the 1st example?
Are there any better methods to efficiently solve the problem described above?
Could there be a way to solve this problem without locks? (Even if I think it is impossible)
Note: this is no classical producer/consumer problem: It is possible for Consumer to read the values multiple times before Producer writes it again - the old data stays valid until Producer writes new data.
Is locking, as shown in the 2nd example, safe? Or does the change of references introduce problems?
As far as I can tell, because reference assignment is atomic, this may be safe but not ideal. Because the WriteValues() method reads from frontBuffer without a lock or memory barrier forcing a cache refresh, there no guarantee that the variable will ever be updated with new values from main memory. There is then a potential to continuously read the stale, cached values of that instance from the local register or CPU cache. I'm unsure of whether the compiler/JIT might infer a cache refresh anyway based on the local variable, maybe somebody with more specific knowledge can speak to this area.
Even if the values aren't stale, you may also run into more contention than you would like. For example...
Thread A calls WriteValues()
Thread A takes a lock on the instance in frontBuffer and starts copying.
Thread B calls WriteValues(int[])
Thread B writes its data, moves the currently locked frontBuffer instance into backBuffer.
Thread B calls WriteValues(int[])
Thread B waits on the lock for backBuffer because Thread A still has it.
Will the code in the 2nd example execute faster than the code in the 1st example?
I suggest that you profile it and find out. X being faster than Y only matters if Y is too slow for your particular needs, and you are the only one who knows what those are.
Are there any better methods to efficiently solve the problem described above?
Yes. If you are using .Net 4 and above, there is a BlockingCollection type in System.Collections.Concurrent that models the Producer/Consumer pattern well. If you consistently read more than you write, or have multiple readers to very few writers, you may also want to consider the ReaderWriterLockSlim class. As a general rule of thumb, you should do as little within a lock as you can, which will also help to alleviate your time issue.
Could there be a way to solve this problem without locks? (Even if I think it is impossible)
You might be able to, but I wouldn't suggest trying that unless you are extremely familiar with multi-threading, cache coherency, and potential compiler/JIT optimizations. Locking will most likely be fine for your situation and it will be much easier for you (and others reading your code) to reason about and maintain.

Resources