Should I use CRITICAL_SECTION or not?

Should I use CRITICAL_SECTION or not? - multithreading

I have a program which has a Ui with which users choose the way to display and do small configurations. It also has a background procedure, which continuously reads data from the network and update the data to display.
Now I put them in one process:
background procedure:
STATE MainWindow::Rcv()
{
DeviceMAP::iterator dev;
for(dev= dev_map.begin(); dev!= dev_map.end(); dev++)
{
dev->second.rcvData();//receive data from the network, the time can be ignored.
BitLog* log = new BitLog();
dev->second.parseData(log);
LogItem* logItem = new LogItem();
logItem->time = QString::fromLocal8Bit(log->rcvTime.c_str());
logItem->name = QString::fromLocal8Bit(log->basicInfo.getName().c_str());
logItem->PIN = QString::fromLocal8Bit(log->basicInfo.getPIN().c_str()).toShort();
delete log;
add_logItem(logItem);
}
return SUCCESS;
}
add_logItem:
void MainWindow::add_logItem(LogItem* logItem)
{
writeToFile(logItem);
Device* r = getDevbyPIN(QString::number(logItem->PIN));
if(r == NULL)return;
devInfo_inside_widget::States state = logItem->state;
bool bool_list[portsNum_X];
for(int i =0; i < portsNum_X; i++)
{
bool_list[i] = 0;
}
for(int i = 0; i < portsNum; i++)
{
bool_list[i] = (logItem->BITS[i/8] >> (7 - i%8)) & 0x1;
}
r->refresh(state, logItem->time, bool_list);//update data inside...state, time , BITS...
IconLabel* icl = getIConLabelByDev(r);//update data
icl->refresh(state);
logDisplayQueue.enqueue(logItem);//write queue here
int size = logDisplayQueue.size();
if(size > 100)
{
logDisplayQueue.dequeue();//write queue here
}
}
The section above has not dealt with any ui operations yet, but when user push a radio button in the ui, the program has to filter the data in the queue and display it in the table widget:
ui operations:
void MainWindow::filter_log_display(bool bol)
{
row_selectable = false;
ui->tableWidget->setRowCount(0);//delete table items all
row_selectable = true;
int size_1 = logDisplayQueue.size() - 1;
ui->tableWidget->verticalScrollBar()->setSliderPosition(0);
if(size_1+1 < 100)
{
ui->tableWidget->setRowCount(size_1 + 1);
}
else
{
ui->tableWidget->setRowCount(100);//display 100 rows at most
}
if(bol)//filter from logDisplayQueue and display unworking-state-log rows
{
int index = 0;
for(int queue_i = size_1; queue_i >= 0; queue_i--)
{
LogItem* logItem = (LogItem*)logDisplayQueue.at(queue_i); // read queue here
if(logItem->state == STATE_WORK || logItem->state == STATE_UN)continue;
QString BITS_str = bits2Hexs(logItem->BITS);
ui->tableWidget->setItem(index, 0, new QTableWidgetItem(logItem->time));//time
ui->tableWidget->setItem(index, 1, new QTableWidgetItem(logItem->name));//name
ui->tableWidget->setItem(index, 2, new QTableWidgetItem(BITS_str));//BITS
if(queue_i == oldRowItemNo)ui->tableWidget->selectRow(index);
index++;
}
ui->tableWidget->setRowCount(index);
}
else//display all rows
{
for(int queue_i = size_1, index = 0; queue_i >= 0; queue_i--, index++)
{
LogItem* logItem = (LogItem*)logDisplayQueue.at(queue_i); //read queue here
QString BITS_str = bits2Hexs(logItem->BITS);//
finish = clock();
ui->tableWidget->setItem(index, 0, new QTableWidgetItem(logItem->time));//time
ui->tableWidget->setItem(index, 1, new QTableWidgetItem(logItem->name));//name
ui->tableWidget->setItem(index, 2, new QTableWidgetItem(BITS_str));//BITS
if(queue_i == oldRowItemNo)ui->tableWidget->selectRow(index);
}
}
}
So the queue is quite samll and the background procedure is quite frequent(nearly 500 times per sec). That is, the queue will be written 500 times in 1 sec, but displayed time from time by the user.
I want to split the functions into two threads and run them together, one rev and update data, one display.
If i do not use any lock or mutex, the user may get the wrong data, but if i force the write-data procedure enter critical section and leave critical section everytime, it will be a heavy overload. :)
Should I use CRITICAL_SECTION or something else, any suggestions related?(my words could be verbose for you :) , i only hope for some hints :)

I'd put "Recv" function in another QObject derived class, put it under other QThread not main gui thread and connect "logItemAdded(LogItem* item)" signal to main window's "addLogItem(LogItem* item)" slot.
for just quick and dirty hint my conceptual code follows.
#include <QObject>
class Logger : public QObject
{
Q_OBJECT
public:
Logger(QObject* parent=0);
virtual ~Logger();
signals:
void logItemAdded(LogItem* logItem);
public slots:
protected:
void Rcv()
{
// ...
// was "add_logItem(logItem)"
emit logItemAdded(logItem);
}
};
MainWindow::MainWindow(...)
{
Logger logger = new Logger;
// setup your logger
QThread* thread = new QThread;
logger->moveToThread(thread);
connect(thread, SIGNAL(finished()), thread, SLOT(deleteLater()));
thread->start();
}
Hope this helps.
Good luck.

Related

Getting value from thread running in while loop

I have a java thread which is running a path-finding algorithm in a constant while loop. Then, every so often I want to retrieve the most updated path from the thread. However, I am unsure how to do this, and think I might be doing it wrong.
My thread consists of the following code:
public class BotThread extends Thread {
Bot bot;
AStar pathFinder;
Player targetPlayer;
public List<boolean[]> plan;
public BotThread(Bot bot) {
this.bot = bot;
this.plan = new ArrayList<>();
pathFinder = new AStar(bot, bot.getLevelHandler());
}
public void run() {
while (true) {
System.out.println("THREAD RUNNING");
targetPlayer = bot.targetPlayer;
plan = pathFinder.optimise(targetPlayer);
}
}
public boolean[] getNextAction() {
return plan.remove(0);
}
}
I then create an object of BotThread, and call start(). Then when I call getNextAction() on the thread, I seem to receive a null pointer. Is this because I am not able to call another method on the thread whilst it is in the main loop? How should I do this properly?

This is because you are not giving enough time to thread to initialise plan Arraylist. You need to add sleeping time to the threads. Something like this while calling BotThread class from main:
int num_threads = 8;
BotThread myt[] = new BotThread[num_threads];
for (int i = 0; i < num_threads; ++i) {
myt[i] = new BotThread();
myt[i].start();
Thread.sleep(1000);
myt[i].getNextAction();
}

unordered_map insertion is creating bottleneck

So here I am trying to create a Graph data structure in which i have to keep track of edges according to their ids. So I am creating edge ids in string data structure as eid: sourceid_destinationid
using namespace std;
class Edge{
public:
bool operator==(const Edge* &obj) const
{
return eid==obj->eid;
}
std::string eid;
set<int> rrids;
int sourceid;
int destid;
int strength;
public:
Edge(std::string eid,int from,int to);
std::string getId();
void addRRid(int rrid);
void removeRRid(int rrid);
void setRRid(set<int> rrids);
void setId(std::string eid);
};
This is another class which I am using for adding and removing the edges.
hpp-file
using namespace std;
class RRassociatedGraph{
public:
unordered_map<int,vertex*> vertexMap;
std::unordered_map<std::string,Edge*> EdgeMap;
int noOfEdges;
public:
RRassociatedGraph();
unordered_set<vertex> getVertices();
int getNumberOfVertices();
void addVertex(vertex v);
vertex* find(int id);
Edge* findedge(std::string id);
void addEdge(int from, int to, int label);
void removeEdge(int from, int to,int rrSetID);
};
When I debugged the code I found out that in the function add edge here the place where I am doing EdgeMap.insert the execution doesn't go to next line. It remains in hashtable for loop of some bucket entry. I can't debug this code frequently because I have to wait for 3 hours to get this issue. The code is working perfectly with small graphs. But for larger graphs where edgeMap has to store 800k edges. It goes in this hashtable infinite loop. I don't get this hashtable code. But is there something wrong with my data structure of creating Edgemap?
#include "RRassociatedGraph.hpp"
RRassociatedGraph::RRassociatedGraph() {
noOfEdges=0;
}
void RRassociatedGraph::addVertex(vertex v) {
vertexMap.insert(pair<int,vertex*>(v.getId(), &v));
}
vertex* RRassociatedGraph::find(int id) {
unordered_map<int,vertex*>::const_iterator got=vertexMap.find(id);
if(got != vertexMap.end() )
return got->second;
return nullptr;
}
Edge* RRassociatedGraph::findedge(std::string id){
unordered_map<std::string,Edge*>::const_iterator got=EdgeMap.find(id);
if(got != EdgeMap.end() )
return got->second;
return nullptr;
}
void RRassociatedGraph::addEdge(int from, int to, int label) {
vertex* fromVertex = find(from);
if (fromVertex == nullptr) {
fromVertex = new vertex(from);
vertexMap.insert(pair<int,vertex*>(fromVertex->getId(), fromVertex));
}
vertex* toVertex = find(to);
if (toVertex == nullptr) {
toVertex = new vertex(to);
vertexMap.insert(pair<int,vertex*>(toVertex->getId(), toVertex));
}
if(fromVertex==toVertex){
// fromVertex->outDegree++;
//cout<<fromVertex->getId()<<" "<<toVertex->getId()<<"\n";
return;
}
std::string eid=std::to_string(from);
eid+="_"+std::to_string(to);
Edge* edge=findedge(eid);
if(edge==nullptr){
edge=new Edge(eid,from,to);
edge->addRRid(label);
fromVertex->addOutGoingEdges(edge);
EdgeMap.insert(pair<std::string,Edge*>(edge->getId(), edge));
noOfEdges++;
}
else{
edge->addRRid(label);
fromVertex->outDegree++;
}
}
void RRassociatedGraph::removeEdge(int from, int to,int rrSetID) {
vertex* fromVertex = find(from);
std::string eid=std::to_string(from);
eid+="_"+std::to_string(to);
if(EdgeMap.count(eid)==1){
Edge* e=EdgeMap.find(eid)->second;
if(fromVertex->removeOutgoingEdge(e,rrSetID)){
EdgeMap.erase(eid);
delete e;
}
}
}
this is the place where it keeps going into this for loop. The insertion time of map should be very less but this is creating bottleneck in my code.
template <class _Tp, class _Hash, class _Equal, class _Alloc>
void
__hash_table<_Tp, _Hash, _Equal, _Alloc>::__rehash(size_type __nbc)
{
#if _LIBCPP_DEBUG_LEVEL >= 2
__get_db()->__invalidate_all(this);
#endif // _LIBCPP_DEBUG_LEVEL >= 2
__pointer_allocator& __npa = __bucket_list_.get_deleter().__alloc();
__bucket_list_.reset(__nbc > 0 ?
__pointer_alloc_traits::allocate(__npa, __nbc) : nullptr);
__bucket_list_.get_deleter().size() = __nbc;
if (__nbc > 0)
{
for (size_type __i = 0; __i < __nbc; ++__i)
__bucket_list_[__i] = nullptr;
__next_pointer __pp = __p1_.first().__ptr();
__next_pointer __cp = __pp->__next_;
if (__cp != nullptr)
{
size_type __chash = __constrain_hash(__cp->__hash(), __nbc);
__bucket_list_[__chash] = __pp;
size_type __phash = __chash;
for (__pp = __cp, __cp = __cp->__next_; __cp != nullptr;
__cp = __pp->__next_)
{
__chash = __constrain_hash(__cp->__hash(), __nbc);
if (__chash == __phash)
__pp = __cp;
else
{
if (__bucket_list_[__chash] == nullptr)
{
__bucket_list_[__chash] = __pp;
__pp = __cp;
__phash = __chash;
}
else
{
__next_pointer __np = __cp;
for (; __np->__next_ != nullptr &&
key_eq()(__cp->__upcast()->__value_,
__np->__next_->__upcast()->__value_);
__np = __np->__next_)
;
__pp->__next_ = __np->__next_;
__np->__next_ = __bucket_list_[__chash]->__next_;
__bucket_list_[__chash]->__next_ = __cp;
}
}
}
}
}
}
I have many files so I can't put the whole code. I am not that good in c++. Please let me know if I have to implement it some other way. I have to use hashMap because I also need faster search.

You are probably experiencing re-hash at insert. Unordered_map has number of buckets. When they are filled worst case insert time is O(size()).
http://en.cppreference.com/w/cpp/container/unordered_map/insert
Rehashing occurs only if the new number of elements is greater than max_load_factor()*bucket_count().
What you may do with your current setup is:
1. Growth map at the init of the program, as usually number of buckets doesn't shrink.
2. Change from std::unordered_map to Boost::intrusive_map, where you can manager number of buckets manually.

Performance testing: One core vs multiple

I'm trying to understand one problem that I encountered recently in my project. I'm using Aurigma library to resize images. It is used in the single thread mode and produce only one thread during calculation. Lately I decided to move to ImageMagick project, because it is free and open source. I've built IM in the single thread mode and started to test. At first I wanted to compare their performance without interruptions, so I created a test that has high priorities for a process and its thread. Also, I set affinity to one core. I got that IM faster than Aurigma on ~25%. But than more threads I added than less IM had advantage against Aurigma.
My project is a windows service that starts about 7-10 child processes. Each process has 2 threads to process images. When I run my test as two different processes with 2 threads each, I noticed that IM worked worse than Aurigma on about 5%.
Maybe my question is not very detailed, but this scope is a little new for me and I would be glad to get direction for further investigation. How can it be that one program works faster if it is run on one thread in one process, but worse if it is run in multiple processes at the same time.
Fro example,
Au: 8 processes x 2Th(20 tasks per thread) = 320 tasks for 245 secs
IM: 8 processes x 2Th(20 tasks per thread) = 320 tasks for 280 secs
Au: 4 processes x 2Th(20 tasks per thread) = 160 tasks for 121 secs
IM: 4 processes x 2Th(20 tasks per thread) = 160 tasks for 141 secs
We can see that Au works better if we have more that 1 process, but in single process mode: Au process one task for 2,2 sec, IM for 1,4 sec and the sum time is better for IM
private static void ThreadRunner(
Action testFunc,
int repeatCount,
int threadCount
)
{
WaitHandle[] handles = new WaitHandle[threadCount];
var stopwatch = new Stopwatch();
// warmup
stopwatch.Start();
for (int h = 0; h < threadCount; h++)
{
var handle = new ManualResetEvent(false);
handles[h] = handle;
var thread = new Thread(() =>
{
Runner(testFunc, repeatCount);
handle.Set();
});
thread.Name = "Thread id" + h;
thread.IsBackground = true;
thread.Priority = ThreadPriority.Normal;
thread.Start();
}
WaitHandle.WaitAll(handles);
stopwatch.Stop();
Console.WriteLine("All Threads Total time taken " + stopwatch.ElapsedMilliseconds);
}
private static void Runner(
Action testFunc,
int count
)
{
//Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2); // Use only the second core
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.BelowNormal;
Process.GetCurrentProcess().PriorityBoostEnabled = false;
Thread.CurrentThread.Priority = ThreadPriority.Normal;
var stopwatch = new Stopwatch();
// warmup
stopwatch.Start();
while(stopwatch.ElapsedMilliseconds < 10000)
testFunc();
stopwatch.Stop();
long elmsec = 0;
for (int i = 0; i < count; i++)
{
stopwatch.Reset();
stopwatch.Start();
testFunc();
stopwatch.Stop();
elmsec += stopwatch.ElapsedMilliseconds;
Console.WriteLine("Ticks: " + stopwatch.ElapsedTicks +
" mS: " + stopwatch.ElapsedMilliseconds + " Thread name: " + Thread.CurrentThread.Name);
}
Console.WriteLine("Total time taken " + elmsec + " Thread name: " + Thread.CurrentThread.Name);
}
/// <summary>
/// Entry point
/// </summary>
/// <param name="args"></param>
private static void Main(string[] args)
{
var files = GetFiles(args.FirstOrDefault());
if (!files.Any())
{
Console.WriteLine("Source files were not found.");
goto End;
}
//// Run tests
Console.WriteLine("ImageMagick run... Resize");
Runner(() => PerformanceTests.RunResizeImageMagickTest(files[0]), 20);
Console.WriteLine("Aurigma run... Resize");
Runner(() => PerformanceTests.RunResizeAurigmaTest(files[0]), 20);
Console.WriteLine("ImageMagick run... multi Resize");
ThreadRunner(() => PerformanceTests.RunResizeImageMagickTest(files[0]), 20, 2);
Console.WriteLine("Aurigma run... multi Resize");
ThreadRunner(() => PerformanceTests.RunResizeAurigmaTest(files[0]), 20, 2);
End:
Console.WriteLine("Done");
Console.ReadKey();
}
public static void RunResizeImageMagickTest(string source)
{
float[] ratios = { 0.25f, 0.8f, 1.4f };
// load the source bitmap
using (MagickImage bitmap = new MagickImage(source))
{
foreach (float ratio in ratios)
{
// determine the target image size
var size = new Size((int)Math.Round(bitmap.Width * ratio), (int)Math.Round(bitmap.Height * ratio));
MagickImage thumbnail = null;
try
{
thumbnail = new MagickImage(bitmap);
// scale the image down
thumbnail.Resize(size.Width, size.Height);
}
finally
{
if (thumbnail != null && thumbnail != bitmap)
{
thumbnail.Dispose();
}
}
}
}
}
public static void RunResizeAurigmaTest(string source)
{
float[] ratios = { 0.25f, 0.8f, 1.4f };
//// load the source bitmap
using (ABitmap bitmap = new ABitmap(source))
{
foreach (float ratio in ratios)
{
// determine the target image size
var size = new Size((int)Math.Round(bitmap.Width * ratio), (int)Math.Round(bitmap.Height * ratio));
ABitmap thumbnail = null;
try
{
thumbnail = new ABitmap();
// scale the image down
using (var resize = new Resize(size, InterpolationMode.HighQuality))
{
resize.ApplyTransform(bitmap, thumbnail);
}
}
finally
{
if (thumbnail != null && thumbnail != bitmap)
{
thumbnail.Dispose();
}
}
}
}
}
Code for testing is added. I use C#/.NET, ImageMagick works through ImageMagick.Net library, for Aurigma there is one too. For IM .net lib is written on C++/CLI, IM is C. A lot of languages are used.
OpenMP for IM is off.

Could be a memory cache issue. It is possible that multiple threads utilizing memory in a certain way create a scenario that one thread invalidates cache memory that another thread was using, causing a stall.
Programs that are not purely number crunching, but rely on a lot of IO (CPU<->Memory) are more difficult to analyze.

Multithreading and Monitoring

I am trying to get multithreading more unraveled in my head. I made these three classes.
A global variable class
public partial class globes
{
public bool[] sets = new bool[] { false, false, false };
public bool boolChanged = false;
public string tmpStr = string.Empty;
public int gcount = 0;
public bool intChanged = false;
public Random r = new Random();
public bool gDone = false;
public bool first = true;
}
Drop in point
class Driver
{
static void Main(string[] args)
{
Console.WriteLine("start");
globes g = new globes();
Thread[] threads = new Thread[6];
ParameterizedThreadStart[] pts = new ParameterizedThreadStart[6];
lockMe _lockme = new lockMe();
for (int b = 0; b < 3; b++)
{
pts[b] = new ParameterizedThreadStart(_lockme.paramThreadStarter);
threads[b] = new Thread(pts[b]);
threads[b].Name = string.Format("{0}", b);
threads[b].Start(b);
}
}
}
And then my threading class
class lockMe
{
#region Fields
private string[] words = new string[] {"string0", "string1", "string2", "string3"};
private globes g = new globes();
private object myKey = new object();
private string[] name = new string[] { String.Empty, String.Empty, String.Empty };
#endregion
#region methods
// first called for all threads
private void setName(Int16 i)
{
Monitor.Enter(myKey);
{
try
{
name[i] = string.Format("{0}:{1}", Thread.CurrentThread.Name, g.r.Next(100, 500).ToString());
}
finally
{
Monitor.PulseAll(myKey);
Monitor.Exit(myKey);
}
}
}
// thread 1
private void changeBool(Int16 a)
{
Monitor.Enter(myKey);
{
try
{
int i = getBools();
//Thread.Sleep(3000);
if (g.gcount > 5) { g.gDone = true; return; }
if (i == 3) resets();
else { for (int x = 0; x <= i; i++) { g.sets[x] = true; } }
Console.WriteLine("Thread {0} ran through changeBool()\n", name[a]);
}
finally
{
Monitor.PulseAll(myKey);
Monitor.Exit(myKey);
}
}
}
// thread 2
private void changeInt(Int16 i)
{
Monitor.Enter(myKey);
{
try
{
g.gcount++;
//Thread.Sleep(g.r.Next(1000, 3000));
Console.WriteLine("Thread {0}: Count is now at {1}\n", name[i], g.gcount);
}
finally
{
Monitor.PulseAll(myKey);
Monitor.Exit(myKey);
}
}
}
// thread 3
private void printString(Int16 i)
{
Monitor.Enter(myKey);
{
try
{
Console.WriteLine("...incoming...");
//Thread.Sleep(g.r.Next(1500, 2500));
Console.WriteLine("Thread {0} printing...{1}\n", name[i], words[g.r.Next(0, 3)]);
}
finally
{
Monitor.PulseAll(myKey);
Monitor.Exit(myKey);
}
}
}
// not locked- called from within a locked peice
private int getBools()
{
if ((g.sets[0] == false) && (g.sets[1] == false) && (g.sets[2] == false)) return 0;
else if ((g.sets[0] == true) && (g.sets[1] == false) && (g.sets[2] == false)) return 1;
else if ((g.sets[2] == true) && (g.sets[3] == false)) return 2;
else if ((g.sets[0] == true) && (g.sets[1] == true) && (g.sets[2] == true)) return 3;
else return 99;
}
// should not need locks- called within locked statement
private void resets()
{
if (g.first) { Console.WriteLine("FIRST!!"); g.first = false; }
else Console.WriteLine("Cycle has reset...");
}
private bool getStatus()
{
bool x = false;
Monitor.Enter(myKey);
{
try
{
x = g.gDone;
}
finally
{
Monitor.PulseAll(myKey);
Monitor.Exit(myKey);
}
}
return x;
}
#endregion
#region Constructors
public void paramThreadStarter(object starter)
{
Int16 i = Convert.ToInt16(starter);
setName(i);
do
{
switch (i)
{
default: throw new Exception();
case 0:
changeBool(i);
break;
case 1:
changeInt(i);
break;
case 2:
printString(i);
break;
}
} while (!getStatus());
Console.WriteLine("fin");
Console.ReadLine();
}
#endregion
}
So I have a few questions. The first- is it better to have my global class set like this? Or should I be using a static class with properties and altering them that way? Next question is, when this runs, at random one of the threads will run, pulse/exit the lock, and then step right back in (sometimes like 5-10 times before the next thread picks up the lock). Why does this happen?

Each thread is given a certain amount of CPU time, I doubt that one particular thread is getting more actual CPU time over the others if you are locking all the calls in the same fashion and the thread priorities are the same among the threads.
Regarding how you use your global class, it doesn't really matter. The way you are using it wouldn't change it one way or the other. Your use of globals was to test thread safety, so when multiple threads are trying to change shared properties all that matters is that you enforce thread safety.
Pulse might be a better option knowing that only one thread can actually enter, pulseAll is appropriate when you lock something because you have a task to do, once that task is complete and won't lock the very next time. In your scenario you lock every time so doing a pulseAll is just going to waste cpu because you know that it will be locked for the next request.
Common example of when to use static classes and why you must make them thread safe:
public static class StoreManager
{
private static Dictionary<string,DataStore> _cache = new Dictionary<string,DataStore>(StringComparer.OrdinalIgnoreCase);
private static object _syncRoot = new object();
public static DataStore Get(string storeName)
{
//this method will look for the cached DataStore, if it doesn't
//find it in cache it will load from DB.
//The thread safety issue scenario to imagine is, what if 2 or more requests for
//the same storename come in? You must make sure that only 1 thread goes to the
//the DB and all the rest wait...
//check to see if a DataStore for storeName is in the dictionary
if ( _cache.ContainsKey( storeName) == false )
{
//only threads requesting unknown DataStores enter here...
//now serialize access so only 1 thread at a time can do this...
lock(_syncRoot)
{
if (_cache.ContainsKey(storeName) == false )
{
//only 1 thread will ever create a DataStore for storeName
DataStore ds = DataStoreManager.Get(storeName); //some code here goes to DB and gets a DataStore
_cache.Add(storeName,ds);
}
}
}
return _cache[storeName];
}
}
What's really important to see is that the Get method only single threads the call when there is no DataStore for the storeName.
Double-Check-Lock:
You can see the first lock() happens after an if, so imagine 3 threads simultaneously run the if ( _cache.ContainsKey(storeName) .., now all 3 threads enter the if. Now we lock so that only 1 thread can enter, now we do the same exact if statement, only the very first thread that gets here will actually pass this if statement and get the DataStore. Once the first thread .Add's the DataStore and exits the lock the other 2 threads will fail the second check (double check).
From that point on any request for that storeName will get the cached instance.
So we single threaded our application only in the spots that required it.

CyclicBarrier code not working?

I got CyclicBarrier code from oracle page to understand it more. I modified it and now having one doubt.
Below code doesn't terminate but If I uncomment Thread.sleep condition, It works fine.
import java.util.Arrays;
import java.util.concurrent.BrokenBarrierException;
import java.util.concurrent.CyclicBarrier;
class Solver {
final int N;
final float[][] data;
boolean done = false;
final CyclicBarrier barrier;
class Worker implements Runnable {
int myRow;
Worker(int row) {
myRow = row;
}
public void run() {
while (!done) {
processRow(myRow);
try {
barrier.await();
} catch (InterruptedException ex) {
return;
} catch (BrokenBarrierException ex) {
return;
}
}
System.out.println("Run finish for " + Thread.currentThread().getName());
}
private void processRow(int row) {
float[] rowData = data[row];
for (int i = 0; i < rowData.length; i++) {
rowData[i] = 1;
}
/*try {
Thread.sleep(2000);
} catch (InterruptedException e) {
e.printStackTrace();
}*/
done = true;
}
}
public Solver(float[][] matrix) {
data = matrix;
N = matrix.length;
barrier = new CyclicBarrier(N, new Runnable() {
public void run() {
for (int i = 0; i < data.length; i++) {
System.out.println("Data " + Arrays.toString(data[i]));
}
System.out.println("Completed:");
}
});
for (int i = 0; i < N; ++i)
new Thread(new Worker(i), "Thread "+ i).start();
}
}
public class CyclicBarrierTest {
public static void main(String[] args) {
float[][] matrix = new float[5][5];
Solver solver = new Solver(matrix);
}
}
Why Thread.sleep is required in above code?

I've not run your code but there may be a race condition, here is a scenario that reveals it:
you start the first thread, it runs during a certain amount of time sufficient for it to finish the processRow method call so it sets done to true and then waits on the barrier,
the other threads start but they see that all is "done" so they don't enter the loop and they'll never wait on the barrier, and end directly
the barrier will never be activated as only one of the N threads has reached it
deadlock
Why it is working with the sleep:
when one of the thread starts to sleep it lets the other threads work before marking the work as "done"
the other threads have enough time to work and can themselves reach the barrier
2 seconds is largely enough for 5 threads to end a processing that should not last longer than 10ms
But note that if your system is ovrerloaded it could too deadlock:
the first thread starts to sleep
the OS scheduler lets another application work during more than 2 seconds
the OS scheduler comes back to your application and the threads scheduler chooses the first thread again and lets it terminate, setting done to true
and here again the first scenario => deadlock too
And a possible solution (sorry not tested):
change your while loops for do/while loops:
do
{
processRow(myRow);
...
}
while (!done);

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Should I use CRITICAL_SECTION or not? - multithreading

Related

Getting value from thread running in while loop

unordered_map insertion is creating bottleneck

Performance testing: One core vs multiple

Multithreading and Monitoring

CyclicBarrier code not working?

Categories

Resources