boost::thread not updating global variable - multithreading

I am using a wrapper function in an external software to start a new thread, which updates a global variable, but yet this seems invisible to the main thread. I cant call join(), not to block the main thread and crash the software. boost::async, boost::thread and boost::packaged_task all behave the same way.
uint32 *Dval;
bool hosttask1()
{
while(*Dval<10)
{
++*Dval;
PlugIn::gResultOut << " within thread global value: " << *Dval << std::endl;
Sleep(500);
}
return false;
}
void SU_HostThread1(uint32 *value)
{
Dval = value;
*Dval = 2;
PlugIn::gResultOut << " before thread: " << *value << " before thread global: " << *Dval << std::endl;
auto myFuture = boost::async(boost::launch::async,&hosttask1);
//boost::thread thread21 = boost::thread(&hosttask1);
//boost::packaged_task<bool> pt(&hosttask1);
//boost::thread thread21 = boost::thread(boost::move(pt));
}
When I call the function:
number a=0
su_hostthread1(a)
sleep(2) //seconds
result(" function returned "+a+" \n")
OUTPUT:
before thread value: 2 before thread global value: 2
within thread global value: 3
within thread global value: 4
within thread global value: 5
within thread global value: 6
function returned 2
within thread global value: 7
within thread global value: 8
within thread global value: 9
within thread global value: 10
Any ideas?
Thanks in advance!

If you share data between threads, you must syncronize access to that data. The two possible ways are a mutex protecting said data and atomic operations. The simple reason is that caches and read/write reordering (both by CPU and compiler) exist. This is a complex topic though and it's nothing that can be explained in an answer here, but there are a few good books out there and also a bunch of code that gets it right.

The following code correctly reproduces what I intend to do. Mainly, the thread updates a global variable which the main thread correctly observes.
#include "stdafx.h"
#include <iostream>
#include <boost/thread.hpp>
#include <boost/chrono.hpp>
unsigned long *dataR;
bool hosttask1()
{
bool done = false;
std::cout << "In thread global value: " << *dataR << "\n"; //*value11 << *dataL <<
unsigned long cc = 0;
boost::mutex m;
while (!done)
{
m.lock();
*dataR = cc;
m.unlock();
cc++;
std::cout << "In thread loop global value: "<< *dataR << "\n";
if (cc==5) done = true;
}
return done;
}
void SU_HostThread1(unsigned long *value)
{
dataR = value;
std::cout << "Before thread value: " << *value << " Before thread global value: " << *dataR << "\n"; //*value11 << *dataL <<
auto myFuture = boost::async(boost::launch::async, &hosttask1);
return;
}
int main()
{
unsigned long value =1;
unsigned long *value11;
value11 = &value;
SU_HostThread1(value11);
boost::this_thread::sleep(boost::posix_time::seconds(1));
std::cout << "done with end value: " << *value11 << "\n";
return 0;
}
output:
Before thread value: 1 Before thread global value: 1
In thread global value: 1
In thread loop global value: 0
In thread loop global value: 1
In thread loop global value: 2
In thread loop global value: 3
In thread loop global value: 4
done with end value: 4
Yet when I copy this exactly to the SDK of the external software, the main thread does not update global value. Any ideas how this is so?
Thanks
output in external software:
before thread value: 1 before thread global value: 1
In thread global value: 1
In thread loop global value: 0
In thread loop global value: 1
In thread loop global value: 2
In thread loop global value: 3
In thread loop global value: 4
done with end value: 1

Likely this is because the compiler doesn't generally think about multithreading when optimising your code. If has seen you code checks a value repeatedly, and it knows that in single threading that value cannot change, so it just omitted the check.
If you declare the variable as volatile, then it will probably generate less efficient code that checks more often.
Of course you have to also understand that when a value is written, there are circumstances when it may not all be written in one go, so if you are unlucky enough to read it back when it is half-written, then you get back a garbage value. The fix for that is to declare it as std::atomic (which is automatically considered volatile by the optimiser), and then even more complex code will be emitted to ensure that the write and the read cannot intersect (or different processor primitives might be used for small objects such as integers)
most variables are not shared between threads, and when they are it is up to the programmer to identify those and balance optimisation against the thread synchronisation needs during design.

Related

Dekker Mutual-Exclusion

I'm studying the operating system and the program that Dekker wrote for his third attempt to Mutual Exclusion
I wrote my code in C ++ in Visual Studio, the code is below, but I wonder how these two threads are still in the critical area at the same time?
The output of the program is below
#include<iostream>
#include<conio.h>
#include<thread>
using namespace std;
bool flag0 = false;
bool flag1 = false;
void p1()
{
flag0 = true;
while (flag1);
for (int i = 1; i <= 10; i++)
cout << "p1 : " << i << endl;
flag0 = false;
}
void p2()
{
flag1 = true;
while (flag0);
for (int i = -1; i >= -10; i--)
cout << "p2 : " << i << endl;
flag1 = false;
}
int main()
{
thread t1(p1);
thread t2(p2);
t1.join();
t2.join();
_getch();
return 0;
}
Output:
p1 : p2 : -11
p2 : p1 : 2
p1 : 3
p1 : 4
p1 : 5
p1 : 6
-2
p2 : -3
p2 : -4
p2 : -5
p2 : -6
p2 : -7
p2 : -8
p2 : -9
p2 : p1 : -107
p1 : 8
p1 : 9
p1 : 10
" I wonder how these two threads are still in the critical area at the same time"
You've got two threads accessing the same variables, without any form of synchronization. In C++, that is a form of Undefined Behavior. Undefined Behavior means that anything can happen. "Two threads in the same critical area at the same time" is not even remotely surprising.
The problem is that your flags are ordinary bool variables, which results in a data race and therefore UB. That basically means that all bets are off about how your program behaves! For example, the compiler could hoist the load in the while-loop, effectively transforming the loop into an infinite loop. But infinite loops without side effects are also UB, so the compiler is well in its right to remove the loop entirely!
But even if the compiler does not perform these optimizations, the code still does not guarantee mutual exclusion, because the flag operations are not sequentially consistent. Essentially what can happen is that both threads store true in their respective flag, but it is not guaranteed that this updated value is visible to the other thread. So it can happen that the subsequent load operations in the while-loop still return false for both threads.
To get the desired behavior, the operations on the flags need to be sequentially consistent, which means that all such operations have a single total order. To achieve that you have to define your flags as std::atomic<bool>. All operations on atomics are sequentially consistent by default (unless specified otherwise).
However, since this is the third attempt by Dekker and not the final (correct) version, it does provide mutual exclusion (under a sequentially consistent memory model), but is prone to deadlocks!
For more details you should familiarize yourself with the C++ memory model. I can recommend the paper Memory Models for C/C++ Programmers which I have co-authored.
#include<iostream>
#include<conio.h>
#include<thread>
using namespace std;
bool turnop1 = false;
void p1()
{
for (int i = 1; i <= 10; i++)
{
while (turnop1 == false);
cout << "p1 : " << i << endl;
turnop1 = false;
}
}
void p2()
{
for (int t = -1; t >= -10; t--)
{
while (turnop1 == true);
cout << "p2 : " << t << endl;
turnop1 = true;
}
}
int main()
{
thread t1(p1);
thread t2(p2);
t2.join();
t1.join();
_getch();
return 0;
}
Modified your code and worked for me. Thanks! I know this is not completely right but this Dekker's idea/example is enough for a simple explanation.

read variable value in main from thread in c++

I need to have a thread which executes a function in while loop(say increments an int value). In the main I need to have a while loop which executes some function(say a for loop which counts from 0 to 5) and then reads the current value of a variable in the thread. The thread must keep running its own while loop irrespective of whats going on in main. However the value of the thread variable must not change while main reads the variable.
I guess this problem can be solved using atomic. However this is a toy problem in which the variable in the thread is an int. In my actual problem the thread variable if of type Eigen::quaternionf or float[4]. So I need to ensure that the entire Eigen::quaternionf or float[4] is held constant when it is read from main.
The cout in the thread is only for debugging. If the code runs with thread safety, it can be removed. I read from another post that using cout in a thread safe manner may need to write a new wrapper around cout with a mutex. I want to avoid it.
My main concern is reading the variable in correct order in main.
My code fails(today is my first day with multithreading) and is as below along with observed output(selected parts). the code fails to keep the order of the output using cout(garbled output). Also I am not sure that the thread variable is correctly read by the main.
#include <thread>
#include <mutex>
#include <iostream>
int i = 0;
void safe_increment(std::mutex& i_mutex)
{
while(1)
{
std::lock_guard<std::mutex> lock(i_mutex);
++i;
std::cout << "thread: "<< std::this_thread::get_id() << ", i=" << i << '\n';
}
}
int main()
{
std::mutex i_mutex;
std::thread t1(safe_increment, std::ref(i_mutex));
while(1)
{
for(int k =0; k < 5; k++)
{
std::cout << "main: k =" << k << '\n';
}
std::lock_guard<std::mutex> lock(i_mutex);
std::cout << "main: i=" << i << '\n';
}
}
The output(selected parts) I get is
thread: 139711042705152, i=223893
thread: 139711042705152, i=223894
thread: 139711042705152, i=223895
main: i=223895
main: k =0
thread: main: k =1139711042705152
main: k =2
main: k =3
, i=main: k =4
223896
thread: 139711042705152, i=223897
thread: 139711042705152, i=223898
thread: 139711042705152, i=224801
thread: 139711042705152, i=224802
main: i=224802
main: k =0
main: k =1
thread: main: k =2
main: k =3
main: k =4
139711042705152, i=224803
thread: 139711042705152, i=224804
thread: 139711042705152, i=224805
i is properly synchronized with the mutex. well done! obviously this runs until you force it to stop, so when you do find a better way to end execution, be sure to join on your thread.
to fix the garbling, you need to synchronize on std::cout:
int main()
{
std::mutex i_mutex;
std::thread t1(safe_increment, std::ref(i_mutex));
while(1)
{
std::lock_guard<std::mutex> lock(i_mutex);//moved up here to sync on std::cout << k
for(int k =0; k < 5; k++)
{
std::cout << "main: k =" << k << '\n';
}
std::cout << "main: i=" << i << '\n';
if (i > 100) break;
}
t1.join(); //thread will continue and main will wait until it is done
//your thread needs to have some way out of its while(1) as well.
}
the thread can maybe be this:
void safe_increment(std::mutex& i_mutex)
{
while(1)
{
std::lock_guard<std::mutex> lock(i_mutex);
++i;
std::cout << "thread: "<< std::this_thread::get_id() << ", i=" << i << '\n';
if (i > 111) break;
}
}

What is the difference between two join statements in the code?

In the below code, there are two joins (of course one is commented). I would like to know what is the difference between
when join is executed before the loop and when join is executed after the loop?
#include <iostream>
#include <thread>
using namespace std;
void ThreadFunction();
int main()
{
thread ThreadFunctionObj(ThreadFunction);
//ThreadFunctionObj.join();
for (int j=0;j<10;++j)
{
cout << "\tj = " << j << endl;
}
ThreadFunctionObj.join();
return 0;
}
void ThreadFunction()
{
for (int i=0;i<10;++i)
{
cout << "i = " << i << endl;
}
}
A join() on a thread waits for it to finish execution, your code doesn't continue as long as the thread isn't done. As such, calling join() right after starting a new thread defeats the purpose of multi-threading, as it would be the same as executing those two for loops in a serial way. Calling join() after your loop in main() ensures that both for loops execute in parallel, meaning that at the end of your for loop in your main(), you wait for the ThreadFunction() loop to be done too. This is the equivalent of you and a friend going out to eat, for example. You both start eating at relatively the same time, but the first one to finish still has to wait for the other (might not be the best example, but hope it does the job).
Hope it helps

How to suspend all other threads inside a separate class function c++

I am working on a final project for a class. This project is to mimic multiple atm's. That is my program already runs. Inside of my main.cpp, I created the threads, for now just two, later on maybe more, They call a class Begin that rand() if customers are going to make a deposit or withdraw and then rand() the amount they are going to use and does this 5 times.
#include "ATM.h"
void main()
{
Begin test1;
test1.manager();
thread first(&Begin::atm, test1);
thread second(&Begin::atm, test1);
first.join();
second.join();
delete resbox::cashbox;
system("pause");
}
I cannot figure out how to suspend my threads created in Main.cpp inside of my observe() function like so:
void watcher::observe()
{
float cash;
if (resbox::cashbox->gettotal() >= resbox::cashbox->getmax())
{
//suspend all other threads
cout << "Please empty cash box it is full! with $"<< resbox::cashbox->gettotal() << endl;
cout << "How much would like to withdraw?" << endl;
cin >> cash;
resbox::cashbox->cashwd(cash);
cout << "This is the amount in the reserve box now is $" << resbox::cashbox->gettotal() << endl;
//resume all other threads
}
if (resbox::cashbox->gettotal() <= 500)
{
//suspend all other threads
cout << "Please fill cashbox it is low, has $" << resbox::cashbox->gettotal() << endl;
cout << "How much would like to add?" << endl;
cin >> cash;
resbox::cashbox->cashdp(cash);
cout << "This is the amount in the reserve box now $" << resbox::cashbox->gettotal() << endl;
//resume all other threads
}
}
Whenever the condition is met for one of the if statements I need to be able to suspend all other threads except the current thread that met the condition. Then after the data is completed before leaving the if statement and observer functions resume all other threads.
I read about the possibility of using SuspendThread, and ResumeThread from here, how to suspend thread. Yet I am having a hard time passing the threads created in main.cpp to the observer function so that I could call those functions. I figured out how to create threads from cplusplus.com, I also notice I could potentially use a mutex locking as refered to from What is the best solution to pause and resume pthreads?
I am using c++ under Microsoft Visual Studio 2015 Community.
This is my first time dealing with threads. For my use which is better, pass the created threads to the observer function, or is there another to pause/suspend and then resume them and how would i do so? Thank you for any advice/help provided.
Currently If I run my program and one of the conditions is met by a thread, the other thread will also meet the same condition and I have to enter the amount to deposit/withdraw twice before the threads continue until each thread has dealt with 5 customers each for a total of 10 customers.
I finally figured out what I needed and what to use thanks to:
Class RWLock
By utilizing this class, inside my project. Then creating a global instance of that class.
Then I added the reader and writer lock and unlocks where it function inside my code the best. Like so:
void Begin::atm() //The main function that makes it easier for threads to
call and run the Program.
{
ATM atm;
int choice, amount;
LARGE_INTEGER cicles;
QueryPerformanceCounter(&cicles);
srand(cicles.QuadPart);
for (int i = 0; i < imax; i++) //mimics a total of 5 customers
{
rw.ReadLock(); //Have to place to read lock here.
choice = rand() % 2; //Randomizes the choice of depositing or withdrawing.
amount = rand() % 5000 + 1; //Randomizes 'the amount of cash that the customers use.
rw.ReadUnlock(); //Read unlock must happen here otherwise it blocks the writers.
rw.WriteLock(); //Must happen here!
if (choice == 0)
{
atm.cashdp(amount);
cout << "\tCustomer depositing $" << amount << endl;
}
else if (choice == 1)
{
atm.cashwd(amount);
cout << "\tCustomer withdrawing $" << amount << endl;
}
else
//error checker against the randomizer for the choice of depsoiting or withdrawing.
cout << "error rand creating wrong number" << endl;
rw.WriteUnlock(); //Must Happen here!
Sleep(5000); // Sleeps the program between customer usage to mimic actual use.
}
}

std::async performance on Windows and Solaris 10

I'm running a simple threaded test program on both a Windows machine (compiled using MSVS2015) and a server running Solaris 10 (compiled using GCC 4.9.3). On Windows I'm getting significant performance increases from increasing the threads from 1 to the amount of cores available; however, the very same code does not see any performance gains at all on Solaris 10.
The Windows machine has 4 cores (8 logical) and the Unix machine has 8 cores (16 logical).
What could be the cause for this? I'm compiling with -pthread, and it is creating threads since it prints all the "S"es before the first "F". I don't have root access on the Solaris machine, and from what I can see there's no installed tool which I can use to view a process' affinity.
Example code:
#include <iostream>
#include <vector>
#include <future>
#include <random>
#include <chrono>
std::default_random_engine gen(std::chrono::system_clock::now().time_since_epoch().count());
std::normal_distribution<double> randn(0.0, 1.0);
double generate_randn(uint64_t iterations)
{
// Print "S" when a thread starts
std::cout << "S";
std::cout.flush();
double rvalue = 0;
for (int i = 0; i < iterations; i++)
{
rvalue += randn(gen);
}
// Print "F" when a thread finishes
std::cout << "F";
std::cout.flush();
return rvalue/iterations;
}
int main(int argc, char *argv[])
{
if (argc < 2)
return 0;
uint64_t count = 100000000;
uint32_t threads = std::atoi(argv[1]);
double total = 0;
std::vector<std::future<double>> futures;
std::chrono::high_resolution_clock::time_point t1;
std::chrono::high_resolution_clock::time_point t2;
// Start timing
t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < threads; i++)
{
// Start async tasks
futures.push_back(std::async(std::launch::async, generate_randn, count/threads));
}
for (auto &future : futures)
{
// Wait for tasks to finish
future.wait();
total += future.get();
}
// End timing
t2 = std::chrono::high_resolution_clock::now();
// Take the average of the threads' results
total /= threads;
std::cout << std::endl;
std::cout << total << std::endl;
std::cout << "Finished in " << std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count() << " ms" << std::endl;
}
As a general rule, classes defined by the C++ standard library do not have any internal locking. Modifying an instance of a standard library class from more than one thread, or reading it from one thread while writing it from another, is undefined behavior, unless "objects of that type are explicitly specified as being sharable without data races". (N3337, sections 17.6.4.10 and 17.6.5.9.) The RNG classes are not "explicitly specified as being sharable without data races". (cout is an example of a stdlib object that is "sharable with data races" — as long as you haven't done ios::sync_with_stdio(false).)
As such, your program is incorrect because it accesses a global RNG object from more than one thread simultaneously; every time you request another random number, the internal state of the generator is modified. On Solaris, this seems to result in serialization of accesses, whereas on Windows it is probably instead causing you not to get properly "random" numbers.
The cure is to create separate RNGs for each thread. Then each thread will operate independently, and they will neither slow each other down nor step on each other's toes. This is a special case of a very general principle: multithreading always works better the less shared data there is.
There's an additional wrinkle to worry about: each thread will call system_clock::now at very nearly the same time, so you may end up with some of the per-thread RNGs seeded with the same value. It would be better to seed them all from a random_device object. random_device requests random numbers from the operating system, and does not need to be seeded; but it can be very slow. The random_device should be created and used inside main, and seeds passed to each worker function, because a global random_device accessed from multiple threads (as in the previous edition of this answer) is just as undefined as a global default_random_engine.
All told, your program should look something like this:
#include <iostream>
#include <vector>
#include <future>
#include <random>
#include <chrono>
static double generate_randn(uint64_t iterations, unsigned int seed)
{
// Print "S" when a thread starts
std::cout << "S";
std::cout.flush();
std::default_random_engine gen(seed);
std::normal_distribution<double> randn(0.0, 1.0);
double rvalue = 0;
for (int i = 0; i < iterations; i++)
{
rvalue += randn(gen);
}
// Print "F" when a thread finishes
std::cout << "F";
std::cout.flush();
return rvalue/iterations;
}
int main(int argc, char *argv[])
{
if (argc < 2)
return 0;
uint64_t count = 100000000;
uint32_t threads = std::atoi(argv[1]);
double total = 0;
std::vector<std::future<double>> futures;
std::chrono::high_resolution_clock::time_point t1;
std::chrono::high_resolution_clock::time_point t2;
std::random_device make_seed;
// Start timing
t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < threads; i++)
{
// Start async tasks
futures.push_back(std::async(std::launch::async,
generate_randn,
count/threads,
make_seed()));
}
for (auto &future : futures)
{
// Wait for tasks to finish
future.wait();
total += future.get();
}
// End timing
t2 = std::chrono::high_resolution_clock::now();
// Take the average of the threads' results
total /= threads;
std::cout << '\n' << total
<< "\nFinished in "
<< std::chrono::duration_cast<
std::chrono::milliseconds>(t2 - t1).count()
<< " ms\n";
}
(This isn't really an answer, but it won't fit into a comment, especially with the command formatting an links.)
You can profile your executable on Solaris using Solaris Studio's collect utility. On Solaris, that will be able to show you where your threads are contending.
collect -d /tmp -p high -s all app [app args]
Then view the results using the analyzer utility:
analyzer /tmp/test.1.er &
Replace /tmp/test.1.er with the path to the output generated by a collect profile run.
If your threads are contending over some resource(s) as #zwol posted in his answer, you will see it.
Oracle marketing brief for the toolset can be found here: http://www.oracle.com/technetwork/server-storage/solarisstudio/documentation/o11-151-perf-analyzer-brief-1405338.pdf
You can also try compiling your code with Solaris Studio for more data.

Resources