I need to have a thread which executes a function in while loop(say increments an int value). In the main I need to have a while loop which executes some function(say a for loop which counts from 0 to 5) and then reads the current value of a variable in the thread. The thread must keep running its own while loop irrespective of whats going on in main. However the value of the thread variable must not change while main reads the variable.
I guess this problem can be solved using atomic. However this is a toy problem in which the variable in the thread is an int. In my actual problem the thread variable if of type Eigen::quaternionf or float[4]. So I need to ensure that the entire Eigen::quaternionf or float[4] is held constant when it is read from main.
The cout in the thread is only for debugging. If the code runs with thread safety, it can be removed. I read from another post that using cout in a thread safe manner may need to write a new wrapper around cout with a mutex. I want to avoid it.
My main concern is reading the variable in correct order in main.
My code fails(today is my first day with multithreading) and is as below along with observed output(selected parts). the code fails to keep the order of the output using cout(garbled output). Also I am not sure that the thread variable is correctly read by the main.
#include <thread>
#include <mutex>
#include <iostream>
int i = 0;
void safe_increment(std::mutex& i_mutex)
{
while(1)
{
std::lock_guard<std::mutex> lock(i_mutex);
++i;
std::cout << "thread: "<< std::this_thread::get_id() << ", i=" << i << '\n';
}
}
int main()
{
std::mutex i_mutex;
std::thread t1(safe_increment, std::ref(i_mutex));
while(1)
{
for(int k =0; k < 5; k++)
{
std::cout << "main: k =" << k << '\n';
}
std::lock_guard<std::mutex> lock(i_mutex);
std::cout << "main: i=" << i << '\n';
}
}
The output(selected parts) I get is
thread: 139711042705152, i=223893
thread: 139711042705152, i=223894
thread: 139711042705152, i=223895
main: i=223895
main: k =0
thread: main: k =1139711042705152
main: k =2
main: k =3
, i=main: k =4
223896
thread: 139711042705152, i=223897
thread: 139711042705152, i=223898
thread: 139711042705152, i=224801
thread: 139711042705152, i=224802
main: i=224802
main: k =0
main: k =1
thread: main: k =2
main: k =3
main: k =4
139711042705152, i=224803
thread: 139711042705152, i=224804
thread: 139711042705152, i=224805
i is properly synchronized with the mutex. well done! obviously this runs until you force it to stop, so when you do find a better way to end execution, be sure to join on your thread.
to fix the garbling, you need to synchronize on std::cout:
int main()
{
std::mutex i_mutex;
std::thread t1(safe_increment, std::ref(i_mutex));
while(1)
{
std::lock_guard<std::mutex> lock(i_mutex);//moved up here to sync on std::cout << k
for(int k =0; k < 5; k++)
{
std::cout << "main: k =" << k << '\n';
}
std::cout << "main: i=" << i << '\n';
if (i > 100) break;
}
t1.join(); //thread will continue and main will wait until it is done
//your thread needs to have some way out of its while(1) as well.
}
the thread can maybe be this:
void safe_increment(std::mutex& i_mutex)
{
while(1)
{
std::lock_guard<std::mutex> lock(i_mutex);
++i;
std::cout << "thread: "<< std::this_thread::get_id() << ", i=" << i << '\n';
if (i > 111) break;
}
}
Related
I'm studying the operating system and the program that Dekker wrote for his third attempt to Mutual Exclusion
I wrote my code in C ++ in Visual Studio, the code is below, but I wonder how these two threads are still in the critical area at the same time?
The output of the program is below
#include<iostream>
#include<conio.h>
#include<thread>
using namespace std;
bool flag0 = false;
bool flag1 = false;
void p1()
{
flag0 = true;
while (flag1);
for (int i = 1; i <= 10; i++)
cout << "p1 : " << i << endl;
flag0 = false;
}
void p2()
{
flag1 = true;
while (flag0);
for (int i = -1; i >= -10; i--)
cout << "p2 : " << i << endl;
flag1 = false;
}
int main()
{
thread t1(p1);
thread t2(p2);
t1.join();
t2.join();
_getch();
return 0;
}
Output:
p1 : p2 : -11
p2 : p1 : 2
p1 : 3
p1 : 4
p1 : 5
p1 : 6
-2
p2 : -3
p2 : -4
p2 : -5
p2 : -6
p2 : -7
p2 : -8
p2 : -9
p2 : p1 : -107
p1 : 8
p1 : 9
p1 : 10
" I wonder how these two threads are still in the critical area at the same time"
You've got two threads accessing the same variables, without any form of synchronization. In C++, that is a form of Undefined Behavior. Undefined Behavior means that anything can happen. "Two threads in the same critical area at the same time" is not even remotely surprising.
The problem is that your flags are ordinary bool variables, which results in a data race and therefore UB. That basically means that all bets are off about how your program behaves! For example, the compiler could hoist the load in the while-loop, effectively transforming the loop into an infinite loop. But infinite loops without side effects are also UB, so the compiler is well in its right to remove the loop entirely!
But even if the compiler does not perform these optimizations, the code still does not guarantee mutual exclusion, because the flag operations are not sequentially consistent. Essentially what can happen is that both threads store true in their respective flag, but it is not guaranteed that this updated value is visible to the other thread. So it can happen that the subsequent load operations in the while-loop still return false for both threads.
To get the desired behavior, the operations on the flags need to be sequentially consistent, which means that all such operations have a single total order. To achieve that you have to define your flags as std::atomic<bool>. All operations on atomics are sequentially consistent by default (unless specified otherwise).
However, since this is the third attempt by Dekker and not the final (correct) version, it does provide mutual exclusion (under a sequentially consistent memory model), but is prone to deadlocks!
For more details you should familiarize yourself with the C++ memory model. I can recommend the paper Memory Models for C/C++ Programmers which I have co-authored.
#include<iostream>
#include<conio.h>
#include<thread>
using namespace std;
bool turnop1 = false;
void p1()
{
for (int i = 1; i <= 10; i++)
{
while (turnop1 == false);
cout << "p1 : " << i << endl;
turnop1 = false;
}
}
void p2()
{
for (int t = -1; t >= -10; t--)
{
while (turnop1 == true);
cout << "p2 : " << t << endl;
turnop1 = true;
}
}
int main()
{
thread t1(p1);
thread t2(p2);
t2.join();
t1.join();
_getch();
return 0;
}
Modified your code and worked for me. Thanks! I know this is not completely right but this Dekker's idea/example is enough for a simple explanation.
I am trying to understand, then, write some code that has to read from, and write to many different files and do so from the main loop of my application. I am hoping to use the C++11 model present in VS 2013.
I don't want to stall the main loop so I am investigating spinning off a thread each time a request to write or read a file is generated.
I've tried many things including using the async keyword which sounds promising. I boiled down some code to a simple example:
#include <future>
#include <iostream>
#include <string>
bool write_file(const std::string filename)
{
std::cout << "write_file: filename is " << filename << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(2000));
std::cout << "write_file: written" << std::endl;
return true;
}
int main(int argc, char* argv[])
{
const std::string filename = "foo.txt";
auto write = std::async(std::launch::async, write_file, filename);
while (true)
{
std::cout << "working..." << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(100));
std::cout << "write result is " << write.get() << std::endl;
}
}
I'm struggling to understand the basics but my expectation would be that this code would constantly print "working..." and interspersed in the output would be the write_file start and end messages. Instead, I see that the write_file thread seems to block the main loop output until the timer expires.
I realize I need to also consider mutex/locking on the code to actually write the file but I would like to understand this bit first.
Thank you if you can point me in the right direction.
Molly.
write.get() will wait for the async task to finish. You want to use wait_for() instead:
do {
std::cout << "working...\n";
} while(write.wait_for(std::chrono::milliseconds(100)) != std::future_status::ready);
std::cout << "write result is " << write.get() << "\n";
I am using a wrapper function in an external software to start a new thread, which updates a global variable, but yet this seems invisible to the main thread. I cant call join(), not to block the main thread and crash the software. boost::async, boost::thread and boost::packaged_task all behave the same way.
uint32 *Dval;
bool hosttask1()
{
while(*Dval<10)
{
++*Dval;
PlugIn::gResultOut << " within thread global value: " << *Dval << std::endl;
Sleep(500);
}
return false;
}
void SU_HostThread1(uint32 *value)
{
Dval = value;
*Dval = 2;
PlugIn::gResultOut << " before thread: " << *value << " before thread global: " << *Dval << std::endl;
auto myFuture = boost::async(boost::launch::async,&hosttask1);
//boost::thread thread21 = boost::thread(&hosttask1);
//boost::packaged_task<bool> pt(&hosttask1);
//boost::thread thread21 = boost::thread(boost::move(pt));
}
When I call the function:
number a=0
su_hostthread1(a)
sleep(2) //seconds
result(" function returned "+a+" \n")
OUTPUT:
before thread value: 2 before thread global value: 2
within thread global value: 3
within thread global value: 4
within thread global value: 5
within thread global value: 6
function returned 2
within thread global value: 7
within thread global value: 8
within thread global value: 9
within thread global value: 10
Any ideas?
Thanks in advance!
If you share data between threads, you must syncronize access to that data. The two possible ways are a mutex protecting said data and atomic operations. The simple reason is that caches and read/write reordering (both by CPU and compiler) exist. This is a complex topic though and it's nothing that can be explained in an answer here, but there are a few good books out there and also a bunch of code that gets it right.
The following code correctly reproduces what I intend to do. Mainly, the thread updates a global variable which the main thread correctly observes.
#include "stdafx.h"
#include <iostream>
#include <boost/thread.hpp>
#include <boost/chrono.hpp>
unsigned long *dataR;
bool hosttask1()
{
bool done = false;
std::cout << "In thread global value: " << *dataR << "\n"; //*value11 << *dataL <<
unsigned long cc = 0;
boost::mutex m;
while (!done)
{
m.lock();
*dataR = cc;
m.unlock();
cc++;
std::cout << "In thread loop global value: "<< *dataR << "\n";
if (cc==5) done = true;
}
return done;
}
void SU_HostThread1(unsigned long *value)
{
dataR = value;
std::cout << "Before thread value: " << *value << " Before thread global value: " << *dataR << "\n"; //*value11 << *dataL <<
auto myFuture = boost::async(boost::launch::async, &hosttask1);
return;
}
int main()
{
unsigned long value =1;
unsigned long *value11;
value11 = &value;
SU_HostThread1(value11);
boost::this_thread::sleep(boost::posix_time::seconds(1));
std::cout << "done with end value: " << *value11 << "\n";
return 0;
}
output:
Before thread value: 1 Before thread global value: 1
In thread global value: 1
In thread loop global value: 0
In thread loop global value: 1
In thread loop global value: 2
In thread loop global value: 3
In thread loop global value: 4
done with end value: 4
Yet when I copy this exactly to the SDK of the external software, the main thread does not update global value. Any ideas how this is so?
Thanks
output in external software:
before thread value: 1 before thread global value: 1
In thread global value: 1
In thread loop global value: 0
In thread loop global value: 1
In thread loop global value: 2
In thread loop global value: 3
In thread loop global value: 4
done with end value: 1
Likely this is because the compiler doesn't generally think about multithreading when optimising your code. If has seen you code checks a value repeatedly, and it knows that in single threading that value cannot change, so it just omitted the check.
If you declare the variable as volatile, then it will probably generate less efficient code that checks more often.
Of course you have to also understand that when a value is written, there are circumstances when it may not all be written in one go, so if you are unlucky enough to read it back when it is half-written, then you get back a garbage value. The fix for that is to declare it as std::atomic (which is automatically considered volatile by the optimiser), and then even more complex code will be emitted to ensure that the write and the read cannot intersect (or different processor primitives might be used for small objects such as integers)
most variables are not shared between threads, and when they are it is up to the programmer to identify those and balance optimisation against the thread synchronisation needs during design.
I am using Raspbian on Raspberry 3.
I need to divide my code in few blocks (2 or 4) and assign a thread per block to speed up calculations.
At the moment, I am testing with simple loops (see attached code) on one thread and then on 4 threads. And executions time on 4 threads is always 4 times longer, so it looks like this 4 threads are scheduled to run on the same CPU.
How to assign each thread to run on other CPUs? Even 2 threads on 2 CPUs should make big difference to me.
I even tried to use g++6 and no improvement. And using parallel libs openmp in the code with "#pragma omp for" still running on one CPU.
I tried to run this code on Fedora Linux x86 and I had the same behavior, but on Windows 8.1 and VS2015 i have got different results where time was the same one one thread and then on 4 threads, so it was running on different CPUs.
Would you have any suggestions??
Thank you.
#include <iostream>
//#include <arm_neon.h>
#include <ctime>
#include <thread>
#include <mutex>
#include <iostream>
#include <vector>
using namespace std;
float simd_dot0() {
unsigned int i;
unsigned long rezult;
for (i = 0; i < 0xfffffff; i++) {
rezult = i;
}
return rezult;
}
int main() {
unsigned num_cpus = std::thread::hardware_concurrency();
std::mutex iomutex;
std::vector<std::thread> threads(num_cpus);
cout << "Start Test 1 CPU" << endl; // prints !!!Hello World!!!
double t_start, t_end, scan_time;
scan_time = 0;
t_start = clock();
simd_dot0();
t_end = clock();
scan_time += t_end - t_start;
std::cout << "\nExecution time on 1 CPU: "
<< 1000.0 * scan_time / CLOCKS_PER_SEC << "ms" << std::endl;
cout << "Finish Test on 1 CPU" << endl; // prints !!!Hello World!!!
cout << "Start Test 4 CPU" << endl; // prints !!!Hello World!!!
scan_time = 0;
t_start = clock();
for (unsigned i = 0; i < 4; ++i) {
threads[i] = std::thread([&iomutex, i] {
{
simd_dot0();
std::cout << "\nExecution time on CPU: "
<< i << std::endl;
}
// Simulate important work done by the tread by sleeping for a bit...
});
}
for (auto& t : threads) {
t.join();
}
t_end = clock();
scan_time += t_end - t_start;
std::cout << "\nExecution time on 4 CPUs: "
<< 1000.0 * scan_time / CLOCKS_PER_SEC << "ms" << std::endl;
cout << "Finish Test on 4 CPU" << endl; // prints !!!Hello World!!!
cout << "!!!Hello World!!!" << endl; // prints !!!Hello World!!!
while (1);
return 0;
}
Edit :
On Raspberry Pi3 Raspbian I used g++4.9 and 6 with the following flags :
-std=c++11 -ftree-vectorize -Wl--no-as-needed -lpthread -march=armv8-a+crc -mcpu=cortex-a53 -mfpu=neon-fp-armv8 -funsafe-math-optimizations -O3
I have a multi threaded program and I am profiling time taken starting before all pthread_create's and after all pthread_join's.
Now I find that this time, lets call it X, which is shown below in "Done in xms" is actually user + sys time of time output. In my app the number argument to a.out controls how many threads to spawn. ./a.out 1 spawn 1 pthread and ./a.out 2 spawns 2 threads where each thread does the same amount of work.
I was expecting X to be the real time instead of user + sys time. Can someone please tell me why this is not so? Then this really means my app is indeed running parallel without any locking between threads.
[jithin#whatsoeverclever tests]$ time ./a.out 1
Done in 320ms
real 0m0.347s
user 0m0.300s
sys 0m0.046s
[jithin#whatsoeverclever tests]$ time ./a.out 2
Done in 450ms
real 0m0.266s
user 0m0.383s
sys 0m0.087s
[jithin#whatsoeverclever tests]$ time ./a.out 3
Done in 630ms
real 0m0.310s
user 0m0.532s
sys 0m0.105s
Code
int main(int argc, char **argv) {
//Read the words
getWords();
//Set number of words to use
int maxWords = words.size();
if(argc > 1) {
int numWords = atoi(argv[1]);
if(numWords > 0 && numWords < maxWords) maxWords = numWords;
}
//Init model
model = new Model(MODEL_PATH);
pthread_t *threads = new pthread_t[maxWords];
pthread_attr_t attr;
void *status;
// Initialize and set thread joinable
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
int rc;
clock_t startTime = clock();
for(unsigned i=0; i<maxWords; i++) {
//create thread
rc = pthread_create(&threads[i], NULL, processWord, (void *)&words[i] );
if (rc){
cout << "Error:unable to create thread: " << i << "," << rc << endl;
exit(-1);
}
}
// free attribute and wait for the other threads
pthread_attr_destroy(&attr);
for(unsigned i=0; i<maxWords; i++) {
rc = pthread_join(threads[i], &status);
if (rc){
cout << "Error:unable to join thread: " << i << "," << rc << endl;
exit(-1);
}
}
clock_t endTime = clock();
float diff = (((float)endTime - (float)startTime) / 1000000.0F ) * 1000;
cout<<"Done in "<< diff << "ms\n";
delete[] threads;
delete model;
}
The clock function is specifically documented to return the processor time used by a process. If you want to measure wall time elapsed, it's not the right function.