C++11: Thread pool for successive input reduction - multithreading

Consider a container of type T elements, e.g. std::vector<T> input, and an excessively time-consuming function T f(const T&, const T&). I wish to apply f on input until one single element remains - in parallel.
I am new to C++ so I adapted an existing implementation for a Thread pool pattern in C++11 to my needs, in particular this example.
Code for discussion:
#include <thread>
#include <iostream>
#include <algorithm>
#include <vector>
#include <queue>
#include <memory>
#include <future>
#include <functional>
#include <stdexcept>
#include "ThreadPool.h"
template<class T>
T ParallelReduction(ThreadPool& pool, const std::vector<T>& input, const std::function<T(T,T)>& func)
{
std::queue<std::future<T>> futureResults;
std::for_each(input.begin(), input.end(), [&futureResults](T e)
{
// T -> future<T>
futureResults.push(std::async([e]{return e;}));
});
for (;;)
{
if (futureResults.size() > 1)
{
// Get two "old" partial results, forward them to func(T,T)
T a = futureResults.front().get();
futureResults.pop();
T b = futureResults.front().get();
futureResults.pop();
futureResults.push(pool.enqueue([a, b, func]{return func(a,b);}));
}
else
{
// all input and partial result elements have been processed once, return the final result
return futureResults.front().get();
}
}
}
int main(int argc, char *argv[])
{
ThreadPool pool(4);
std::vector<int> input(12);
int n = 0;
std::generate(input.begin(), input.end(), [&n]{ return ++n; });
auto sum = ParallelReduction<int>(pool, input, [](int a, int b){return a + b;});
std::cout << sum << "\n";
auto fibonacci = ParallelReduction<int>(pool, input, [](int a, int b){return a * b;});
std::cout << fibonacci << "\n";
return 0;
}
My questions:
Is it possible to implement this without introducing futureResults, i.e. a solution that works "inline" on std::vector<T>& input?
If not, can we optimize the way we assign values T to std::future<T> in the first part?
Is it more efficient to forward std::future<T> to our function func instead of T, i.e. let the child thread wait and access the value of a and b and not the main thread?
The parts inside the loop look awful - any ideas?
Where can I optimize the code?

Related

Rcpp: how to use unwind protection?

I was wondering how could I make some Rcpp code use automatic unwind protection in all Rcpp object creations.
For example, suppose I have some code like this:
#include <stdint.h>
#include <Rcpp.h>
class MyObj {
public:
int val;
MyObj(int val) : val(val) {};
~MyObj() {
std::cout << "I' being destructed - value was: " << val << std::endl;
}
};
// [[Rcpp::export]]
Rcpp::NumericVector crashme(unsigned int seed)
{
srand(seed);
MyObj obj1(rand());
Rcpp::NumericVector out(INT64_MAX-1, 100.);
return out;
}
When I call crashme, obj1 doesn't get destructed before the function ends, due to R's long jumps which I want to protect against.
I see there is a function Rcpp::unwindProtect, but it's implemented as something that takes a callback.
I'm not 100% sure if I'm doing it right, but I managed to add unwind protection like this:
#include <stdint.h>
#include <Rcpp.h>
#include <Rcpp/unwindProtect.h>
// [[Rcpp::plugins(unwindProtect)]]
class MyObj {
public:
int val;
MyObj(int val) : val(val) {};
~MyObj() {
std::cout << "I' being destructed - value was: " << val << std::endl;
}
};
struct NumVecArgs {
size_t size;
double fillwith;
};
SEXP alloc_NumVec(void *data)
{
NumVecArgs *args = (NumVecArgs*)data;
return Rcpp::NumericVector(args->size, args->fillwith);
}
// [[Rcpp::export]]
Rcpp::NumericVector crashme(unsigned int seed)
{
srand(seed);
MyObj obj1(rand());
NumVecArgs args = {INT64_MAX-1, 100.};
Rcpp::NumericVector out = Rcpp::unwindProtect(alloc_NumVec, (void*)&args);
return out;
}
Now calling crashme will successfully destruct obj1 and print the destructor message.
But this is very inconvenient, since I have a series of different Rcpp object allocations taking different constructor types, which would imply either defining a different struct and callback for each one of them, or translating all the calls to lengthy lambda functions.
Is there any way to automatically make all calls to constructors of e.g. Rcpp::NumericVector and Rcpp::IntegerVector have unwind protection?

How should you use C++14 shared mutex with lambda captures and multiple threads?

I have some very simple code which is supposed to test a multi-threaded logger by starting 10 threads at the same time which will all write to the logger at once.
I expect to see all 10 messages, not in any order; However, I randomly get 5,6,7,8,9, and sometimes 10 output messages.
Here is the code:
//*.cxx
#include <iostream>
#include <mutex>
#include <shared_mutex> // requires c++14
#include <string>
#include <thread>
#include <vector>
namespace {
std::mutex g_msgLock;
std::shared_timed_mutex g_testingLock;
}
void info(const char * msg) {
std::unique_lock<std::mutex> lock(g_msgLock);
std::cout << msg << '\n'; // don't flush
}
int main(int argc, char** argv) {
info("Start message..");
std::vector<std::thread> threads;
unsigned int threadCount = 10;
threads.reserve(threadCount);
{ // Scope for locking all threads
std::lock_guard<std::shared_timed_mutex> lockAllThreads(g_testingLock); // RAII (scoped) lock
for (unsigned int i = 0; i < threadCount; i++) {
// Here we start the threads using lambdas
threads.push_back(std::thread([&, i](){
// Here we block and wait on lockAllThreads
std::shared_lock<std::shared_timed_mutex> threadLock(g_testingLock);
std::string msg = std::string("THREADED_TEST_INFO_MESSAGE: ") + std::to_string(i);
info(msg.c_str());
}));
}
} // End of scope, lock is released, all threads continue now
for(auto& thread : threads){
thread.join();
}
}
The output is generally something of the form:
Start message..
THREADED_TEST_INFO_MESSAGE: 9
THREADED_TEST_INFO_MESSAGE: 5
THREADED_TEST_INFO_MESSAGE: 3
THREADED_TEST_INFO_MESSAGE: 1
THREADED_TEST_INFO_MESSAGE: 4
THREADED_TEST_INFO_MESSAGE: 0
THREADED_TEST_INFO_MESSAGE: 8
THREADED_TEST_INFO_MESSAGE: 7
Notice that there are only 8 outputs for this run.
Interestingly enough, this problem was associated with my build system which was dropping messages. The executable is always producing the outputs as expected.

Issue in predicate function in wait in thread C++

I am trying to put the condition inside a function but it is throwing confusing compile time error . While if I write it in lambda function like this []{ retur i == k;} it is showing k is unidentified . Can anybody tell How to solve this problem .
#include <iostream>
#include <mutex>
#include <sstream>
#include <thread>
#include <chrono>
#include <condition_variable>
using namespace std;
condition_variable cv;
mutex m;
int i;
bool check_func(int i,int k)
{
return i == k;
}
void print(int k)
{
unique_lock<mutex> lk(m);
cv.wait(lk,check_func(i,k)); // Line 33
cout<<"Thread no. "<<this_thread::get_id()<<" parameter "<<k<<"\n";
i++;
return;
}
int main()
{
thread threads[10];
for(int i = 0; i < 10; i++)
threads[i] = thread(print,i);
for(auto &t : threads)
t.join();
return 0;
}
Compiler Error:
In file included from 6:0:
/usr/include/c++/4.9/condition_variable: In instantiation of 'void std::condition_variable::wait(std::unique_lock<std::mutex>&, _Predicate) [with _Predicate = bool]':
33:30: required from here
/usr/include/c++/4.9/condition_variable:97:14: error: '__p' cannot be used as a function
while (!__p())
^
wait() takes a predicate, which is a callable unary function returning bool. wait() uses that predicate like so:
while (!pred()) {
wait(lock);
}
check_func(i,k) is a bool. It's not callable and it's a constant - which defeats the purpose. You're waiting on something that can change. You need to wrap it in something that can be repeatedly callable - like a lambda:
cv.wait(lk, [&]{ return check_func(i,k); });

C++ thread and mutex and condition variable

findsmallest common multiple of 10-million numbers in the queue does not exceed 10,000
I killed 2 days to sort out but I just do not understand! please help me
#include <condition_variable>
#include <mutex>
#include <thread>
#include <iostream>
#include <queue>
#include <chrono>
#include <cmath>
#include <map>
#include <cstdlib>
#include <fstream>
#include <ctime>
using namespace std;
int main()
{
std::map <int, int> NOK;
map<int, int> snok;
std::queue<int> oche;
std::mutex m;
std::condition_variable cond_var;
bool done = false;
bool notified = false;
std::thread filev([&]() {
//std::unique_lock<std::mutex> lock(m);
ifstream in; // Поток in будем использовать для чтения
int ch;
in.open("/home/akrasikov/prog/output.txt");
while(!in.eof()){
if (oche.size()>9999){
std::this_thread::sleep_for(std::chrono::milliseconds(3));
std::unique_lock<std::mutex> lock(m);
} else {
in>>ch;
oche.push(ch);
}
}
notified = true;
cond_var.notify_one();
done = true;
cond_var.notify_one();
});
std::thread nok([&]() {
std::unique_lock<std::mutex> lock(m);
while (!done) {
while (!notified) { // loop to avoid spurious wakeups
cond_var.wait(lock);
}
while (!oche.empty()) {
ch=oche.front();
oche.pop();
int j=2;
while (j < sqrt((double)ch)+1){
int s=0;
while(!(ch%j)){
s++;
ch/=j;
}
if (s > 0 && NOK[j] < s){
NOK[j] = s;
}
j++;
}
if (NOK[ch] == 0) NOK[ch]++;
}
long int su=1;
int temp=-1;
int step=0;
int sa=1;
std::cout << " NOK= ";
for (std::map<int, int>::iterator it=NOK.begin(); it!=NOK.end(); it++){
for (int i=0; i<it->second; i++){
su*=it->first;
sa=it->first;
if (temp<sa && sa >1){
temp=sa;
step=1;
} else {
if(sa>1)
step++;
}
}
cout<< temp << "^"<< step << " * " ;
}
std::cout << "su = " << su << '\n';
}
notified = false;
});
filev.join();
nok.join();
}
This program does not work! how come? what's wrong? it just starts and hangs, but if you do not delete is code
if (oche.size()>9999){
std::this_thread::sleep_for(std::chrono::milliseconds(3));
std::unique_lock<std::mutex> lock(m);
} else {
and
while (!done) {
while (!notified) { // loop to avoid spurious wakeups
cond_var.wait(lock);
}
everything works help plz
From what I understand of your problem, you have 3 problems
Conpute the least common multiple for a list of 1M elements
You want to have one thread that produces the elements and one that consumes it. They transfer it through a buffer (a queue in your case)
Your queue cannot exceed 10K elements
In my implementation I m generating the numbers randomly and using condition variables to coordinate between the threads.
Note that the LCM is associative so you can compute it recursively, not matter what the order is.
Here is the code but please DO NOT POST DIRTY CODE LIKE YOU DID NEXT TIME OR EVERYONE will kick you out.
Here is the code
#include <condition_variable>
#include <mutex>
#include <thread>
#include <iostream>
#include <queue>
#include <chrono>
#include <cmath>
#include <map>
#include <cstdlib>
#include <fstream>
#include <ctime>
#include <atomic>
#include <random>
using namespace std;
std::mutex mutRandom;//use for multithreading for random variables
int getNextRandom()
{
std::lock_guard<std::mutex> lock(mutRandom);
// C++11 Random number generator
std::mt19937 eng (time(NULL)); // Mersenne Twister generator with a different seed at each run
std::uniform_int_distribution<int> dist (1, 1000000);
return dist(eng);
}
//thread coordination
std::mutex mut;
std::queue<int> data_queue;
std::condition_variable data_cond;
std::atomic<int> nbData=0;
std::atomic<int> currLCM=1;//current LCM
const unsigned int nbMaxData=100000;
const unsigned int queueMaxSize=10000;
//Arithmetic function, nothing to do with threads
//greatest common divider
int gcd(int a, int b)
{
for (;;)
{
if (a == 0) return b;
b %= a;
if (b == 0) return a;
a %= b;
}
}
//least common multiple
int lcm(int a, int b)
{
int temp = gcd(a, b);
return temp ? (a / temp * b) : 0;
}
/// Thread related part
//for producing the data
void produceData()
{
while (nbData<nbMaxData)
{
std::unique_lock<std::mutex> lk(mut);
data_cond.wait(lk,[]{
return data_queue.size()<queueMaxSize;
});
cout<<nbData<<endl;
++nbData;
data_queue.push(getNextRandom());
data_cond.notify_one();
lk.unlock();
}
cout<<"Producer done \n";
}
//for consuming the data
void consumeData()
{
while (nbData<nbMaxData)
{
std::unique_lock<std::mutex> lk(mut);
data_cond.wait(lk,[]{
return !data_queue.empty();
});
int currData=data_queue.front();
data_queue.pop();
lk.unlock();
currLCM = lcm(currLCM,currData);
}
cout<<"Consumer done \n";
}
int main()
{
std::thread thProduce(&produceData);
std::thread thConsume(&consumeData);
thProduce.join();//to wait for the producing thread to finish before the program closes
thConsume.join();//same thing for the consuming one
return 0;
}
Hope that helps,

writing my first exploit in linux

How can I modify the source code in the func( ) so that the address to which the program returns after executing func () is changed in such a manner that the instruction printf("first print\n”) is skipped. Use the pointer *ret defined in func() to modify the return address appropriately in order to achieve this.
Here is the code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void func(char *str)
{
char buffer[24];
int *ret;
strcpy(buffer,str);
}
int main(int argc, char **argv)
{
if (argc < 2)
{
printf("One argument needed.\n");
exit(0);
}
int x;
x = 0;
func(argv[1]);
x = 1;
printf("first print\n");printf("second print\n");
}
As sherrellbc noted, a program's exploits are usually written without modifying its source code. But if you want, inserting these two lines into func() may do:
ret = (int *)&str; // point behind saved return address
ret[-1] += 12; // or however many code bytes are to be skipped

Resources