Non-repeatable affinity for pthreads - multithreading

I am trying to measure the time it takes for a thread from creation to actually start.
Using POSIX thread on a Debian 6.0 machine with 32-cores (no hyper-threading) and calling pthread_attr_setaffinity_np function to set the affinity.
In a loop, I am creating the threads, waiting for them to finish, repeatedly.
So, my code looks like the following (thread 0 is running this).
for (ni=0; ni<n; ni++)
{
pthread_t *thrds;
pthread_attr_t attr;
cpu_set_t cpuset;
ths = 1; // thread starts from 1
thrds = malloc(sizeof(pthread_t)*nt); // thrds[0] not used
assert(!pthread_attr_init(&attr));
for (i=ths; i<nt; i++)
{
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
CPU_ZERO(&cpuset);
CPU_SET(i, &cpuset); // setting i as the affinity for thread i
assert(!pthread_attr_setaffinity_np(&attr,
sizeof(cpu_set_t), &cpuset));
assert(!pthread_create(thrds+i, &attr, DoWork, i));
}
pthread_attr_destroy(&attr);
DoWork(0);
for (i=ths; i<nt; i++)
{
pthread_join(thrds[i], NULL);
}
if (thrds) free(thrds);
}
Inside the thread function, I am calling sched_getcpu() to verify that the affinity is working. The problem is, this verification only passes the first iteration of i-loop. For the second iteration, thrd[1] gets the affinity of nt-1 (instead of 1) and so on.
Can anyone please explain why? And/or how to fix it?
NOTE: I found a workaround that if I put the master thread to sleep for 1 second after the join finishes at each iteration, the affinity works correctly. But this sleep duration could different on other machines. So still need a real fix for the issue.

Related

How an exit code is determined in a multithreaded program when one of the threads fails?

struct func
{
int& i;
func(int& i_):i(i_){}
void operator()()
{
for (unsigned j = 0; j < 1000000; ++j)
{
++i;
}
}
};
int main()
{
int some_local_state = 0;
func my_func(some_local_state);
std::thread my_thread(my_func);
my_thread.detach();
return 0;
}
Output is
(process 1528) exited with code -1073741819
What determines the exit code? What does detaching mean for a Windows process?
In this example, the error code -1073741819 (0xc0020001) is not produced by your executable but by the operating system which decided to kill your process.
You also asked a question (in the comments) about detaching a thread.
This means that you will not use join() on this thread, thus you launch it, but you are not interested in knowing when it finishes its work.
EDIT
In my first answer I misread the example and thought the abrupt termination was due to an invalid memory access through the
uninitialized i reference.
It was wrong since i is actually initialised in order to reference some_local_state.
However, when main() returns some_local_state does not exist anymore while being still referenced by the thread.
Nothing is said about what happens to the detached thread at the exact moment when main() returns.
Does this thread terminate immediately before the local variables of main() disappear? I really have doubts about this...
This probably explains the abnormal termination of the process.

Thread scheduling/Data race conditions

In class, we are studying threads and race conditions. By my estimates, it should be possible for the below code to output the value 8 or 9, as it is possible that thread 1 is interrupted by thread 2 before the counter value is updated, but after it has been incremented in the eax register.
int counter = 10;
void *worker(void *arg) {
counter--;
return NULL;
}
int main(int argc, char *argv[]) {
pthread_t p1, p2;
pthread_create(&p1, NULL, worker, NULL);
pthread_create(&p2, NULL, worker, NULL);
pthread_join(p1, NULL);
pthread_join(p2, NULL);
printf("%d\n", counter);
}
However, when I run the code, I always receive the output 8. Is it a mechanism of the compiler that normalizes the output, or is it only possible for the code to output 8 (no race condition is created)?
There's no way for us to tell without knowing lots of complicated details about your platform, compiler, maybe even CPU. The code has a race condition in theory but it may be exceptionally difficult, maybe even impossible, to trigger.
Of course, if you upgrade your compiler or CPU, change compilation options, upgrade your OS, or do any number of other things, it may start behaving differently.
This is one of the reasons race conditions can be so insidious. They can be impossible to trigger under some conditions and then suddenly start happening all the time when some change is made elsewhere.
The code definitely has a race condition.
I don't find it surprising that you're seeing consistent results--starting a thread takes a little while, so there's a good chance that in your case, the first thread finishes before the second gets started.
Nonetheless, the code clearly has undefined behavior, because there's no question it has a race condition.
There definitely is a race condition. The reason you're not seeing it is because the increment happens so fast compared to the time it takes to start a thread that it's likely for the first thread to be done before the second thread even starts. You'll see the race condition if you make the amount of work sufficiently large that the first thread will still be running when the second one starts.
example: modify the worker function to decrement in a loop
int counter = 1000000000;
void* worker(void *arg)
{
for (int i = 0; i < 500000000; ++i)
--counter;
return NULL;
}
Since counter starts at 1 billion, and you're running 2 threads that each decrement counter by 500 million, you would expect counter to be 0 when you are done if race conditions didn't exist.

How to join all threads before deleting the ThreadPool

I am using a MultiThreading class which creates the required number of threads in its own threadpool and deletes itself after use.
std::thread *m_pool; //number of threads according to available cores
std::mutex m_locker;
std::condition_variable m_condition;
std::atomic<bool> m_exit;
int m_processors
m_pool = new std::thread[m_processors + 1]
void func()
{
//code
}
for (int i = 0; i < m_processors; i++)
{
m_pool[i] = std::thread(func);
}
void reset(void)
{
{
std::lock_guard<std::mutex> lock(m_locker);
m_exit = true;
}
m_condition.notify_all();
for(int i = 0; i <= m_processors; i++)
m_pool[i].join();
delete[] m_pool;
}
After running through all tasks, the for-loop is supposed to join all running threads before delete[] is being executed.
But there seems to be one last thread still running, while the m_pool does not exist anymore.
This leads to the problem, that I can't close my program anymore.
Is there any way to check if all threads are joined or wait for all threads to be joined before deleting the threadpool?
Simple typo bug I think.
Your loop that has the condition i <= m_processors is a bug and will actually process one extra entry past the end of the array. This is an off-by-one bug. Suppose m_processors is 2. You'll have an array that contains 2 elements with indices [0] and [1]. Yet, you'll be reading past the end of the array, attempting to join with the item at index [2]. m_pool[2] is undefined memory and you're likely going to either crash or block forever there.
You likely intended i < m_processors.
The real source of the problem is addressed by Wick's answer. I will extend it with some tips that also solve your problem while improving other aspects of your code.
If you use C++11 for std::thread, then you shouldn't create your thread handles using operator new[]. There are better ways of doing that with other C++ constructs, which will make everything simpler and exception safe (you don't leak memory if an unexpected exception is thrown).
Store your thread objects in a std::vector. It will manage the memory allocation and deallocation for you (no more new and delete). You can use other more flexible containers such as std::list if you insert/delete threads dynamically.
Fill the vector in place with std::generate or similar
std::vector<std::thread> m_pool;
m_pool.reserve(n_processors);
// Fill the vector
std::generate_n( std::back_inserter(m_pool), m_processors,
[](){ return std::thread(func); } );
Join all the elements using range-for loop and delete handles using container's functions.
for( std::thread& t: m_pool ) {
t.join();
}
m_pool.clear();

Multithreaded Environment - Signal Handling in c++ in unix-like environment (freeBSD and linux)

I wrote a network packet listener program and I have 2 threads. Both runs forever but one of them sleeps 30 sec other sleeps 90 sec. In main function, I use sigaction function and after installed signal handler, I created these 2 threads. After creation of threads, main function calls pcaploop function, which is infinite loop. Basic structure of my program:
(I use pseudo syntax)
signalHandler()
only sets a flag (exitState = true)
thread1()
{
while 1
{
sleep 30 sec
check exit state, if so exit(0);
do smth;
}
}
thread2()
{
while 1
{
sleep 90 sec
check exit state, if so exit(0);
do smth;
}
}
main()
{
necassary snytax for sigaction ;
sigaction( SIGINT, &act, NULL );
sigaction( SIGUSR1, &act, NULL );
create thread1;
create thread2;
pcaploop(..., processPacket,...); // infinite loop, calls callback function (processPacket) everytime a packet comes.
join threads;
return 0;
}
processPacket()
{
check exitState, if true exit(0);
do smth;
}
And here is my question. When I press CTRL-C program does not terminate. If the program run less than 6-7 hours, when I press CTRL-C, program terminates. If the program run 1 night, at least 10 hours or more, I cannot terminate the program. Actually, signal handler is not called.
What could be the problem? Which thread does catch the signal?
Basically it would be better to remove all pseudo code you put in your example, and leave the minimum working code, what exactly you have.
From what I can see so far from your example, is that the error handling of sigaction's is missing.
Try to perform checks against errors in your code.
I am writing this for those who had faced with this problem. My problem was about synchronization of threads. After i got handle synchronization problem, the program now, can handle the signals. My advice is check the synchronization again and make sure that it works correctly.
I am sorry for late answer.
Edited :
I have also used sigaction for signal handling
and I have change my global bool variable whit this definition :
static volatile sig_atomic_t exitFlag = 0;
This flag has been used for checking whether the signal received or not.

ptrace one thread from another

Experimenting with the ptrace() system call, I am trying to trace another thread of the same process. According to the man page, both the tracer and the tracee are specific threads (not processes), so I don't see a reason why it should not work. So far, I have tried the following:
use PTRACE_TRACEME from the clone()d child: the call succeeds, but does not do what I want, probably because the parent of the to-be-traced thread is not the thread that called clone()
use PTRACE_ATTACH or PTRACE_SEIZE from the parent thread: this always fails with EPERM, even if the process runs as root and with prctl(PR_SET_DUMPABLE, 1)
In all cases, waitpid(-1, &status, __WALL) fails with ECHILD (same when passing the child pid explicitly).
What should I do to make it work?
If it is not possible at all, is it by desing or a bug in the kernel (I am using version 3.8.0). In the former case, could you point me to the right bit of the documentation?
As #mic_e pointed out, this is a known fact about the kernel - not quite a bug, but not quite correct either. See the kernel mailing list thread about it. To provide an excerpt from Linus Torvalds:
That "new" (last November) check isn't likely going away. It solved
so many problems (both security and stability), and considering that
(a) in a year, only two people have ever even noticed
(b) there's a work-around as per above that isn't horribly invasive
I have to say that in order to actually go back to the old behaviour,
we'd have to have somebody who cares deeply, go back and check every
single special case, deadlock, and race.
The solution is to actually start the process that is being traced in a subprocess - you'll need to make the ptracing process be the parent of the other.
Here's an outline of doing this based on another answer that I wrote:
// this number is arbitrary - find a better one.
#define STACK_SIZE (1024 * 1024)
int main_thread(void *ptr) {
// do work for main thread
}
int main(int argc, char *argv[]) {
void *vstack = malloc(STACK_SIZE);
pid_t v;
if (clone(main_thread, vstack + STACK_SIZE, CLONE_PARENT_SETTID | CLONE_FILES | CLONE_FS | CLONE_IO, NULL, &v) == -1) { // you'll want to check these flags
perror("failed to spawn child task");
return 3;
}
long ptv = ptrace(PTRACE_SEIZE, v, NULL, NULL);
if (ptv == -1) {
perror("failed monitor sieze");
return 1;
}
// do actual ptrace work
}

Resources