Crash in program using OpenMP, x64 only - visual-c++

The program below crashes when I build it in Release x64 (all other configurations run fine).
Am I doing it wrong or is it an OpenMP issue?
Well-grounded workarounds are highly appreciated.
To reproduce build a project (console application) with the code below.
Build with /openmp and /GL and (/O1 or /O2 or /Ox) options in Release x64 configuration.
That is OpenMP support and C++ optimization must be turned on. The resulting program should (should not) crash.
#include <omp.h>
#include <vector>
class EmptyClass
{
public:
EmptyClass() {}
};
class SuperEdge
{
public:
SuperEdge() {mp_points[0] = NULL; mp_points[1] = NULL;}
private:
const int* mp_points[2];
};
EmptyClass CreateEmptyClass(SuperEdge s)
{
return EmptyClass();
}
int main(int argc, wchar_t* argv[], wchar_t* envp[])
{
std::vector<int> v;
long count = 1000000;
SuperEdge edge;
#pragma omp parallel for
for(long i = 0; i < count; ++i)
{
EmptyClass p = CreateEmptyClass(edge);
#pragma omp critical
{
v.push_back(0);
}
}
return 0;
}

I think it is a bug. Looking at the ASM output with /O2 on the push_back call has been optimized away and there are just a couple of reserve calls and what looks like direct accesses instead. The reserve calls however don't appear to be in the critical section and you end up with Heap corruption. Doing a release x64 with /openmp /GL /Od you will see that there is a call to push_back in the asm, and it is between the _vcomp_enter_critsect calls, and doesn't crash. I'd report it to MS. (tested with VS 2010)

Related

Visual C++ compiler bug?

I've reduced my case as much as possible below.
#include <vector>
#include <atomic>
#include <chrono>
using namespace std;
class Unused
{
private:
vector<vector<unsigned>> data;
atomic<unsigned> counter;
};
class Timer
{
private:
chrono::time_point<chrono::high_resolution_clock> begin;
public:
void start()
{
begin = std::chrono::high_resolution_clock::now();
}
};
int main()
{
Unused unused;
vector<Timer> timers;
timers.resize(1);
timers[0].start();
}
I've compiled it as (note the specific flags):
cl /O2 /GL /EHsc driver.cpp
This is with
Microsoft (R) C/C++-Optimierungscompiler Version 19.27.29111 für x86
Microsoft (R) Incremental Linker Version 14.27.29111.0
but I've tried a couple of other recent versions as well. The executable segfaults with a memory access violation. It works with g++, and it works if I change the compile flags. It also works is I simplify the code further.
Is this a compiler bug?
It was indeed a compiler bug, https://developercommunity.visualstudio.com/content/problem/1157189/possible-compiler-bug-1.html.

which openmp schedule am I running?

How to check in runtime the openmp schedule?
I compile my code with a parallel loop and runtime scheduele
#pragma omp parallel for schedule(runtime) collapse(2)
for(j=1;j>-2;j-=2){
for(i=0;i<n;i++){
//nested loop code here
}
}
and I specify the environment variable OMP_SCHEDULE=dynamic,50.
How can I check in runtime that my program actually took the OMP_SCHEDULEvariable ?
I am using openmp 3.1 with gcc 4.7.3
I downloaded http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
Then went to the section "C/C++ Stub Routines" and found this
void omp_get_schedule(omp_sched_t *kind, int *chunk_size)
{
*kind = omp_sched_static;
*chunk_size = 0;
}
Then made this test
/*
typedef enum omp_sched_t {
omp_sched_static = 1,
omp_sched_dynamic = 2,
omp_sched_guided = 3,
omp_sched_auto = 4
} omp_sched_t;
*/
#include <omp.h>
#include <stdio.h>
int main(void) {
omp_sched_t kind;
int chunk;
omp_get_schedule(&kind, &chunk);
printf("%d %d\n", kind, chunk);
}
and compiled
gcc -fopenmp -O3 foo.c
and then
export OMP_SCHEDULE=static,50
./a.out
1 50
export OMP_SCHEDULE=dynamic,100
2 100
Note that omp_get_schedule only reports the runtime scheduling definition OMP_SCHEDULE. If you change the scheduling with e.g.
#pragma omp parallel for schedule(static,1)
and define OMP_SCHEDULE=dynamic,100 then omp_get_schedule still reports dynamic scheduling and chunk size 100.

Visual Studio Settings For Multithreading

I'm trying to run the below c++ code in Visual Studio compiling with both Visual C++ and İntel Compiler. I'm setting the /Qopenmp option. Although the omp_get_max_threads() result is 2 the "printf("Running on multiple threads\n")" part is not printed twice.
This is the code:
#include <omp.h>
int main(int argc, char* argv[])
{
printf("Starting Program!\n");
int ntr;
omp_set_dynamic(0);
omp_set_num_threads(2);
#pragma omp parallel
{
printf("Running on multiple threads\n");
ntr = omp_get_max_threads();
printf("%d\n",ntr);
}
printf("Finished!\n");
return 0;
}
This is the output
Starting Program!
Running on multiple threads
2
Finished!
What is wrong with this code or Visual Studio settings?

error: undefined reference to `sched_setaffinity' on windows xp

Basically the code below was intended for use on linux and maybe thats the reason I get the error because I'm using windows XP, but I figure that pthreads should work just as well on both machines. I'm using gcc as my compiler and I did link with -lpthread but I got the following error anyways.
|21|undefined reference to sched_setaffinity'|
|30|undefined reference tosched_setaffinity'|
If there is another method to setting the thread affinity using pthreads (on windows) let me know. I already know all about the windows.h thread affinity functions available but I want to keep things multiplatform. thanks.
#include <stdio.h>
#include <math.h>
#include <sched.h>
double waste_time(long n)
{
double res = 0;
long i = 0;
while(i <n * 200000)
{
i++;
res += sqrt (i);
}
return res;
}
int main(int argc, char **argv)
{
unsigned long mask = 1; /* processor 0 */
/* bind process to processor 0 */
if (sched_setaffinity(0, sizeof(mask), &mask) <0)//line 21
{
perror("sched_setaffinity");
}
/* waste some time so the work is visible with "top" */
printf ("result: %f\n", waste_time (2000));
mask = 2; /* process switches to processor 1 now */
if (sched_setaffinity(0, sizeof(mask), &mask) <0)//line 30
{
perror("sched_setaffinity");
}
/* waste some more time to see the processor switch */
printf ("result: %f\n", waste_time (2000));
}
sched_getaffinity() and sched_setaffinity() are strictly Linux-specific calls. Windows provides its own set of specific Win32 API calls that affect scheduling. See this answer for sample code for Windows.

Try...catch causes segmentation fault on embedded ARM with posix threads

Today, I posted a problem about a segmentation fault after destruction of a std::string (see this post). I've stripped the code so that I don't use the STL and still have a segmentation fault sometimes.
The following code works just fine on my PC running Linux. But using the ARM crosscompiler suppplied by the manufactor of our embedded device, it gives a segmentation fault just before catch (...).
This problems seems to have a link with this post in Google Groups, but I haven't found any solution yet.
The code is compiled using an ARM cross compiler
Any suggestions are still welcome!
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
void *ExecuteThreadMethod(void *AThread);
class Thread
{
private:
pthread_t internalThread;
public:
void RunSigSegv()
{
try
{
for (int i = 0; i < 200; i++)
{
usleep(10000);
}
} // <---- Segmentation fault occurs here
catch(...)
{
}
}
void Start()
{
pthread_attr_t _attr;
pthread_attr_init(&_attr);
pthread_attr_setdetachstate(&_attr, PTHREAD_CREATE_DETACHED);
pthread_create (&internalThread, &_attr, ExecuteThreadMethod, this);
}
};
void *ExecuteThreadMethod(void *AThread)
{
((Thread *)AThread)->RunSigSegv();
pthread_exit(NULL);
}
Thread _thread1;
Thread _thread2;
Thread _thread3;
Thread _thread4;
void s()
{
_thread1.Start();
_thread2.Start();
_thread3.Start();
_thread4.Start();
}
int main(void)
{
s();
usleep(5000000);
}
I once encountered a problem like this which was caused by linking with a version of libstdc++ with no threading support, meaning that all threads shared a common exception handling stack with disastrous consequences.
Make sure the cross-compiler and its libraries were configured with --enable-threads=posix.
Just a diagnostic question: What happens if you don't detach the thread in Start()? You'd have to pthread_join() the threads back in main().
Also, have you considered Boost's threads? That might be more appropriate since you're using C++ rather than C.

Resources