Using GLUT after fork on OSX Sierra [duplicate] - multithreading

This question already has an answer here:
Why can't I use cocoa frameworks in different forked processes?
(1 answer)
Closed 6 years ago.
Before Sierra, I used to be able to initialize GLUT on the child process after forking the original process. With the latest version of Sierra, this seems to have changed. The following program crashes with a segmentation fault. If I instead move all the glut functions to the parent process, everything works. Why is there a difference between using the parent/child process?
#include <stdlib.h>
#include <GLUT/glut.h>
void pass(void){
}
int main(int argc, char* argv[]) {
pid_t childpid;
childpid = fork();
if (childpid == 0){
glutInit(&argc,argv);
glutInitDisplayMode(GLUT_SINGLE | GLUT_RGB | GLUT_DEPTH);
glutInitWindowSize(100,100);
glutCreateWindow("test");
glutDisplayFunc(pass);
glGetError();
glutMainLoop();
}else{
sleep(5);
}
exit(1);
}
The segmentation fault I get:
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes: 0x0000000000000001, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Termination Signal: Illegal instruction: 4
Termination Reason: Namespace SIGNAL, Code 0x4
Terminating Process: exc handler [0]
Application Specific Information:
BUG IN CLIENT OF LIBDISPATCH: _dispatch_main_queue_callback_4CF called from the wrong thread
crashed on child side of fork pre-exec
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libdispatch.dylib 0x00007fffe8e7bd21 _dispatch_main_queue_callback_4CF + 1291
1 com.apple.CoreFoundation 0x00007fffd3c7bbe9 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 9
2 com.apple.CoreFoundation 0x00007fffd3c3d00d __CFRunLoopRun + 2205
3 com.apple.CoreFoundation 0x00007fffd3c3c514 CFRunLoopRunSpecific + 420
4 com.apple.Foundation 0x00007fffd57e1c9b -[NSRunLoop(NSRunLoop) limitDateForMode:] + 196
5 com.apple.glut 0x0000000104f39e93 -[GLUTApplication run] + 321
6 com.apple.glut 0x0000000104f46b0e glutMainLoop + 279
7 a.out 0x0000000104f24ed9 main + 121 (main.c:18)
8 libdyld.dylib 0x00007fffe8ea4255 start + 1

fork() is not creating a thread, it forks the process. After calling fork you have two processes with nearly identical address space contents, but their address spaces are protected from each other. The general rule regarding fork() is, that the only sensisble thing to do after a fork in the child process, is to replace the process image with execve(); doing anything else requires a lot of foresight in the program's design.
Thread's are created in a different way. I suggest you use the actual threading primitives offered by your programming language of choice.
That being said, many OSs and GUI libraries want a process' GUI parts to run in the main thread, so that could be part of the reason as well. Also be aware that OpenGL and multithreading is a little bit finicky.

Related

Linux - Using mutex to synchonise serial port

I'm writing a C program for Linux OS.
The program can start a timer: both main program and timer can send and receive characters on a serial port.
My attempt is to serialize the serial port access by a mutex in a global structure initialized on the opening with:
if (pthread_mutex_init( &pED->lockSerial, NULL) != 0)
{
lwsl_err("lockSerial init failed\n");
}
I protected all the functions that send data on the port as follow:
ssize_t cmdFirmwareVersion(EngineData *pED)
{
if (pED->fdSerialPort==-1)
return -1;
LOCK_SERIAL;
unsigned char cmd[] = { 0x00, 0x00, 0x7F };
write( pED->fdSerialPort, cmd, sizeof(cmd));
int rx = read ( pED->fdSerialPort, rxbuffer, sizeof rxbuffer);
dump( rxbuffer, rx);
UNLOCK_SERIAL;
return rx;
}
where
#define LOCK_SERIAL if (0!=pthread_mutex_lock(&pED->lockSerial)) {printf("Err lock");return 0;}
#define UNLOCK_SERIAL pthread_mutex_unlock(&pED->lockSerial);
Running the program and starting the timer I see the requests are regular. When I trigger one of this calls on other way (from a rx websocket function) the program hangs and I need to kill it.
Why the entire program stops ??
If a process hangs, it could be because of circular wait for mutexes or holding mutex and trying to lock it again. This could cause deadlock.
ps output will show thread's state as D or S if it's waiting for a resource. it will appear as the process is hung.
D uninterruptible sleep (usually IO)
S interruptible sleep (waiting for an event to complete)
I have made a thread to hold mutex and try to lock it again.
ps output and GDB shows main thread and child thread are in sleep.
xxxx#virtualBox:~$ ps -eflT |grep a.out
0 S root 3982 3982 2265 0 80 0 - 22155 - 20:28 pts/0 00:00:00 ./a.out
1 S root 3982 3984 2265 0 80 0 - 22155 - 20:28 pts/0 00:00:00 ./a.out
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7ffff7fdf740 (LWP 4625) "a.out" 0x00007ffff7bbed2d in __GI___pthread_timedjoin_ex (
threadid=140737345505024, thread_return=0x0, abstime=0x0, block= <optimized out>) at pthread_join_common.c:89
2 Thread 0x7ffff77c4700 (LWP 4629) "a.out" __lll_lock_wait ()
at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
Please check blog Tech Easy for more information on threads.

How could sys_sigsuspend is atomical in linux kernel 2.6.11?

I'm reading linux 2.6.11
the implementation of sys_sigsuspend is as the following
34 /*
35 * Atomically swap in the new signal mask, and wait for a signal.
36 */
37 asmlinkage int
38 sys_sigsuspend(int history0, int history1, old_sigset_t mask)
39 {
40 struct pt_regs * regs = (struct pt_regs *) &history0;
41 sigset_t saveset;
42
43 mask &= _BLOCKABLE;
44 spin_lock_irq(&current->sighand->siglock);
45 saveset = current->blocked;
46 siginitset(&current->blocked, mask);
47 recalc_sigpending();
48 spin_unlock_irq(&current->sighand->siglock);
49
50 regs->eax = -EINTR;
51 while (1) {
52 current->state = TASK_INTERRUPTIBLE;
53 schedule();
54 if (do_signal(regs, &saveset))
55 return -EINTR;
56 }
57 }
in ULK3 the author says
the sigsuspend( ) system call does not allow signals to be sent after unblocking and before the schedule( ) invocation, because other processes cannot grab the CPU during that time interval.
Between spin_unlock_irq and schedule the syscall can be interrupted and preempted, so the other process can have enough time to send a signal which is not blocked to the process
But in this case, the signal will be lost, because the process schedule after the signal is delivered.
That's why sigsuspend should be atomical, but it's NOT according the its implementation.
sigsuspend implementation is correct, but the explanation in ULK is seems to be misleading.
When process executes kernel code, that execution is never interrupted by the user's signals. Instead, such signals are accumulated inside current task structure. At the moment the process leaves kernel code and returns to the user one, all signals accumulated(and not blocked) are fired.
schedule() kernel's function checks, whether some signals are accumulated. If they are, and current->state is TASK_INTERRUPTIBLE, schedule() returns. So all signals collected before schedule() call are not lost.
Atomicity of sigsuspend() system call means that if signals, temporary unblocked by the call, are emitted, then that call will garantee see them and return. Such atomicity is simply achived by placing both unblocking and checking signals inside same kernel function.

Cause all processes running under OpenMPI to dump core

I'm running a distributed process under OpenMPI on linux.
When one process dies, mpirun will detect this and kill the other processes. But even though I get a core from the process which died, I don't get a core from the processes killed by OpenMPI.
Why am I not getting these other corefiles? How can I get these corefiles?
The other processes are just killed by Open MPI. They didn't segfault themselves. From an MPI perspective, their execution was erroneous, but from a C perspective, it was fine. As such, there's no reason for them to have dumped core.
OMPI's mpiexec kills the remaining ranks by first sending them SIGTERM and then SIGKILL (should any of them survive SIGTERM). None of those signals results in core being dumped. You could probably install a signal handler for SIGTERM that calls abort(3) in order to force core dumps on kill.
Here is some sample code that works with Open MPI 1.6.5:
#include <stdlib.h>
#include <signal.h>
#include <mpi.h>
void term_handler (int sig) {
// Restore the default SIGABRT disposition
signal(SIGABRT, SIG_DFL);
// Abort (dumps core)
abort();
}
int main (int argc, char **argv) {
int rank;
MPI_Init(&argc, &argv);
// Override the SIGTERM handler AFTER the call to MPI_Init
signal(SIGTERM, term_handler);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// Cause division-by-zero exception in rank 0
rank = 1 / rank;
// Make other ranks wait for rank 0
MPI_Bcast(&rank, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
Open MPI's MPI_Init installs special handlers for some known signals that either print useful debug information or generate backtrace files (.btr). That's why the SIGTERM handler has to be installed after the call to MPI_Init and the default action of SIGABRT (used by abort(3)) has to be restored before calling abort().
Note that the signal handler will appear at the top of the call stack in the core file:
(gdb) bt
#0 0x0000003bfd232925 in raise () from /lib64/libc.so.6
#1 0x0000003bfd234105 in abort () from /lib64/libc.so.6
#2 0x0000000000400dac in term_handler (sig=15) at test.c:8
#3 <signal handler called>
#4 0x00007fbac7ad0bc7 in mca_btl_sm_component_progress () from /path/libmpi.so.1
#5 0x00007fbac7c9fca7 in opal_progress () from /path/libmpi.so.1
...
I would rather recommend that you should use a parallel debugger such as TotalView or DDT, if you have one at your disposal.

Why does C++ thread class create two threads?

I am running Visual C++ 2013 and I notice that creating a thread with the std::thread class spawns two threads. Is this by design? If so, what is the reason for this?
When I use _beginthreadex() it only spawns one thread as I would expect.
unsigned int __stdcall Func(void*)
{
unsigned int i = 0;
while (i < 1000000000)
{
++i;
}
return i;
}
int wmain()
{
thread doStuff(Func, nullptr);
auto id = doStuff.get_id();
doStuff.join();
}
EDIT 1
When I put a breakpoint on doStuff.join() I see the following output. The id variable matches the 55760 thread. When I use _beginthreadex() I do not get that extra thread "ntdll.dll thread".
EDIT 2
Here is the call stack with symbols loaded.
ThreadTest.exe!wmain() Line 21
ThreadTest.exe!__tmainCRTStartup() Line 623
ThreadTest.exe!wmainCRTStartup() Line 466
kernel32.dll!#BaseThreadInitThunk#12()
ntdll.dll!___RtlUserThreadStart#8()
ntdll.dll!__RtlUserThreadStart#8()
Main Thread is obvious. It's your main thread. When you create a thread, only one thread will be created. The msvcr* thread is Microsoft C Runtime Library. I don't think you can control it but don't mind it. Your code works as you expect.

high resource usage program stalls/crashes linux

I have a program that reads about 1000 images and creates a statistical summary of their contents. Each image is processed in its own thread using OpenMP, and I have the thread limit set to match my number of processors.
Until about two weeks ago, the program ran fine. Now, however, if I run the program more than once, my system slows down and eventually freezes up.
In order to troubleshoot, I wrote the simple code listed below that emulates what my program is doing. This code will freeze my system, just as my original program does, after trying to read only a few files at line 35.
I ran the program, successively reverting to an earlier kernel after each failure, and found that it fails with all 3.6 kernels up to version 3.6.8.
However, when I go back to kernel 3.5.6, it works.
test.cc:
1 #include <cstdio>
2 #include <iostream>
3 #include <vector>
4 #include <unistd.h>
5
6 using namespace std;
7
8 int main ()
9 {
10 // number of files
11 const size_t N = 1000;
12 // total system memory
13 const size_t MEM = sysconf (_SC_PHYS_PAGES) * sysconf (_SC_PAGE_SIZE);
14 // file size
15 const size_t SZ = MEM/N;
16
17 // create temp filenames
18 vector<string> fn (N);
19 for (size_t i = 0; i < fn.size (); ++i)
20 fn[i] = string (tmpnam (NULL));
21
22 // write a bunch of files to disk
23 for (size_t i = 0; i < fn.size (); ++i)
24 {
25 vector<char> a (SZ);
26 FILE *fp = fopen (fn[i].c_str (), "wb");
27 fwrite (&a[0], a.size (), 1, fp);
28 clog << fn[i] << " written" << endl;
29 }
30
31 // read a bunch of files from disk
32 #pragma omp parallel for
33 for (size_t i = 0; i < fn.size (); ++i)
34 {
35 vector<char> a (SZ);
36 FILE *fp = fopen (fn[i].c_str (), "rb");
37 fread (&a[0], a.size (), 1, fp);
38 clog << fn[i] << " read" << endl;
39 }
40
41 return 0;
42 }
Makefile:
1 a:$
2 g++ -fopenmp -Wall -o test -g test.cc$
3 ./test$
My question is: What is different about kernel 3.6 that would cause this program to fail, but does not cause it to fail in version 3.5?
Without going through the code, if you want to set some limits to your processes, have a look at cgroups for limiting resource usage.
As for the freezing - you are trying to read/write GBs of data to disk at once. Given the speeds of ~100MB/s of today's hard-drives, I would expect a freeze at the time the kernel decides to flush the caches to the disk - which will probably occur as soon as you try to read a reasonably sized chunk of data from the disk under memory pressure (since you allocated lots of memory, the space for caches is limited).
You can try to mmap() the files or change kernel I/O scheduler.
I haven't look in deep at your code, but I realised some bad practices (at least, I thing they're) :
First, the critical section inside the openmp loop. That is a synchronism point, and putting it in every iteration sounds kind of problematic to me. Since each thread must be sure no other one has entered there, probably the overhead that synchronism introduces increases with the number of threads.
Second: I am not very used to C++, but I guess that every time vector<char> a (SZ) is executed memory is allocated (and freed at the end of the block). Excuse me if I am wrong. Since you know beforehand the value of SZ, you'll better allocate a vector<vector<char> > with as many elements as threads before the parallel region. Then, in the parallel region, you'd make each thread access its vector<char>.

Resources