Segmentation fault using glGetString() with pthreads under linux - linux

I'm trying to load textures in a background thread to help speed up my application.
The stack we are using is C/C++ on Linux, compiling with gcc. We're using OpenGL, GLUT and GLEW. We have been using libSOIL for texture loading.
Ultimately, launching texture loads with libSOIL fails because it encounters a glGetString() call that causes a segfault. Trying to narrow down the problem, I wrote a very simple OpenGL application that reproduces the behavior. The below code sample shouldn't "do anything," but it also shouldn't segfault. If I knew why it did, I could in theory rework libSOIL so that it would behave in a pthreaded environment.
void *glPthreadTest( void* arg ) {
glGetString( GL_EXTENSIONS ); //SIGSEGV
return NULL;
}
int main( int argc, char **argv ) {
glutInit( &argc, argv );
glutInitDisplayMode( GLUT_RGBA | GLUT_DOUBLE | GLUT_DEPTH );
glewInit();
glGetString( GL_EXTENSIONS ); // Does not cause SIGSEGV
pthread_t id;
if (pthread_create( &id, NULL, glPthreadTest, (void*)NULL ) != 0)
fprintf( stderr, "phtread_create glPthreadTest failed.\n" );
glutMainLoop();
return EXIT_SUCCESS;
}
A sample stacktrace for this application from gdb looks like this:
#0 0x00000038492f86e9 in glGetString () from /usr/lib64/nvidia/libGL.so.1
No symbol table info available.
#1 0x0000000000404425 in glPthreadTest (arg=0x0) at sf.cpp:168
No locals.
#2 0x0000003148e07d15 in start_thread (arg=0x7ffff7b36700) at pthread_create.c:308
__res = <optimized out>
pd = 0x7ffff7b36700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737349117696, -5802871742031723458, 1, 211665686528, 140737349117696, 0, 5802854601940796478,
-5829171783283899330}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = 0
pagesize_m1 = <optimized out>
sp = <optimized out>
freesize = <optimized out>
#3 0x00000031486f246d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:114
No locals.
You'll notice I am using the nvidia libGL implementation, but this also occurs identically with the mesa libgl that Ubuntu uses for Intel HD graphics cards.
Any tips for what might be going wrong, or how to investigate further to see what's happening?
Edit: Here are the #includes and the compile string for my example test:
#include <SOIL.h>
#include <GL/glew.h>
#include <GL/freeglut.h>
#include <GL/freeglut_ext.h>
#include <signal.h>
#include <pthread.h>
#include <cstdio>
g++ -Wall -pedantic -I/usr/include/SOIL -O0 -ggdb -o sf sf.cpp -lSOIL -pthread -lGL -lGLU -lGLEW -lglut -lX11

In order for any OpenGL call to operate properly, it requires an OpenGL context. Contexts are created using a window-system binding call (like wglCreateContext or similar). After creating a context, it needs to be "made current", which means associating the context with the current thread of execution. This is accomplished with another window-system specific call (like wglMakeCurrent for Microsoft Windows, or glXMakeCurrent for X Windows). GLUT abstracts all of that complexity away from you, doing all of those operations when you call glutCreateWindow.
Now, an important rule to know is that only a single OpenGL context can be current to a thread of execution at any one time. So, in the OP's original example, if she/he could make the context current in the Pthread they created, then the context would be lost in the main thread. The way to keep all this consistent is to only use a single context in a single thread. (It's possible to have OpenGL contexts share data, but that's neither exposed by GLUT, nor possible without using the window-system context creation calls).
In your case, it's likely that GLUT doesn't allow access to what you really need (i.e., the OpenGL context), to make it current in the other thread. You'd need to create and manage OpenGL contexts yourself.

Related

how to prevent some values from being optimized out in linux kernel debugging? [duplicate]

This question already has answers here:
Is there a way to tell GCC not to optimise a particular piece of code?
(3 answers)
Closed last year.
This is a code in linux (5.4.21)
When I use a virtual machine and connect gdb to the linux process, I can use break points and follow code. For example, I set breakpoint on a function arm_smmu_device_probe. When I follow with 'next' command, I see some values, for example, 'smmu' or 'dev' below are shown to have been optimized out. How can I make them not optimized out so that I can see them in gdb?
static int arm_smmu_device_probe(struct platform_device *pdev)
{
int irq, ret;
struct resource *res;
resource_size_t ioaddr;
struct arm_smmu_device *smmu;
struct device *dev = &pdev->dev;
bool bypass;
smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
if (!smmu) {
dev_err(dev, "failed to allocate arm_smmu_device\n");
return -ENOMEM;
}
smmu->dev = dev;
if (dev->of_node) {
ret = arm_smmu_device_dt_probe(pdev, smmu);
} else {
ret = arm_smmu_device_acpi_probe(pdev, smmu);
if (ret == -ENODEV)
return ret;
}
I tried chaning -O2 to -Og in the top Makefile but the kernel build fails then.
Recently I found how to do this. (from Is there a way to tell GCC not to optimise a particular piece of code?, flolo's answer)
If you want a function aaa(...) not to be optimzed, you can do it like this.
#pragma GCC push_options
#pragma GCC optimize ("O0")
aaa ( ... )
{
function body
}
#prgma GCC pop_options
In some cases, this putting #pragma causes some discrepancy between the #include header file and the function source. So in this case (not often) you need to add this #praga around the corresponding #include statement. If linux/bbb.h causes this kind of problem, do this.
#pragma GCC push_options
#pragma GCC optimize ("O0")
#include <linux/bbb.h>
#pragma GCC pop_options
This works sure and I'm enjoying(?) debug/analysis this way.

Calling "clone()" on linux but it seems to malfunction

A simple test program, I expect it will "clone" to fork a child process, and each process can execute till its end
#include<stdio.h>
#include<sched.h>
#include<unistd.h>
#include<sys/types.h>
#include<errno.h>
int f(void*arg)
{
pid_t pid=getpid();
printf("child pid=%d\n",pid);
}
char buf[1024];
int main()
{
printf("before clone\n");
int pid=clone(f,buf,CLONE_VM|CLONE_VFORK,NULL);
if(pid==-1){
printf("%d\n",errno);
return 1;
}
waitpid(pid,NULL,0);
printf("after clone\n");
printf("father pid=%d\n",getpid());
return 0;
}
Ru it:
$g++ testClone.cpp && ./a.out
before clone
It didn't print what I expected. Seems after "clone" the program is in unknown state and then quit. I tried gdb and it prints:
Breakpoint 1, main () at testClone.cpp:15
(gdb) n-
before clone
(gdb) n-
waiting for new child: No child processes.
(gdb) n-
Single stepping until exit from function clone#plt,-
which has no line number information.
If I remove the line of "waitpid", then gdb prints another kind of weird information.
(gdb) n-
before clone
(gdb) n-
Detaching after fork from child process 26709.
warning: Unexpected waitpid result 000000 when waiting for vfork-done
Cannot remove breakpoints because program is no longer writable.
It might be running in another process.
Further execution is probably impossible.
0x00007fb18a446bf1 in clone () from /lib64/libc.so.6
ptrace: No such process.
Where did I get wrong in my program?
You should never call clone in a user-level program -- there are way too many restrictions on what you are allowed to do in the cloned process.
In particular, calling any libc function (such as printf) is a complete no-no (because libc doesn't know that your clone exists, and have not performed any setup for it).
As K. A. Buhr points out, you also pass too small a stack, and the wrong end of it. Your stack is also not properly aligned.
In short, even though K. A. Buhr's modification appears to work, it doesn't really.
TL;DR: clone, just don't use it.
The second argument to clone is a pointer to the child's stack. As per the manual page for clone(2):
Stacks grow downward on all processors that run Linux (except the HP PA processors), so child_stack usually points to the topmost address of the memory space set up for the child stack.
Also, 1024 bytes is a paltry amount for a stack. The following modified version of your program appears to run correctly:
// #define _GNU_SOURCE // may be needed if compiled as C instead of C++
#include <stdio.h>
#include <sched.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <errno.h>
int f(void*arg)
{
pid_t pid=getpid();
printf("child pid=%d\n",pid);
return 0;
}
char buf[1024*1024]; // *** allocate more stack ***
int main()
{
printf("before clone\n");
int pid=clone(f,buf+sizeof(buf),CLONE_VM|CLONE_VFORK,NULL);
// *** in previous line: pointer is to *end* of stack ***
if(pid==-1){
printf("%d\n",errno);
return 1;
}
waitpid(pid,NULL,0);
printf("after clone\n");
printf("father pid=%d\n",getpid());
return 0;
}
Also, #Employed Russian is right -- you probably shouldn't use clone except if you're trying to have some fun. Either fork or vfork are more sensible interfaces to clone whenever they meet your needs.

Xlib fails (Segmentation fault) even with each connection per thread

So far as I know about X11 with Xlib, is that with multi threading a programmer has 2 choices
call early enough XInitThreads
or use new connection (XOpenDisplay) per thread.
Suppose i don't like first the first method with XInitThreads() call. Why does second fail?
#include <X11/Xlib.h>
#include <thread>
void startBasicWin() {
Display *display;
if ( (display=XOpenDisplay(NULL)) == NULL )
{
fprintf( stderr, "cannot connect to X server\n");
exit( -1 );
}
XCloseDisplay(display);
}
int main() {
std::thread t3 = std::thread(startBasicWin);
std::thread t4 = std::thread(startBasicWin);
std::thread t5 = std::thread(startBasicWin);
std::thread t6 = std::thread(startBasicWin);
std::thread t7 = std::thread(startBasicWin);
std::thread t8 = std::thread(startBasicWin);
std::thread t9 = std::thread(startBasicWin);
t3.join();
t4.join();
t5.join();
t6.join();
t7.join();
t8.join();
t9.join();
}
compiled with
g++ -o xlib_multi xlib_multi.cpp -lX11 -std=c++11 -pthread -g
sometimes produces output:
Segmentation fault
or
No protocol specified
cannot connect to X server :0
Can it be, that I can't use XOpenDisplay() without thread-synchronization? But once the X11 connections are created with Xlib, I could use Xlib in multi-threaded environment without any problems? Is such assumption correct?
Or is Xlib just buggy for multi-threading anyway?
Chances are that XOpenDisplay() uses some global variable internally that is not thread-safe or shares data between the displays. I don't think it's wise to call XOpenDisplay like that from within a thread; I suggest opening the displays sequentially first, then start the threads with a Display pointer. Or protect the code section around XOpenDisplay (and XCloseDisplay!) with a mutex.
Either way, the fact that there is a separate XInitThreads() call makes your assumption that everything will be fine "after" XOpenDisplay() very dangerous.

context sharing in FreeGLUT under Linux with xorg

I am trying to use OpenGL with shared context (because of sharing textures between windows) via FreeGLUT library... It work fine, I can share textures, but i failed on the end of program or during windows closing by mouse...
I have cerated the code which emulate the problem: (http://pastie.org/9437038)
// file: main.c
// compile: gcc -o test -lglut main.c
// compile: gcc -o test -lglut -DTIME_LIMIT main.c
#include "GL/freeglut.h"
#include <unistd.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
int winA, winB, winC;
int n;
glutInit(&argc, argv);
glutSetOption(GLUT_ACTION_ON_WINDOW_CLOSE , GLUT_ACTION_CONTINUE_EXECUTION);
//glutSetOption(GLUT_RENDERING_CONTEXT, GLUT_USE_CURRENT_CONTEXT);
glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGB);
winA = glutCreateWindow("Test A");
glutSetOption(GLUT_RENDERING_CONTEXT, GLUT_USE_CURRENT_CONTEXT);
winB = glutCreateWindow("Test B");
winC = glutCreateWindow("Test C");
printf("loop\n");
#ifdef TIME_LIMIT
for (n=0;n<50;n++)
{
glutMainLoopEvent();
usleep(5000);
}
#else //TIMELIMIT
glutMainLoop();
#endif // TIME_LIMIT
printf("Destroy winC\n");
glutDestroyWindow(winC);
printf("Destroy winB\n");
glutDestroyWindow(winB);
printf("Destroy winA\n");
glutDestroyWindow(winA);
printf("Normal end\n");
return 0;
}
Output:
loop
X Error of failed request: GLXBadContext
Major opcode of failed request: 153 (GLX)
Minor opcode of failed request: 4 (X_GLXDestroyContext)
Serial number of failed request: 113
Current serial number in output stream: 114
Segmentation fault
output with TIME_LIMIT:
loop
Destroy winC
Destroy winB
Destroy winA
Segmentation fault
Without calling glutSetOption(GLUT_RENDERING_CONTEXT, GLUT_USE_CURRENT_CONTEXT);, it works well.
Do anybody have idea what am I doing bad?
The option GLUT_USE_CURRENT_CONTEXT does not create shared contexts. It just means that the same GL context is used for all windows. You only have one GL conxtext, and destroy it when you first destroy a window which uses that, so the other destruction calls fail. None of the GLUT implementations I'm aware of actually supports GL context sharing.
GLUT_USE_CURRENT_CONTEXT is more like a hack (and it is nor part of the GLUT specification anyway), and not really a well-implemented. It could use some reference counting to destroy the context not before the last window using it is destroyed, but that is simply not the case.

Getting stack traces on Unix systems, automatically

What methods are there for automatically getting a stack trace on Unix systems? I don't mean just getting a core file or attaching interactively with GDB, but having a SIGSEGV handler that dumps a backtrace to a text file.
Bonus points for the following optional features:
Extra information gathering at crash time (eg. config files).
Email a crash info bundle to the developers.
Ability to add this in a dlopened shared library
Not requiring a GUI
FYI,
the suggested solution (using backtrace_symbols in a signal handler) is dangerously broken. DO NOT USE IT -
Yes, backtrace and backtrace_symbols will produce a backtrace and a translate it to symbolic names, however:
backtrace_symbols allocates memory using malloc and you use free to free it - If you're crashing because of memory corruption your malloc arena is very likely to be corrupt and cause a double fault.
malloc and free protect the malloc arena with a lock internally. You might have faulted in the middle of a malloc/free with the lock taken, which will cause these function or anything that calls them to dead lock.
You use puts which uses the standard stream, which is also protected by a lock. If you faulted in the middle of a printf you once again have a deadlock.
On 32bit platforms (e.g. your normal PC of 2 year ago), the kernel will plant a return address to an internal glibc function instead of your faulting function in your stack, so the single most important piece of information you are interested in - in which function did the program fault, will actually be corrupted on those platform.
So, the code in the example is the worst kind of wrong - it LOOKS like it's working, but it will really fail you in unexpected ways in production.
BTW, interested in doing it right? check this out.
Cheers,
Gilad.
If you are on systems with the BSD backtrace functionality available (Linux, OSX 1.5, BSD of course), you can do this programmatically in your signal handler.
For example (backtrace code derived from IBM example):
#include <execinfo.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
void sig_handler(int sig)
{
void * array[25];
int nSize = backtrace(array, 25);
char ** symbols = backtrace_symbols(array, nSize);
for (int i = 0; i < nSize; i++)
{
puts(symbols[i]);;
}
free(symbols);
signal(sig, &sig_handler);
}
void h()
{
kill(0, SIGSEGV);
}
void g()
{
h();
}
void f()
{
g();
}
int main(int argc, char ** argv)
{
signal(SIGSEGV, &sig_handler);
f();
}
Output:
0 a.out 0x00001f2d sig_handler + 35
1 libSystem.B.dylib 0x95f8f09b _sigtramp + 43
2 ??? 0xffffffff 0x0 + 4294967295
3 a.out 0x00001fb1 h + 26
4 a.out 0x00001fbe g + 11
5 a.out 0x00001fcb f + 11
6 a.out 0x00001ff5 main + 40
7 a.out 0x00001ede start + 54
This doesn't get bonus points for the optional features (except not requiring a GUI), however, it does have the advantage of being very simple, and not requiring any additional libraries or programs.
Here is an example of how to get some more info using a demangler. As you can see this one also logs the stacktrace to file.
#include <iostream>
#include <sstream>
#include <string>
#include <fstream>
#include <cxxabi.h>
void sig_handler(int sig)
{
std::stringstream stream;
void * array[25];
int nSize = backtrace(array, 25);
char ** symbols = backtrace_symbols(array, nSize);
for (unsigned int i = 0; i < size; i++) {
int status;
char *realname;
std::string current = symbols[i];
size_t start = current.find("(");
size_t end = current.find("+");
realname = NULL;
if (start != std::string::npos && end != std::string::npos) {
std::string symbol = current.substr(start+1, end-start-1);
realname = abi::__cxa_demangle(symbol.c_str(), 0, 0, &status);
}
if (realname != NULL)
stream << realname << std::endl;
else
stream << symbols[i] << std::endl;
free(realname);
}
free(symbols);
std::cerr << stream.str();
std::ofstream file("/tmp/error.log");
if (file.is_open()) {
if (file.good())
file << stream.str();
file.close();
}
signal(sig, &sig_handler);
}
Dereks solution is probably the best, but here's an alternative anyway:
Recent Linux kernel version allow you to pipe core dumps to a script or program. You could write a script to catch the core dump, collect any extra information you need and mail everything back.
This is a global setting though, so it'd apply to any crashing program on the system. It will also require root rights to set up.
It can be configured through the /proc/sys/kernel/core_pattern file. Set that to something like ' | /home/myuser/bin/my-core-handler-script'.
The Ubuntu people use this feature as well.

Resources