Vulkan theaded application get error message on queue submissions under mutex - multithreading

I have an application with Vulkan for rendering and glfw for windowing. If I start several threads, each with a different window, I get errors on threading and queue submission even though ALL vulkan calls are protected by a common mutex. The vulkan layer says:
THREADING ERROR : object of type VkQueue is simultaneously used in thread 0x0 and thread 0x7fc365b99700
Here is the skeleton of the loop under which this happens in each thread:
while (!finished) {
window.draw(...);
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
The draw function skeleton looks like:
draw(Arg arg) {
static std::mutex mtx;
std::lock_guard lock{mtx};
// .... drawing calls. Including
device.acquireNextImageKHR(...);
// Fill command bufers
graphicsQueue.submit(...);
presentQueue.presentKHR(presentInfo);
}
This is C++17 which slightly simplifies the syntax but is otherwise irrelevant.
Clearly everything is under a mutex. I also intercept the call to the debug message. When I do so, I see that one thread is waiting for glfw events, one is printing the vulkan layer message and the other two threads are trying to acquire the mutex for the lock_guard.
I am at a loss as to what is going on or how to even figure out what is causing this.
I am running on linux, and it does not crash. However on Mac OS X, after a random amount of time, the code will crash in a queue submit call of MoltenVK and when the crash happens, I see a similar situation of the threads. That is to say no other thread is inside a Vulkan call.
I'd appreciate any ideas. My next move would be to move all queue submissions to a single thread, though that is not my favorite solution.
PS: I created a complete MCVE under the Vookoo framework. It is at https://github.com/FunMiles/Vookoo/tree/lock_guard_queues and is the example 00-parallelTriangles
To try it, do the following:
git clone https://github.com/FunMiles/Vookoo.git
cd Vookoo
git checkout lock_guard_queues
mkdir build
cd build
cmake ..
make
examples/00-parallelTriangles

The way you call the draw is:
window.draw(device, fw.graphicsQueue(), [&](){//some lambda});
The insides of draw is protected by mutex, but the fw.graphicsQueue() isn't.
fw.graphicsQueue() million abstraction layers below just calls vkGetDeviceQueue. I found executing vkGetDeviceQueue in parallel with vkQueueSubmit causes the validation error.
So there are few issues here:
There is a bug in layers that causes multiple initialization of VkQueue state on vkGetDeviceQueue, which is the cause of the validation error
KhronosGroup/Vulkan-ValidationLayers#1751
Thread id 0 is not a separate issue. As there are not any actual previous accesses, thread id is not recorded. The problem is the layers issue the error because the access count goes into negative because it is previously wrongly reset to 0.
Arguably there is some spec issue here. It is not immediatelly obvious from the text that VkQueue is not actually accessed in vkGetDeviceQueue, except the silent assumption that it is the sane thing to do.
KhronosGroup/Vulkan-Docs#1254

Related

How to safely use [NSTask waitUntilExit] off the main thread?

I have a multithreaded program that needs to run many executables at once and wait for their results.
I use [nstask waitUntilExit] in an NSOperationQueue that runs it on non-main thread (running NSTask on the main thread is completely out of the question).
My program randomly crashes or runs into assertion failures, and the crash stacks always point to the runloop run by waitUntilExit, which executes various callbacks and handlers, including—IMHO incorrectly—KVO and bindings updating the UI, which causes them to run on non-main thread (It's probably the problem described by Mike Ash)
How can I safely use waitUntilExit?
Is it a problem of waitUntilExit being essentially unusable, or do I need to do something special (apart from explicitly scheduling my callbacks on the main thread) when using KVO and IB bindings to prevent them from being handled on a wrong thread running waitUntilExit?
As Mike Ash points out, you just can't call waitUntilExit on a random runloop. It's convenient, but it doesn't work. You have to include "doesn't work" in your computation of "is this actually convenient?"
You can, however, use terminationHandler in 10.7+. It does not pump the runloop, so shouldn't create this problem. You can recreate waitUntilExit with something along these lines (untested; probably doesn't compile):
dispatch_group group = dispatch_group_create();
dispatch_group_enter(group);
task.terminationHandler = ^{ dispatch_group_leave(group); };
[task launch];
dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
// If not using ARC:
dispatch_release(group);
Hard to say without general context of what are you doing...
In general you can't update interface from the non main threads. So if you observe some KVO notifications of NSTasks in non main thread and update UI then you are wrong.
In that case you can fix situation by simple
-[NSObject performSelectorOnMainThread:];
or similar when you want to update UI.
But as for me more grace solution:
write separated NSOperationQueue with maxConcurentOperationsCount = 1 (so FIFO queue) and write subclass of NSOperation which will execute NSTask and update UI through delegate methods. In that way you will control amount of executing tasks in application. (or you may stop all of them or else)
But high level solution for your problem I think will be writing privileged helper tool. Using this approach you will get 2 main benefits: your NSTask's will be executes in separated process and you will have root privilegies for executing your tasks.
I hope my answer covers your problem.

QThread execution freezes my GUI

I'm new to multithread programming. I wrote this simple multi thread program with Qt. But when I run this program it freezes my GUI and when I click inside my widow, it responds that your program is not responding .
Here is my widget class. My thread starts to count an integer number and emits it when this number is dividable by 1000. In my widget simply I catch this number with signal-slot mechanism and show it in a label and a progress bar.
Widget::Widget(QWidget *parent) :
QWidget(parent),
ui(new Ui::Widget)
{
ui->setupUi(this);
MyThread *th = new MyThread;
connect( th, SIGNAL(num(int)), this, SLOT(setNum(int)));
th->start();
}
void Widget::setNum(int n)
{
ui->label->setNum( n);
ui->progressBar->setValue(n%101);
}
and here is my thread run() function :
void MyThread::run()
{
for( int i = 0; i < 10000000; i++){
if( i % 1000 == 0)
emit num(i);
}
}
thanks!
The problem is with your thread code producing an event storm. The loop counts very fast -- so fast, that the fact that you emit a signal every 1000 iterations is pretty much immaterial. On modern CPUs, doing a 1000 integer divisions takes on the order of 10 microseconds IIRC. If the loop was the only limiting factor, you'd be emitting signals at a peak rate of about 100,000 per second. This is not the case because the performance is limited by other factors, which we shall discuss below.
Let's understand what happens when you emit signals in a different thread from where the receiver QObject lives. The signals are packaged in a QMetaCallEvent and posted to the event queue of the receiving thread. An event loop running in the receiving thread -- here, the GUI thread -- acts on those events using an instance of QAbstractEventDispatcher. Each QMetaCallEvent results in a call to the connected slot.
The access to the event queue of the receiving GUI thread is serialized by a QMutex. On Qt 4.8 and newer, the QMutex implementation got a nice speedup, so the fact that each signal emission results in locking of the queue mutex is not likely to be a problem. Alas, the events need to be allocated on the heap in the worker thread, and then deallocated in the GUI thread. Many heap allocators perform quite poorly when this happens in quick succession if the threads happen to execute on different cores.
The biggest problem comes in the GUI thread. There seems to be a bunch of hidden O(n^2) complexity algorithms! The event loop has to process 10,000 events. Those events will be most likely delivered very quickly and end up in a contiguous block in the event queue. The event loop will have to deal with all of them before it can process further events. A lot of expensive operations happen when you invoke your slot. Not only is the QMetaCallEvent deallocated from the heap, but the label schedules an update() (repaint), and this internally posts a compressible event to the event queue. Compressible event posting has to, in worst case, iterate over entire event queue. That's one potential O(n^2) complexity action. Another such action, probably more important in practice, is the progressbar's setValue internally calling QApplication::processEvents(). This can, recursively call your slot to deliver the subsequent signal from the event queue. You're doing way more work than you think you are, and this locks up the GUI thread.
Instrument your slot and see if it's called recursively. A quick-and-dirty way of doing it is
void Widget::setNum(int n)
{
static int level = 0, maxLevel = 0;
level ++;
maxLevel = qMax(level, maxLevel);
ui->label->setNum( n);
ui->progressBar->setValue(n%101);
if (level > 1 && level == maxLevel-1) {
qDebug("setNum recursed up to level %d", maxLevel);
}
level --;
}
What is freezing your GUI thread is not QThread's execution, but the huge amount of work you make the GUI thread do. Even if your code looks innocuous.
Side Note on processEvents and Run-to-Completion Code
I think it was a very bad idea to have QProgressBar::setValue invoke processEvents(). It only encourages the broken way people code things (continuously running code instead of short run-to-completion code). Since the processEvents() call can recurse into the caller, setValue becomes a persona-non-grata, and possibly quite dangerous.
If one wants to code in continuous style yet keep the run-to-completion semantics, there are ways of dealing with that in C++. One is just by leveraging the preprocessor, for example code see my other answer.
Another way is to use expression templates to get the C++ compiler to generate the code you want. You may want to leverage a template library here -- Boost spirit has a decent starting point of an implementation that can be reused even though you're not writing a parser.
The Windows Workflow Foundation also tackles the problem of how to write sequential style code yet have it run as short run-to-completion fragments. They resort to specifying the flow of control in XML. There's apparently no direct way of reusing standard C# syntax. They only provide it as a data structure, a-la JSON. It'd be simple enough to implement both XML and code-based WF in Qt, if one wanted to. All that in spite of .NET and C# providing ample support for programmatic generation of code...
The way you implemented your thread, it does not have its own event loop (because it does not call exec()). I'm not sure if your code within run() is actually executed within your thread or within the GUI thread.
Usually you should not subclass QThread. You probably did so because you read the Qt Documentation which unfortunately still recommends subclassing QThread - even though the developers long ago wrote a blog entry stating that you should not subclass QThread. Unfortunately, they still haven't updated the documentation appropriately.
I recommend reading "You're doing it wrong" on Qt Blog and then use the answer by "Kari" as an example of how to set up a basic multi-threaded system.
But when I run this program it freezes my GUI and when I click inside my window,
it responds that your program is not responding.
Yes because IMO you're doing too much work in thread that it exhausts CPU. Generally program is not responding message pops up when process show no progress in handling application event queue requests. In your case this happens.
So in this case you should find a way to divide the work. Just for the sake of example say, thread runs in chunks of 100 and repeat the thread till it completes 10000000.
Also you should have look at QCoreApplication::processEvents() when you're performing a lengthy operation.

Multithreading (pthreads)

I'm working on a project where I need to make a program run on multiple threads. However, I'm running into a bit of an issue.
In my program, I have an accessory function called 'func_call'.
If I use this in my code:
func_call((void*) &my_pixels);
The program runs fine.
However, if I try to create a thread, and then run the function on that, the program runs into a segmentation fault.
pthread_t thread;
pthread_create (&thread, NULL, (void*)&func_call, (void*) &my_pixels);
I've included pthread.h in my program. Any ideas what might be wrong?
You are not handling data in a thread safe manner:
the thread copies data from the thread argument, which is a pointer to the main thread's my_pixels variable; the main thread may exit, making my_pixles invalid.
the thread uses scene, main thread calls free_scene() on it, which I imagine makes it invalid
the thread calls printf(), the main thread closes stdout (kind of unusual itself)
the thread updates the picture array, the main thread accesses picture to output data from it
It looks like you should just wait for the thread to finish its work after creating it - call pthread_join() to do that.
For a single thread, that would seem to be pointless (you've just turned a multi-threaded program into a single threaded program). But on the basis of code that's commented out, it looks like you're planning to start up several threads that work on chunks of the data. So, when you get to the point of trying that again, make sure you join all the threads you start. As long as the threads don't modify the same data, it'll work. Note that you'll need to use separate my_pixels instances for each thread (make an array of them, just like you did with pthreads), or some threads will likely get parameters that are intended for a different thread.
Without knowing what func_call does, it is difficult to give you an answer. Nevertheless, here are few possibilities
Does func_call use some sort of a global state - check if that is initialized properly from within the thread. The order of execution of threads is not always the same for every execution
Not knowing your operating system (AIX /Linux/Solaris etc) it is difficult to answer this, but please check your compilation options
Please provide the signal trapped and atleast a few lines of the stack-trace - for all the threads. One thing you can check for yourself is to print the threads' stack-track (using threads/thread or pthread and thread current <x> based on the debugger) and and if there is a common data that is being accessed. It is most likely that the segfault occurred when two threads were trying to read off the other's (uncommitted) change
Hope that helps.
Edit:
After checking your code, I think the problem is the global picture array. You seem to be modifying that in the thread function without any guards. You loop using px and py and all the threads will have the same px and py and will try to write into the picture array at the same time. Please try to modify your code to prevent multiple threads from stepping on each other's data modifications.
Is func_call a function, or a function pointer? If it's a function pointer, there is your problem: you took the address of a function pointer and then cast it.
People are guessing because you've provided only a fraction of the program, which mentions names like func_call with no declaration in scope.
Your compiler must be giving you diagnostics about this program, because you're passing a (void *) expression to a function pointer parameter.
Define your thread function in a way that is compatible with pthread_create, and then just call it without any casts.

QPointer in multi-threaded programs

According to http://doc.qt.io/qt-5/qpointer.html, QPointer is very useful. But I found it could be inefficient in the following context:
If I want to show label for three times or do something else, I have to use
if(label) label->show1();
if(label) label->show2();
if(label) label->show3();
instead of
if(label) { label->show1();label->show2();label->show3(); }
just because label might be destroyed in another thread after label->show1(); or label->show2();.
Is there a beautiful way other than three ifs to get the same functionality?
Another question is, when label is destroyed after if(label), is if(label) label->show1(); still wrong?
I don't have experience in multi-threaded programs. Any help is appreciated. ;)
I think the only safe way to do it is to make sure you only access your QWidgets from within the main/GUI thread (that is, the thread that is running Qt's event loop, inside QApplication::exec()).
If you have code that is running within a different thread, and that code wants the QLabels to be shown/hidden/whatever, then that code needs to create a QEvent object (or a subclass thereof) and call qApp->postEvent() to send that object to the main thread. Then when the Qt event loop picks up and handles that QEvent in the main thread, that is the point at which your code can safely do things to the QLabels.
Alternatively (and perhaps more simply), your thread's code could emit a cross-thread signal (as described here) and let Qt handle the event-posting internally. That might be better for your purpose.
Neither of your approaches is thread-safe. It's possible that your first thread will execute the if statement, then the other thread will delete your label, and then you will be inside of your if statement and crash.
Qt provides a number of thread synchronization constructs, you'll probably want to start with QMutex and learn more about thread-safety before you continue working on this program.
Using a mutex would make your function would look something like this:
mutex.lock();
label1->show();
label2->show();
label3->show();
mutex.unlock()
As long as your other thread is using locking that same mutex object then it will prevented from deleting your labels while you're showing them.

OpenGL Rendering in a secondary thread

I'm writing a 3D model viewer application as a hobby project, and also as a test platform to try out different rendering techniques. I'm using SDL to handle window management and events, and OpenGL for the 3D rendering. The first iteration of my program was single-threaded, and ran well enough. However, I noticed that the single-threaded program caused the system to become very sluggish/laggy. My solution was to move all of the rendering code into a different thread, thereby freeing the main thread to handle events and prevent the app from becoming unresponsive.
This solution worked intermittently, the program frequently crashed due to a changing (and to my mind bizarre) set of errors coming mainly from the X window system. This led me to question my initial assumption that as long as all of my OpenGL calls took place in the thread where the context was created, everything should still work out. After spending the better part of a day searching the internet for an answer, I am thoroughly stumped.
More succinctly: Is it possible to perform 3D rendering using OpenGL in a thread other than the main thread? Can I still use a cross-platform windowing library such as SDL or GLFW with this configuration? Is there a better way to do what I'm trying to do?
So far I've been developing on Linux (Ubuntu 11.04) using C++, although I am also comfortable with Java and Python if there is a solution that works better in those languages.
UPDATE: As requested, some clarifications:
When I say "The system becomes sluggish" I mean interacting with the desktop (dragging windows, interacting with the panel, etc) becomes much slower than normal. Moving my application's window takes time on the order of seconds, and other interactions are just slow enough to be annoying.
As for interference with a compositing window manager... I am using the GNOME shell that ships with Ubuntu 11.04 (staying away from Unity for now...) and I couldn't find any options to disable desktop effects such as there was in previous distributions. I assume this means I'm not using a compositing window manager...although I could be very wrong.
I believe the "X errors" are server errors due to the error messages I'm getting at the terminal. More details below.
The errors I get with the multi-threaded version of my app:
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0.0"
after 73 requests (73 known processed) with 0 events remaining.
X Error of failed request: BadColor (invalid Colormap parameter)
Major opcode of failed request: 79 (X_FreeColormap)
Resource id in failed request: 0x4600001
Serial number of failed request: 72
Current serial number in output stream: 73
Game: ../../src/xcb_io.c:140: dequeue_pending_request: Assertion `req == dpy->xcb->pending_requests' failed.
Aborted
I always get one of the three errors above, which one I get varies, apparently at random, which (to my eyes) would appear to confirm that my issue does in fact stem from my use of threads. Keep in mind that I'm learning as I go along, so there is a very good chance that in my ignorance I've something rather stupid along the way.
SOLUTION: For anyone who is having a similar issue, I solved my problem by moving my call to SDL_Init(SDL_INIT_VIDEO) to the rendering thread, and locking the context initialization using a mutex. This ensures that the context is created in the thread that will be using it, and it prevents the main loop from starting before initialization tasks have finished. A simplified outline of the startup procedure:
1) Main thread initializes struct which will be shared between the two threads, and which contains a mutex.
2) Main thread spawns render thread and sleeps for a brief period (1-5ms), giving the render thread time to lock the mutex. After this pause, the main thread blocks while trying to lock the mutex.
3) Render thread locks mutex, initializes SDL's video subsystem and creates OpenGL context.
4) Render thread unlocks mutex and enters its "render loop".
5) The main thread is no longer blocked, so it locks and unlocks the mutex before finishing its initialization step.
Be sure and read the answers and comments, there is a lot of useful information there.
As long as the OpenGL context is touched from only one thread at a time, you should not run into any problems. You said even your single threaded program made your system sluggish. Does that mean the whole system or only your own application? The worst that should happen in a single threaded OpenGL program is, that processing user inputs for that one program gets laggy but the rest of the system is not affected.
If you use some compositing window manager (Compiz, KDE4 kwin), please try out what happens if you disable all compositing effects.
When you say X errors do you mean client side errors, or errors reported in the X server log? The latter case should not happen, because any kind of kind of malformed X command stream the X server must be able to cope with and at most emit a warning. If it (the X server) crashes this is a bug and should reported to X.org.
If your program crashes, then there's something wrong in its interaction with X; in that case please provide us with the error output in its variations.
What I did in a similar situation was to keep my OpenGL calls in the main thread but move the vertex arrays preparation to a separate thread (or threads).
Basically, if you manage to separate the cpu intensive stuff from the OpenGL calls you don't have to worry about the unfortunately dubious OpenGL multithreading.
It worked out beautifully for me.
Just in case - the X-Server has its' own sync subsystem.
Try following while drawing:
man XInitThreads - for initialization
man XLockDisplay/XUnlockDisplay -- for drawing (not sure for events processing);
I was getting one of your errors:
../../src/xcb_io.c:140: dequeue_pending_request: Assertion `req ==
dpy->xcb->pending_requests' failed. Aborted
and a whole host of different ones as well. Turns out that SDL_PollEvent needs an a pointer with initialized memory. So this fails:
SDL_Event *event;
SDL_PollEvent(event);
while this works:
SDL_Event event;
SDL_PollEvent(&event);
In case anyone else runs across this from google.
This is half an answer and half a question.
Rendering in SDL in a separate thread is possible. It works usually on any OS. What you need to do is, that you make sure you make the GL context current when the render thread takes over. At the same time, before you do so, you need to release it from the main thread, e.g.:
Called from the main thread:
void Renderer::Init()
{
#ifdef _WIN32
m_CurrentContext = wglGetCurrentContext();
m_CurrentDC = wglGetCurrentDC();
// release current context
wglMakeCurrent( nullptr, nullptr );
#endif
#ifdef __linux__
if (!XInitThreads())
{
THROW( "XLib is not thread safe." );
}
SDL_SysWMinfo wm_info;
SDL_VERSION( &wm_info.version );
if ( SDL_GetWMInfo( &wm_info ) ) {
Display *display = wm_info.info.x11.gfxdisplay;
m_CurrentContext = glXGetCurrentContext();
ASSERT( m_CurrentContext, "Error! No current GL context!" );
glXMakeCurrent( display, None, nullptr );
XSync( display, false );
}
#endif
}
Called from the render thread:
void Renderer::InitGL()
{
// This is important! Our renderer runs its own render thread
// All
#ifdef _WIN32
wglMakeCurrent(m_CurrentDC,m_CurrentContext);
#endif
#ifdef __linux__
SDL_SysWMinfo wm_info;
SDL_VERSION( &wm_info.version );
if ( SDL_GetWMInfo( &wm_info ) ) {
Display *display = wm_info.info.x11.gfxdisplay;
Window window = wm_info.info.x11.window;
glXMakeCurrent( display, window, m_CurrentContext );
XSync( display, false );
}
#endif
// Init GLEW - we need this to use OGL extensions (e.g. for VBOs)
GLenum err = glewInit();
ASSERT( GLEW_OK == err, "Error: %s\n", glewGetErrorString(err) );
The risks here is, that SDL does not have a native MakeCurrent() function, unfortunately. So, we have to poke around a little in SDL internals (1.2, 1.3 might have solved this by now).
And one problem remains, that for some reason, I run into a problem when SDL is shutting down. Maybe someone can tell me how to safely release the context when the thread terminates.
C++, SDL, OpenGl:::
on main thread: SDL_CreateWindow( );
SDL_CreateSemaphore( );
SDL_SemWait( );
on renderThread: SDL_CreateThread( run, "rendererThread", (void*)this )
SDL_GL_CreateContext( )
"initialize the rest of openGl and glew"
SDL_SemPost( ) //unlock the previously created semaphore
P.S: SDL_CreateThread( ) only takes functions as its first parameter not methods, if a method is wanted than you simulate a method/function in your class by making it a friend function. this way it will have method traits while still able to be used as a functor for the SDL_CreateThread( ).
P.S.S: inside of the "run( void* data )" created for the thread, the "(void*)" this is important and in order to re-obtain "this" inside of the function this line is needed "ClassName* me = (ClassName*)data;"

Resources