OpenGL: glClientWaitSync on separate thread

OpenGL: glClientWaitSync on separate thread - multithreading

I am using glMapBufferRange with the GL_MAP_UNSYNCHRONIZED_BIT to map a buffer object. I then pass the returned pointer to a worker thread to compute the new vertices asynchronously. The Object is doubly buffered so I can render one object while the other is written to. Using GL_MAP_UNSYNCHRONIZED_BIT gives me significantly better performance (mainly because glUnmapBuffer returns sooner), but I am getting some visual artifacts (despite the double buffering) - so I assume either the GPU starts rendering while the DMA upload is still in progress, or the worker thread starts writing to the vertices too early.
If I understand glFenceSync, glWaitSync and glClientWaitSync correctly, then I am supposed to address these issues in the following way:
A: avoid having the GPU render the buffer object before the DMA process completed:
directly after glUnmapBufferRange, call on the main thread
GLsync uploadSync = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
glFlush();
glWaitSync(uploadSync, 0, GL_TIMEOUT_IGNORED);
B: avoid writing to the buffer from the worker thread before the GPU has finished rendering it:
direclty after glDrawElements, call on the main thread
GLsync renderSync = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
and on the worker thread, right before starting to write data to the pointer that has previously been returned from glMapBufferRange
glClientWaitSync(renderSync,0,100000000);
...start writing to the mapped pointer
1: Is my approach to the explicit syncing correct?
2: How can I handle the second case? I want to wait in the worker thread (I don't want to make my main thread stall), but I cannot issue glCommands from the worker thread. Is there another way to check if the GLsync has been signalled other than the gl call?

What you could do is create an OpenGL context in the worker thread, and then share it with the main thread. Next:
Run on the main thread:
GLsync renderSync = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
glFlush();
then
Run on the worker thread:
glClientWaitSync(renderSync,0,100000000);
The glFlush on the main thread is important, since otherwise you could have an infinite wait. See also the OpenGL docs:
4.1.2 Signaling
Footnote 3: The simple flushing behavior defined by SYNC_FLUSH_COMMANDS_BIT will not help when waiting for a fence command issued in another context’s command stream to complete. Applications which block on a fence sync object must take additional steps to assure that the context from which the corresponding fence command was issued has flushed that command to the graphics
pipeline.

Related

Can `vkCommandPool` be allocated from the main thread and the moved to other threads?

Is it possible, to allocate vkCommandPool from the main thread and then move them into a new thread, where it is used exclusively?
Pseudo code:
// Pool for creating secondary buffers
threaded_command_pool = new CommandPool();
// Thread for filling secondary buffers
// threaded_command_poolzd is used only here
thread_handle = new Thread(move(command_pool))
thread_handle.join()
// Pool for merging secondary buffers
command_pool = new CommandPool()
primary_command_buffer = command_pool.create_buffer()
// fill primary_command_buffer with secondary buffers from thread
In all examples and presentation I have found, the command_pool is created in the thread, not in the main thread, but I couldn't find this requirement in the specs.

Nothing in vulkan is bound to a specific thread.
You are free to call any vulkan function from any thread as long as you obey the externally synchronized requirements.
If two commands operate on the same object and at least one of the commands declares the object to be externally synchronized, then the caller must guarantee not only that the commands do not execute simultaneously, but also that the two commands are separated by an appropriate memory barrier (if needed).
In other APIs when an object is bound to a thread it is very clearly documented.
In this case only 1 thread at a time can access a command_pool however successive commands to the same command pool can be from different threads.

How to close thread winapi

what is the rigth way to close Thread in Winapi, threads don't use common resources.
I am creating threads with CreateThread , but I don't know how to close it correctly in ,because someone suggest to use TerminateThread , others ExitThread , but what is the correct way to close it .
Also where should I call closing function in WM_CLOSE or WM_DESTROY ?
Thx in advance .

The "nicest" way to close a thread in Windows is by "telling" the thread to shutdown via some thread-safe signaling mechanism, then simply letting it reach its demise its own, potentially waiting for it to do so via one of the WaitForXXXX functions if completion detection is needed (which is frequently the case). Something like:
Main thread:
// some global event all threads can reach
ghStopEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
// create the child thread
hThread = CreateThread(NULL, 0, ThreadProc, NULL, 0, NULL);
//
// ... continue other work.
//
// tell thread to stop
SetEvent(ghStopEvent);
// now wait for thread to signal termination
WaitForSingleObject(hThread, INFINITE);
// important. close handles when no longer needed
CloseHandle(hThread);
CloseHandle(ghStopEvent);
Child thread:
DWORD WINAPI ThreadProc(LPVOID pv)
{
// do threaded work
while (WaitForSingleObject(ghStopEvent, 1) == WAIT_TIMEOUT)
{
// do thread busy work
}
return 0;
}
Obviously things can get a lot more complicated once you start putting it in practice. If by "common" resources you mean something like the ghStopEvent in the prior example, it becomes considerably more difficult. Terminating a child thread via TerminateThread is strongly discouraged because there is no logical cleanup performed at all. The warnings specified in the `TerminateThread documentation are self-explanatory, and should be heeded. With great power comes....
Finally, even the called thread invoking ExitThread is not required explicitly by you, and though you can do so, I strongly advise against it in C++ programs. It is called for you once the thread procedure logically returns from the ThreadProc. I prefer the model above simply because it is dead-easy to implement and supports full RAII of C++ object cleanup, which neither ExitThread nor TerminateThread provide. For example, the ExitThread documentation :
...in C++ code, the thread is exited before any destructors can be called
or any other automatic cleanup can be performed. Therefore, in C++
code, you should return from your thread function.
Anyway, start simple. Get a handle on things with super-simple examples, then work your way up from there. There are a ton of multi-threaded examples on the web, Learn from the good ones and challenge yourself to identify the bad ones.
Best of luck.

So you need to figure out what sort of behaviour you need to have.
Following is a simple description of the methods taken from documentation:
"TerminateThread is a dangerous function that should only be used in the most extreme cases. You should call TerminateThread only if you know exactly what the target thread is doing, and you control all of the code that the target thread could possibly be running at the time of the termination. For example, TerminateThread can result in the following problems:
If the target thread owns a critical section, the critical section will not be released.
If the target thread is allocating memory from the heap, the heap lock will not be released.
If the target thread is executing certain kernel32 calls when it is terminated, the kernel32 state for the thread's process could be inconsistent.
If the target thread is manipulating the global state of a shared DLL, the state of the DLL could be destroyed, affecting other users of the DLL."
So if you need your thread to terminate at any cost, call this method.
About ExitThread, this is more graceful. By calling ExitThread, you're telling to windows you're done with that calling thread, so the rest of the code isn't going to get called. It's a bit like calling exit(0).
"ExitThread is the preferred method of exiting a thread. When this function is called (either explicitly or by returning from a thread procedure), the current thread's stack is deallocated, all pending I/O initiated by the thread is canceled, and the thread terminates. If the thread is the last thread in the process when this function is called, the thread's process is also terminated."

mutex destroyed while busy

There is a singleton object of EventHandler class to receive events from the mainthread. It registers the input to a vector and creates a thread that runs a lambda function that waits for some time before deleting the input from the vector to prevent repeated execution of the event for this input for some time.
But I'm getting mutex destroyed while busy error. I'm not sure where it happened and how it happened. I am not even sure what it meant either because it shouldn't be de-constructed ever as a singleton object. Some help would be appreciated.
class EventHandler{
public:
std::mutex simpleLock;
std::vector<UInt32> stuff;
void RegisterBlock(UInt32 input){
stuff.push_back(input);
std::thread removalCallBack([&](UInt32 input){
std::this_thread::sleep_for(std::chrono::milliseconds(200));
simpleLock.lock();
auto it = Find(stuff, input);
if (it != stuff.end())
stuff.erase(it);
simpleLock.simpleLock.unlock();
}, input)
removalCallBack.detach();
}
virtual EventResult ReceiveEvent(UInt32 input){
simpleLock.lock();
if (Find(stuff, input) != stuff.end()){
RegisterBlock(input));
//dostuff
}
simpleLock.simpleLock.unlock();
}
};

What is happening is that a thread is created
std::thread removalCallBack([&](UInt32 input){
std::this_thread::sleep_for(std::chrono::milliseconds(200));
simpleLock.lock();
...
removalCallBack.detach();
And then since removalCallBack is a local variable to the function RegisterBlock, when the function exits, the destructor for removalCallBack gets called which invokes std::terminate()
Documentation for thread destructor
~thread(); (since C++11)
Destroys the thread object. If *this still has an associated running thread (i.e. joinable() == true), std::terminate() is called.
but depending on timing, simpleLock is still owned by the thread (is busy) when the thread exits which according to the spec leads to undefined behavior, in your case the destroyed while busy error.
To avoid this error, you should either allow the thread to exist after the function exits (e.g. not make it a local variable) or block until the thread exits before the function exits using thread::join
Dealing with cleaning up after threads can be tricky especially if they are essentially used as different programs occupying the same address space, and in those cases many times a manager thread just like you thought of is created whose only job is to reclaim thread related resources. Your situation is a little easier because of the simplicity of the work done in the thread created by removalCallBack, but there still is cleanup to do.
If the thread object is going to be created by new, then although system resources used by the system thread the C++ thread object represents will get cleaned up, but the memory the object uses will remain allocated until delete is called.
Also, consider if the program exits while there are threads running, then the threads will be terminated, but if there is a mutex locked when that happens, once again there will be undefined behavior.
What is usually done to guarantee that a thread is no longer running is to join with it, but though this doesn't say, the pthread_join man page states
Once a thread has been detached, it can't be joined with pthread_join(3) or be made joinable again.

Should Storm Spouts only emit output using the thread calling Spout.nextTuple?

The ISpout.nextTuple() javadoc specifies that nextTuple(), ack(...) and fail(...) are called on the same thread.
However, the actual collector upon which emit(...) is called is supplied earlier, as a parameter on open(..., collector).
Question is whether a background thread that sees some new data must always enqueue the data for nextTuple() to dequeue and emit. What would happen if the background thread emits the data immediately? Is that supported? If that is allowed, what's the recommended way to implement the "sleep for a short amount of time" in nextTuple()?

The implicit meaning of nextTuple()/ack()/fail() methods are called on the same thread is, the task (background Java thread), running at machine 'A', which emits the tuple is the same task, running at 'A' on which the ack()/fail() is called depending on the success/failure of processing (processed by Bolt running at 'B'or 'C') the tuple in the topology.
As long as the messageId is not null and Bolt tasks are calling the ack(tuple) in the execute() method, Storm framework keeps track of tuple traversal within the topology and call the ack()/fail() of tuple's owning task.
Here is the brief introduction on how the background task thread works before answering your question. The background task thread has in-memory structure/buffer for the emitted tuple and few other in-memory structures for status/pending tuples etc. The buffer gets filling up as the Spout/Bolt starts emitting the data and this buffer getting freed up as and when the tuples are processed i.e after calling ack()/fail(). Essentially, the background thread calls nextTuple() when the buffer is free and background thread stops calling the nextTuple() once the buffer is full. In simple words, emit() method either in the open()/nextTuple()/close(), fills the background thread buffer and ack()/fail() frees up the buffer.
With the above explanation, the background thread is unaware of the new/incoming data. It's up to the logic within the nextTuple() to read the data from source(Twitter/JMS providers/ESB/AMQP compliant servers/RDBMS) and emit the data. So, depending on the background thread's buffer size, Storm calls nextTuple() as explained above.
For other question, it should be ok to sleep for short duration if it's required. Please note, the nextTuple() need not emit the value, it can return with nothing.

It is my understanding that you shouldn't emit data unless requested by Storm by calling your nextTuple() method. Consequently, your background thread must enqueue new data, so that it is emitted when requested. Your nextTuple() method should sleep briefly only if there are no tuples to emit when the method is called.

multithreading: how to process data in a vector, while the vector is being populated?

I have a single-threaded linux app which I would like to make parallel. It reads a data file, creates objects, and places them in a vector. Then it calls a compute-intensive method (.5 second+) on each object. I want to call the method in parallel with object creation. While I've looked at qt and tbb, I am open to other options.
I planned to start the thread(s) while the vector was empty. Each one would call makeSolids (below), which has a while loop that would run until interpDone==true and all objects in the vector have been processed. However, I'm a n00b when it comes to threading, and I've been looking for a ready-made solution.
QtConcurrent::map(Iter begin,Iter end,function()) looks very easy, but I can't use it on a vector that's changing in size, can I? And how would I tell it to wait for more data?
I also looked at intel's tbb, but it looked like my main thread would halt if I used parallel_for or parallel_while. That stinks, since their memory manager was recommended (open cascade's mmgt has poor performance when multithreaded).
/**intended to be called by a thread
\param start the first item to get from the vector
\param skip how many to skip over (4 for 4 threads)
*/
void g2m::makeSolids(uint start, uint incr) {
uint curr = start;
while ((!interpDone) || (lineVector.size() > curr)) {
if (lineVector.size() > curr) {
if (lineVector[curr]->isMotion()) {
((canonMotion*)lineVector[curr])->setSolidMode(SWEPT);
((canonMotion*)lineVector[curr])->computeSolid();
}
lineVector[curr]->setDispMode(BEST);
lineVector[curr]->display();
curr += incr;
} else {
uio::sleep(); //wait a little bit for interp
}
}
}
EDIT: To summarize, what's the simplest way to process a vector at the same time that the main thread is populating the vector?

Firstly, to benefit from threading you need to find similarly slow tasks for each thread to do. You said your per-object processing takes .5s+, how long does your file reading / object creation take? It could easily be a tenth or a thousandth of that time, in which case your multithreading approach is going to produce neglegible benefit. If that's the case, (yes, I'll answer your original question soon incase it's not) then think about simultaneously processing multiple objects. Given your processing takes quite a while, the thread creation overhead isn't terribly significant, so you could simply have your main file reading/object creation thread spawn a new thread and direct it at the newly created object. The main thread then continues reading/creating subsequent objects. Once all objects are read/created, and all the processing threads launched, the main thread "joins" (waits for) the worker threads. If this will create too many threads (thousands), then put a limit on how far ahead the main thread is allowed to get: it might read/create 10 objects then join 5, then read/create 10, join 10, read/create 10, join 10 etc. until finished.
Now, if you really want the read/create to be in parallel with the processing, but the processing to be serialised, then you can still use the above approach but join after each object. That's kind of weird if you're designing this with only this approach in mind, but good because you can easily experiment with the object processing parallelism above as well.
Alternatively, you can use a more complex approach that just involves the main thread (that the OS creates when your program starts), and a single worker thread that the main thread must start. They should be coordinated using a mutex (a variable ensuring mutually-exclusive, which means not-concurrent, access to data), and a condition variable which allows the worker thread to efficiently block until the main thread has provided more work. The terms - mutex and condition variable - are the standard terms in the POSIX threading that Linux uses, so should be used in the explanation of the particular libraries you're interested in. Summarily, the worker thread waits until the main read/create thread broadcasts it a wake-up signal indicating another object is ready for processing. You may want to have a counter with index of the last fully created, ready-for-processing object, so the worker thread can maintain it's count of processed objects and move along the ready ones before once again checking the condition variable.

It's hard to tell if you have been thinking about this problem deeply and there is more than you are letting on, or if you are just over thinking it, or if you are just wary of threading.
Reading the file and creating the objects is fast; the one method is slow. The dependency is each consecutive ctor depends on the outcome of the previous ctor - a little odd - but otherwise there are no data integrity issues so there doesn't seem to be anything that needs to be protected by mutexes and such.
Why is this more complicated than something like this (in crude pseudo-code):
while (! eof)
{
readfile;
object O(data);
push_back(O);
pthread_create(...., O, makeSolid);
}
while(x < vector.size())
{
pthread_join();
x++;
}
If you don't want to loop on the joins in your main then spawn off a thread to wait on them by passing a vector of TIDs.
If the number of created objects/threads is insane, use a thread pool. Or put a counter is the creation loop to limit the number of threads that can be created before running ones are joined.

#Caleb: quite -- perhaps I should have emphasized active threads. The GUI thread should always be considered one.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string