I have a job in pthread which prepares a data set for plotting. Then I need to display this data in a main window like a graph. How can I transfer the data set form the thread to the rendering widget which is in the main window.
I use slots and signals. What happens when my thread emits signal more frequently than the slot could receive it.
The problem is that I use QMap* to transfer the data set form one thread to another. And I need to be confident that slot finished its job and I can update this map in the job thread.
Firstly, I assume you mean you have a job in 'QThread', not pthread (as in posix thread). In that case, you're right to use signals and slots to pass the data to the main thread for rendering.
How frequent is 'more frequently than the slot could receive it'? Have you tried it and are you having problems, or just speculating about something that you think may go wrong? If you are actually having a problem with sending too many signals, then batch up the data on the processing thread and send the batch periodically on a timer.
As for ensuring the slot has finished its job, you can use QMutex to control the access to the QMap in each thread. The Qt help for QMutex clearly explains its usage; lock the mutex, do the work and then unlock.
Related
Can Vulkan Compute dispatch from a child CPU thread, or does it have to dispatch from the main thread? I don't think this is possible to dispatch compute shaders in Unity from child threads and I wanted to find out if it could be done in Unreal Engine.
It depends on what you mean by "dispatch" and "main thread".
vkCmdDispatch, as the "Cmd" prefix suggests, puts a command in a command buffer. This can be called on any thread, so long as the VkCommandBuffer object will not have other vkCmd functions called on it at the same time (typically, you reserve specific command buffers for a single thread). So by one definition, you can "dispatch" compute operations from other threads.
Of course, recording commands in a command buffer doesn't actually do anything. Commands only get executed when you queue up those CBs via vkQueueSubmit. Like vkCmdDispatch, it doesn't matter what thread you call that function on. However, like vkCmdDispatch, it does matter that multiple threads be prevented from accessing the same VkQueue object at the same time.
Now, you don't have to use a single thread for that VkQueue; you can lock the VkQueue behind some kind of mutex, so that only one thread can own it at a time. And thus, a thread that creates a CB could submit its own work.
However, ignoring the fact that tasks often need to be inserted into the queue in an order (one task might generate some compute data that a graphics task needs to wait on, so the graphics task CB must be after the compute CB), there's a bigger problem. vkQueueSubmit takes a long time. If you look at the function, it can take an arbitrarily large number of CBs to insert, and it has the ability to have multiple batches, with each batch guarded by semaphores and fences for synchronization. As such, you are strongly encouraged to make as few vkQueueSubmit calls as possible, since each call has a quantity of overhead to it that has nothing to do with how many CBs you are queuing up.
There's even a warning about this in the spec itself.
So the typical way applications are structured is that you farm out tasks to the available CPU threads, and these tasks build command buffers. One particular thread will be anointed as the owner of the queue. That thread may perform some CB building, but once it is done, it will wait for the other tasks to complete and gather up all of the CBs from the other threads. Once gathered, that thread will vkQueueSubmit them in appropriate batches.
You could call that thread the "main thread", but Vulkan itself doesn't really care which thread "owns" the queue. It certainly doesn't have to be your process's initial thread.
Can someone help me to understand thread enqueuing while using GCD.
I want to understand thread enqueuing which we see while putting breakpoints.
How does it work?
Do every thread executes on either main or global queue? Is it the reason of enqueuing?
Thanks,
Can someone help me to understand thread enqueuing while using GCD. I want to understand thread enqueuing which we see while putting breakpoints.
I’d suggest you think of it the other way around. You don’t “enqueue” threads. You dispatch blocks of code to a queue (i.e. “enqueue”), and the dispatch queue will select the appropriate thread on which that code shall run.
For example, above, I create a queue, dispatched a block of code to that queue, and added a breakpoint. I can see that my queue spun up a thread (it’s “Thread 3” in this case) and I can see that this was “enqueued” from the the viewDidLoad method running on the “main thread”.
Do every thread executes on either main or global queue?
Again, it’s the other way around. Code that is dispatched to a particular queue will trigger that queue to run that block of code on a particular thread.
But there are three types of queues:
the “main” queue (which runs its code on a single, special, dedicated “main” thread);
one of the various, shared “global” queues (which will select a background thread from a pool of worker threads and run the code on that thread); or
a “custom” queue that you create to a custom queue, like above.
Is it the reason of enqueuing?
This “enqueuing” is merely the process of adding a block of code to a queue. Xcode will try to show you where the code was enqueued, to help you diagnose from where the code was dispatched.
In my application it is imperative that "state" and "graphics" are processed in separate threads. So for example, the "state" thread is only concerned with updating object positions, and the "graphics" thread is only concerned with graphically outputting the current state.
For simplicity, let's say that the entirety of the state data is contained within a single VkBuffer. The "state" thread creates a Compute Pipeline with a Storage Buffer backed by the VkBuffer, and periodically vkCmdDispatchs to update the VkBuffer.
Concurrently, the "graphics" thread creates a Graphics Pipeline with a Uniform Buffer backed by the same VkBuffer, and periodically draws/vkQueuePresentKHRs.
Obviously there must be some sort of synchronization mechanism to prevent the "graphics" thread from reading from the VkBuffer whilst the "state" thread is writing to it.
The only idea I have is to employ the usage of a host mutex fromvkQueueSubmit to vkWaitForFences in both threads.
I want to know, is there perhaps some other method that is more efficient or is this considered to be OK?
Try using semaphores. They are used to synchronize operations solely on the GPU, which is much more optimal than waiting in the app and submitting work after previous work is fully processed.
When You submit work You can provide a semaphore which gets signaled when this work is finished. When You submit another work You can provide the same semaphore on which the second batch should wait. Processing of the second batch will start automatically when the semaphore gets signaled (this semaphore is also automatically unsignaled and can be reused).
(I think there are some constraints on using semaphores, associated with queues. I will update the answer later when I confirm this but they should be sufficient for Your purposes.
[EDIT] There are constraints on using semaphores but it shouldn't affect You - when You use a semaphore as a wait semaphore during submission, no other queue can wait on the same semaphore.)
There are also events in Vulkan which can be used for similar purposes but their use is a little bit more complicated.
If You really need to synchronize GPU and Your application, use fences. They are signaled in a similar way as semaphores. But You can check their state on the app side and You need to manually unsignal them before You can use then again.
[EDIT]
I've added an image that more or less shows what I think You should do. One thread calculates state and with each submission adds a semaphore to the top of the list (or a ring buffer as #NicolasBolas wrote). This semaphore gets signaled when the submission is finished (it is provided in pSignalSemaphores during "compute" batch submission).
Second thread renders Your scene. It manages it's own list of semaphores similarly to the compute thread. But when You want to render things, You need to be sure that compute thread finished calculations. That's why You need to take the latest "compute" semaphore and wait on it (provide it in pWaitSemaphores during "render" batch submission). When You submit rendering commands, compute thread can't start and modify the data because it may influence the results of a rendering. So compute thread also needs to wait until the most recent rendering is done. That's why compute thread also needs to provide a wait semaphore (the most recent "rendering" semaphore).
You just need to synchronize submissions. Rendering thread cannot start when a compute threads submits commands and vice versa. That's why adding semaphores to the lists (and taking semaphores from the list) should be synchronized. But this has nothing to do with Vulkan. Probably some mutex will be helpful (for example a C++-ish std::lock_guard<std::mutex>). But this synchronization is a problem only when You have a single buffer.
Another thing is what to do with old semaphores from both lists. You cannot directly check what is their state and You cannot directly unsignal them. The state of semaphores can be checked by using additional fences provided with each submission. You don't wait on them but from time to time check if a given fence is signaled and, if it is, You can destroy old semaphore (as You cannot unsignal it from the application) or You can make an empty submission, with no command buffers, and use that semaphore as a wait semaphore. This way the semaphore will be unsignaled and You can reuse it. But I don't know which solution is more optimal: destroying old and creating new semaphores, or unsignaling them with empty submissions.
When You have a single buffer, a one-element list/ring is probably enough. But more optimal solution would have some kind of a ping-pong set of buffers - You read data from one buffer, but store results in another buffer. And in the next step You swap them. That's why in the image above, the lists of semaphores (rings) may have more elements depending on Your setup. The more independent buffers and semaphores in the lists (of course to some reasonable count), the best performance You will get as You reduce time wasted on waiting. But this complicates Your code and it may also increase a lag (rendering thread gets data that is a bit older than the data currently processed by the compute thread). So You may need to balance performance, code complexity and a rendering lag.
How you do this depends on two factors:
Whether you want to dispatch the compute operation on the same queue as its corresponding graphics operation.
The ratio of compute operations to their corresponding graphics operations.
#2 is the most important part.
Even though they are generated in separate threads, there must be at least some idea that the graphics operation is being fed by a particular compute operation (otherwise, how would the graphics thread know where the data is to read from?). So, how do you do that?
At the end of the day, that part has nothing to do with Vulkan. You need to use some inter-thread communication mechanism to allow the graphics thread to ask, "which compute task's data should I be using?"
Typically, this would be done by having the compute thread add every compute operation it does to some kind of circular buffer (thread-safe of course. And non-locking). When the graphics thread goes to decide where to read its data from, it asks the circular buffer for the most recently added compute operation.
In addition to the "where to read its data from" information, this would also provide the graphics thread with an appropriate Vulkan synchronization primitive to use to synchronize its command buffer(s) with the compute operation's CB.
If the compute and graphics operations are being dispatched on the same queue, then this is pretty simple. There doesn't have to actually be a synchronization primitive. So long as the graphics CBs are issued after the compute CBs in the batch, all the graphics CBs need is to have a vkCmdPipelineBarrier at the front which waits on all memory operations from the compute stage.
srcStageMask would be STAGE_COMPUTE_SHADER_BIT, with dstStageMask being, well, pretty much everything (you could narrow it down, but it won't matter, since at the very least your vertex shader stage will need to be there).
You would need a single VkMemoryBarrier in the pipeline barrier. It's srcAccessMask would be SHADER_WRITE_BIT, while the dstAccessMask would be however you intend to read it. If the compute operations wrote some vertex data, you need VERTEX_ATTRIBUTE_READ_BIT. If they wrote some uniform buffer data, you need UNIFORM_READ_BIT. And so on.
If you're dispatching these operations on separate queues, that's where you need an actual synchronization object.
There are several problems:
You cannot detect if a Vulkan semaphore has been signaled by user code. Nor can you set a semaphore to the unsignaled state by user code. Nor can you reasonably submit a batch that has a semaphore in it that is currently signaled and nobody's waiting on it. You can do the latter, but it won't do the right thing.
In short, you can never submit a batch that signals a semaphore unless you are certain that some process is going to wait for it.
You cannot issue a batch that waits on a semaphore, unless a batch that signals it is "pending execution". That is, your graphics thread cannot vkQueueSubmit its batch until it is certain that the compute queue has submitted its signaling batch.
So what you have to do is this. When the graphics queue goes to get its compute data, this must send a signal to the compute thread to add a semaphore to its next submit call. When the graphics thread submits its graphics operation, it then waits on that semaphore.
But to ensure proper ordering, the graphics thread cannot submit its operation until the compute thread has submitted the semaphore signaling operation. That requires a CPU-synchronization operation of some form. It could be as simple as the graphics thread polling an atomic variable set by the compute thread.
When a process tries to output to the console (using printf) , does it come under I/O event where it will be sent to the waiting queue and so, the short term scheduler comes into action and selects another process to take the cpu time
Does context switch occur here, at the console output event ?
Sure, it may do, yes, if the I/O stream is locked by another thread that is performing output.
Not use what you mean by 'short term scheduler'. The console stream will probably be protected by a mutex and gets locked/unlocked by threads in the 'usual' way when they request I/O.
You will need to do manual synchronization. You can not make the assumption that it is thread safe.
If you want, separate threads don't access the stream at the same time, you need to wrap output with a mutex.
I would like to have three threads in a sample application.
Thread #1 (Main Thread) - User Interface/GUI
Thread #2 - Tied to a serial port device receiving data via events passing to a data queue.
Thread #3 - Activated when a queue entry is made, process data node, frees data object.
The goal is to
a) Prevent the loss of data when a button or the form is held by the mouse on the main form.
b) Quickly get the data from the event, stuff it in the queue, go back to sleep
c) Process data when we have it, otherwise sleep.
Can packages like AsyncoPro tie event handling to a non-main thread?
I've never done much with serial port event driven apps, most of what I've work with are polled and I want to do some testing.
You can definitely tie event handling to a non-main thread. What you can't do is tie screen updating to a non-main thread. The Windows API is not threadsafe, and so the Delphi VCL, which is built on top of the Windows API, isn't either. But your design is basically a good, workable idea; just remember to use the Synchronize or Queue methods of TThread to send any UI updates back to be executed on the main thread.
The easiest should be to define some user messages, then sent it from sub-threads to the main thread.
It's perfectly thread-safe, and even process-safe.
Use PostMessage() with the Handle of the main form. But don't broadcast this WM_USER+n message to the whole UI, because you could confuse some part of the VCL which defines its own custom messages.
If you want to copy some textual data accross threads or processes, you can see WM_COPY_DATA. In practice, this is very fast, faster than named pipes for small messages.
For User Interface, I discovered than a stateless implementation is sometimes a good idea. That is, you don't call-back the main thread via a Synchronize() call or a GDI message, but your main GUI thread has a timer which check a shared memory buffer for pending updates. This is how the web works, and in practice, it's pretty easy to work with: you don't have to write any callback, each thread is independent, do its own stuff, and refresh when necessary.
But of course, the solution depends on your exact project architecture.
For a simple but proven library, see AsyncCalls, working from Delphi 5 up to XE. For latest versions of the IDE (Delphi 2007 and later), take a look at OmniThreadLibrary. By using such libraries, you'll ensure that your software implementation won't break anywhere: it's very common for a multi-threaded application to work as expected most of the time, then, for unknown reasons, going into an endless loop. And, of course, it happens only on the customer side, not yours... If you don't want to spend hours debugging your program, just trust those proven libraries, which are known to be well designed and debugged.
Sure you can do this, one way or another. Not used Apro since D5 - the Apro I have does not work on my D2009, (unicode/string/ANSIstring issues), & I have my own serial classes. Most of the available serial components have the option of firing dataRx events on either the rx thread or the main GUI thread - obviously in your case you should select the rx thread, (Thread #2). Shove the rx data into some buffer class and push it onto a producer-consumer thread to (Thread #3). Process it there. If you need to do a GUI update from there, PostMessage the reference to the GUI thread and handle it in a user-defined message-handler procedure.
Done this sort of stuff loadsa times - it will work OK.
Rgds,
Martin