I am designing real-time windows application displaying graphics or images gotten from multiple sensors. I assigned a thread for getting data from each sensor, and a UI thread for each display. According to MSDN, I can employ PostThreadMessage to send a message to another thread.
That sounds fine but in my architecture, a worker thread needs to send lots of information such as image. So I don't think I can send single big image data to UI thread with PostThreadMessage because the worker thread has to hold that data until corresponding UI thread processes it.
If so, what is best way to send the large amount data from worker thread to UI thread?
I thought about saving it as a file but I am sure it can be big bottleneck as it requires to process the data very quickly.
One idea I have is to send very small part, for example, few lines of image, when you send a message from worker thread.
Any advice would be appreciated.
my comment as an answer:
As jeffamaphone wrote, use the pointer to the memory instead of copying everything. Thats the advantage of threads - shared memory - dont waste it.
Leave the freeing of the images memory to the ui-thread, and allocate new memory for the next image in the worker thread. So the worker thread does not have to wait until the ui-thread is finished with the image. Requires more memory, but no copying or long waits will be necessary.
There are possible improvements, which can reduce the number of allocations you will have to make, but they are quite fiddly - and its quite doutable that they indeed would improve performance, because they would reintroduce some kind of synchronisation. So i would go ahead and implement it like i suggested, and if you notice that the amount of memory allocations is a performance bottleneck, you/we can rethink this matter.
Related
In the interest of ease of programming (local function calls instead of IPC) and performance (e.g. avoiding copies of large buffers), I'd like to have a Java VM call native code using JNI instead of through interprocess communication. There would be lots of worker threads, each doing computer vision on some image and sending back a list of detected features.
I've found a few other posts about this topic:
How to implement a native code sandbox?
Linux: Is it possible to sandbox shared library code
but in all cases, the agreed upon solution is to use multiple processes.
But I would like to explore the feasibility of partly sand boxing threads. Clearly, this goes against common sense, but I think if your client processes aren't malicious and if you can recover from faults, and in the worst case, are willing to tolerate a whole system crash once in a blue moon, it might work.
There are some hints that this is possible such as from jmajnert in #2. You would have to capture segfaults and other crashes, and terminate and restart the crashed thread. But I also want to reset the heap of the thread. That means each thread should have a private heap, but I don't know of any common malloc implementation that lets you create multiple heaps (AIX seems to).
Then I would want to close all files opened by the thread when it gets restarted.
Also, if Java objects get compromised by the native code, would it be practical to provide some fault tolerance like recreating them?
Because if complexity of hopping models between some native code, and the JVM -- The idea itself is really not even feasible.
To be feasible, you'd need to be within a single machine/threading model.
Lets assume you're in posix/ansi c.
You'd need to write a custom allocator that allocated from pools. Each time you launched a thread you'd allocate a new pool and set that pool as a thread local variable that all your custom_malloc() functions would allocate from. This way, when your thread died you could crush all of it's memory along with it.
Next, you'll need to set up some niftyness with setjmp/longjmp and signal to catch all those segfaults etc. exit the thread, crush it's memory and restart.
If you have objects from the "parent process" that you don't want to get corrupted, you'd have to create some custom mutexes that would have rollback functions that could be triggered when a threads signal handler was triggered to destroy the thread.
I have a multithreaded application where I want to allow all but one of the threads to run synchronously. However, when a specific thread wakes up I need the rest of the threads to block.
My Current implementation is:
void ManyBackgroundThreadsDoingWork()
{
AquireMutex(mutex);
DoTheBackgroundWork();
ReleaseTheMutex(mutex);
}
void MainThread()
{
AquireMutex(mutex);
DoTheMainThreadWork();
ReleaseTheMutex(mutex);
}
This works, in that it does indeed keep the background threads from operating inside the critical block while the main thread is doing its work. However, There is a lot of contention for the mutex amongst the background threads even when they don't necessarily need it. The main thread runs intermittently and the background threads are able to run concurrently with each other, just not with the main thread.
What i've effectively done is reduced a multithreaded architecture to a single threaded one using locks... which is silly. What I really want is an architecture that is multithreaded for most of the time, but then waits while a small operation completes and goes back to being multithreaded.
Edit: An explanation of the problem.
What I have is an application that displays multiple video feeds coming from pcie capture cards. The pcie capture card driver issues callbacks on threads it manages into what is effectively the ManyBackgroundThreadsDoingWork function. In this function I copy the captured video frames into buffers for rendering. The main thread is the render thread that runs intermittently. The copy threads need to block during the render to prevent tearing of the video.
My initial approach was to simply do double buffering but that is not really an option as the capture card driver won't allow me to buffer frames without pushing the frames through system memory. The technique being used is called "DirectGMA" from AMD that allows the capture card to push video frames directly into the GPU memory. The only method for synchronization is to put a glFence and mutex around the actual rendering as the capture card will be continuously streaming data to the GPU Memory. The driver offers no indication of when a frame transfer completes. The callback supplies enough information for me to know that a frame is ready to be transferred at which point I trigger the transfer. However, I need to block transfers during the scene render to prevent tearing and artifacts in the video. The technique described above is the suggested technique from the pcie card manufacturer. The technique, however, breaks down when you want more than one video playing at a time. Thus, the question.
You need a lock that supports both shared and exclusive locking modes, sometimes called a readers/writer lock. This permits multiple threads to get read (shared) locks until one thread requests an exclusive (write) lock.
My Previous Question
From the above answer, means if in my threads has create objects, i will face memory allocation/deallocation bottleneck, thus result running threads may slower or no obvious time taken diff. than no thread. What's the advantages of running multi threads in the application if I cannot allocate memory to create the object for calculations in my thread?
What's the advantages of running multi threads in the application if I cannot allocate memory to create the objects for calculations in my thread?
It depends on where your bottlenecks are. If your bottleneck is the amount of memory available, then creating more threads won't help. Or, if I/O is a bottleneck, trying to parallelize will just slightly slow down everything because of context switching. It's like trying to make an underpowered car faster by putting wider tyres in it: fixing the wrong thing doesn't help.
Threads are useful when the bottleneck is the processor and there are several processors available.
Well, if you allocate chunks of memory in a loop, things will slow down.
If you can create your objects once at the beginning of TThread.execute, the overhead will be smaller.
Threads can also be benificial if you have to wait for IO-operations, or if you have expensive calculations to do on a machine with more than one physical core.
If you have memory intensive threads (many memory allocations/deallocations) you better use TopMM instead of FastMM:
http://www.topsoftwaresite.nl/
FastMM uses a lock which blocks all other threads, TopMM does not so it scales much better on multi cores/cpus!
When it comes to multithreding, shared resources issues will always arise (with current technology). All resources that may need serialization (RAM, disk, etc.) are a possible bottleneck. Multithreading is not a magic solution that turns a slow app in a fast one, and not always result in better speed. Made in the wrong way, it can actually result in worse speed. it should be analyzed to find possible bottlenecks, and some parts could need to be rewritten to minimize bottlenecks using different techniques (i.e. preallocating memory, using async I/O, etc.). Anyway, performance is only one of the reasons to use more than one thread. There are several other reason, for example letting the user to be able to interact with the application while background threads perform operations (i.e. printing, checking data, etc.) without "locking" the user. The application that way could seem "faster" (the user can keep on using it without waiting) even if it is actually slowerd (it takes more time to finish operations than if made them serially).
How do I control the number of threads that my program is working on?
I have a program that is now ready for mutithreading but one problem is that the program is extremely memory intensive and i have to limit the number of threads running so that i don't run out of ram. The main program goes through and creates a whole bunch of handles and associated threads in suspended state.
I want the program to activate a set number of threads and when one thread finishes, it will automatically unsuspended the next thread in line until all the work has been completed. How do i do this?
Someone has once mentioned something about using a thread handler, but I can't seem to find any information about how to write one or exactly how it would work.
If anyone can help, it would be greatly appreciated.
Using windows and visual c++.
Note: i don't need to worry about the traditional problems of access with the threads, each one is completely independent of each other, its more of like batch processing rather than true mutithreading of a program.
Thanks,
-Faken
Don't create threads explicitly. Create a thread pool, see Thread Pools and queue up your work using QueueUserWorkItem. The thread pool size should be determined by the number of hardware threads available (number of cores and ratio of hyperthreading) and the ratio of CPU vs. IO your work items do. By controlling the size of the thread pool you control the number of maximum concurrent threads.
A Suspended thread doesn't use CPU resources, but it still consumes memory, so you really shouldn't be creating more threads than you want to run simultaneously.
It is better to have only as many threads as your maximum number of simultaneous tasks, and to use a queue to pass units of work to the pool of worker threads.
You can give work to the standard pool of threads created by Windows using the Windows Thread Pool API.
Be aware that you will share these threads and the queue used to submit work to them with all of the code in your process. If, for some reason, you don't want to share your worker threads with other code in your process, then you can create a FIFO queue, create as many threads as you want to run simultaneously and have each of them pull work items out of the queue. If the queue is empty they will block until work items are added to the queue.
There is so much to say here.
There are a few ways
You should only create as many thread handles as you plan on running at the same time, then reuse them when they complete. (Look up thread pool).
This guarantees that you can never have too many running at the same time. This raises the question of funding out when a thread completes. You can have a callback be called just before a thread terminates where a parameter in that callback is the thread handle that just finished. Use Boost bind and boost signals for that. When the callback is called, look for another task for that thread handle and restart the thread. That way all you have to do is add to the "tasks to do" list and the callback will remove the tasks for you. No polling needed, and no worries about too many threads.
As a side project I'm currently writing a server for an age-old game I used to play. I'm trying to make the server as loosely coupled as possible, but I am wondering what would be a good design decision for multithreading. Currently I have the following sequence of actions:
Startup (creates) ->
Server (listens for clients, creates) ->
Client (listens for commands and sends period data)
I'm assuming an average of 100 clients, as that was the max at any given time for the game. What would be the right decision as for threading of the whole thing? My current setup is as follows:
1 thread on the server which listens for new connections, on new connection create a client object and start listening again.
Client object has one thread, listening for incoming commands and sending periodic data. This is done using a non-blocking socket, so it simply checks if there's data available, deals with that and then sends messages it has queued. Login is done before the send-receive cycle is started.
One thread (for now) for the game itself, as I consider that to be separate from the whole client-server part, architecturally speaking.
This would result in a total of 102 threads. I am even considering giving the client 2 threads, one for sending and one for receiving. If I do that, I can use blocking I/O on the receiver thread, which means that thread will be mostly idle in an average situation.
My main concern is that by using this many threads I'll be hogging resources. I'm not worried about race conditions or deadlocks, as that's something I'll have to deal with anyway.
My design is setup in such a way that I could use a single thread for all client communications, no matter if it's 1 or 100. I've separated the communications logic from the client object itself, so I could implement it without having to rewrite a lot of code.
The main question is: is it wrong to use over 200 threads in an application? Does it have advantages? I'm thinking about running this on a multi-core machine, would it take a lot of advantage of multiple cores like this?
Thanks!
Out of all these threads, most of them will be blocked usually. I don't expect connections to be over 5 per minute. Commands from the client will come in infrequently, I'd say 20 per minute on average.
Going by the answers I get here (the context switching was the performance hit I was thinking about, but I didn't know that until you pointed it out, thanks!) I think I'll go for the approach with one listener, one receiver, one sender, and some miscellaneous stuff ;-)
use an event stream/queue and a thread pool to maintain the balance; this will adapt better to other machines which may have more or less cores
in general, many more active threads than you have cores will waste time context-switching
if your game consists of a lot of short actions, a circular/recycling event queue will give better performance than a fixed number of threads
To answer the question simply, it is entirely wrong to use 200 threads on today's hardware.
Each thread takes up 1 MB of memory, so you're taking up 200MB of page file before you even start doing anything useful.
By all means break your operations up into little pieces that can be safely run on any thread, but put those operations on queues and have a fixed, limited number of worker threads servicing those queues.
Update: Does wasting 200MB matter? On a 32-bit machine, it's 10% of the entire theoretical address space for a process - no further questions. On a 64-bit machine, it sounds like a drop in the ocean of what could be theoretically available, but in practice it's still a very big chunk (or rather, a large number of pretty big chunks) of storage being pointlessly reserved by the application, and which then has to be managed by the OS. It has the effect of surrounding each client's valuable information with lots of worthless padding, which destroys locality, defeating the OS and CPU's attempts to keep frequently accessed stuff in the fastest layers of cache.
In any case, the memory wastage is just one part of the insanity. Unless you have 200 cores (and an OS capable of utilizing) then you don't really have 200 parallel threads. You have (say) 8 cores, each frantically switching between 25 threads. Naively you might think that as a result of this, each thread experiences the equivalent of running on a core that is 25 times slower. But it's actually much worse than that - the OS spends more time taking one thread off a core and putting another one on it ("context switching") than it does actually allowing your code to run.
Just look at how any well-known successful design tackles this kind of problem. The CLR's thread pool (even if you're not using it) serves as a fine example. It starts off assuming just one thread per core will be sufficient. It allows more to be created, but only to ensure that badly designed parallel algorithms will eventually complete. It refuses to create more than 2 threads per second, so it effectively punishes thread-greedy algorithms by slowing them down.
I write in .NET and I'm not sure if the way I code is due to .NET limitations and their API design or if this is a standard way of doing things, but this is how I've done this kind of thing in the past:
A queue object that will be used for processing incoming data. This should be sync locked between the queuing thread and worker thread to avoid race conditions.
A worker thread for processing data in the queue. The thread that queues up the data queue uses semaphore to notify this thread to process items in the queue. This thread will start itself before any of the other threads and contain a continuous loop that can run until it receives a shut down request. The first instruction in the loop is a flag to pause/continue/terminate processing. The flag will be initially set to pause so that the thread sits in an idle state (instead of looping continuously) while there is no processing to be done. The queuing thread will change the flag when there are items in the queue to be processed. This thread will then process a single item in the queue on each iteration of the loop. When the queue is empty it will set the flag back to pause so that on the next iteration of the loop it will wait until the queuing process notifies it that there is more work to be done.
One connection listener thread which listens for incoming connection requests and passes these off to...
A connection processing thread that creates the connection/session. Having a separate thread from your connection listener thread means that you're reducing the potential for missed connection requests due to reduced resources while that thread is processing requests.
An incoming data listener thread that listens for incoming data on the current connection. All data is passed off to a queuing thread to be queued up for processing. Your listener threads should do as little as possible outside of basic listening and passing the data off for processing.
A queuing thread that queues up the data in the right order so everything can be processed correctly, this thread raises the semaphore to the processing queue to let it know there's data to be processed. Having this thread separate from the incoming data listener means that you're less likely to miss incoming data.
Some session object which is passed between methods so that each user's session is self contained throughout the threading model.
This keeps threads down to as simple but as robust a model as I've figured out. I would love to find a simpler model than this, but I've found that if I try and reduce the threading model any further, that I start missing data on the network stream or miss connection requests.
It also assists with TDD (Test Driven Development) such that each thread is processing a single task and is much easier to code tests for. Having hundreds of threads can quickly become a resource allocation nightmare, while having a single thread becomes a maintenance nightmare.
It's far simpler to keep one thread per logical task the same way you would have one method per task in a TDD environment and you can logically separate what each should be doing. It's easier to spot potential problems and far easier to fix them.
What's your platform? If Windows then I'd suggest looking at async operations and thread pools (or I/O Completion Ports directly if you're working at the Win32 API level in C/C++).
The idea is that you have a small number of threads that deal with your I/O and this makes your system capable of scaling to large numbers of concurrent connections because there's no relationship between the number of connections and the number of threads used by the process that is serving them. As expected, .Net insulates you from the details and Win32 doesn't.
The challenge of using async I/O and this style of server is that the processing of client requests becomes a state machine on the server and the data arriving triggers changes of state. Sometimes this takes some getting used to but once you do it's really rather marvellous;)
I've got some free code that demonstrates various server designs in C++ using IOCP here.
If you're using unix or need to be cross platform and you're in C++ then you might want to look at boost ASIO which provides async I/O functionality.
I think the question you should be asking is not if 200 as a general thread number is good or bad, but rather how many of those threads are going to be active.
If only several of them are active at any given moment, while all the others are sleeping or waiting or whatnot, then you're fine. Sleeping threads, in this context, cost you nothing.
However if all of those 200 threads are active, you're going to have your CPU wasting so much time doing thread context switches between all those ~200 threads.