How to chose priority for threads? - multithreading

Considering one core , when multiple request is arrived at server at the same timestamp, and all have the same priority ,for which request the thread would be allotted first ?
Ex: CPU has single core and has 2 thread. Now the 4 people has made the request (process) A,B,C,D to any server & server need to assign threads in the message queue in order to process those request. But which 2 process would be given chance first to assign those 2 threads ?
Assumption they all have arrived at same timestamp and have equal priority.

TUSHAR, there is a bit of a language gap occurring here. Considering you chose, kernel, and didn't seem to think it was something to do with algebra, I am going to translate your question:
In a single CPU system, when multiple interrupts are asserted
simultaneously, and all have the same priority, which handler would be
serviced first?
The first bit of info is that most interrupt controllers are little more than a priority encoder with some extra glue. As such, they have no notion of same priority, but that is less important than you might think.
Real Time Operating Systems, in particular, seek to disassociate their implementation with the hardware, and may even dynamically adjust interrupt priorities to suit the current workload. The key here is that the OS spends a minimal time at the mercy of the interrupt controller, and chooses what to do based upon its state. As the system designer, you can choose what happens.
Time Sharing Operating Systems also have some control over this; but typically less as they strive for maximum throughput rather than predictable response. As such, they might do anything from first-in-first-served, random-served, or even random-starved.
So the answer to your question depends upon your environment. For the most part, if you have a very simple environment (eg. an executive like vxWorks or freeRTOS), expect it to follow the dictates of the interrupt controller. If you have a more sophisticated device OS (eg. INTEGRITY or QNX) it is up to your configuration. If you have Linux/winDOS, there are likely 320 control knobs that all result in burning the toast.

Related

Java Threads and number of Cores

Is it recommended that the number of threads in a java application should be less than the number of cpu cores?
If so why is this the case and what are the implications of using threads greater than the number of cpu cores ?
You will probably not get any definitive answer on the question of knowing, generally speaking, how many threads an app should have, in relation to the number of core(s) the underlying computer has.
One may also argue that, at the time of PaaS software design and/or elastic clusters, the notion of a fixed number of cores for any given process might be overrated.
Still, the first part of your question :
Is it recommended that the number of threads in a java application should be less than the number of cpu cores?
This has a definitive answer, which is a "no" (once more : as a general rule). And the reason why, shortly, it that all created threads are not typically running (and maybe more importantly runable) at once, meaning there is an opportunity to optimize here.
As a support to this discussion, I'll oppose two ways of creating apps, you could call it "classical" versus "reactive", although this is not a generally acceptable division. Yet, let's use this as a support.
Classical application design
I label as classical applications that rely mostly on "blocking" calls and/or "thread per request" pattern. Consider the traditional way I/Os are done (socket communication like HTTP or Database connection, hard drive based file reading, ...) : your app thread calls some kind of read or write method, which usually triggers an OS level call, that blocks your app thread, fills some device buffer at the OS level (say, read from a disk). Once the buffer has received enough data, the OS signals your java app and thread to resume activity, and the read method returns with the data from the buffer.
The whole time the OS is working (usually just a tiny fraction of a second, but still some large amount of time compared to your typical GHz CPU speed), your Java thread is in state BLOCKED_WAITING, waiting for the OS to signal it can resume. This happens all the time. A code profiler tool, like JProfiler, or YourKit, can help you measure this time. If you do so, you'll notice that in many apps doing I/O, this is a significant part of the so-called "wall time" or "clock time" that is spent... waiting.
So we have one thread waiting, meaning it is not using any CPU time. It can be scheduled out, and the OS is free to give CPU time to anybody else.
Suppose this is a one core CPU, then NOW would be a good time to have another thread to feed the CPU. Meaning having two or more threads could be a good design to maximize CPU usage even on a single core CPU, and get the most out of your hardware.
Most "classical" web applications are typically subject to this type of CPU underuse if you follow the rule of "one thread per CPU core", because Socket communications (or more typically : the time spent waiting for a response to your SQL queries) will incur so much blocking.
If you raise the number of threads your app has, then even if one or two long running requests remain waiting, other faster requests will have runnable threads to run them, and you'll get better CPU usage, and better performance (number of concurrent requests). That is... untill something else reaches saturation (too many heavy requests on your DB, too many simultaneous hard drive reads/writes...)
Reactive app design
Recognizing this typical behavior of apps, and using different sets of OS features, some application frameworks now use non blocking patterns (even for I/O) to mitigate the above issues. Examples in the Java ecosystem are NIO based networking stacks like Netty, or actor pattern implementations like Akka.
In a typical "reactive" app, one usually abandons the "thread per request" pattern that we have in classical apps (meaning one thread is responsible for handling everything from start to finish of a given user request, and waiting when need be for external resources to become available), in favor of a vastly more modular, and non-blocking approach.
Threads are given more technical-grained bits of work to do, and each thread will hand-off work to one another and callbacks to hear back when work they depend upon is done. This "handing of" of units of work means each thread can quickly grab new units of work it is able to handle. Meaning one of two things : you achieve higher CPU usage with far fewer threads in your app (because each can grab work more efficiently, instead of just sitting "waiting") ; or you can instantiate many many more threads because they'll mostly be waiting (not saturating the CPUs), and the dynamical hand-off will still allow for good CPU usage.
Conclusion
Anyway, you don't design the number of threads solely based on the number of available cores. The nature of your implementation and work dictates the number of optimal threads to create.
On a classical app-design philosophy, the two numbers are more closely related than on a reactive one, but still, we have different profiles :
a very simple server app can accomodate many more threads than CPU cores, because it will allow for better throughput (the limit being, say, the output network bandwidth).
a SQL heavy app, should be scalled to the point where your app server will saturate the SQL backend. As your app server will be mostly waiting for your SQL server, then this is the limit
a mixed application consisting of some SQL heavy work, and some lightweight work, will need precision tuning, because you don't want the stuck threads (those blocked waiting for the DB) starving the light requests that would be served more rapidly
a compute intensive program (say, a cryptography service) will probably benefit from a number of threads close to the number CPU cores (if your algorithm is implemented in a classical way), because creating more threads than you are able to run is pointless. In an actor based implementation, creating more threads could actually be a win.

What makes a kernel/OS real-time?

I was reading this article, but my question is on a generic level, I was thinking along the following lines:
Can a kernel be called real time just because it has a real time scheduler? Or in other words, say I have a linux kernel, and if I change the default scheduler from O(1) or CFS to a real time scheduler, will it become an RTOS?
Does it require any support from the hardware? Generally I have seen embedded devices having an RTOS (eg VxWorks, QNX), do these have any special provisions/hw to support them? I know RTOS process's running time is deterministic, but then one can use longjump/setjump to get the output in determined time.
I'd really appreciate some input/insight on it, if I am wrong about something, please correct me.
After doing some research, talking to poeple (Jamie Hanrahan, Juha Aaltonen #linkedIn Group - Device Driver Experts) and ofcourse the input from #Jim Garrison, this what I can conclude:
In Jamie Hanrahan's words-
What makes a kernel real time?
The sine qua non of a real time OS -
The ability to guarantee a maximum latency between an external interrupt and the start of the interrupt handler.
Note that the maximum latency need not be particularly short (e.g. microseconds), you could have a real time OS that guaranteed an absolute maximum latency of 137 milliseconds.
A real time scheduler is one that offers completely predictable (to the developer) behavior of thread scheduling - "which thread runs next".
This is generally separate from the issue of a guaranteed maximum latency to responding to an interrupt (since interrupt handlers are not necessarily scheduled like ordinary threads) but it is often necessary to implement a real-time application. Schedulers in real-time OSs generally implement a large number of priority levels. And they almost always implement priority inheritance, to avoid priority inversion situations.
So, it is good to have a guaranteed latency for an interrupt and predictability of thread scheduling, then why not make every OS real time?
Because an OS suited for general purpose use (servers and/or desktops) needs to have characteristics that are generally at odds with real-time latency guarantees.
For example, a real-time scheduler should have completely predictable behavior. That means, among other things, that whatever priorities have been assigned to the various tasks by the developer should be left alone by the OS. This might mean that some low-priority tasks end up being starved for long periods of time. But the RT OS has to shrug and say "that's what the dev wanted." Note that to get the correct behavior, the RT system developer has to worry a lot about things like task priorities and CPU affinities.
A general-purpose OS is just the opposite. You want to be able to just throw apps and services on it, almost always things written by many different vendors (instead of being one tightly integrated system as in most R-T systems), and get good performance. Perhaps not the absolute best possible performance, but good.
Note that "good performance" is not just measured in interrupt latency. In particular, you want CPU and other resource allocations that are often described as "fair", without the user or admin or even the app developers having to worry much if at all about things like thread priorities and CPU affinities and NUMA nodes. One job might be more important than another, but in a general-purpose OS, that doesn't mean that the second job should get no resources at all.
So the general purpose OS will usually implement time-slicing among threads of equal priority, and it may adjust the priorities of threads according to their past behavior (e.g. a CPU hog might have its priority reduced; an I/O bound thread might have its priority increased, so it can keep the I/O devices working; a CPU-starved thread might have its priority boosted so it can get a little bit of CPU time now and then).
Can a kernel be called real time just because it has a real time scheduler?
No, an RT scheduler is a necessary component of an RT OS, but you also need predictable behavior in other parts of the OS.
Does it require any support from the hardware?
In general, the simpler the hardware the more predictable its behavior is. So PCI-E is less predictable than PCI, and PCI is less predictable than ISA, etc. There are specific I/O buses that were designed for (among other things) easy predictability of e.g. interrupt latency, but a lot of R-T requirements can be met these days with commodity hardware.
The specific description of real-time is that processes have minimum response time guarantees. This is often not sufficient for the application, and even less important than determinism. This is especially hard to achieve with modern feature rich OS's. Consider:
If I want to command some hardware or a machine at precise points in time, I need to be able to generate command signals at those specific moments, often with far sub millisecond accuracy. Generally if you compile let's say a C-code that runs a loop that waits for "half a millisecond" and does something, the wait time is not exactly half a millisecond, it is a little bit more, since the way common OS's handle this, is that they put the process aside at least up until the correct time has passed, after which the scheduler might (at some point) pick it up again.
What is seriously problematic is not that the time t is not exactly half a second but that it cannot be known in advance how much more it is. This inaccuracy is not constant nor deterministic.
This has surprising consequences when doing physical automation. For example it is impossible to command a stepper motor accurately with any typical OS without using dedicated hardware through kernel interfaces and telling them how long time steps you really want. Because of this, a single AVR module can command several motors accurately, but a Raspberry Pi (that absolutely stomps the AVR in terms of clockspeed) cannot manage more than 2 with any typical OS.

These days, what are good reasons for setting thread affinity rather than leaving it to the OS?

Searching answers here for "thread affinity", I see a lot of interest in doing it but little justification for it save possibly getting stable QueryPerformanceTimer results.
Assuming a modern OS and a modern 2-4 socket workstation/server class machine with modern 4-6 core CPUs, what good reasons would anyone have for thinking they know better than their OS's scheduler ? Are there any real world situations where taking more control of thead affinity is the right thing to do ? What sort of performance benefits can be demonstrated ?
The last time I saw a really good case for setting thread affinity somewhere (as in, it was backed up by concrete results showing genuine and significant improvements in system performance), it was some obscure thing to do with Win2K device drivers. But I haven't seen anything like that in years so when someone tells me they need to control thread affinity (but not why) these days I am deeply sceptical... but curious to be shown otherwise.
The primary reason is if you have something that depends heavily upon caching. The OS scheduler doesn't necessarily take that into account to the degree you might like.
I use it to assign threads to cores; for example in a simulation you do the physics entirely on one core, and allow the rest of the computation to be executed on another one. It makes sense to be able to control this, if you're on a tight environment where you know the hardware.
Of course, configuring this needs to be done per system, so by default I let the OS decide the cores on which to run, but keep the option of restricting core usage.
In the OS kernel and sometimes in kernel mode drivers you need to perform the same action on every CPU (e.g. update a system register). You can do that in a loop in a single thread, changing the affinity on each iteration.
For desktops it's quite unnecessary.
But I can see some applications where it would help. For example the CPU cache likes it if the app that runs on it doesn't change.
Another possibility is you have a critical task - you give it an entire CPU, and the other tasks use the rest of the CPUs.
Or the opposite: You have some low priority tasks, you put them all on one CPU, then leave the others free for more important tasks (using process priority will give you most of this benefit without having affinity, but I can imagine some memory heavy cases where it wouldn't).
I would agree its best to leave to the CPU to figure this out in most situations. However, the most common reason to go for thread affinity as far as I have seen is when you need good cache dependency. In multiple CPU systems, when a particular CPU caches something individually for itself and if the same thing has been cached in some other CPU, then I believe it can automatically get invalidated on the other CPU. So if a particular thread keeps changing CPUs on which it executes, then the cache hit rate will be too less. So in this case I guess it makes sense for the programmer to be a better judge of the COU affinities.
I also think the above point by Ariel about making sure a critical task constantly gets a CPU without throttling other low priority processes also makes sense.

Two processes on two CPUs -- is it possible that they complete at exactly the same moment?

This is sort of a strange question that's been bothering me lately. In our modern world of multi-core CPUs and multi-threaded operating systems, we can run many processes with true hardware concurrency. Let's say I spawn two instances of Program A in two separate processes at the same time. Disregarding OS-level interference which may alter the execution time for either or both processes, is it possible for both of these processes to complete at exactly the same moment in time? Is there any specific hardware/operating-system mechanism that may prevent this?
Now before the pedants grill me on this, I want to clarify my definition of "exactly the same moment". I'm not talking about time in the cosmic sense, only as it pertains to the operation of a computer. So if two processes complete at the same time, that means that they complete
with a time difference that is so small, the computer cannot tell the difference.
EDIT : by "OS-level interference" I mean things like interrupts, various techniques to resolve resource contention that the OS may use, etc.
Actually, thinking about time in the "cosmic sense" is a good way to think about time in a distributed system (including multi-core systems). Not all systems (or cores) advance their clocks at exactly the same rate, making it hard to actually tell which events happened first (going by wall clock time). Because of this inability to agree, systems tend to measure time by logical clocks. Two events happen concurrently (i.e., "exactly at the same time") if they are not ordered by sharing data with each other or otherwise coordinating their execution.
Also, you need to define when exactly a process has "exited." Thinking in Linux, is it when it prints an "exiting" message to the screen? When it returns from main()? When it executes the exit() system call? When its process state is run set to "exiting" in the kernel? When the process's parent receives a SIGCHLD?
So getting back to your question (with a precise definition for "exactly at the same time"), the two processes can end (or do any other event) at exactly the same time as long as nothing coordinates their exiting (or other event). What counts as coordination depends on your architecture and its memory model, so some of the "exited" conditions listed above might always be ordered at a low level or by synchronization in the OS.
You don't even need "exactly" at the same time. Sometimes you can be close enough to seem concurrent. Even on a single core with no true concurrency, two processes could appear to exit at the same time if, for instance, two child processes exited before their parent was next scheduled. It doesn't matter which one really exited first; the parent will see that in an instant while it wasn't running, both children died.
So if two processes complete at the same time, that means that they complete with a time difference that is so small, the computer cannot tell the difference.
Sure, why not? Except for shared memory (and other resources, see below), they're operating independently.
Is there any specific hardware/operating-system mechanism that may prevent this?
Anything that is a resource contention:
memory access
disk access
network access
explicit concurrency management via locks/semaphores/mutexes/etc.
To be more specific: these are separate CPU cores. That means they have computing circuitry implemented in separate logic circuits. From the wikipedia page:
The fact that each core can have its own memory cache means that it is quite possible for most of the computation to occur as interaction of each core with its own cache. Once you have that, it's just a matter of probability. That's not to say that algorithms take a nondeterministic amount of time, but their inputs may come from a probabilistic distribution and the amount of time it takes to run is unlikely to be completely independent of input data unless the algorithm has been carefully designed to take the same amount of time.
Well I'm going to go with I doubt it:
Internally any sensible OS maintains a list of running processes.
It therefore seems sensible for us to define the moment that the process completes as the moment that it is removed from this list.
It also strikes me as fairly unlikely (but not impossible) that a typical OS will go to the effort to construct this list in such a way that two threads can independently remove an item from this list at exactly the same time (processes don't terminate that frequently and removing an item from a list is relatively inexpensive - I can't see any real reason why they wouldn't just lock the entire list instead).
Therefore for any two terminating processes A and B (where A terminates before B), there will always be a reasonably large time period (in a cosmic sense) where A has terminated and B has not.
That said it is of course possible to produce such a list, and so in reality it depends on the OS.
Also I don't really understand the point of this question, in particular what do you mean by
the computer cannot tell the difference
In order for the computer to tell the difference it has to be able to check the running process table at a point where A has terminated and B has not - if the OS schedules removing process B from the process table immediately after process A then it could very easily be that no such code gets a chance to execute and so by some definitions it isn't possible for the computer to tell the difference - this sutation holds true even on a single core / CPU processor.
Yes, without any OS Scheduling interference they could finish at the same time, if they don't have any resource contention (shared memory, external io, system calls). When either of them have a lock on a resource they will force the other to stall waiting for resource to free up.

When should I consider changing thread priority

I once was asked to increase thread priority to fix a problem. I refused, saying that changing it was dangerous and was not the root cause of the problem.
My question is, under what circumstannces should I conider changing priority of threads?
When you've made a list of the threads you're using and defined a priority order for them which makes sense in terms of the work they do.
If you nudge threads up here and there in order to bodge your way out of a problem, eventually they'll all be high priority and you're back where you started. Don't assume you can fix a race condition with prioritisation when really it needs locking, because chances are you've only fixed it in friendly conditions. There may still be cases where it can fail, such as when the lower-priority thread has undergone priority inheritance because another high-priority thread is waiting on another lock it's holding.
If you classify threads along the lines of "these threads fill the audio buffer", "these threads make my app responsive to system events", "these threads make my app responsive to the user", "these threads are getting on with some business and will report when they're good and ready", then the threads ought to be prioritised accordingly.
Finally, it depends on the OS. If thread priority is completely secondary to process priority, then it shouldn't be "dangerous" to prioritise threads: the only thing you can starve of CPU is yourself. But if your high-priority threads run in preference to the normal-priority threads of other, unrelated applications, then you have a broader responsibility. You should only be raising priorities of threads which do small amounts of urgent work. The definition of "small" depends what kind of device you're on - with a 3GHz multi-core processor you get away with a lot, but a mobile device might have pseudo real-time expectations that user-level apps can break.
Keeping the audio buffer serviced is the canonical example of when to be high priority, though, since small under-runs usually cause nasty crackling. Long downloads (or other slow I/O) are the canonical example of when to be low priority, since there's no urgency processing this chunk of data if the next one won't be along for ages anyway. If you're ever writing a device driver you'll need to make more complex decisions how to play nicely with others.
Not many. The only time I've ever had to change thread priorities in a positive direction was with a user interface thread. UIs must be extremely snappy in order for the app to feel right, so a lot of times it is best to prioritize painting threads higher than others. For example, the Swing Event Dispatch Thread runs at priority 6 by default (1 higher than the default).
I do push threads down in priority quite a bit. Again, this is usually to keep the UI responsive while some long-running background process does its thing. However, this also will sometimes apply to polling daemons and the like which I know that I don't want to be interfering with anything, regardless of how minimal the interference.
Our app uses a background thread to download data and we didn't want that interfering with the UI thread on single-core machines, so we deliberately prioritized that lower.
I think it depends on the direction you're looking at changing the priority.
Normally you shouldn't ever increase thread priority unless you have a very good reason. Increasing thread priority can cause your app's thread to start taking away time from other applications, which probably isn't what the user wants. If your thread is using up a significant amount of CPU it can make the machine hard to use, as some standard UI threads may start to starve.
I'd say the only times you should increase priority above normal is if the user explicitly told your app to do so, but even then you want to prevent "clueless" users from doing so. Maybe if your app doesn't use much CPU normally, but might have brief bursts of really really important activity then it could be OK to have an increased priority, as it wouldn't normally detract from the user's general experience.
Decreasing priority is another matter. If your app is doing something that takes a LOT of CPU and runs for a long time, yet isn't critical, then lowering the priority can be good. By lowering the priority you allow the CPU to be used for other things when it's needed, which helps keep the system responding quickly. As long as the system is mostly idling other than your app you'll still get most of the CPU time, but won't take away from tasks that need it more than you. An example of this would be a thread that indexes the hard drive (think google desktop).
I would say when your original design assumptions about the threads are no longer valid.
Thread priority is mostly a design decision about what work is most important. So for some examples of when to reconsider: If you add a new feature that might require its own thread that becomes more important, then reconsider thread priorities. If some requirements change that force you to reconsider the priorities of the work you are doing, then reconsider. Or, if you do performance testing and realize that your "high priority work" as specified in your design do not get the required performance, then tweak priorities.
Otherwise, its often a hack.

Resources