Does -XX:+CMSIncrementalMode run on application threads or in GC-dedicated threads? - garbage-collection

When reading Really? iCMS? Really? from this blog, one statement caught my attention:
The concurrent phases are typically long (think seconds and not milliseconds).
If CMS hogged the single hardware thread for several
seconds, the application would not execute during those
several seconds and would in
effect experience a stop-the-world pause.
Which doesn't make sense to me on preemptive operating systems. My assumption is that CMS has one or more collector threads running. Another hypothesis would be that instead of having CMS having dedicated GC threads executing the garbage collection we are talking about making application threads interleave their logic with GC logic (time-multiplexing).
Is this the case? What am I getting wrong here?
Thanks

In HotSpot JVM, the Garbage Collector (including CMS and i-CMS) uses dedicated worker threads.
CMS threads run concurrently with application threads, but they have higher priority: NearMaxPriority. On a single core machine, CMS cycle could indeed make application threads starving. The idea of CMS incremental mode was to make GC voluntarily yield CPU to the application without relying on OS scheduler.
From HotSpot GC Tuning Guide:
Normally, the CMS collector uses one or more processors during the
entire concurrent tracing phase, without voluntarily relinquishing
them. Similarly, one processor is used for the entire concurrent sweep
phase, again without relinquishing it. This overhead can be too much
of a disruption for applications with response time constraints that
might otherwise have used the processing cores, particularly when run
on systems with just one or two processors. Incremental mode solves
this problem by breaking up the concurrent phases into short bursts of
activity, which are scheduled to occur midway between minor pauses.
Note that CMS incremental mode was deprecated long ago in 2012.

Related

How many thread Pools are allowed to be created?

I have a Spring Boot application in which everytime API call is made, I am creating an ExecutorService with fixedThreadPool size of 5 threads and passing around 500 tasks to CompletableFuture to run Async. I am using this for a migration of lakhs of data.
As I started the migration, initially API was working fine and each API Call ( Basically code logic + ThreadPool Creation + Jobs Assignment to threads ) was taking around just 200 ms or so. But as API calls increased and new threadpools kept on creating, I can see gradual increase in time being taken to Create the thread Pool and assign the jobs, as a result API response time went till 4 secs.
Note : After the jobs are done, i am shutting down the executor service in finally block.
Question :
Can multiple creation create overhead to the application and do those pools keep on piling up?
Wont there be any automatic garbage collection to this ?
Will there be any limit to how many pools get created ?
And what could be causing this time delay ..
I can add further clarifications based on specific queries..
Can multiple creation create overhead to the application and do those pools keep on piling up?
Yes absolutely. Unless you shutdown the thread pools, they won't be destroyed automatically and consume resources. See next question for more details.
Wont there be any automatic garbage collection to this ?
You need to take care that the thread pools are destructed after they are no longer needed. For example, the javadoc of ThreadPoolExecutor provides some hints:
A pool that is no longer referenced in a program AND has no remaining threads will be shutdown automatically. If you would like to ensure that unreferenced pools are reclaimed even if users forget to call shutdown(), then you must arrange that unused threads eventually die, by setting appropriate keep-alive times, using a lower bound of zero core threads and/or setting allowCoreThreadTimeOut(boolean).
Will there be any limit to how many pools get created ?
There is no hard limit on how many threads are supported by Java, however there may be restrictions depending on your operating system and available resources such as memory. This is quite a complex question, more details can be found in the answers to this question: How many threads can a Java VM support?
And what could be causing this time delay?
I assume that you don't have a proper cleanup / shutdown mechanism in place for the thread pools. Every thread allocates at least 1 MB of memory for the thread stack. For example, the more threads you create, the more memory your application consumes. Depending on the system / jvm configuration, the application may utilize swap which dramatically slows down the performance.
There may be other things that cause a drop in performance, so this is just what came to my mind right now.
Profilers will help you to identify performance issues or resource leaks. This article by Baeldung shows a few profilers you could use.

Can you run NodeJs parallelly in a single-core CPU?

I know that a single-core CPU (typically) will be able to have 2 threads running. So does this means you can have NodeJs running parallelly in a single-core CPU?
First off, nodejs only runs your Javascript in a single thread, regardless of how many CPUs there are (assuming there are no WorkerThreads being used). It may use some other threads internally for the implementation of some library functions (like file operations or asynchronous crypto operations). But, it only ever uses a single thread/CPU to execute your Javascript instructions.
So does this means you can have NodeJs running parallelly in a single-core CPU?
That depends upon what you mean by "parallelly".
Your operating system supports lots of threads, even with only a single CPU. But, when you only have a single CPU, those threads get time-sliced across the single CPU such that none of them are ever actually running at the same time.
One thread gets to run for a short time, then the OS suspends that thread, context switches to another thread, runs it for a short time, suspends that thread, context switches to another thread and so on.
So, the one CPU is "shared" among multiple threads, but no each thread is still running one at a time (typically for short periods of time).
The more CPUs you have, the more threads can run simultaneously where there is true parallel execution.
This is all managed by the OS, independent of nodejs or any particular app running on the computer. Also, please be aware that a typical modern OS has a lot of services running in the OS. Each of these services may also have their own threads that needs to use the CPU from time to time in order to keep the OS and its services running properly. For example, you might be doing a file system backup while typing into your word processor, while running a nodejs app. That can all happen on a single CPU by just sharing it between the different threads that all want to have some cycles. Apps or services that need lots of CPU to do their job will run more slowly when a single CPU is being shared among a bunch of different uses, but they will all still proceed via the time-slicing.
Time-slicing on a single CPU will give the appearance of parallel execution because multiple threads can appear to be making progress, but in reality, one thread may run for a few milliseconds, then the OS switches over to another thread which runs for a few milliseconds and so on. Tasks get done in parallel (at a somewhat slower rate) even though both tasks are never actually using the CPU at exactly the same moment.

tasks Scheduler and CPU isolation in Linux

I'm a kernel noob including schedulers. I understand that there is a IO scheduler and a task scheduler and according to this post IO scheduler uses normal tasks that are handled by the task schedule in the end.
So if I run an user space thread that was assigned to an isolated core (using isolcpus) and it will do some IO operation, will the the
task created by the IO scheduler get executed on the isolated core ?
Since CFS seems to favor user interaction does this mean that CPU intensive threads might get a lower CPU time in the long run?
Isolating cores can help mitigate this issue?
Isolating cores can decrease the scheduling latency (the time it takes for a thread that was marked as runnable to get executed ) for
the threads that are pined to the isolated cores?
So if I run an user space thread that was assigned to an isolated core
(using isolcpus) and it will do some IO operation, will the the task
created by the IO scheduler get executed on the isolated core ?
What isolcpus is doing is taking that particular core out of kernel list of cpu where it can schedule tasks. So once you isolate a cpu from kernel's list of cpus it will never schedule any task on that core, no matter whether that core is idle or is being used by some other process/thread.
Since CFS seems to favor user interaction does this mean that CPU
intensive threads might get a lower CPU time in the long run?
Isolating cores can help mitigate this issue?
Isolating cpu has a different use altogether in my opinion. Basically if your applications has both fast threads(threads with no system calls, and are latency sensitive) and slow threads(threads with system calls) you would want to have dedicated cpu cores for your fast threads so that they are not interrupted by kernel's scheduling process and hence can run to their completion without any noise. Fast threads are usually latency sensitive. On the other hand slow threads or threads which are not really latency sensitive and are doing supporting logic for your application need not have dedicated cpu cores. As mentioned earlier isloting cpu servers a different purpose. We do all this all the time in our organization.
Isolating cores can decrease the scheduling latency (the time it takes
for a thread that was marked as runnable to get executed ) for the
threads that are pined to the isolated cores?
Since you are taking cpus from kernel's list of cpus this will surely impact other threads and processes, but then again you would want to pay extra thought and attention to what really is your latency sensitive code and you would want to separate it from your non-latency sensitive code.
Hope it helps.

What Use are Threads Outside of Parallel Problems on MultiCore Systems?

Threads make the design, implementation and debugging of a program significantly more difficult.
Yet many people seem to think that every task in a program that can be threaded should be threaded, even on a single core system.
I can understand threading something like an MPEG2 decoder that's going to run on a multicore cpu ( which I've done ), but what can justify the significant development costs threading entails when you're talking about a single core system or even a multicore system if your task doesn't gain significant performance from a parallel implementation?
Or more succinctly, what kinds of non-performance related problems justify threading?
Edit
Well I just ran across one instance that's not CPU limited but threads make a big difference:
TCP, HTTP and the Multi-Threading Sweet Spot
Multiple threads are pretty useful when trying to max out your bandwidth to another peer over a high latency network connection. Non-blocking I/O would use significantly less local CPU resources, but would be much more difficult to design and implement.
Performing a CPU intensive task without blocking the user interface, for example.
Any application in which you may be waiting around for a resource (for example, blocking I/O from network sockets or disk devices) can benefit from threading.
In that case the thread blocking on the slow operation can be put to sleep while other threads continue to run (including, under some operating systems, the GUI thread which, if the OS cannot contact it for a while, will offer the use the chance to destroy it, thinking it's deadlocked somehow).
So it's not just for multi-core machines at all.
An interesting example is a webserver - you need to be able to handle multiple incoming connections that have nothing to do with each other.
what kinds of non-performance related
problems justify threading?
Web applications are the classic example. Each user request is conceptually a new thread. Nothing to do with performance, it's just a natural fit for the design.
Blocking code is usually much simpler to write and easier to read (and therefore maintain) than non-blocking code. Yet, using blocking code limits you to a single execution path and also locks out things like user interface (mentioned) and other IO ports. Threading is an elegant solution in these cases.
Another case when multithreading is to be considered is when you have several near-synchronous IO channels that should be managed: using multiple threads (and usually a local message queue) allows for much clearer code.
Here are a couple of specific and simple scenarios where I have launched threads...
A long running report request by the user. When the report is submitted, it is placed in a queue to be processed by a separate thread. The user can then go on within the application and check back later to see the status of their report, they aren't left with a "Processing..." page or icon.
A thread that iterates cache storage, removing data that has expired or no longer needed. The thread's job within the application is independent of any processing for a specific user, but part of the overall application run-time maintenance.
although, not specifically a threading scenario, logging within our web site is handed off to a parallel process, so the throughput of the web site isn't hindered by the time it takes to record log data.
I agree that threading just for threadings sake isn't a good idea and it can introduce problems within your application if isn't done properly, but it is an extremely useful tool for solving some problems.
Whenever you need to call some external component (be it a database query, a 3. party library, an operating system primitive etc.) that only provides a synchronous/blocking interface or using the asynchronous interface not worth the extra trouble and pain - and you also need some form of concurrency - e.g. serving multiple clients in a server or keep the GUI still responsive.
Well, how do you know if you're app is going to run on a multi-core system or not?
Beyond that, there are a lot of processes that take up time, but don't require the CPU. Such as writing to a disk or networking. Who wants to push a button in a GUI and then have to sit there and wait for a network connection. Even on a single core machine, having a separate IO thread greatly improves user experience. You always at least want a separate thread for the UI.
Yet many people seem to think that
every task in a program that can be
threaded should be threaded, even on a
single core system.
"Many people"... Who?
Also from my experience many many programs that should be multithreaded aren't (especially games.. I have an i7 and yet most games still use only 1 of my cores), so I'm not sure what you're talking about. Definitely programs like calc.exe are not multithread (or, if they are, 1 thread does 99% of the work).
Performing a CPU intensive task
without blocking the user interface,
for example.
Yes, this is true but this is fairly easy to implement and it's not what the OP is referring to (since, in this case, 1 thread does almost all the work and you only need very few mutexes)

How to determine the best number of threads in Tomcat?

How does one determine the best number of maxSpare, minSpare and maxThreads, acceptCount etc in Tomcat? Are there existing best practices?
I do understand this needs to be based on hardware (e.g. per core) and can only be a basis for further performance testing and optimization on specific hardware.
the "how many threads problem" is quite a big and complicated issue, and cannot be answered with a simple rule of thumb.
Considering how many cores you have is useful for multi threaded applications that tend to consume a lot of CPU, like number crunching and the like. This is rarely the case for a web-app, which is usually hogged not by CPU but by other factors.
One common limitation is lag between you and other external systems, most notably your DB. Each time a request arrive, it will probably query the database a number of times, which means streaming some bytes over a JDBC connection, then waiting for those bytes to arrive to the database (even is it's on localhost there is still a small lag), then waiting for the DB to consider our request, then wait for the database to process it (the database itself will be waiting for the disk to seek to a certain region) etc...
During all this time, the thread is idle, so another thread could easily use that CPU resources to do something useful. It's quite common to see 40% to 80% of time spent in waiting on DB response.
The same happens also on the other side of the connection. While a thread of yours is writing its output to the browser, the speed of the CLIENT connection may keep your thread idle waiting for the browser to ack that a certain packet has been received. (This was quite an issue some years ago, recent kernels and JVMs use larger buffers to prevent your threads for idling that way, however a reverse proxy in front of you web application server, even simply an httpd, can be really useful to avoid people with bad internet connection to act as DDOS attacks :) )
Considering these factors, the number of threads should be usually much more than the cores you have. Even on a simple dual or quad core server, you should configure a few dozens threads at least.
So, what is limiting the number of threads you can configure?
First of all, each thread (used to) consume a lot of resources. Each thread have a stack, which consumes RAM. Moreover, each Thread will actually allocate stuff on the heap to do its work, consuming again RAM, and the act of switching between threads (context switching) is quite heavy for the JVM/OS kernel.
This makes it hard to run a server with thousands of threads "smoothly".
Given this picture, there are a number of techniques (mostly: try, fail, tune, try again) to determine more or less how many threads you app will need:
1) Try to understand where your threads spend time. There are a number of good tools, but even jvisualvm profiler can be a great tool, or a tracing aspect that produces summary timing stats. The more time they spend waiting for something external, the more you can spawn more threads to use CPU during idle times.
2) Determine your RAM usage. Given that the JVM will use a certain amount of memory (most notably the permgen space, usually up to a hundred megabytes, again jvisualvm will tell) independently of how many threads you use, try running with one thread and then with ten and then with one hundred, while stressing the app with jmeter or whatever, and see how heap usage will grow. That can pose a hard limit.
3) Try to determine a target. Each user request needs a thread to be handled. If your average response time is 200ms per "get" (it would be better not to consider loading of images, CSS and other static resources), then each thread is able to serve 4/5 pages per second. If each user is expected to "click" each 3/4 seconds (depends, is it a browser game or a site with a lot of long texts?), then one thread will "serve 20 concurrent users", whatever it means. If in the peak hour you have 500 single users hitting your site in 1 minute, then you need enough threads to handle that.
4) Crash test the high limit. Use jmeter, configure a server with a lot of threads on a spare virtual machine, and see how response time will get worse when you go over a certain limit. More than hardware, the thread implementation of the underlying OS is important here, but no matter what it will hit a point where the CPU spend more time trying to figure out which thread to run than actually running it, and that numer is not so incredibly high.
5) Consider how threads will impact other components. Each thread will probably use one (or maybe more than one) connection to the database, is the database able to handle 50/100/500 concurrent connections? Even if you are using a sharded cluster of nosql servers, does the server farm offer enough bandwidth between those machines? What else will run on the same machine with the web-app server? Anache httpd? squid? the database itself? a local caching proxy to the database like mongos or memcached?
I've seen systems in production with only 4 threads + 4 spare threads, cause the work done by that server was merely to resize images, so it was nearly 100% CPU intensive, and others configured on more or less the same hardware with a couple of hundreds threads, cause the webapp was doing a lot of SOAP calls to external systems and spending most of its time waiting for answers.
Oce you've determined the approx. minimum and maximum threads optimal for you webapp, then I usually configure it this way :
1) Based on the constraints on RAM, other external resources and experiments on context switching, there is an absolute maximum which must not be reached. So, use maxThreads to limit it to about half or 3/4 of that number.
2) If the application is reasonably fast (for example, it exposes REST web services that usually send a response is a few milliseconds), then you can configure a large acceptCount, up to the same number of maxThreads. If you have a load balancer in front of your web application server, set a small acceptCount, it's better for the load balancer to see unaccepted requests and switch to another server than putting users on hold on an already busy one.
3) Since starting a thread is (still) considered a heavy operation, use minSpareThreads to have a few threads ready when peak hours arrive. This again depends on the kind of load you are expecting. It's even reasonable to have minSpareThreads, maxSpareThreads and maxThreads setup so that an exact number of threads is always ready, never reclaimed, and performances are predictable. If you are running tomcat on a dedicated machine, you can raise minSpareThreads and maxSpareThreads without any danger of hogging other processes, otherwise tune them down cause threads are resources shared with the rest of the processes running on most OS.

Resources