Requirement:
I have to design a micro service which performs search query in a sql db multiple times(say 7 calls) along with multiple third party http calls(say 8 calls) in sequential and interleaved manner to complete an order, by saying sequential I mean before next call of DB or third party previous call to DB or third party must be completed as the result of these calls will be used in further third party or search operations in DB.
Resources:
I) CPU: 4 cores(per instance)
II) RAM: 4 GB(per instance)
III) It can be auto scaled upto at max of 4 pods or instances.
IV) Deployment: Open Shift (Own cloud architecture)
V) Framework: Spring Boot
My Solution:
I've created a fixed thread pool of 5 threads(Size of blocking queue is not configured, also there are another 20 fixed pool threads running apart from these 5 threads for creating orders of multiple types i.e. in total there are 25 threads running per instance) using thread pool executor of Java. So when multiple requests are sent to this micro service I keep submitting the job and the JVM by using some scheduling algorithms schedules these jobs and complete the jobs.
Problem:
I'm not able to achieve the expected through put, using above approach the micro service is able to achieve only 3 to 5 tps or orders per second which is very low. Sometimes it also happens that tomcat gets choked and we have to restart services to bring back the system in responsive situation.
Observation:
I've observed that even when orders are processed very slowly by the thread pool executor if I call orders api through jmeter at the same time when things are going slow, these kind of requests which are directly landing on the controller layer are processed faster than the request getting processed by thread pool executor.
My Questions
I) What changes I should make at the architectural level to make through put upto 50
to 100 tps.
II) What changes should be done so that even if traffic on this service increases in
future then the service can either be auto scaled or justification to increase
hardware resources can be given easily.
III) Is this the way tech giants(Amazon, Paypal) solve scaling problems like these
using multithreading to optimise performance of their code.
You can assume that third parties are responding as expected and query optimisation is already done with proper indexing.
Tomcat already has a very robust thread pooling algorithm. Making your own thread pool is likely causing deadlocks and slowing things down. The java threading model is non-trivial, and you likely are causing more problems than you are solving. This is further evidenced by the fact that you are getting better performance relying on Tomcat's scheduling when you hit the controller directly.
High-volume services generally solve problems like this by scaling wide, keeping things as stateless as possible. This allows you to allocate many small servers to solve the solution much more efficiently than a single large server.
Debugging multi-threaded executions is not for the faint of heart. I would highly recommend you simplify things as much as possible. The most important bit about threading is to avoid mutable state. Mutable state is the bane of shared executions, moving memory around and forcing reads through to main memory can be very expensive, often costing far more than savings due to threading.
Finally, the way you are describing your application, it's all I/O bound anyway. Why are you bothering with threading when it's likely I/O that's slowing it down?
When reading Really? iCMS? Really? from this blog, one statement caught my attention:
The concurrent phases are typically long (think seconds and not milliseconds).
If CMS hogged the single hardware thread for several
seconds, the application would not execute during those
several seconds and would in
effect experience a stop-the-world pause.
Which doesn't make sense to me on preemptive operating systems. My assumption is that CMS has one or more collector threads running. Another hypothesis would be that instead of having CMS having dedicated GC threads executing the garbage collection we are talking about making application threads interleave their logic with GC logic (time-multiplexing).
Is this the case? What am I getting wrong here?
Thanks
In HotSpot JVM, the Garbage Collector (including CMS and i-CMS) uses dedicated worker threads.
CMS threads run concurrently with application threads, but they have higher priority: NearMaxPriority. On a single core machine, CMS cycle could indeed make application threads starving. The idea of CMS incremental mode was to make GC voluntarily yield CPU to the application without relying on OS scheduler.
From HotSpot GC Tuning Guide:
Normally, the CMS collector uses one or more processors during the
entire concurrent tracing phase, without voluntarily relinquishing
them. Similarly, one processor is used for the entire concurrent sweep
phase, again without relinquishing it. This overhead can be too much
of a disruption for applications with response time constraints that
might otherwise have used the processing cores, particularly when run
on systems with just one or two processors. Incremental mode solves
this problem by breaking up the concurrent phases into short bursts of
activity, which are scheduled to occur midway between minor pauses.
Note that CMS incremental mode was deprecated long ago in 2012.
Hazelcast in Embedded mode is increasing Context Switching by approximately 46%, Is this expected?
Is there anyway to control or configure the same?
Except the 2 health monitor threads, rest are Hazelcast internal threads and it is not recommended to change anything there unless they are having a drastically negative affect on performance.
What is shown in the attached picture is the total time taken by the threads. 8 threads is not a lot of context switching. You will need to provide more information on how this impacts your application.
If you do not want health monitoring and diagnostics, you can disable it.
Check out https://docs.hazelcast.org/docs/3.12.5/manual/html-single/index.html#threading-model for info on other threads.
I was just playing with threads and see how much CPU they consume. I have checked in two scenarios.
In first scenario I created four threads and started them with infinite loop. Soon those threads consumed my all 4 CPU cores. After checking performance monitor in task manager I found CPU consumption is 100%.
In second scenario when I tried it with web application and in rest controller(using tomcat server 8.5 version) I have run infinite loop. So that if I request url 4 times with browser(with different tabs obviously). My CPU consumption should be 100%. I couldn't see 100% CPU consumption.
Why is there difference?
My Second question is: how would I tune the server thread pool. I have to use more than 4 threads because it might be possible few of them are waiting for IO operation. I am using hibernate as ORM that maintains connection pooling. So how many threads I should use in thread pool as well as connection pool. How would I decide?
We can't answer the first part of your question without seeing your code. But I suspect the problem is in the way that you have implemented the threads in the webapp case. (Because what you report shouldn't happen ...)
The answer to the second part is "trial and error". More specifically:
Make the pool sizes tunable parameters
Develop a benchmark that is representative of your expected system load.
Run benchmark with different settings, measure performance and graph results.
Based on the graph (and other criteria) pick a settings that are the best compromise between performance and resource (e.g. memory) utilization.
Thread pools and connection pools are different, and have different resource implications. The first is (largely) about memory; i.e. thread stacks and temporary objects used by the threads while they are active. The second is (largely) about resources associated with connections (active or idle).
How does one determine the best number of maxSpare, minSpare and maxThreads, acceptCount etc in Tomcat? Are there existing best practices?
I do understand this needs to be based on hardware (e.g. per core) and can only be a basis for further performance testing and optimization on specific hardware.
the "how many threads problem" is quite a big and complicated issue, and cannot be answered with a simple rule of thumb.
Considering how many cores you have is useful for multi threaded applications that tend to consume a lot of CPU, like number crunching and the like. This is rarely the case for a web-app, which is usually hogged not by CPU but by other factors.
One common limitation is lag between you and other external systems, most notably your DB. Each time a request arrive, it will probably query the database a number of times, which means streaming some bytes over a JDBC connection, then waiting for those bytes to arrive to the database (even is it's on localhost there is still a small lag), then waiting for the DB to consider our request, then wait for the database to process it (the database itself will be waiting for the disk to seek to a certain region) etc...
During all this time, the thread is idle, so another thread could easily use that CPU resources to do something useful. It's quite common to see 40% to 80% of time spent in waiting on DB response.
The same happens also on the other side of the connection. While a thread of yours is writing its output to the browser, the speed of the CLIENT connection may keep your thread idle waiting for the browser to ack that a certain packet has been received. (This was quite an issue some years ago, recent kernels and JVMs use larger buffers to prevent your threads for idling that way, however a reverse proxy in front of you web application server, even simply an httpd, can be really useful to avoid people with bad internet connection to act as DDOS attacks :) )
Considering these factors, the number of threads should be usually much more than the cores you have. Even on a simple dual or quad core server, you should configure a few dozens threads at least.
So, what is limiting the number of threads you can configure?
First of all, each thread (used to) consume a lot of resources. Each thread have a stack, which consumes RAM. Moreover, each Thread will actually allocate stuff on the heap to do its work, consuming again RAM, and the act of switching between threads (context switching) is quite heavy for the JVM/OS kernel.
This makes it hard to run a server with thousands of threads "smoothly".
Given this picture, there are a number of techniques (mostly: try, fail, tune, try again) to determine more or less how many threads you app will need:
1) Try to understand where your threads spend time. There are a number of good tools, but even jvisualvm profiler can be a great tool, or a tracing aspect that produces summary timing stats. The more time they spend waiting for something external, the more you can spawn more threads to use CPU during idle times.
2) Determine your RAM usage. Given that the JVM will use a certain amount of memory (most notably the permgen space, usually up to a hundred megabytes, again jvisualvm will tell) independently of how many threads you use, try running with one thread and then with ten and then with one hundred, while stressing the app with jmeter or whatever, and see how heap usage will grow. That can pose a hard limit.
3) Try to determine a target. Each user request needs a thread to be handled. If your average response time is 200ms per "get" (it would be better not to consider loading of images, CSS and other static resources), then each thread is able to serve 4/5 pages per second. If each user is expected to "click" each 3/4 seconds (depends, is it a browser game or a site with a lot of long texts?), then one thread will "serve 20 concurrent users", whatever it means. If in the peak hour you have 500 single users hitting your site in 1 minute, then you need enough threads to handle that.
4) Crash test the high limit. Use jmeter, configure a server with a lot of threads on a spare virtual machine, and see how response time will get worse when you go over a certain limit. More than hardware, the thread implementation of the underlying OS is important here, but no matter what it will hit a point where the CPU spend more time trying to figure out which thread to run than actually running it, and that numer is not so incredibly high.
5) Consider how threads will impact other components. Each thread will probably use one (or maybe more than one) connection to the database, is the database able to handle 50/100/500 concurrent connections? Even if you are using a sharded cluster of nosql servers, does the server farm offer enough bandwidth between those machines? What else will run on the same machine with the web-app server? Anache httpd? squid? the database itself? a local caching proxy to the database like mongos or memcached?
I've seen systems in production with only 4 threads + 4 spare threads, cause the work done by that server was merely to resize images, so it was nearly 100% CPU intensive, and others configured on more or less the same hardware with a couple of hundreds threads, cause the webapp was doing a lot of SOAP calls to external systems and spending most of its time waiting for answers.
Oce you've determined the approx. minimum and maximum threads optimal for you webapp, then I usually configure it this way :
1) Based on the constraints on RAM, other external resources and experiments on context switching, there is an absolute maximum which must not be reached. So, use maxThreads to limit it to about half or 3/4 of that number.
2) If the application is reasonably fast (for example, it exposes REST web services that usually send a response is a few milliseconds), then you can configure a large acceptCount, up to the same number of maxThreads. If you have a load balancer in front of your web application server, set a small acceptCount, it's better for the load balancer to see unaccepted requests and switch to another server than putting users on hold on an already busy one.
3) Since starting a thread is (still) considered a heavy operation, use minSpareThreads to have a few threads ready when peak hours arrive. This again depends on the kind of load you are expecting. It's even reasonable to have minSpareThreads, maxSpareThreads and maxThreads setup so that an exact number of threads is always ready, never reclaimed, and performances are predictable. If you are running tomcat on a dedicated machine, you can raise minSpareThreads and maxSpareThreads without any danger of hogging other processes, otherwise tune them down cause threads are resources shared with the rest of the processes running on most OS.