We are running a web API hosted in IIS 10 on an 8 core machine with 16 GB Memory and running Windows 10, and throwing a load of say 100 to 200 requests per second through JMeter on the server.
Individual transactions are taking less than 500 milliseconds. When we throw the load initially, IIS threads grow up to around 150-160 mark (monitored through resource monitor and Performance monitor) and throughput increases up to 22-24 transactions per second but throughput and number of threads stop to grow beyond this point even though the CPU usage is less than 40 per cent and we have enough physical memory also available at the peak, the resource monitor does not show any choking at the network or IO level.
The web API is making calls to the Oracle database (3-4 select calls and 2-3 inserts/updates).
We fail to understand what is stopping IIS to further grow its thread pool to process more requests in parallel while all the resources including processing power, memory, network etc are available.
We have placed many performance counters as well, there is no queue build-up (that's probably because jmeter works in synchronous mode)
Also, we have tried to set the min and max threads settings through machine.config as well as ThreadPool.SetMin and Max threads APIs but no difference was observed and seems like those setting are not taking any effect.
Important to mention that we are using synchronous calls/operations (no asnch and await). Someone has advised to convert all our blocking IO calls e.g. database calls to asynchronous mode to achieve more throughput but my understanding is that if threads cant be grown beyond this level then making async calls might not help or may indeed negatively impact the throughput. Since our code size is huge, that would be a very costly activity in terms of time and effort and we dont want to invest in it till we are sure that it would really help. If someone has anything to share on these two problems, pls do share.
Below is a screenshot of the permanence monitor.
So, i have a nodejs process which is, when scaled consumes around 5-8gb of RAM. It is running within the docker container. launching with the arg --max-old-space-size=12192 to increase the node process limit.
The memory consumption is OK, since i try to use a dedicated server (AMD EPYC CPU, 64GB memory) in place of the horizontal scaling with AWS or other cloud provider, because it is 10x cheaper in case if i can make it work on dedicated server (most of expenses for AWS/Google Cloud goes for network traffic, while the VDS have unlimited. The network side is already optimised with the use of GraphQL and minimising the amount of requests). The process itself processes huge amount of data in memory, in multithreaded fashion. There is no further significant optimisation from the side of the process code itself.
When the process memory consumption reaches 3Gb+, it is significantly slowing down. Docker is not limiting the container resources. The server itself is running on 5-10% load in terms of memory and CPU. SSD driver -> low drive load (low amount of I/O on the server side).
I guess re-writing the app to golang for example might improve it significantly, however that is really a lot of work.
Anything can be done on the server setup / nodejs app side to prevent slowing down?
Thanks!
Solved by horizontal scaling within a single VDS with docker.
As jfriend00 noticed, the nature of the problem might be with the garbage collection - personally have no other guess.
Hey guys I am having a big problem and i need some advice .
I have a Dedicated server with those informations :
Atom C2750 8/8t 2,4 / 2,6 GHz
16 GB RAM DDR3 1600MHz
12TB
500Mbps Bandwidth
List item
1Gbps Network Burst
I am running a website using Nodejs where users can download high volume files .
The website evolved rapidly and I am having 10K users per day and an average of 1K concurrent users (downloads).
The problem is the server is getting lower and lower download speed on client's side so I have added a throttle to the downloads to 800Kb/s , it did help a bit but the problem remains the same what should I ?
Thanks
You have to figure out where your worst bottleneck is. You will have to design some measurements and tests and maybe some calculations to help determine where the bottleneck is.
Here are some possibilities:
Total server bandwidth over your server ethernet connection to the Internet. If you have 1K users all trying to download something and you have 500Mbps total bandwidth, then you're only going to get 0.5Mbps or 500Kbps per user. At that point, you need to either shrink the data, reduce the number of users or increase your server bandwidth.
Server CPU. This should be easy to detect. Check your CPU utilization. If your node.js process is using 100% of a CPU, then you are CPU bound and you either need a faster computer or if you have a multi-CPU processor, you can cluster your server on the same host to get more CPUs working for you. If you don't have a multi-CPU process, then get one or cluster across multiple servers (the idea is you need more CPUs). Though I have no idea if you are CPU bound (I suspect you're more likely bandwidth bound), this Atom C2750 has 8 cores so it would be good for clustering, but each core is not particularly fast compared to other Intel CPUs.
Network card. It's possible that your network card could be holding you back and not fully saturating your bandwidth. For example, if you only had a 100Mbps network connection to your server, then that's the max bandwidth you can use. If you think you should have a 1Gbps network connection to your server, then you need to make sure you actually are getting that fast a link.
FYI, the 1Gbps Network Burst probably doesn't help you much if you have lots of users downloading stuff over a longer period of time. That is most useful for a sudden and short peak of activity, not for a continuous high load.
I have 6 "identical" x64 IIS6 Servers containing the same content.
Occasionally on one or more servers we observer slow running, it can take around 20 seconds to load a simple html page.
When this happens we can see an increase in threads to 300 (normally 80) and the non paged pool 1000k (normally 200k).
Can't see much else that is different, i.e. Other Processes, Disk I/O.
The network counters appear to shifting less data, but I’s say this was symptomatic.
Anyone know what would cause a high Non paged pool? This is a x64 server, maybe I’m looking in the wrong place
How does one determine the best number of maxSpare, minSpare and maxThreads, acceptCount etc in Tomcat? Are there existing best practices?
I do understand this needs to be based on hardware (e.g. per core) and can only be a basis for further performance testing and optimization on specific hardware.
the "how many threads problem" is quite a big and complicated issue, and cannot be answered with a simple rule of thumb.
Considering how many cores you have is useful for multi threaded applications that tend to consume a lot of CPU, like number crunching and the like. This is rarely the case for a web-app, which is usually hogged not by CPU but by other factors.
One common limitation is lag between you and other external systems, most notably your DB. Each time a request arrive, it will probably query the database a number of times, which means streaming some bytes over a JDBC connection, then waiting for those bytes to arrive to the database (even is it's on localhost there is still a small lag), then waiting for the DB to consider our request, then wait for the database to process it (the database itself will be waiting for the disk to seek to a certain region) etc...
During all this time, the thread is idle, so another thread could easily use that CPU resources to do something useful. It's quite common to see 40% to 80% of time spent in waiting on DB response.
The same happens also on the other side of the connection. While a thread of yours is writing its output to the browser, the speed of the CLIENT connection may keep your thread idle waiting for the browser to ack that a certain packet has been received. (This was quite an issue some years ago, recent kernels and JVMs use larger buffers to prevent your threads for idling that way, however a reverse proxy in front of you web application server, even simply an httpd, can be really useful to avoid people with bad internet connection to act as DDOS attacks :) )
Considering these factors, the number of threads should be usually much more than the cores you have. Even on a simple dual or quad core server, you should configure a few dozens threads at least.
So, what is limiting the number of threads you can configure?
First of all, each thread (used to) consume a lot of resources. Each thread have a stack, which consumes RAM. Moreover, each Thread will actually allocate stuff on the heap to do its work, consuming again RAM, and the act of switching between threads (context switching) is quite heavy for the JVM/OS kernel.
This makes it hard to run a server with thousands of threads "smoothly".
Given this picture, there are a number of techniques (mostly: try, fail, tune, try again) to determine more or less how many threads you app will need:
1) Try to understand where your threads spend time. There are a number of good tools, but even jvisualvm profiler can be a great tool, or a tracing aspect that produces summary timing stats. The more time they spend waiting for something external, the more you can spawn more threads to use CPU during idle times.
2) Determine your RAM usage. Given that the JVM will use a certain amount of memory (most notably the permgen space, usually up to a hundred megabytes, again jvisualvm will tell) independently of how many threads you use, try running with one thread and then with ten and then with one hundred, while stressing the app with jmeter or whatever, and see how heap usage will grow. That can pose a hard limit.
3) Try to determine a target. Each user request needs a thread to be handled. If your average response time is 200ms per "get" (it would be better not to consider loading of images, CSS and other static resources), then each thread is able to serve 4/5 pages per second. If each user is expected to "click" each 3/4 seconds (depends, is it a browser game or a site with a lot of long texts?), then one thread will "serve 20 concurrent users", whatever it means. If in the peak hour you have 500 single users hitting your site in 1 minute, then you need enough threads to handle that.
4) Crash test the high limit. Use jmeter, configure a server with a lot of threads on a spare virtual machine, and see how response time will get worse when you go over a certain limit. More than hardware, the thread implementation of the underlying OS is important here, but no matter what it will hit a point where the CPU spend more time trying to figure out which thread to run than actually running it, and that numer is not so incredibly high.
5) Consider how threads will impact other components. Each thread will probably use one (or maybe more than one) connection to the database, is the database able to handle 50/100/500 concurrent connections? Even if you are using a sharded cluster of nosql servers, does the server farm offer enough bandwidth between those machines? What else will run on the same machine with the web-app server? Anache httpd? squid? the database itself? a local caching proxy to the database like mongos or memcached?
I've seen systems in production with only 4 threads + 4 spare threads, cause the work done by that server was merely to resize images, so it was nearly 100% CPU intensive, and others configured on more or less the same hardware with a couple of hundreds threads, cause the webapp was doing a lot of SOAP calls to external systems and spending most of its time waiting for answers.
Oce you've determined the approx. minimum and maximum threads optimal for you webapp, then I usually configure it this way :
1) Based on the constraints on RAM, other external resources and experiments on context switching, there is an absolute maximum which must not be reached. So, use maxThreads to limit it to about half or 3/4 of that number.
2) If the application is reasonably fast (for example, it exposes REST web services that usually send a response is a few milliseconds), then you can configure a large acceptCount, up to the same number of maxThreads. If you have a load balancer in front of your web application server, set a small acceptCount, it's better for the load balancer to see unaccepted requests and switch to another server than putting users on hold on an already busy one.
3) Since starting a thread is (still) considered a heavy operation, use minSpareThreads to have a few threads ready when peak hours arrive. This again depends on the kind of load you are expecting. It's even reasonable to have minSpareThreads, maxSpareThreads and maxThreads setup so that an exact number of threads is always ready, never reclaimed, and performances are predictable. If you are running tomcat on a dedicated machine, you can raise minSpareThreads and maxSpareThreads without any danger of hogging other processes, otherwise tune them down cause threads are resources shared with the rest of the processes running on most OS.