.net 2.0 aspx app / IIS6 creating a silly number of threads in w3wp.exe process app pool.
The app has been isolated to its own app pool with the following settings:
RECYCLING
recycle worker processses (in minutes) : 870
recycle worker process (no of requests): (not ticked)
recycle worker processes at the following times: 00:00
max virtual memory: (not ticked)
max used memory (in mb): 1000mb (1gb)
PERFORMANCE
shutdown worker processes after being idle for (time in mins): 20
limit the kernal request queue (number of requests): 1000
enable cpu monitoring (%): 85
refresh cpu usage numbers (in mins): 5
action performed when cpu usage exceeds maximum cpu uses: NO ACTION (keeps
sessions)
max number of worker processes: 1
HEALTH
enable pinging (checked)
ping worker process every (seconds) : 30
enable rapid fail protection (checked)
failures: 5
time period (in mins):5
start time limit - worker process must startup within (seconds): 90
shutdown time limit - worker process must shutdown within (seconds): 90
Normal running would see the w3wp.exe process utilise 300MB ram and 50
therads. When my problem occurs the thread count slowly increases to 10,000
, ram to 1GB before the threads are knocked back to 0. The w3wp.exe process
is NOT shutdown and my users are not logged out (crucially), ie they keep
their session and dont have to log back in . Altough the standard 50 threads
are killed in amongst the 10, 000 rouge threads.
1) Can an expert offer any pros/cons on the above app pool settings ?
2) The "max used mem" setting appears to be doing the trick to
automatiaclly handle this issue (by killing the threads, keeping the session
alive , but can someone explain why ? ... i take it threads are unrelated to
the session).
The app uses server based sessions but we store a local cookie for authentication.
Threads
10k threads is insanely high, and your threads are spending more time hopping on and off the processor than doing actual work. aka thrashing.
EDIT: I'm assuming here that it's a .NET web application.
Does your application use a ThreadPool or BackgroundWorkers? It seems like you'd have to be using some mechanism other than IIS's standard thread entourage (which is only around 4 per processor) to reach 10k threads.
Memory
Each thread requires memory to keep track of in addition to the memory it utilizes for work, so by sheer volume of threads, you are probably reaching the 1G limit.
Session (I will survive!)
The application is probably setup to store Session State in persistent storage or the session state service. With this being the case, the worker process can safely be recycled without losing user state information. If the session state was configured (in the Web.config) as In-Proc, then session state would be lost when the worker process recycled.
Work process recycling
One other thing of note, before a worker process dies, another worker process is setup and started to take it's place. It's somewhere in this process that you're probably seeing the w3wp.exe process (either old or new) with 0 threads.
BackgroundWorkers are like rabbits
If your threads are performing work that lasts longer than 1 second (1/2 second really), don't use BackgroundWorkers. Unless you change the ThreadPool's max threads (which is NOT recommended, as this can screw up deeper functionality within .NET), there's not a hard (enough) limit on the number of BackgroundWorkers that can run concurrently. It would be better to use a Producer Consumer Queue model in that case.
Check out this site. It's an awesome resource on concurrent programming with lots of models and examples.
Related
I have a web server that is run by a fleet of 50 hosts. One request to the server can result in 10 subsequent network calls, all of which can be done in parallel.
My idea is to create an executor service thread pool with 10 threads, so that each host can make the network calls in parallel.
However, there seems to be a problem with this. What if I get 1000 requests at once? And suppose a single host is tasked with 20 requests at the same time? Does this mean that the host will only have 10 threads available, and thus all 20 requests will compete with each other for the 10 threads? This seems WORSE than without thread pooling, in which case each request lives on its own thread and there's effectively 20 threads running at once.
Thus, it appears as if executor service is very dangerous in this situation, and has potential to actually make my application slower when in spiky volume. Am I understanding the situation correctly? If so, what is a way to solve it? Should I have each request CREATE the 10 threads manually, rather than attempting to share from a pool and introduce that entanglement between different requests?
You seem to be conflating thread pooling to mean easier thread creation. But, its primary aim to reduce the thread requirements of an application, because threads get reused. So, if the first request ends up starting 10 threads, when the second request comes in, some of them may be available to be reused. So, the second request may not end up creating another 10 additional threads, but maybe 5. And the third request may not create any new thread at all. So, based on this, at a time, your service may need a thread-pool with only 15 threads. The advantage of the thread-pool in this case is these 15 threads will get created shortly after the requests start coming in, and will get reused till it dies, and your application, its runtime, and the underlying OS will not waste time creating and destroying threads, allocating stacks for them, etc.
I am using Spring Boot and Java 8
For calling an api with 1 employee id it takes 1 miliseconds .So if I am calling API 100,000 times with 100,000 times with different employee id
why it is taking hours and not 100,000*1 millis i.e just 1.6 minutes
SpringBoot uses a thread pool to manage the workload for working on tasks. Thus, the max worker threads, is set as 200 by default.
Though this is a good number, the number of threads that can work in parallel depends upon the CPU time slicing and availability of backend resources. Assuming, that the backend resources are unlimited, the throughput will solely depend upon the CPU time available for each thread. In a multi-core CPU, it would be the maximum cores available and are able to serve the embedded tomcat container.
As Spring is a blocking framework, for a normal quad-core single CPU environment (assuming that all 4 cores are able to serve), this no. is 4. This means a maximum of 4 requests can be served in parallel. Rest all are likely to be queued and taken up when the next CPU slice is available.
Mathematical analysis:
Time taken by the API to process 1 request = 1ms
Time is taken by the API to process 4 concurrent requests = 1ms
Time taken by the API to process 1000,000 concurrent requests = 1000000 / 4 = 250 secs
This is just the best-case scenario. In real scenarios, all the CPUs are less likely to provide a time slice at the same instant. So, you are likely to see differences.
In such scenarios, it would be better to use the Spring Reactive than the conventional Spring framework in SpringBoot.
The API you're pulling from could be limit the amount of requests you can pull in a certain period of time. If you don't have access to the API source, I would attempt to run larger and larger numbers of pulls until you notice it takes significantly longer.
Well the time needed to get response from a web server depends on it's hosting machine and environment.
Usually in a single machine a limited number of thread inside a thread pool and each request is bound with one thread. So while making concurrent request every time certain number of request is processed within the available threads and rest awaits in the queue.
This can be the reason that your requests are taking a while to get response or even some of them can get a request time out.
In my application when i execute 2000 virtual users in thread(No: of threads) for 1 http request my response time was 30 sec , when i changed no of threads to 500 and instead of 1 http request I put 4 copies of same http request, RESPONSE TIME WAS 3 SEC . What is the difference? Is it the right way to reduce no of threads and increasing replicas of request? please help
Note: In each reqest i have changed the user id also
In terms of HTTP Request samplers your test must behave exactly like real browser behaves so artificially adding more HTTP Requests may (and will) break the logic of your workload (if it is in place).
In your case high response time seems to be caused by incorrect JMeter configuration, i.e. if JMeter is not properly configured to high load it simply will not be able to fire requests fast enough resulting in increased response time while your server will just be idle.
2000 threads sounds like quite a big number so make sure to:
Follow JMeter Best Practices
Follow recommendations from 9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure especially these:
Increase JVM Heap size allocated for JMeter
Run your test in non-GUI mode
Remove all the Listeners from the Test Plan
Monitor baseline OS health metrics on the machine where JMeter is running (CPU, RAM, Disk, Network usage). You can use JMeter PerfMon Plugin for this. If you will notice the lack of any of the aforementioned resources, i.e. usage will start exceeding, say, 90% of total available capacity - JMeter is not acting at full speed and you will need to consider Distributed Testing.
To extend #Dmitri T answer, If your server response 10 times more on load, as you execute 2000 virtual users, it means there's a bottleneck that you need to identify.
Read JMeter's Best Practices
consider running multiple non-GUI JMeter instances on multiple machines using distributed mode
Also check Delay Thread creation until needed checkbox in Thread Group
JMeter has an option to delay thread creation until the thread starts sampling, i.e. after any thread group delay and the ramp-up time for the thread itself. This allows for a very large total number of threads, provided that not too many are active concurrently.
And set Thread Group Ramp-up to 2000
Start with Ramp-up = number of threads and adjust up or down as needed.
I'm using Managed Executor Service to implement a process manager which will process tasks in the background upon receiving an JMS message event. Normally, there will be a small number of tasks running (maybe 10 max) but what if something happens and my application starts getting hundred of JMS message events. How do I handle such event?
My thought is to limit the number of threads if possible and save all the other messages to database and will be run when thread available. Thanks in advance.
My thought is to limit the number of threads if possible and save all the other messages to database and will be run when thread available.
The detailed answer to this question depends on which Java EE app server you choose to run on, since they all have slightly different configuration.
Any Java EE app server will allow you to configure the thread pool size of your Managed Executor Service (MES), this is the number of worker threads for your thread pool.
Say you have a 10 worker threads, and you get flooded with 100 requests all at once, the MES will keep a queue of requests that are backlogged, and the worker threads will take work off the queue whenever they finish work until the queue is empty.
Now, it's fine if work goes to the queue sometimes but if overall your work queue increases more quickly than your worker threads can take work off the queue, you will run into problems. The solution to this is to increase your thread pool size otherwise the backlog will get overrun and your server will run out of memory.
what if something happens and my application starts getting hundred of JMS message events. How do I handle such event?
If the load on your server will be so sporadic that tasks need to be saved to a database, it seems that the best approach would be to either:
increase thread pool size
have the server immediately reject incoming tasks when the task backlog queue is full
have clients do a blocking wait for the server task queue to be not full (I would only advise this option if client task submission is in no way connected to user experience)
I am trying to fork worker clusters to a maximun of 10, and only if the working load increases. Can it be done?
I have tried with strong-cluster-control's setSize, but I can't find an easy way of forking automatically (if many requests are being done then fork, for example), or closing/"suiciding" forks (maybe with a timeOut if nothing is being done, like in this answer)
This is my repo's main file at GitHub
Thank you in advance!!
I assume that you already have some idea as to how you would like to spread your load so I will not include details about that and instead focus on the interprocess communication required for this.
Notifying the master
To send arbitrary data to the master, you can use process.send() from a worker. The way I would go about this is probably something along these steps:
The application is started
Minimum amount of workers are spawned
Each worker will send the master a request message every time it receives a new request, via process.send()
The master keeps track of all the request events from all workers
If the amount of request events increases above a predefined threshold (i.e. > 100 requests/s) it spawns a new worker
If the amount of request events decreases below a predefined threshold it asks one of the workers to stop processing new requests and close itself gracefully (note that it should not simply kill the process to avoid interrupting ongoing requests)
Main point is: Do not focus on time - focus on rate. In an application that is supposed to handle tens to thousands of requests per second, your setTimout() (the task of which might be to kill the worker if it has been idle for too long) will never fire because Node.js evenly distributes your load across your workers - you could start with one worker, but once you reach your maximum you will never drop to one worker again under continuous load even if there is only one request per second.
It should be noted that it is counterproductive to spawn more workers than the amount of CPU cores you have at your disposal. It might, however, be beneficial to start with a single worker and incrementally increase the amount to all cores as load increases.