I would like to know what happens to the worker process when the server runs at full capacity.
Im asking because sometimes my web application loads very slow and throw me server error (without any code).
Related
Goal
Determine the cause of the sporadic lock ups of our web application running on IIS.
Problem
An application we are running on IIS sporadically locks up throughout the day. When it locks up it will lock up on all workers and on all load balanced instance.
Environment and Application
The application is running on 4 different Windows Server 2016 machines. The machines are load balanced using ha-proxy using a round robin load balancing scheme. The IIS application pools this website is hosted in are configured to have 4 workers each and the application it hosts is a 32-bit application. The IIS instances are not using a shared configuration file but the application pools for this application are all configured the same.
This application is the only application in the IIS application pool. The application is an ASP.NET web API and is using .NET 4.6.1. The application is not creating threads of its own.
Theory
My theory for why this is happening is that we have requests that are coming in that are taking ~5-30 minutes to complete. Every machine gets tied up servicing these requests so they look "locked up". The company rolled their own logging mechanism and from that I can tell we have requests that are taking ~5-30 minutes to complete. The team responsible for the application has cleaned up many of these but I am still seeing ~5 minute requests in the log.
I do not have access to the machines personally so our systems team has gotten memory dumps of the application when this happens. In the dumps I generally will see ~50 threads running and all of them are in our code. These threads will be all over our application and do not seem to be stopped on any common piece of code. When the application is running correctly the dumps have 3-4 threads running. Also I have looked at performance counters like the ASP.NET\Requests Queued but it never seems to have any requests queued. During these times the CPU, Memory, Disk and Network usage look normal. Using windbg none of the threads seem to have a high CPU time other than the finalizer thread which as far as I know should live the entire time.
Conclusion
I am looking for a means to prove or disprove my theory as to why we are locking up as well as any metrics or tools I should look at.
So this issue came down to our application using query in stitch on a table with 2,000,000 records in it to another table. Memory would become so fragmented that the Garbage Collector was spending more time trying to find places to put objects and moving them around than it was running our code. This is why it appeared that our application was still working and why their was no exceptions. Oddly IIS would time out the requests but would continue processing the threads.
I'm currently prototyping a very light weight TCP server based on a custom protocol. It's written in C++ and using Boost Asio for cross-platform sockets. When I monitor the process on Windows it only eats about <3MB in memory, barely grows with many concurrent connections (I tested up to 8).
I built the same server for Linux and put it on a 128MB + 64MB swap VPS for testing. It runs fine and my testings are successful, but the process gets killed in the middle of the night by kernel. I checked the logs and it was out of memory (OOM score was 0).
I highly doubt my process has memory leaks. I checked my server logs and only 1 person has connected to it the previous night, which should not result in OOM. The process sleeps for majority of the time as it only does processing if Boost's async handler wakes up the main thread to process the packet.
What I did notice is that the default VM allocation for the process is a whooping 89MB (using top command). And as soon as I make a connection it is doubled to about 151MB. My VPS has about 100MB free ram and all 64MB swap while running the server, so the the only thing I could think of is that the process tried to allocate more virtual memory going over the ~164MB remaining and went beyond the physical limit and triggered the OOM-Killer.
I've since used the ulimit command to limit the VM allocation to 30MB and it seems to be working fine, but I'll have to wait a while to see if it actually helps the issue.
My question is how does Linux determine how much VM to allocate for a process? Is there a compiler/linker setting I can use to reduce the default VM reservation? Is my reasoning correct or are there other reasons for the OOM?
Recently in our production environment, one of our web site which is running at IIs caused 100% CPU usage.
We figured out that issue was about long running MySQL queries. I know that the ASP.NET engine uses thread pool for processing HTTP requests and when I got a dump file for the worker process, thread count was about 100~. I can't really understand why long running DB queries cause CPU overhead. If anyone can explain with details I'll be glad.
can someone explain for me this sentence from msdn ?
Idle time-out can be helpful in the following situations:
The server has a heavy processing load.
is the idle timeout for the w3wp.exe process or each user that's connected to the website has a nested process inside the w3wp and this is the idle time for this nested process? if it's the idle process for w3wp as a whole , then what does msdn sentence mean??
It prevents a application pool worker process from hanging around when no users are hitting the web pages hosted by the worker process. When the users stop hitting the web site for a while, IIS stops the process.
This can be annoying if you have an expensive setup/teardown process for the application, such as populating a cache.
The idle timeout is per application pool. You can observe the running pools as well as in IIS in VS by Tools > Attach to Process then I click the Process column header to bring any running w3wp.exe's to the top.
You can set the timeout for each app pool in IIS7+ in Advanced Settings... Process Model section.
I ran into an issue with an IIS web app shutting down an idle worker process! The next request would then have to re-initialize the application, leading to delays.
I disabled the IIS shutdown of idle worker processes on the application pool to resolve this. Are there any issues associated with turning this off? If the process is leaking memory, I imagine it is nice to recycle the process every now and then.
Are there any other benefits to having this process shutdown?
I'm assuming that you're referring to IIS 6.
Instead of disabling shutdown altogether, maybe you can just increase the amount of time it waits before killing the process. The server is essentially conserving resources - if your server can stand the resource allocation for a process that mostly sits around doing nothing, then there isn't any harm in letting it be.
As you mentioned, setting the auto-recycling of the process on a memory limit would be a good idea, if the possibility of a memory leak is there.