How to diagnose a memory leak in an Azure WebJob - azure

I suspect that I may have a memory leak in a WebJob, but I'm not certain how to definitively prove that I do. I suspect that I can find the information by going to the /processExplorer in the Kudu management console, start a profile, and download the results. However, I am not entirely sure if this is the route to go or what I should do with the file once I get it.
Any suggestions would be appreciated.

I can find the information by going to the /processExplorer in the Kudu management console, start a profile, and download the results
After you get the .diagsession file, you could open it with Visual Studio. You will see the CPU usage trend, but memory data is not include in this file. To easily identify whether there is a memory leak, Steps below are for your reference.
Refresh the Process Explorer on kudu manually and timely(For example once per 30s).
After you refresh the Process Explorer, you need to record the private memory and virtual memory which will be used for diagnose memory leak. By click the Properties button after the Process name, you will see the private memory and virtual memory of current Process.
After you have finished recording enough data, you need to compare the grow speed of virtual memory and private memory. If both virtual memory and private memory grow fast or virtual memory grows faster than private memory, it means there is a memory leak.
If you need more information of memory leak, you could download the memory dump file from Process Properties page and view the detail information of it using WinDbg. You also could analyze the dump file online using Diagnostics as a Service for Azure Web Sites. For more information about how to use it. Link below is for your reference.
DaaS – Diagnostics as a Service for Azure Web Sites

Related

100% Memory usage on Azure App Service Plan with two Apps - working set used 10gb+

I've got an app service plan with 14gb of memory - it should be plenty for my application's needs. There are two application services running on it, each identical - the private memory consumption of these hovers around 1gb but can spike to 4gb during periods of high usage. One app has a heavier usage pattern than the other.
Lately, during periods of high usage, I've noticed that the heavily used service can become unresponsive, and memory usage stays at 100% in the App Service Plan.
The high traffic service is using 4gb of private memory and starting to massively slow down. When I head over to the /scm.../ProcessExplorer/ page, I can see that the low traffic service has 1gb private memory used and 10gb of 'Working Set'.
As I understand it, on a single machine at least, the working set should be freed up when that memory is needed on another process. Does this happen naturally when two App Services share a single Plan?
It looks to me like the working set on the low-traffic instance is not being freed up to supply the needs of the high-traffic App Service.
If this is indeed the case, the simple fix is to move them to separate App Service Plans, each with 7gb of memory. However this seems like it might potentially be just shifting the problem around - has anyone else noticed similar issues with multiple Apps on a single App Service Plan? As far as I understand it, these shouldn't interfere with one another to the extent that they all need to be separated. Or have I got the wrong diagnosis?
In some high memory-consumption scenarios, your app might truly require more computing resources. In that case, consider scaling to a higher service tier so the application gets all the resources it needs. Other times, a bug in the code might cause a memory leak. A coding practice also might increase memory consumption. Getting insight into what's triggering high memory consumption is a two-part process. First, create a process dump, and then analyze the process dump. Crash Diagnoser from the Azure Site Extension Gallery can efficiently perform both these steps. For more information.
refer Capture and analyze a dump file for intermittent high memory for Web Apps.
In the end we solved this one via mitigation, rather than getting to the root cause.
We found a mitigation strategy to our previous memory issues several months ago, which was just to restart the server each night using a powershell script. This seems to prevent the memory just building up over time, and only costs us a few seconds of downtime. Our system doesn't have much overnight traffic as our users are all based in the same geographic location.
However we recently found that the overnight restart was reporting 'success' but actually failing each night due to expired credentials. Which meant that the memory issues we were having in the question I posted were actually exacerbated by server uptimes of several weeks. Restoring the overnight restart resolved the memory issues we were seeing, and we certainly don't see our system ever using 10gb+ again.
We'll investigate the memory issues if they rear their heads again. KetanChawda-MSFT's suggestion of using memory dumps to analyse the memory usage will be employed for this investigation when it's needed.

how to analyze memory leaks for "azure web apps" (PaaS)

I am looking to analyze memory leaks for the web app deployed in azure.
Referring to following urls
https://blogs.msdn.microsoft.com/kaushal/2017/05/04/azure-app-service-manually-collect-memory-dumps/
https://blogs.msdn.microsoft.com/kaushal/2017/05/04/azure-app-service-manually-collect-memory-dumps/
we were able to extract memory dump and analyze them. but since we were not able to inject the LeakTrack dll / enable memory leaks tracking when collecting the dump, we are getting message that leak analysis was not performed due to not injecting the dll on performing memory analysis.
please suggest how to find out memory leakages from analyzing the dump in this scenario.
As you said, DebugDiag currently can't create reflected process dumps, and ProcDump doesn't have a way to inject the LeakTrack dll to track allocations. So, we could get around by working with both tools.
We can simply go to the Processes tab in DebugDiag, right click the process, and chose "Start Monitory for Leaks."
We can do that by scripting DebugDiag and ProcDump to do the individual tasks we've set out for them.
Once we have the PID of the troubled process, we can use a script to inject the LeakTrack dll into the process. With the PID known and the script created, we can launch DebugDiag from a command line.
Such as:
C:\PROGRA~1\DEBUGD~1\DbgHost.exe -script "your LeakTrack dll path" -attach your PID
For more detail, you could refer to this article.
Here is also the reference case.

Why does w3wp memory keeps increasing?

I am on a medium instance which has 3GB of RAM. When I start my webapp the w3wp process starts with say 80MB. I notice that the more time passes this goes up and up.... Now I took a memory dump of the process when it was 570MB and the site was running for 5 days, to see whether there were any .NET objects which were consuming a lot but found out that the largest object was 18MB which were a set of string objects.
I am not using any cache objects since I'm using redis for my session storage, and in actual fact the dump showed that there was nothing in the cache.
Now my question is the following... I am thinking that since I have 3GB of memory IIS will retain some pages in memory (cached) so the website is faster whenever there are requests and that is the reason why the memory keeps increasing. What I'm concerned is that I am having some memory leak in some way, even if I am disposing all EntityFramework objects when being used, or any other appropriate streams which need to be disposed. When some specific threshold is reached I am assuming that old cached data which was in memory gets removed and new pages are included. Am I right in saying this?
I want to point out that in the past I had been on a small instance and the % never went more than 70% and now I am on medium instance and the memory is already 60%.... very very strange with the same code.
I can send memory dump if anyone would like to help me out.
There is an issue that is affecting a small number of Web Apps, and that we're working on patching.
There is a workaround if you are hitting this particular issue:
Go to Kudu Console for your app (e.g. https://{yourapp}.scm.azurewebsites.net/DebugConsole)
Go into the LogFiles folder. If you are running into this issue, you will have a very large eventlog.xml file
Make that file readonly, by running attrib +r eventlog.xml
Optionally, restart your Web App so you have a clean w3wp
Monitor whether the usage still goes up
The one downside is that you'll no longer get those events generated, but in most cases they are not needed (and this is temporary).
The problem has been identified, but we don't have an ETA for the deployment yet.

What can cause an Azure Cloud Service's Disk Write/Read to spike unexpectedly?

We have a simple worker that picks up messages from a queue and runs a few queries. We don't ever write to the disk ourselves but we do have diagnostics turned on in the roll settings.
Once in a while the the disk write/read spikes and the worker becomes unresponsive. What is the role trying to write to the disk? On the surface it doesn't appear to be a crash dump because those tables and blobs are still empty. Are our diagnostics configured improperly?
Here's an example of a spike we saw recently. It was writing for over an hour!
Try enabling remote desktop support in the role configuration in the Azure portal.
Once the problem resurfaces, log in via RDP and start Resource Monitor. The Disk tab should be able to pinpoint disk IO usage by process and by file.
Enabling storage logs should tell you exactly what are those reads and writes on the disk.
So, this is a very open ended question and is very hard to predict. Your Cloud Services are ultimately Windows machines and what's happening on Windows can (usually) only be monitored by something inside Windows.
It is very possible that a Windows Update related task was running. Those may cause spikes in disk R/W
We typically advice users who use CloudMonix and want to know what causes CPU/Memory/other issues to install CloudMonix agent on their machines as it captures running processes, their memory and CPU utilizations and can show process that caused a spike. Usually spikes in disk R/W are correlated to spikes in CPU usage.
Note, if the spike was caused by your own code, you'll need to use profilers such as RedGate's ANTS performance profiler or Jetbrains dotTrace or some such to determine the ultimate root cause.
HTH

What is normal Azure WaIISHost.exe Memory Usage?

I have recently installed NewRelic server monitoring to our Azure web role. The role is a small instance. We are on OSv4 (Win 2012 R2) using 2.2 Service Runtime.
Looking at memory usage I notice that WallSHost.exe (which I understand to be Azure related) it reported as consuming 219Mb (down from a peak of 250Mb) via NewRelic. Is that a lot of memory for it? Can I reduce it? Just seemed like a lot to be taking up.
CPU usage seems to aperiodically spike at about 4% for it. However CPU isn't really an issue as my instance rarely goes above 50%
First off, why do you care how much memory a process is taking up? All of that memory will be paged out to disk, and assuming it isn't being paged back in regularly then all it does is take up page file size which is usually irrelevant.
The WaIISHost process runs your role entry point code (OnStart, Run, StatusCheck, Changing, etc) and is typically implemented in WebRole.cs. If you want to reduce the memory size of this process then you can reduce the amount of memory being loaded by your role entry point code.
See http://blogs.msdn.com/b/kwill/archive/2011/05/05/windows-azure-role-architecture.aspx for more information about the WaIISHost.exe process and what it does.

Resources