How do I list or retrieve object instances from a memory generation in c# - memory-leaks

I have searched everywhere, and I haven't found a way to list which objects are not getting disposed by memory collection in a memory generation with c#.
We are having two of our .NET Core services (deployed on aks) eventually die due to too much memory consumption. (The pods in aks evict them after about 5 GB.)
Our administrator has not succeeded trying to find a good tool that can help us identify which objects are not getting deleted through memory collection in Linux.

Related

100% Memory usage on Azure App Service Plan with two Apps - working set used 10gb+

I've got an app service plan with 14gb of memory - it should be plenty for my application's needs. There are two application services running on it, each identical - the private memory consumption of these hovers around 1gb but can spike to 4gb during periods of high usage. One app has a heavier usage pattern than the other.
Lately, during periods of high usage, I've noticed that the heavily used service can become unresponsive, and memory usage stays at 100% in the App Service Plan.
The high traffic service is using 4gb of private memory and starting to massively slow down. When I head over to the /scm.../ProcessExplorer/ page, I can see that the low traffic service has 1gb private memory used and 10gb of 'Working Set'.
As I understand it, on a single machine at least, the working set should be freed up when that memory is needed on another process. Does this happen naturally when two App Services share a single Plan?
It looks to me like the working set on the low-traffic instance is not being freed up to supply the needs of the high-traffic App Service.
If this is indeed the case, the simple fix is to move them to separate App Service Plans, each with 7gb of memory. However this seems like it might potentially be just shifting the problem around - has anyone else noticed similar issues with multiple Apps on a single App Service Plan? As far as I understand it, these shouldn't interfere with one another to the extent that they all need to be separated. Or have I got the wrong diagnosis?
In some high memory-consumption scenarios, your app might truly require more computing resources. In that case, consider scaling to a higher service tier so the application gets all the resources it needs. Other times, a bug in the code might cause a memory leak. A coding practice also might increase memory consumption. Getting insight into what's triggering high memory consumption is a two-part process. First, create a process dump, and then analyze the process dump. Crash Diagnoser from the Azure Site Extension Gallery can efficiently perform both these steps. For more information.
refer Capture and analyze a dump file for intermittent high memory for Web Apps.
In the end we solved this one via mitigation, rather than getting to the root cause.
We found a mitigation strategy to our previous memory issues several months ago, which was just to restart the server each night using a powershell script. This seems to prevent the memory just building up over time, and only costs us a few seconds of downtime. Our system doesn't have much overnight traffic as our users are all based in the same geographic location.
However we recently found that the overnight restart was reporting 'success' but actually failing each night due to expired credentials. Which meant that the memory issues we were having in the question I posted were actually exacerbated by server uptimes of several weeks. Restoring the overnight restart resolved the memory issues we were seeing, and we certainly don't see our system ever using 10gb+ again.
We'll investigate the memory issues if they rear their heads again. KetanChawda-MSFT's suggestion of using memory dumps to analyse the memory usage will be employed for this investigation when it's needed.

Why does w3wp memory keeps increasing?

I am on a medium instance which has 3GB of RAM. When I start my webapp the w3wp process starts with say 80MB. I notice that the more time passes this goes up and up.... Now I took a memory dump of the process when it was 570MB and the site was running for 5 days, to see whether there were any .NET objects which were consuming a lot but found out that the largest object was 18MB which were a set of string objects.
I am not using any cache objects since I'm using redis for my session storage, and in actual fact the dump showed that there was nothing in the cache.
Now my question is the following... I am thinking that since I have 3GB of memory IIS will retain some pages in memory (cached) so the website is faster whenever there are requests and that is the reason why the memory keeps increasing. What I'm concerned is that I am having some memory leak in some way, even if I am disposing all EntityFramework objects when being used, or any other appropriate streams which need to be disposed. When some specific threshold is reached I am assuming that old cached data which was in memory gets removed and new pages are included. Am I right in saying this?
I want to point out that in the past I had been on a small instance and the % never went more than 70% and now I am on medium instance and the memory is already 60%.... very very strange with the same code.
I can send memory dump if anyone would like to help me out.
There is an issue that is affecting a small number of Web Apps, and that we're working on patching.
There is a workaround if you are hitting this particular issue:
Go to Kudu Console for your app (e.g. https://{yourapp}.scm.azurewebsites.net/DebugConsole)
Go into the LogFiles folder. If you are running into this issue, you will have a very large eventlog.xml file
Make that file readonly, by running attrib +r eventlog.xml
Optionally, restart your Web App so you have a clean w3wp
Monitor whether the usage still goes up
The one downside is that you'll no longer get those events generated, but in most cases they are not needed (and this is temporary).
The problem has been identified, but we don't have an ETA for the deployment yet.

What is normal Azure WaIISHost.exe Memory Usage?

I have recently installed NewRelic server monitoring to our Azure web role. The role is a small instance. We are on OSv4 (Win 2012 R2) using 2.2 Service Runtime.
Looking at memory usage I notice that WallSHost.exe (which I understand to be Azure related) it reported as consuming 219Mb (down from a peak of 250Mb) via NewRelic. Is that a lot of memory for it? Can I reduce it? Just seemed like a lot to be taking up.
CPU usage seems to aperiodically spike at about 4% for it. However CPU isn't really an issue as my instance rarely goes above 50%
First off, why do you care how much memory a process is taking up? All of that memory will be paged out to disk, and assuming it isn't being paged back in regularly then all it does is take up page file size which is usually irrelevant.
The WaIISHost process runs your role entry point code (OnStart, Run, StatusCheck, Changing, etc) and is typically implemented in WebRole.cs. If you want to reduce the memory size of this process then you can reduce the amount of memory being loaded by your role entry point code.
See http://blogs.msdn.com/b/kwill/archive/2011/05/05/windows-azure-role-architecture.aspx for more information about the WaIISHost.exe process and what it does.

Lucene .NET 3 and Azure: unable to complete indexing

I'm trying to use the latest version of Lucene.NET (applied to my project via NuGet) in an Azure web role.
The original web application (MVC4) has been created to be able to run in a traditional web host or in Azure: in the former case it uses a file-system based Lucene directory, writing the Lucene index in an *App_Data* subdirectory; in the latter case it uses the AzureDirectory installed from NuGet (Lucene.Net.Store.Azure).
The documents being indexed either come from the web, or from some files uploaded locally, as some of the collections to index are closed and rather small. To start with, I am trying with one of these small closed sets, counting about 1,000 files for a couple of GB.
When I index this set locally in my development environment, the indexing is completed and I can
successfully use it for searching. When instead I try indexing on Azure, it fails to complete and I have no clue about the exact problem: I added both Elmah and NLog for logging any problem, but nothing gets registered in Elmah or in the monitoring tools configured from the Azure console. Only once I got an error from NLog, which was an out of memory exception thrown by Lucene index writer at the end of the process, when committing the document additions. So I tried:
explicitly setting a very low RAM buffer size calling
SetRAMBufferSizeMB(10.0) on my writer.
committing multiple times, e.g. every 200 documents added.
removing any call to Optimize after the indexing completes (see also http://blog.trifork.com/2011/11/21/simon-says-optimize-is-bad-for-you/ on this).
targeting either the file system or the Azure storage.
upscaling the web role VM up to the large size.
Most of these attempts fail at different stages: some times the indexing stops after 1-200 documents, some other times it gets up to 8-900; when I'm lucky, it even completes. This happened only for file system, and never for Azure storage: I never had luck to complete indexing with this.
The essential part of my Lucene code is very simple:
IndexWriter writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED);
writer.SetRAMBufferSizeMB(10.0);
where directory is an instance of FSDirectory or AzureDirectory, according to the test being executed. I then add documents with their fields (using UpdateDocument, as one of the fields represents a unique ID). Once finished, I call writer.Dispose(). If required by the test, I call writer.Commit() several times before the final Dispose; this usually helps the system go on before hitting the memory exception.
Could anyone suggest a hint to be able to complete my indexing?
The error seems to hold the key: Lucene is running out of memory while indexing.
From my perspective you have two options:
Allocate more memory to the RAM buffer, which actually improves your performance (refer to the Lucene documentation on the subject) or,
Reduce the number of documents between each commit.
You could try unit testing your indexing job under several different configurations (more RAM vs. less documents) until you come up with a suitable combination for your application.
In the other hand, if the problem is strictly with the Azure server you might want to try to use a local file cache instead of a RAM cache.

Is a Windows Azure worker role instance a whole VM?

When I run a worker role instance on Azure, is it a complete VM running in a shared host (like EC2)? Or is it running in a shared system (like Heroku)?
For example, what happens if my application starts requesting 100 GB of memory? Will it get killed off automatically for violation of limits (รก la Google App Engine), or will it just exhaust the VM, so that the Azure fabric restarts it?
Do two roles ever run in the same system?
It's a whole VM, and the resources allocated are based directly on the size of VM you choose, from 1.75GB (Small) to 14GB (XL), with 1-8 cores. There's also an Extra Small instance with 768MB RAM and shared core. Full VM size details are here.
With Windows Azure, your VM is allocated on a physical server, and it's the fabric's responsibility of finding such servers to properly allocate all of your web or worker role instances. If you have multiple instances, this means allocating these VMs across fault domains.
With your VM, you don't have to worry about being killed off if you try allocating too much in the resource dep't: it's just like having a machine, and you can't go beyond what's there.
As far as two roles running on the same system: Each role has instances, and with multiple instances, as I mentioned above, your instances are divided into fault domains. If, say, you had 4 instances and 2 fault domains, it's possible that you may have two instances on the same rack (or maybe same server).
I ran a quick test to check this. I'm using a "small" instance that has something like 1,75 gigabytes of memory. My code uses an ArrayList to store references to large arrays so that those arrays are not garbage collected. Each array is one billion bytes and once it is allocated I run a loop that sets each element to zero and then another loop to check that each element is zero to ensure that memory is indeed allocated from the operating system (not sure if it matters in C#, but it indeed mattered in C++). Once the array is created, written to and read from, it is added to the ArrayList.
So my code successfully allocated five such arrays and the attempt to allocate the sixth one resulted in System.OutOfMemoryException. Since 5 billion bytes plus overhead is definitely more that 1,75 gigabytes of physical memory allocated to the machine I believe this proves that page file is enabled on the VM and the behavior is the same as on usual Windows Server 2008 with the limitations induced by the machine where it is running.

Resources