DotNet Core 2.1 hoarding memory in Linux - linux

I have a websocket server that hoards memory during days, till the point that Kubernetes eventually kills it. We monitor it using prometheous-net.
# dotnet --info
Host (useful for support):
Version: 2.1.6
Commit: 3f4f8eebd8
.NET Core SDKs installed:
No SDKs were found.
.NET Core runtimes installed:
Microsoft.AspNetCore.All 2.1.6 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All]
Microsoft.AspNetCore.App 2.1.6 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 2.1.6 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
But when I connect remotely and take a memory dump (using createdump), suddently the memory drops... without the service stopping, restarting or loosing any connected user. See the green line in the picture.
I can see in the graphs, that GC is collecting regularly in all generations.
GC Server is disabled using:
<PropertyGroup>
<ServerGarbageCollection>false</ServerGarbageCollection>
</PropertyGroup>
Before disabling GC Server, the service used to grow memory way faster. Now it takes two weeks to get into 512Mb.
Other services using ASP.NET Core on request/response fashion do not show this problem. This uses Websockets, where each connection last usually around 10 minutes... so I guess everything related with the connection survives till Gen 2 easily.
Note that there are two pods, showing the same behaviour, and then one (the green) drops suddenly in memory ussage due the taking of the memory dump.
The pods did not restart during the taking of the memory dump:
No connection was lost or restarted.
Heap:
(lldb) eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x00007F8481C8D0B0
generation 1 starts at 0x00007F8481C7E820
generation 2 starts at 0x00007F852A1D7000
ephemeral segment allocation context: none
segment begin allocated size
00007F852A1D6000 00007F852A1D7000 00007F853A1D5E90 0xfffee90(268430992)
00007F84807D0000 00007F84807D1000 00007F8482278000 0x1aa7000(27947008)
Large object heap starts at 0x00007F853A1D7000
segment begin allocated size
00007F853A1D6000 00007F853A1D7000 00007F853A7C60F8 0x5ef0f8(6222072)
Total Size: Size: 0x12094f88 (302600072) bytes.
------------------------------
GC Heap Size: Size: 0x12094f88 (302600072) bytes.
(lldb)
Free objects:
(lldb) dumpheap -type Free -stat
Statistics:
MT Count TotalSize Class Name
00000000010c52b0 219774 10740482 Free
Total 219774 objects
Is there any explanation to this behaviour?

The problem was the connection to RabbitMQ. Because we were using sort lived channels, the "auto-reconnect" feature of the RabbitMQ.Client was keeping a lot of state about dead channels. We switched off this configuration, since we do not need the "perks" of the "auto-reconnect" feature, and everything start working normally. It was a pain, but we basically had to setup a Windows deploy and do the usual memory analysis process with Windows tools (Jetbrains dotMemory in this case). Using lldb is not productive at all.

Disclaimer: I am no .NET Wizard.
But you should do two things to go with Kubernetes best practices:
Define sensible resource limits for your app. If the app does not need more than 200MB memory define a resource limit to prevent the app from consuming all available host memory. But be aware that the Unix API to get available memory is not capable of processing the cgroup the process has and always outputs the host memory no matter what your cgroup says.
Tell your app what this resource limit is. It seems like your app does not "feel the need" to free memory as there is plenty. Almost all applications, and frameworks as well, have a switch to define the maximum memory to be consumed. Tell your app this limit, and it will "see" memory pressure and perform a full GC (what I guess could be the problem here)

Related

IBM WebSphere - JVM size continue to grow and GC not helping

I have a RESTful service deployed on IBM WAS 8.5.5.x, the application is functioning correctly and performing well most of the time, except occasionally (approx. once/twice a week) the JVM size is not going down, I don't see the typical 'saw tooth' pattern for JVM heap utilization.
When the heap was high, I forced a Heap Dump to look at what's holding up JVM space and all I see are related to IBM classes, can't figure out what my application could be doing wrong.
what could be using/holding these high no. of HashMap within IBM WAS ?
Leak Suspect in IBM Memory Analyzer
HashMap$Node - expanded to show its children
com.ddtek.jdbc - is a JDBC compliant library lib location

Node.js killed due to out of memory

I'm running Node.js on a server with only 512MB of RAM. The problem is when I run a script, it will be killed due to out of memory.
By default the Node.js memory limit is 512MB. So I think using --max-old-space-size is useless.
Follows the content of /var/log/syslog:
Oct 7 09:24:42 ubuntu-user kernel: [72604.230204] Out of memory: Kill process 6422 (node) score 774 or sacrifice child
Oct 7 09:24:42 ubuntu-user kernel: [72604.230351] Killed process 6422 (node) total-vm:1575132kB, anon-rss:396268kB, file-rss:0kB
Is there a way to get rid of out of memory without upgrading the memory? (like using persistent storage as additional RAM)
Update:
It's a scraper which uses node module request and cheerio. When it runs, it will open hundreds or thousands of webpage (but not in parallel)
If you're giving Node access to every last megabyte of the available 512, and it's still not enough, then there's 2 ways forward:
Reduce the memory requirements of your program. This may or may not be possible. If you want help with this, you should post another question detailing your functionality and memory usage.
Get more memory for your server. 512mb is not much, especially if you're running other services (such as databases or message queues) which require in-memory storage.
There is the 3rd possibility of using swap space (disk storage that acts as a memory backup), but this will have a strong impact on performance. If you still want it, Google how to set this up for your operating system, there's a lot of articles on the topic. This is OS configuration, not Node's.
Old question, but may be this answer will help people. Using --max-old-space-size is not useless.
Before Nodejs 12, versions have an heap memory size that depends on the OS (32 or 64 bits). So, following documentations, on 64-bit machines that (the old generation alone) would be 1400 MB, far away from your 512mb.
From Nodejs12, heap size take care of system RAM; however Nodejs' heap isn't the only thing in memory, especially if your server isn't dedicated to it. So set the --max-old-space-size permit to have a limit regarding the old memory heap, and if your application comes closer, the garbage collector will be triggered and will try to free memory.
I've write a post about how I've observed this: https://loadteststories.com/nodejs-kubernetes-an-oom-serial-killer-story/

RavenDb Raven.Server.exe service memory leaking

I've started using ravenDb not too long ago, but I've noticed, that after installation, the ravenDb process is evenly getting higher memory usage (increasing around 0.1MB in a second).
If I leave my computer for a longer period of time - the process builds up with a high usage of memory.
I am not hosting any databases at all (clean reinstall), just using plain ravenDb installation as a windows service.
What could be causing this and how can I avoid it?

What is normal Azure WaIISHost.exe Memory Usage?

I have recently installed NewRelic server monitoring to our Azure web role. The role is a small instance. We are on OSv4 (Win 2012 R2) using 2.2 Service Runtime.
Looking at memory usage I notice that WallSHost.exe (which I understand to be Azure related) it reported as consuming 219Mb (down from a peak of 250Mb) via NewRelic. Is that a lot of memory for it? Can I reduce it? Just seemed like a lot to be taking up.
CPU usage seems to aperiodically spike at about 4% for it. However CPU isn't really an issue as my instance rarely goes above 50%
First off, why do you care how much memory a process is taking up? All of that memory will be paged out to disk, and assuming it isn't being paged back in regularly then all it does is take up page file size which is usually irrelevant.
The WaIISHost process runs your role entry point code (OnStart, Run, StatusCheck, Changing, etc) and is typically implemented in WebRole.cs. If you want to reduce the memory size of this process then you can reduce the amount of memory being loaded by your role entry point code.
See http://blogs.msdn.com/b/kwill/archive/2011/05/05/windows-azure-role-architecture.aspx for more information about the WaIISHost.exe process and what it does.

iis Cpu is on 95% usage with very little users - on production

I have a web site and I am using iis as my web server. I noticed that on production server, the cpu reaches 95% usage pretty fast with very little users. this behaviour I don't see on my developement server. I am using visual studio to develop and iis as my local web server as well.
How much big traffic you have on production comparing to development server? How their parameters compare? Before starting a deep analysis of the application itself, I would identify all the infrastructure and environmental differences. Sometime such problems happens because of some other software, like antivirus software running in the background...
Nevertheless, because it sounds rather as a application problem, I would first check Event Viewer for errors. Then I would start from monitoring a few Performance Counters to correlate % Processor Time counter with Current Connections, Available Memory, # of Exceps Thrown / sec, % Time in GC and so on. This kind of behavior usually has a reason from the list:
excessive loops usage due to some logic error, like calling the same service again and again, trying to load or parse malfunctioned file etc. This can be analyzed with dump analysis (look below).
high CPU usage due to Garbage Collector - when memory usage is extensive (or there is a memory leak even) GC may start to consume more and more CPU fighting with the memory shortage. You will see this with memory-related performance counters.
a considerable amount of exceptions thrown (for example due to some environmental problems like network unavailability, production data difference) can also consume a lot of CPU. Event Viewer and exception-related performance counters (as they can be handled silently by your application) should be a indicator here.
To further analyze your application, I suggest to make a full memory dump during high CPU usage. You can do that with Debug Diag tool. Please refer this IIS troubleshooting guide for details.

Resources