Too many TimerQueueTimer objects getting allocated and consuming high memory - memory-leaks

I have a Web API running that connects with a Service Bus to write messages to Topics and deployed in Service Fabric. I see unusually high memory usage and most of the memory is consumed by TimerQueueTimer objects, although I have not explicitly initialized any Timer.
When I trace back to see how these timers are created, I see below namespaces:
Microsoft.ApplicationInsights.Extensibility.Implementation.TelemetryModules.<Modules>k__BackingField ->
Microsoft.ApplicationInsights.Extensibility.Implementation.SnapshottingList<Microsoft.ApplicationInsights.Extensibility.ITelemetryModule>.Collection ->
System.Collections.Generic.List<Microsoft.ApplicationInsights.Extensibility.ITelemetryModule>._items ->
Microsoft.ApplicationInsights.Extensibility.ITelemetryModule[16] at [4] ->
Microsoft.ApplicationInsights.Extensibility.PerfCounterCollector.PerformanceCollectorModule.timer ->
Microsoft.ApplicationInsights.Extensibility.PerfCounterCollector.Implementation.Timer.Timer.timer ->
System.Threading.Timer.m_timer
--------------------------------------------
Microsoft.Azure.ServiceBus.TopicClient.<ServiceBusConnection>k__BackingField ->
Microsoft.Azure.ServiceBus.ServiceBusConnection.<ConnectionManager>k__BackingField ->
Microsoft.Azure.Amqp.FaultTolerantAmqpObject<Microsoft.Azure.Amqp.AmqpConnection>.taskCompletionSource ->
System.Threading.Tasks.TaskCompletionSource<Microsoft.Azure.Amqp.AmqpConnection>.m_task ->
System.Threading.Tasks.Task<Microsoft.Azure.Amqp.AmqpConnection>.m_result ->
Microsoft.Azure.Amqp.AmqpConnection.heartBeat ->
Microsoft.Azure.Amqp.AmqpConnection+HeartBeat+TimedHeartBeat.heartBeatTimer ->
System.Threading.Timer.m_timer ->
System.Threading.TimerHolder.m_timer ->
System.Threading.TimerQueueTimer.m_next ->
--------------------------------------------
Microsoft.ServiceFabric.Services.Communication.Client.CommunicationClientCache<ServiceClient.HttpCommunicationClient>.cacheCleanupTimer ->
System.Threading.Timer.m_timer ->
System.Threading.TimerHolder.m_timer
Not sure why these timers are never collected and all of these are lying in Gen2. Over a period the memory spikes and the application becomes unresponsive. Any leads to when such Timer objects are never collected?

We were seeing a similar issue recently, primarily with TimerQueueTimer and Application Insights.
We upgraded Microsoft.ApplicationInsights.AspNetCore from 2.3.0 to 2.8.2 and that seems to have resolved the issue. Most likely it was #690, fixed in v2.4.0.

Related

Configuring Azure Application Insights with Terraform

I would like to configure Alert Rules in Azure Application Insights. Is this something that can be done using Terraform or i have to do it through the portal?
I would like to be alerted on the below things :
Whenever the average available memory is less than 200 megabyte (Signal Type = Metrics)
Whenever the average process cpu is greater than 80 (Signal Type = Metrics)
Whenever the average server response time is greater than 5000 milliseconds (Signal Type = Metrics)
Whenever the count of failed requests is greater than 5 (Signal Type = Metrics)
Failure Anomalies - prodstats-masterdata-sandbox (Signal Type = Smart Detector)
You can do it using terraform azurerm providers azurerm_monitor_metric_alert and azurerm_monitor_smart_detector_alert_rule.

ekg-core/GHC RTS : bogus GC stats when running on Google Cloud Run

I have two services deployed on Google cloud infrastructure; Service 1 runs on Compute Engine and Service 2 on Cloud Run and I'd like to log their memory usage via the ekg-core library (https://hackage.haskell.org/package/ekg-core-0.1.1.7/docs/System-Metrics.html).
The logging bracket is similar to this :
mems <- newStore
registerGcMetrics mems
void $ concurrently io (loop mems)
where
loop ms = do
m <- sampleAll ms
... (lookup the gauges from m and log their values)
threadDelay dt
loop ms
I'm very puzzled by this: both rts.gc.current_bytes_used and rts.gc.max_bytes_used gauges return constant 0 in the case of Service 2 (the Cloud Run one), even though I'm using the same sampling/logging functionality and build options for both services. I should add that the concurrent process in concurrently is a web server, and I expect the base memory load to be around 200 KB, not 0B.
This is about where my knowledge ends; could this behaviour be due to the Google Cloud Run hypervisor ("gVisor") implementing certain syscalls in a non-standard way (gVisor syscall guide : https://gvisor.dev/docs/user_guide/compatibility/linux/amd64/) ?
Thank you for any pointers to guides/manuals/computer wisdom.
Details :
Both are built with these options :
-optl-pthread -optc-Os -threaded -rtsopts -with-rtsopts=-N -with-rtsopts=-T
the only difference is that Service 2 has an additional flag -with-rtsopts=-M2G since Cloud Run services must work with 2 GB of memory at most.
The container OS in both cases is Debian 10.4 ("Buster").
Thinking a bit longer about this, this behaviour is perfectly reasonable in the "serverless" model; resources(both CPU and memory) are throttled down to 0 when the service is not processing requests [1], which is exactly what ekg picks up.
Why logs are printed out even outside of requests is still a bit of a mystery, though ..
[1] https://cloud.google.com/run/docs/reference/container-contract#lifecycle

Remote Procedure Call (RPC) errors only thrown when a user is connected via Remote Desktop

I'm trying to resolve the following Remote Procedure Call (RPC) error that occurs when an unattended Windows Service application attempts to create a new Excel file via Interop.
Exception: System.Runtime.InteropServices.COMException (0x800706BE): The remote procedure call failed. (Exception from HRESULT: 0x800706BE)
at Microsoft.Office.Interop.Excel.Workbooks.Add(Object Template)
Here's the code throwing the exception. For context, I included a few lines before and after the culprit code.
' Create Excel objects
Dim objExcelApp As New Excel.Application
Dim objExcelBooks As Excel.Workbooks = objExcelApp.Workbooks 'This is the specific line it fails on every time.
Dim objExcelBook As Excel.Workbook = objExcelBooks.Add
Dim objExcelSheet As Excel.Worksheet = objExcelApp.ActiveSheet
Although this code worked without any issues for the past 7 years, this error started happening immediately upon migrating the service from Windows Server 2008 R2 to Windows Server 2016. Through much trial, error, and hair pulling, I finally discovered that this error is only thrown when someone is connected to the server via Remote Desktop. If no one is connected via Remote Desktop, everything works flawlessly.
What I can't figure out is why does this error only occur when someone is connected to the server via Remote Desktop?
Here are a few things I've tried so far.
Added service account to the Administrators group.
Component Services -> Computers -> My Computer -> Properties -> COM Security -> Added user to all permissions.
Component Services -> Computers -> My Computer -> DCOM Config -> Microsoft Excel Application -> Properties -> Security -> Added user to all permissions.
Component Services -> Computers -> My Computer -> DCOM Config -> Windows Management and Instrumentation -> Properties -> Security -> Added user to all permissions.
Excel -> File -> Options -> Advanced -> Checked "Ignore other applications that use Dynamic Data Exchange (DDE)"
Tried switching from early binding to late binding.
Inspired by Caius Jard's suggestions, I eventually settled on ClosedXML. It's fast, lightweight, free to use for commercial purposes, and doesn't rely on the installation of Excel or the Excel.Interop library to function. The best part is not only did it resolve my issue, fewer lines of code are required to accomplish the same things.

APP Engine Google Cloud Storage - Error 500 when downloading a file

I'm having an error 500 when I download a JSON file (2MB aprox) using the nodejs-storage library. The file gets downloaded without any problem, but once I render the view and pass the file as parameter the app crashes "The server encountered an error and could not complete your request."
file.download(function(err, contents) {
var messages = JSON.parse(contents);
res.render('_myview.ejs', {
"messages": messages
})
}
I am using the App Engine Standard Environment and have this further error detail:
Exceeded soft memory limit of 256 MB with 282 MB after servicing 11 requests total. Consider setting a larger instance class in app.yaml
Can someone give me hint? Thank you in advance.
500 error messages are quite hard to troubleshoot due to the all the possible scenarios that could go wrong with the App Engine instances. A good way to start debugging this type of errors with App Engine would be to go to the Stackdriver logging, query for the 500 error messages click on the expander arrow and check for the specific error code. In the specific case of the Exceeded soft memory limit... error message in the App Engine Standard environment my suggestion would be to choose an instance class better suited to your application's load.
Assuming you are using automatic scaling you could try to use an F2 instance class (which has a higher Memory and CPU limit than the default F1) and start from there. Adding or modifying the instance_class element of your app.yaml file to instance_class: F2 would suffice to accomplish the instance class suggested, or you could change your app.yaml file to use an instance better suited to your application's load.
Notice that increasing the instance class directly affects your billing and you can use the Google Cloud Platform Pricing Calculator to get an estimate of the costs associated to using a different instance class for your App Engine application.

How to access counters from Reporting task in Nifi 1.2.0

IN Nifi 1.4.0, in order to access counters from a scripted reporting task, you can do something like:
context.eventAccess.controllerStatus.processGroupStatus.each { pg ->
pg.processorStatus.each { ps ->
ps.counters.each { counter
System.out.println("${counter.key} -> ${counter.value})
}
}
That's because ProcessorStatus API exposes:
Map<String,Long> getCounters()
However, I'm with NiFi 1.2.0.
Which does not have this method for class ProcessorReportingTask.
I'm desperately searching for a way to access the counters from a ReportingTask (so, through a ReportingContext, because that's what's bound within the script).
Because my ReportingTask is reporting metrics to our graphite server.
Any Idea?
I know I could access the metrics through REST APIs.
But then I would vanish completely the goal of my ScriptedReporingTask, and I would need to setup an additional piece of software to collect these metrics from outside. While the ReponrtingTask will just run from NiFi system.
I believe you will have to upgrade to 1.4.0 in order to get access to the getCounters method, there is no other way I know of through a ReportingContext which is why it was added in 1.4.0 as part of this JIRA:
https://issues.apache.org/jira/browse/NIFI-106

Resources