How to stop the memory leak in "com.newrelic.agent.Transaction" - memory-leaks

I have integrated newRelic-api to my service, just for monitoring. But in the past few days the service has been going down because of OOM. I read a post in newRelic's forum's which basically says that its a problem with my service and asks to increase the -xmx. Mine is a tiny service which doesn't need much memory. So I have put -xmx to 3GB - https://discuss.newrelic.com/t/relic-solution-troubleshooting-java-agent-memory-leaks-and-outofmemoryerror-issues-with-eclipse-memory-analyzer-tool/59596
Below I have added the dependency in the service and all I do is add newRelic.jar to the JavaOptions in my DockerFile -javaagent:tmp/newrelic/newrelic.jar
There are multiple questions here, since this is a tool which is good to have, but not critical for me, I can't spend much time debugging the issue, so if this is a known issue please let me know if there is any configuration that I can add or remove from newrelic.yml. I am currently using default settings common: &default_settings in it.
Should I downgrade newRelic since I am not using any of their advanced features? Or should I upgrade to the latest newRelic versions?
<dependency>
<groupId>com.newrelic.agent.java</groupId>
<artifactId>newrelic-api</artifactId>
<version>5.7.0</version>
</dependency>
This is what Eclipse MAT shows

Related

Debugging an Out of Memory Error in Node.js

I'm currently working on a Node.js project and my server keeps running out of memory. It has happened 4 times in the last 2 weeks, usually after about 10,000 requests. This project is live and has real users.
I am using
NodeJS 16
Google Cloud Platform's App Engine (instances have 2048mb of memory)
Express as my server framework
TypeORM as database ORM (database is postgres hosted on separate GCP SQL instance)
I have installed the GCP profiling tools and have captured the app running out of memory, but I'm not quite sure how to use the results. It almost looks like there is a memory leak in the _handleDataRow function within the pg client library. I am currently using version 8.8.0 of the library (8.9.0 was just released a few weeks ago and doesn't mention fixing any memory leaks in the release notes).
I'm a bit stuck with what I should do at this point.
Any suggestions or advice would be greatly appreciated! Thanks.
Update: I have also cross-posted to reddit and someone there helped me determine that issue is related to large queries with many joins. I was able to reproduce the issue, and will report back here once I am able to solve it.
When using App Engine, a great place to start looking for "why" a problem occurred in your app is through the Logs Explorer. Particularly, if you know the time-frame of when the issues started escalating or when the crash occurred.
Although based on your Memory Usage graph, it's a slow leak. So a top-to-bottom approach of your back-end is really necessary to try and pin-point the culprit. I would go through the whole stack and look for things like Globals that are set and not cleaned up, promises that are not being returned, large result-sets from the database that are bottle-necking the server, perhaps from a scheduled task.
Looking at the 2pm - 2:45pm range on the right-hand of the graph, I would narrow the Logs Explorer down to that exact time-frame. Then I would look for the processes or endpoints that are being utilized most frequently in that time-frame as well as the ones that are taking the most memory to get a good starting point.

Understanding DebugDiag Tool

I have been trying to understand what is the cause of high memory usage from processes in the windows server I have. I installed that tool DebugDiag 1.2 to try to find the problem.
Here is what runs in my server:
I have the IIS server which has a decent number of pool applications (68 pool applications). For each pool application there are at least 4 applications.
Recently, I have faced problems related to high memory usage, causing the server to work at 97% of memory usage or higher.
It was working fine when I took this printscreen below. However, the memory usage will easily get higher.
Task Manager:
With that being said, I have been trying to understand how to use the tool "DebugDiag1.2" from microsoft to find something (part of the source code, an sql procedure) that might help me locate what is causing the problem.
I read that we can't limit the memory for each IIS pool application, so I guess the solution would be trying to optmize the application. But first I need to know where to start.
I hope someone can help me out.

Jenkins running at very high CPU usage

I recently upgraded from Jenkins 1.6 to 2.5. After I did this, I noticed very high CPU usage, sometimes over 300% (there are only 4 cores, so I don't think it could go over 400%). I'm not sure where to begin debugging this, but here's a thread dump and some screenshots from top/htop
htop
top:
As it turned out, my issue was that several jobs had thousands of old builds. This was fine in Jenkins 1.6 but it's a problem in 2.5 (I guess maybe Jenkins tries to load all the builds into memory when you view the job overview page). To fix it, I just deleted most of the old builds from the problem jobs using this strategy and then reloaded jenkins. Worked like a charm!
I also set the "discard old builds" plugin to keep only the 50 most recent builds, to prevent this from happening again.
Whenever a request comes in, Jenkins will spawn some threads to serve the request. After upgrading Jenkins, it might have invoked at high throttle at that time. Plz check the CPU and memory usage of Jenkins server while the following scenarios :
Jenkins is idle and no other apps are running on the server.
Scheduled a build and no other apps are running on the server.
And compare the behaviors which could help you out to determine whether Jenkins or running jenkins in parallel with other apps are really making trouble.
As #vlp said, try to monitor the jenkins application via JVisualVM with Jstad configuration to hook in. Refer this link to Configure JvisualVM with Jstad.
I have noticed a couple of reasons for abnormal CPU usage with my Jenkins install on Windows 7 Ultimate.
I had recently upgraded from v2.138 to v2.140 plus added a few additional plugins. I started noticing a problem with the Jenkins java executable taking up to 60% of my CPU time every time a job would trigger. None of the jobs were CPU bound, just grabbing data from external servers, so it didn't make any sense. It was fixed with a simple restart of the Jenkins service. I assume the upgrade just didn't finish cleanly.
Java Garbage Collection was throwing errors and hogging the CPU when running with the default memory settings. It was probably overkill, but I went wild and upped the Java Heap Space for Jenkins from the default 256mb to 4gb; which solved this problem for me.See this solution for instructions:
https://stackoverflow.com/a/8122566/4479786
2.5 seems to be a development release, while 1.6 is their Long Term Support version. Thus it seems logical that you should expect some regressions when using the bleeding edge version. The bounty on this question is proof that other users are experiencing this as well. The solution is to report a bug on the Jenkins bug tracker. You can temporarily downgrade to the known good version for now.
Try passwing following argument to jenkins:
-Dhudson.util.AtomicFileWriter.DISABLE_FORCED_FLUSH=true
as mentioned here: https://issues.jenkins-ci.org/browse/JENKINS-52150

tomcat7 tuning on linux : how to know what should be changed?

I have big enterprise JAVA application , running on several machines under tomcat7.
There are different performance problems such slow response , server hangs etc.
I want to try to play with different params like maxThread , maxConnection ,acceptCount and so on .
But before change them, how can I check that I run out of connections for example and I need to increase it ? Or everything else , like acceptCount that should be increased ?
Typically, Apache Tomcat performance issues are with the underlying JavaVM configuration, in my experience those are mainly with the size of the permGen, and other memory settings. I have been able to troubleshoot quite a few of them using VisualVM, which visualizes a lot of the JVM memory ops. Would also highly recommend JMeter.
IMHO maxThread and other Tomcat-specific parameters have rarely been the source of application performance issues, but it's the JVM settings where most issues are.
Start with minimum of these settings:
-Xms1024M -Xmx2048m -XX:MaxPermSize=1024m
I would recommend to find the problem before starting "fixing" things.
There are several applications to monitor your servers and check where the problems are. You can try appdynamics, newrelic, ruxit, or any other application monitoring product. (Some have free version offers that comes handy)
Then you search for your bottlenecks, they can be anywhere, server, database, network, jvm, ... depending on your application and your architecture.
And once you find the problem, you can start fixing it.
Good luck!

Meteor Node Process CPU Usage Nears 100%

I'm having trouble with my Meteor app when it gets to its peak amount of traffic (peak for this is nothing, 1k visits, maybe 2,500 pageviews in a day). CPU usage spikes and never recovers, so I've taken to using Nodetime to monitor usage and I've been reloading the process (forever restart) to get things back to normal.
I'm fairly new to profiling, so finding the underlying cause has me at a loss for where to start. I'm fairly certain it has to do with my app's server code, but the profiling seems to point to the Fibers module as a "hotspot" which I understand aids in making my server code synchronous.
Below is a snippet from the profiling results. I hope someone can guide me in the right direction in troubleshooting this!
While I don't have a specific answer to your question, I have experience dealing with CPU issues for our production meteor app for so I can give you a list of things to investigate.
Upgrade to the latest version of meteor and the appropriate node version (see the changelog). As of this writing that's meteor 0.8.2 and node 0.10.28.
Read this and this article. The latter makes a great point that you really should always try to delay activation of subscriptions until you need them. In particular you may not need to publish anything for users who are not logged in. In my experience, meteor CPU problems have everything to do with subscriptions.
Be careful with observe and observeChanges. These are expensive and are easy to abuse. In particular:
Make sure you are calling stop() on your handles when they are no longer needed (consider using a package like publish-with-relations so this is done for you).
Fetch only the collections and fields that you absolutely need. Observe works by continually diffing objects (requires lots of CPU). The fewer and smaller objects you have, the less there is to compute.
Consider using smart-collections before it is retired. Use oplog tailing - this can make for a night and day difference in performance and CPU usage in your app.
Consider making some things not reactive (also mentioned in the articles above). For us that was a big win. We had one extremely expensive join that was used on two frequently accessed pages on the site. When it got to the point where the CPU was pegged at 100% about every 30 minutes I gave up on reactivity for that element and just did the join on the server and shipped the data to the client via a method call. I also created a server-side expiring cache for these results and stored them by user (special thanks to Matt DeBergalis for this suggestion).
Do a preventative nightly restart. I have a cron job that tells forever to restart our app once a day in the middle of the night. That brings the CPU down from ~10% to 1%. This seems like black magic, but the fact that the CPU usage changes after a reset leads me to believe this is a good idea.
Updated thoughts (1/13/14)
We migrated to oplog tailing as soon as it was available (meteor 0.7) and that made a big difference. Note that in order to get access to the oplog, you'll probably need to either host your own db or run a dedicated instance on the hosting provider of your choice. I'd also recommend adding the facts package to actually tell if its working.
There was a memory leak discovered in publish-with-relations, and as of this writing the atmosphere version (v0.1.5) hasn't been bumped to reflect these changes. If you are using it in production, I strongly recommend checking out the HEAD version and running it locally.
We stopped doing nightly restarts a couple of weeks ago. So far everything has been fine (fingers crossed).
Updated thoughts (7/2/14)
A few months ago we switched over to using an Elastic Deployment on mongohq. It's affordable, the performance has been great, and they even have a blog post which tells you how to enable oplog tailing.
I'd strongly recommend checking out kadira to help diagnose performance issues in your app. Also check out the academy articles which have a number of good tips in them.
I'm also having this problem. Actually there is an issue with 0.6.6.1, I run meteor --release 0.6.6 and the cpu is back to normal now.

Resources