We are using:
spring-cloud-starter-zuul:jar:1.2.6.RELEASE
spring-boot-starter-web:jar:1.4.1.RELEASE
embedded tomcat version: 8.0.43
Open Jdk JRE 1.8.0_101
spring-cloud-starter-eureka:jar:1.2.6.RELEASE
After running load tests on our zuul gateway, we are observing the increase in memory being currently consumed:
pcf metrics:
I did a heapdump before and after load test. The biggest increase in not collected objects we noted:
heap dump after load test:
It looks like those objects are responsible (or are) for requests for some static data, which lives in one of the micro services behind zuul proxy.
Our zuul is vanilla one without any custom code.
We googled a little and tried to follow this, and configured Djdk.nio.maxCachedBufferSize=262144 in JAVA_OPTS but without any effect.
Our stress tests are using gatling. It's report says all the requested completed with OK status (11 KO - those are AWS S3 images time outs, which are not going through our zuul)
This looks like a memory leak, but we can not find where?
Related
I have a nodejs project that was running fine on my local machine using pm2 process manager. Now I migrated my code to a micro ec2 instance and when I started my server using pm2 it crashed. I checked memory consumption using free -m and I found that node almost consume 855 Mb of memory out of 1GB RAM. I refactored my code and I started from scratch and I found as I add more files and modules the memory used by node js increases. I changed the swappiness limit to 15 instead of 50. Another thing I tried to used my hard as a swap, Although it works but as I add more files and functionality to my project nodejs consume more memory. Is this behaviour normal. If not how to debug it. Finally are there any tweeks to links or nodejs infrastructure to minimize the memory footprint.
My server is not running any databases or anything. Only nodejs app.
Here is the value code meterics
And here is the free -m command
I'm running an API server using NodeJS 6.10.3 LTS on Ubuntu 14.04 (trusty). I've noticed that my API server tops out at ~600 reqs/min running on a c4.large EC2 instance. By tops out I mean, I see the CPU go uptil 100% Note, I know that I'm not fully utilizing the instance by using the cluster module, but that's ok for now.
I took a .cpuprofile dump of my API server for 10 seconds, and noticed that every second, for ~300ms, the profiler shows my NodeJS code is sitting (idle).
Does anyone know what that (idle) implies? Is it a GC issue? Or is it a internal (to V8) lock that I'm triggering? Any help or pointers to tools to help debug this would be nice. I'm working on anonymizing some of stack traces in the cpuprofile so I can share.
The packages I'm using are ExpressJS 4, Couchbase NodeJS SDK, Socket.IO mainly. The codepaths are mainly reading requests, and pushing to Couchbase. And finally querying couchbase via Views API, and pushing some aggregated data on a Socket.IO channel. So all pretty I/O async friendly stuff. I've made sure that I'm not calling any synchronous functions. There are no patterns of function calls before the (idle) in the cpu profile.
It could also just be I/O wait, meaning none of the sockets have data ready to read yet and so the time is spent idle. If you are using a load testing library you should check that the requests are evenly distributed within a second.
Take a look at https://www.npmjs.com/package/gc-stats to check GC data. There are flags to increase heap space, and to change when GC runs, if the problem turns out to be GC related.
I have performed some performance tests on WSO2 APIM on both WebServices (WSDL) and Gateway interfaces. Everything went good on the gateway one, however I am facing an odd behavior when using the WebServices one.
Basically I created a test that add, change password and delete a user and run a test plan using 64 threads. At the very beggining my throughput increases a lot up until reach all 64 threads (throughput peak was 1600 req/seg). However, after that the throughput start to decrease with no reason.
All 64 threads are still active and running, and the machine hosting the wso2am reduce CPU usage. It seems that APIM is given up of handling the request even though it has threads and processors for that.
The picture below shows the vmstat result for processor (user, system and idle) and the context switch and interruptions. It is possible to cpu/context switch follows the throughput.
And the next picture illustrate the jmeter test result after at the end (after decrease throughput).
Basically what I need is a clue on what may be the reason for such behavior. I have already tried to increase the pool of threads on both wso2am and tomcat, however it has no effect. It is like the requests were not arriving at all. Even though jmeter is full of power and had already send a bigger throughput before.
I would bet that a simple configuration on tomcat or wso2 is the answer for that. Any help is appreciate.
Thanks and Regards
It may be due to JMeter not being able to send the requests fast enough, try the following steps:
Upgrade JMeter to the latest version (3.1 as of now), you can get the most recent JMeter distribution from JMeter download page
Run your test in command-line non-GUI mode. JMeter GUI can be used for tests development and/or debugging only, it is not designed for running load tests.
Remove (or disable) all the listeners during test execution. Later on you can open JMeter GUI, add the listener of your choice, load .jtl results file and perform analysis or create an HTML Reporting Dashboard out of results file
See 9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure article for above points explained in details and few more tips on configuring JMeter for maximum performance and throughput
I'm running a node.js express application on production. After a few hours of running, in a heap snapshot I can see that there are more than 10 huge TLSWrap objects per worker (these are the largest objects in the application).
Some Technical Aspects
I'm running forever with the cluster module (2 workers).
The application runs inside an AWS EC2 large instance.
Most of the tasks per request are getting data from redis and sending some requests (events) to another server.
Normal memory usage: ~450MB, after a few hours suddenly: 3.5GB (then there is too much latency and my load balancer removes this machine). See Memory usage graph.
Normal CPU usage: 16%, during the memory leak: 99%.
What I've Tried Already
Code refactoring with memory leaks problems in mind (closures, big objects and minimal string concatenation.
Upgrading node all the way from v0.12.7, v4.1.1, v4.1.2 and v4.2.0.
Some Interesting Insights
The growth of memory usage is not linear, but exponential and happend suddenly and very fast.
I have both permanent instances and also auto-scaling instances (same type) and this memory leak occurs at the same time on all machines.
Traffic (# requests) is not higher than usual during the memory leak.
I've read that sometimes these problems can be the result of continuing the application running after uncaughtException, but my uncaughtException handler just logs the error and then immediately calls process.exit() - Isn't that the same as when node crashes and the forever automatically restarts it?
I have another application that's:
Running from the same AWS EC2 AMI.
Has larger number of requests per second.
Has the uncaughtException handler (with process.exit()), too.
But no memory leaks at all!
Any ideas?
Thanks,
I believe that your memory leak is caused by something other than the TLSWrap objects, probably in your application layer.
According to this recently closed node issue, https://github.com/nodejs/node/issues/4250, TLSWrap has been incorrectly reporting its size as a large number (a pointer cast to an int). The actual size of TSLWrap objects is much smaller.
I was also seeing very large TLSWrap objects in my heapdumps, but after upgrading to node 5.3.0 (which includes the fix, https://github.com/nodejs/node/pull/4268), I can confirm that they are now correctly shown as quite small in my heapdumps.
Background
I have a relatively simple node js application (essentially just expressjs + mongoose). It is currently running in production on an Ubuntu Server and serves about 20,000 page views per day.
Initially the application was running on a machine with 512 MB memory. Upon noticing that the server would essentially crash every so often I suspected that the application might be running out of memory, which was the case.
I have since moved the application to a server with 1 GB of memory. I have been monitoring the application and within a few minutes the application tends to reach about 200-250 MB of memory usage. Over longer periods of time (say 10+ hours) it seems that the amount keeps growing very slowly (I'm still investigating that).
I have been since been trying to figure out what is consuming the memory. I have been going through my code and have not found any obvious memory leaks (for example unclosed db connections and such).
Tests
I have implemented a handy heapdump function using node-heapdump and I have now enabled --expore-gc to be able to manually trigger garbage collection. From time to time I try triggering a manual GC to see what happens with the memory usage, but it seems to have no effect whatsoever.
I have also tried analysing heapdumps from time to time - but I'm not sure if what I'm seeing is normal or not. I do find it slightly suspicious that there is one entry with 93% of the retained size - but it just points to "builtins" (not really sure what the signifies).
Upon inspecting the 2nd highest retained size (Buffer) I can see that it links back to the same "builtins" via a setTimeout function in some Native Code. I suspect it is cache or https related (_cache, slabBuffer, tls).
Questions
Does this look normal for a Node JS application?
Is anyone able to draw any sort of conclusion from this?
What exactly is "builtins" (does it refer to builtin js types)?