AWS Lambda Memory vs Execution time

AWS Lambda Memory vs Execution time - node.js

I have a (Nodejs 4.3) lambda function that I have tested with several memory limit settings (128, 256, 512).
As I pull up the memory limit, the execution time decreases as expected. However the max memory used also goes down. Every time I reduce the memory limit the execution time and max memory used go back up.
Any thoughts? Trying to figure out how to hit the execution time I need while not over paying.

This is the Node VM utilizing memory. If it's available, it's going to use it (and you could probably reproduce this scenario locally with VMs or Docker). I wouldn't worry too much and I wouldn't recommend trying to instruct the VM on what to do or even thinking about garbage collection (which is not easy with Node.js anyway). Node is very opportunistic about memory is all I can say. I would select the amount you need in order to get a reasonable response time and leave it at that.
I would also probably have to imagine that on "warm" runs your speed will increase when you have a lower amount of memory set, but a "cold" run will be higher. So in production, it may not be as big of a concern.
You may wish to start profiling your code and trying to optimize it...But again, Node doesn't really want the developer worrying about the resources on a machine. It tries to optimize for you. It's a bit unfortunate when you're billed that way though. This is part of why I wish Go was natively supported with Lambda.

Related

How To implement swapfile cross operating system

There are use cases where I can't have a lot of ram, and sometimes due to docker based services doesn't always provide more than 512mb/1gb of ram, or if I run multiple rust based gui apps and if each take 100mb of ram normally, how can I implement a swapfile/ virtual ram to exceed allotted ram? Also os level swapfiles don't let users choose which app can use real ram and which swapfile, so it can become a problem too. I want to use swapfile as much as possible, and not even real ram, if possible. Users and hosting services provide with lot of storage usually (more than 10gb normally) so it would be a good way to use the available storage too!
If swapfile or anything like that aren't possible, I would like to know if there is any difference in speed and cpu consumption between "cache data in ram" apps and "cache data in file and read it when required" apps. If the latter is slow normally and not as efficient as swapfiles, I would like to know the possible ways how os manages to make swapfiles that efficient than apps.

An application does not control whether the memory they allocate is allocated on real RAM, on a swap partition, or else. You just ask for memory, and the OS is responsible for finding available memory to give to you.
Besides that, note that using swap (sometimes called swapping) is extremely bad performance-wise. How much depends a lot on your hardware, but it's about three orders of magnitude. This is even amplified if you are interacting with a user: a program that is fetching some resources will not be too bothered if it has to wait one minute to get them instead of a few milliseconds because the system is under heavy load, but a user will generally not be that patient.
Also note that, when swapping, the OS does not chose which application gets the faster RAM and which ones get the swap memory at random. It will try to determine which application should be prioritized, by how much, etc. based on how it was configured (at least for the Linux kernel), so in reality it's the user who, in the end, decides which applications get the most RAM (ahead of time, of course: they are not prompted each time the kernel has to make that decision with a little pop-up...).
Finally, modern OS allow several applications to allocate memory that overlap, as long as each application is not fully using the memory it asked for (which is kind of usual), allowing you to run applications that in theory require more RAM that you actually have.
This was on the OS part: now to the application part. Usually, when you write a program (whose purpose is not specifically RAM-related), you should not really care for memory consumption (up to a certain point), especially in Rust. Not only that is usually handled by the OS in case you used a little bit too much memory, but when it's possible, most people prefer to trade a little more memory usage (even a lot more) for better CPU performance, because RAM is a lot cheaper than CPU.
There are exceptions, of course, in which the memory consumption is so high that you can't really afford not paying attention. In these cases, either you let the user deal with this problem (ie. this application is known to consume a lot of memory because there are no other ways to do this, so if you want to use it, just have a lot of memory), as often video games do, or you rethink your application to reduce the memory usage trading it for some CPU efficiency, as for example is done when you are handling graphs so huge you couldn't even store them on all the hard disks of the world (in which case your application has to be smart enough to be able to work on small parts of the graph at the time), or finally you are working with a big resource but which can be stored on the hard disk, so you just write it on a file and access it chunks-by-chunks, as some database managers do.

Does RAM affect the time taken to sort an array?

I have an array of a 500k to million items to be sorted. Does going with a configuration of increased RAM be beneficial or not, say 8GB to 32GB or above. Im using a node.JS/mongoDB environment.

Adding RAM for an operation like that would only make a difference if you have filled up the available memory with everything that was running on your computer and the OS was swapping data out to disk to make room for your sort operation. Chances are, if that was happening, you would know because your computer would become pretty sluggish.
So, you just need enough memory for the working set of whatever applications you're running and then enough memory to hold the data you are sorting. Adding additional memory beyond that will not make any difference.
If you had an array of a million numbers to be sorted in Javascript, that array would likely take (1,000,000 * 8 bytes per number) + some overhead for a JS data structure = ~8MB. If your array values were larger than 8 bytes, then you'd have to account for that in the calculation, but hopefully you can see that this isn't a ton of memory in a modern computer.
If you have only an 8GB system and you have a lot of services and other things configured in it and are perhaps running a few other applications at the time, then it's possible that by the time you run nodejs, you don't have much free memory. You should be able to look at some system diagnostics to see how much free memory you have. As long as you have some free memory and are not causing the system to do disk swapping, adding more memory will not increase performance of the sort.
Now, if the data is stored in a database and you're doing some major database operation (such as creating a new index), then it's possible that the database may adjust how much memory it can use based on how much memory is available and it might be able to go faster by using more RAM. But, for a Javascript array which is already all in memory and is using a fixed algorithm for the sort, this would not be the case.

GC taking 32% of runtime expected?

Currently working on optimizing a library for speed. I've already reduced execution time drastically, using V8 CPU and Memory Profiling through Webstorm. This was achieved mainly by changing the core method from recursive to iterative.
Now the self time distribution breaks down as
I'm assuming the first entry "node" is timing internal functions calls, which is great. The other entries also make sense. I'm new to Nodejs profiling, but 31.6% for GC seems high, so I've decided to investigate.
I've now created a heap dump through Webstorm, but unfortunately that doesn't give me much information.
These seem to be system internal memory references mainly. Stepping through the core iteration code logic again, there also don't seem to be a lot of places where memory is explicitly allocated (using this as a reference).
Question
Can the GC overhead be reduced?
Is this amount of allocation just expected here?
Is it possible to get better memory profiling information?
Setup Instructions
In case someone want's to try debugging this, I'm including setup instructions.
Download or clone object-scan and run
yarn install --frozen-lockfile
yarn run test-simple --verbose
Now create a file test.js in the project root containing this content and run node --trace_gc test.js or run it through Webstorm for advanced profiling.

In Javascript and in v8 (node) particularly an amount of time spent for garbage collection depends on amount of data stored in heap, but that's only one of many factors.
In v8 engine there are two main "types" of GC: minor (scavenge) and major (mark-sweep/mark-compact). You may see GC types that happen during your tests in console with --trace-gc enabled. And in different cases one type could "eat" more time than other an vice versa. So before optimizations you should determine which gc takes more time.
There are not a lot of options for optimizing major GC, cause it highly affected by amount of data that stays in memory for "long" (actually in this case long means that object survives scavenge GC) period. Such data is stored in so called "old space" in heap. And major GC works with this space and it should scan all that memory and mark objects that no longer have any references for further clearance.
In your case the amount of test data you're loading goes to old space. As a result it affects major GC during the whole test. And in this case major GC will not clear too much, because you're using your test object, but it still consume time for scanning entire old space. So you may consider preventing v8 from doing that by launching node with gc-specific flags like: --nouse-idle-notification --expose-gc --gc_interval=100500 (where 100500 is number of allocation, it can be take high value that will prevent running gc before the whole test will pass) that will allow trigger garbage collections manually. Test your code using this approach and see how major GC affects it, try tests with different amount of data you provide to function. If the impact is quiet high you may try to refactor your code trying to minimize long-lived variables, closures, etc.
If you'll discover that major GC doesn't have much impact on performance, then scavenge GC takes the most of time. Unlike major GC it operates with so called "new space" in heap. It's a space where all new objects are stored. If those objects survive scavenge, then they are moved to old space. New space has much smaller size ( you may control it by setting --max_semi_space_size, note: new space size = 2 * semi space size) than old space and more new objects and variables you allocate more scavenge GC runs will happen. If this GC heats performance too much you may consider refactor your code to make less new allocations. But if you'll reuse variables it may also slowdown the performance and those objects will go to old space and may become a problem described in "major GC" section.
Also v8 GC doesn't always work in the same thread that your program runs. It does some work in background too, but I don't know what Webstorm shows in your case. If it counts just total time spend in GC, may be it just doesn't have so much impact.
You may find more details on v8 GC in this blog post.
TL;DR:
Can the GC overhead be reduced?
Yes, but first you should discover what should be optimized by following steps above.
Is this amount of allocation just expected here?
That's could be just discovered by comparing different approaches. There's no some absolute number that could limit "good" amount from "bad", because it depends on lot's of factors, including the amount on entry data.
Is it possible to get better memory profiling information?
You may find some good tools here, but in general you may use Chrome dev tools which could provide a bit more details rather than Webstorm does.

Node app eating memory incrementally over time

I've just launched two Express servers on DigitalOcean along with an instance of mongodb. I'm using PM2 to keep both of them running.
When I use htop to see the memory usage, the total usage is usually around 220-235mb (out of a total 488mb). The only thing I can see changing is the blue bars which I assume is buffer memory, the actual green memory in usage seems to always be around same.
I look on DO's graph however and over the past 24 hours the memory graph has been climbing upwards slowly, say 0.5% of the total per hour, sometimes it drops but overall it's on the up, at the moment it has been hovering around 60-65% of the total memory for a few hours.
There has been almost no traffic on these node web servers yet the memory keeps increasing slowly. So my question is could this be a memory leak within one of my servers or is it the nature of the v8 engine to incrementally expand its memory?

If you are considering memory-leak, then why don't you check your theory by writing 2-3 heapdumps with 2-3 hours difference in time. Then you can answer surely on your question.
You may use this module to write heapdumps on disk and then simply compare it using Chrome Developer Tools. Moreover you will see what's exactly placed inside the heap.
FYI: snapshots comparison from official documentation

linux CPU cache slowdown

We're getting overnight lockups on our embedded (Arm) linux product but are having trouble pinning it down. It usually takes 12-16 hours from power on for the problem to manifest itself. I've installed sysstat so I can run sar logging, and I've got a bunch of data, but I'm having trouble interpreting the results.
The targets only have 512Mb RAM (we have other models which have 1Gb, but they see this issue much less often), and have no disk swap files to avoid wearing the eMMCs.
Some kind of paging / virtual memory event is initiating the problem. In the sar logs, pgpin/s, pgnscand/s and pgsteal/s, and majflt/s all increase steadily before snowballing to crazy levels. This puts the CPU up correspondingly high levels (30-60 on dual core Arm chips). At the same time, the frmpg/s values go very negative, whilst campg/s go highly positive. The upshot is that the system is trying to allocate a large amount of cache pages all at once. I don't understand why this would be.
The target then essentially locks up until it's rebooted or someone kills the main GUI process or it crashes and is restarted (We have a monolithic GUI application that runs all the time and generally does all the serious work on the product). The network shuts down, telnet blocks forever, as do /proc filesystem queries and things that rely on it like top. The memory allocation profile of the main application in this test is dominated by reading data in from file and caching it as textures in video memory (shared with main RAM) in an LRU using OpenGL ES 2.0. Most of the time it'll be accessing a single file (they are about 50Mb in size), but I guess it could be triggered by having to suddenly use a new file and trying to cache all 50Mb of it all in one go. I haven't done the test (putting more logging in) to correlate this event with these system effects yet.
The odd thing is that the actual free and cached RAM levels don't show an obvious lack of memory (I have seen oom-killer swoop in the kill the main application with >100Mb free and 40Mb cache RAM). The main application's memory usage seems reasonably well-behaved with a VmRSS value that seems pretty stable. Valgrind hasn't found any progressive leaks that would happen during operation.
The behaviour seems like that of a system frantically swapping out to disk and making everything run dog slow as a result, but I don't know if this is a known effect in a free<->cache RAM exchange system.
My problem is superficially similar to question: linux high kernel cpu usage on memory initialization but that issue seemed driven by disk swap file management. However, dirty page flushing does seem plausible for my issue.
I haven't tried playing with the various vm files under /proc/sys/vm yet. vfs_cache_pressure and possibly swappiness would seem good candidates for some tuning, but I'd like some insight into good values to try here. vfs_cache_pressure seems ill-defined as to what the difference between setting it to 200 as opposed to 10000 would be quantitatively.
The other interesting fact is that it is a progressive problem. It might take 12 hours for the effect to happen the first time. If the main app is killed and restarted, it seems to happen every 3 hours after that fact. A full cache purge might push this back out, though.
Here's a link to the log data with two files, sar1.log, which is the complete output of sar -A, and overview.log, a extract of free / cache mem, CPU load, MainGuiApp memory stats, and the -B and -R sar outputs for the interesting period between midnight and 3:40am:
https://drive.google.com/folderview?id=0B615EGF3fosPZ2kwUDlURk1XNFE&usp=sharing
So, to sum up, what's my best plan here? Tune vm to tend to recycle pages more often to make it less bursty? Are my assumptions about what's happening even valid given the log data? Is there a cleverer way of dealing with this memory usage model?
Thanks for your help.
Update 5th June 2013:
I've tried the brute force approach and put a script on which echoes 3 to drop_caches every hour. This seems to be maintaining the steady state of the system right now, and the sar -B stats stay on the flat portion, with very few major faults and 0.0 pgscand/s. However, I don't understand why keeping the cache RAM very low mitigates a problem where the kernel is trying to add the universe to cache RAM.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string