AWS Lambda async concurrency limits

AWS Lambda async concurrency limits - node.js

I'm working on an AWS Lambda function that currently makes hundreds of API calls but when going into production it will make hundreds of thousands. The problem is that I can't test at that scale.
I'm using the async module to execute my api calls with async.eachLimit so that I can limit the concurrency (I currently set it a 300).
The thing that I don't understand is the limits on AWS Lambda. Here's what the docs say:
AWS Lambda Resource Limits per Invocation
Number of file descriptors: 1,024
Number of processes and threads (combined total): 1,024
As I understand it, Node.js is single threaded so I don't think I would exceed that limit. I'm not using child processes and the async library doesn't either so OK on that front too.
Now about those file descriptors, my function strictly calls the rest of AWS's API and I'm never writing to disk so I don't think I'm using them.
The other important AWS Lambda limits are execution time and memory consumed. Those are very clearly reported on each execution and I am perfectly aware when I'm close to reaching them or not, so let's ignore these for now.
A little bit of context:
The exact nature of my function is that every time a sports match starts I need to subscribe all mobile devices to the appropriate SNS topics, so basically I'm calling our own MySQL database and then the AWS SNS endpoint repeatedly.
So the question is...
How far can I push async's concurrency in AWS Lambda in this context? Are there any practical limits or something else that might come into play that I'm not considering?

As I understand it, Node.js is single threaded so I don't think I
would exceed that limit. I'm not using child processes and the async
library doesn't either so OK on that front too.
Node.js is event driven, not single threaded.
The Javascript engine runs on a single thread (the event loop) and delegates I/O operation to an internal library (libuv) which handles its thread pool and asynchronous operations.
async doesn't open a child process on its own, but behind the scenes, whether you're making an HTTP request or interacting with the file system, you're delegating these operations to libuv.
In other words, you've answered your own question well with the resources limits:
How far can I push async's concurrency in AWS Lambda in this context? Are there any practical limits or something else that might come into play that I'm not considering?
AWS Lambda Resource Limits per Invocation
Number of file descriptors: 1,024
Number of processes and threads (combined total): 1,024
It's hard to say whether libuv would open a new thread for each I/O operation, so you might get away with a little more than the numbers listed above. But you will probably run out or memory way before reaching those limits anyway.
The bottom line is no, you won't be able to make hundreds of thousands of calls in a single lambda execution.
Regarding the context of your function, depending on how often your job needs to run, you might want to refactor your lambda to multiple executions (it would also run faster), or have it on an EC2 with auto scaling triggered by lambda.

Related

Details of how Node JS works?

I want to ask some clarifying questions about NodeJS, which I think are poorly described on the resources I studied.
Many sources say that NodeJS is not suitable for complex calculations, because it is single-threaded and queries are executed sequentially
I created the simplest server on Node and wrote an endpoint that executes a request for about 10 seconds (a cycle). Next, I made 10 consecutive requests via Postman, and indeed, each subsequent request began execution only after the previous one gave a response.
Do I understand correctly that in this case, if the execution time of one endpoint is approximately 300ms, and 700 users will access the server at the same time, then the waiting time for the last user will be a critical 210,000 ms?
I also heard that the advantage of NodeJS is the ability to support a large number of simultaneous connections, then what does this mean and why is it a plus if the answer for the last person from the last question will still be very long
Another statement I came across is that libuv allows you to do many I/O operations at the same time, how does it work if NodeJS processes requests sequentially anyway?
Thank you very much!

TL;DR: I/O operations don't block the single execution thread. CPU intensive tasks DO block the thread and a NodeJS web server is not a good option in that case.
Yes, if your endpoint needs 300ms of synchronous work (cpu) to complete the operation, the last user will wait 210,000ms.
NodeJS is good at handling a large number of connections when the work it needs to do is i/o bound. It is not a good choice if the endpoint needs a lot of CPU time.
I/O operations operate at a different layer and take ZERO CPU time. That means that once the I/O operation is fired, NodeJS can accept new calls to the endpoint. NodeJS then polls the Operating System for completed I/O calls whenever its not using CPU and executes the callbacks. This is what allows it to handle a large number of concurrent requests without one user needing to wait for others to finish.

Is it possible for multiple AWS Lambdas to service a single HTTP request?

On AWS, is it possible to have one HTTP request execute a Lambda, which in turn triggers a cascade of Lambdas running in serial, where the final Lambda returns the result to the user?
I know one way to achieve this is for the initial Lambda to "stay running" and orchestrate the other Lambdas, but I'd be paying for that orchestration Lambda to effectively do nothing most of the time, i.e. paying for the time it's waiting on the others. If it were non-lambda code, that would be like blocking (and paying for) an entire thread while the other threads do their work.
Unless AWS stops the billing clock while async Lambdas are "sleeping"/waiting on network IO?

Unfortunately as you've found only a single Lambda function can be invoked, this becomes an orchestrator.
This is not ideal but will have to be the case if you want to use multiple Lambda functions as you're serving a HTTP request, you can either use the Lambda to call a number of Lambda or instead create a Step Function which can can orchestrate the individual steps. You would still need the Lambda to start this, and then poll the status of it before returning the results.

Improving Amazon SQS Performance

Everything I can find about performance of Amazon Simple Queue Service (SQS), including their own documentation, suggests that getting high throughput requires multiple threads. And I've verified this myself using the JS API with Node 12. If I create multiple threads, I get about the same throughput on each thread, so the total throughput increase is pretty much linear. But I'm running this on a nice machine with lots of cores. When I run in Lambda on a single core, multiple threads don't improve the performance, and generally this is what I would expect of multi-threaded apps.
But here's what I don't understand - there should be very little going on here in the way of CPU, most of the time is spent waiting on web requests. The AWS SQS API appears to be asynchronous in that all of the methods use callbacks for the responses, and I'm using Promises to "asyncify" all of the API calls, with multiple tasks running concurrently. Normally doing this with any kind of async IO is handled great by Node, and improves throughput hugely, I do it all the time with database APIs, multiple streams, etc. But SQS definitely isn't behaving that way, it's behaving as though its IO is actually synchronous and blocking threads on the network calls, which would be outrageous for any modern API.
Has anyone had success getting high SQS message throughput in a single Node thread? The max I'm seeing is about 50 to 100 messages/sec for FIFO queues (send, receive, and delete, all of which are calling the batch methods with the max batch size of 10). And this is running in lambda, i.e. on their own network, which is only slightly faster than running it on my laptop over the Internet, another surprising find. Amazon's documentation says FIFO queues should support up to 3000 messages per second when batching, which would be just fine for me. Does it really take multiple threads on multiple cores or virtual CPUs to achieve this? That would be ridiculous, I just can't believe that much CPU would be used, it should be mostly IO time, which should be asynchronous.
Edit:
As I continued to test, I found that the linear improvement with the number of threads only happened when each thread was processing a different queue. If the threads are all processing the same queue, there is no improvement by adding threads. So it behaves as though each queue is throttled by Amazon. But the throughput to which it seems to be throttling is way below what I found documented as the max throughput. Really confused and disappointed right now!

Michael's comments to the original question were right on. I was sending all messages to the same message group. I had previously been working with AMQP message queues, in which messages will be ordered in the queue in the order they're sent, and they'll be distributed to subscribers in that order. But when multiple listeners are consuming the AMQP queue, because of varying network latencies, there is no guarantee that they'll be received in that order chronologically.
So that's actually a really cool feature of SQS, the guarantee that messages will be chronologically received in the order they were sent within the same message group. In my case, I don't care about the receipt order. So now I'm setting a unique message group ID on each message, and scaling up performance by increasing the number of async message receive loops, still just in one thread, and the throughput is amazing!
So the bottom line: If exact receipt order of messages isn't important for your FIFO queue, set the message group ID to a unique value on each message, and scale out with more receiver tasks to get the best throughput performance. If you do need guaranteed message ordering, it looks like around 50 messages per second is about the best you'll do.

Nodejs using child_process runs a function until it returns

I want this kind of structure;
express backend gets a request and runs a function, this function will get data from different apis and saves it to db. Because this could takes minutes i want it to run parallel while my web server continues to processing requests.
i want this because of this scenario:
user has dashboard, after logs in app starts to collect data from apis and preparing the dashboard for user, at that time user can navigate through site even can close the browser but the function has to run until it finishes fetching data.Once it finishes, all data will be saved db and the dashboard is ready for user.
how can i do this using child_process or any kind of structure in nodejs?

Since what you're describing is all async I/O (networking or disk) and is not CPU intensive, you don't need multiple child processes in order to effectively serve multiple requests. This is the beauty of node.js. With async I/O, node.js can be working on many different requests over the same period of time.
Let's supposed part of your process is downloading an image. Your node.js code sends a request to fetch an image. That request is sent off via TCP. Immediately, there is nothing else to do on that request. It's winging it's way to the destination server and the destination server is preparing the response. While all that is going on, your node.js server is completely free to pull other events from it's event queue and start working on other requests. Those other requests do something similar (they start async operations and then wait for events to happen sometime later).
Your server might get 10 different async operations started and "in flight" before the first one actually starts getting a response. When a response starts coming in, the system puts an event into the node.js event queue. When node.js has a moment between other requests, it pulls the next event out of the event queue and processes it. If the processing has further async operations (like saving it to disk), the whole async and event-driven process starts over again as node.js requests a write to disk and node.js is again free to serve other events. In this manner, events are pulled from the event queue one at a time as they become available and lots of different operations can all get worked on in the idle time between async operations (of which there is a lot).
The only thing that upsets the apple cart and ruins the ability of node.js to juggle lots of different things all at once is an operation that takes a lot of CPU cycles (like say some unusually heavy duty crypto). If you had something like that, it would "hog" too much of the CPU and the CPU couldn't be effectively shared among lots of other operations. If that were the case, then you would want to move the CPU-intensive operations to a group of child processes. But, just doing async I/O (disk, networking, other hardware ports, etc...) does not hog the CPU - in fact it barely uses much node.js CPU.
So, the next question is often "how do I know if I have too much stuff that uses the CPU". The only way to really know is to just code your server properly using async I/O and then measure its performance under load and see how things go. If you're doing async things appropriately and the CPU still spikes to 100%, then you have too much CPU load and you'll want to either use generic clustering or move specific CPU-heavy operations to a group of child processes.

Is it acceptable to use ThreadPool.GetAvailableThreads to throttle the amount of work a service performs?

I have a service which polls a queue very quickly to check for more 'work' which needs to be done. There is always more more work in the queue than a single worker can handle. I want to make sure a single worker doesn't grab too much work when the service is already at max capacity.
Let say my worker grabs 10 messages from the queue every N(ms) and uses the Parallel Library to process each message in parallel on different threads. The work itself is very IO heavy. Many SQL Server queries and even Azure Table storage (http requests) are made for a single unit of work.
Is using the TheadPool.GetAvailableThreads() the proper way to throttle how much work the service is allowed to grab?
I see that I have access to available WorkerThreads and CompletionPortThreads. For an IO heavy process, is it more appropriate to look at how many CompletionPortThreads are available? I believe 1000 is the number made available per process regardless of cpu count.
Update - Might be important to know that the queue I'm working with is an Azure Queue. So, each request to check for messages is made as an async http request which returns with the next 10 messages. (and costs money)

I don't think using IO completion ports is a good way to work out how much to grab.
I assume that the ideal situation is where you run out of work just as the next set arrives, so you've never got more backlog than you can reasonably handle.
Why not keep track of how long it takes to process a job and how long it takes to fetch jobs, and adjust the amount of work fetched each time based on that, with suitable minimum/maximum values to stop things going crazy if you have a few really cheap or really expensive jobs?
You'll also want to work out a reasonable optimum degree of parallelization - it's not clear to me whether it's really IO-heavy, or whether it's just "asynchronous request heavy", i.e. you spend a lot of time just waiting for the responses to complicated queries which in themselves are cheap for the resources of your service.

I've been working virtually the same problem in the same environment. I ended up giving each WorkerRole an internal work queue, implemented as a BlockingCollection<>. There's a single thread that monitors that queue - when the number of items gets low it requests more items from the Azure queue. It always requests the maximum number of items, 32, to cut down costs. It also has automatic backoff in the event that the queue is empty.
Then I have a set of worker threads that I started myself. They sit in a loop, pulling items off the internal work queue. The number of worker threads is my main way to optimize the load, so I've got that set up as an option in the .cscfg file. I'm currently running 35 threads/worker, but that number will depend on your situation.
I tried using TPL to manage the work, but I found it more difficult to manage the load. Sometimes TPL would under-parallelize and the machine would be bored, other times it would over-parallelize and the Azure queue message visibility would expire while the item was still being worked.
This may not be the optimal solution, but it seems to be working OK for me.

I decided to keep an internal counter of how many message are currently being processed. I used Interlocked.Increment/Decrement to manage the counter in a thread-safe manner.
I would have used the Semaphore class since each message is tied to its own Thread but wasn't able to due to the async nature of the queue poller and the code which spawned the threads.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string