I'm running load tests on AWS Lambda with Charlesproxy, but am confused by the timelines chart produced. I've setup a test with 100 concurrent connections and expect varying degrees of latency, but expect all 100 requests to be kicked off at the same time (hence concurrent setting in charlesproxy repeat advanced feature), but I'm seeing some requests get started a bit late ... that is if I understand the chart correctly.
With only 100 invocations, I should be well within the concurrency max set by AWS Lambda, so why then are these request being kicked off late (see requests 55 - 62 on attached image)?
Lambda can take from a few hundred milliseconds to 1-2 seconds to start up when it's in "cold state". Cold means it needs to download your package, unpack it, load in memory, then start executing your code. After execution, this container is kept "alive" for about 5 to 30 minutes ("warm state"). If you request again while it's warm, container startup is much faster.
You probably had a few containers already warm when you started your test. Those started up faster. Since the other requests came concurrently, Lambda needed to start more containers and those came from a "cold state", thus the time difference you see in the chart.
Related
I currently have a script inside a linux ec2 instance that processes some documents. This script gets called from AWS Lambda using (SSM) send_command. It works fine when it processes one or two documents but when it gets past that, I get empty responses. Im assuming the system bottlenecks as there is essentially no limit to the amount of calls that I can send to the instance. So is there a way to set the concurrency level on the instance to only process say 2 commands at a time?
I know I can set the concurrency level on the lambdas, but the execution time is usually less than 200ms. Meanwhile the processing time in the instance is about 5 to 15 seconds.
Ultimately, I can have the lambdas wait for the job to be completed but it would be expensive as I need to process thousands of documents.
Thank you!
I have a Python project with a server that distributes work to one or more clients. Each client is given a number of assignments which contain parameters for querying a target API. This includes a maximum number of requests per second they can make with a given API key. The clients process the response and send the results back to the server to store into a database.
Both the server and clients use Tornado for asynchronous networking. My initial implementation for the clients relied on the PeriodicCallback to ensure that n-number of calls to the API would occur. I thought that this was working properly as my tests would last 1-2 minutes.
I added some telemetry to collect statistics on performance and noticed that the clients were actually having issues after almost exactly 2 minutes of runtime. I had set the API requests to 20 per second (the maximum allowed by the API itself) which the clients could reliably hit. However, after 2 minutes performance would fluctuate between 12 and 18 requests per second. The number of active tasks steadily increased until it hit the maximum amount of active assignments (100) given from the server and the HTTP request time to the API was reported by Tornado to go from 0.2-0.5 seconds to 6-10 seconds. Performance is steady if I only do 14 requests per second. Anything higher than 15 requests will experience issues 2-3 minutes after starting. Logs can be seen here. Notice how the column of "Active Queries" is steady until 01:19:26. I've truncated the log to demonstrate
I believed the issue was the use of a single process on the client to handle both communication to the server and the API. I proceeded to split the primary process into several different processes. One handles all communication to the server, one (or more) handles queries to the API, another processes API responses into a flattened class, and finally a multiprocessing Manager for Queues. The performance issues were still present.
I thought that, perhaps, Tornado was the bottleneck and decided to refactor. I chose aiohttp and uvloop. I split the primary process in a similar manner to that in the previous attempt. Unfortunately, performance issues are unchanged.
I took both refactors and enabled them to split work into several querying processes. However, no matter how much you split the work, you still encounter problems after 2-3 minutes.
I am using both Python 3.7 and 3.8 on MacOS and Linux.
At this point, it does not appear to be a limitation of a single package. I've thought about the following:
Python's asyncio library cannot handle more than 15 coroutines/tasks being generated per second
I doubt that this is true given that different libraries claim to be able to handle several thousand messages per second simultaneously. Also, we can hit 20 requests per second just fine at the start with very consistent results.
The API is unable to handle more than 15 requests from a single client IP
This is unlikely as I am not the only user of the API and I can request 20 times per second fairly consistently over an extended period of time if I over-subscribe processes to query from the API.
There is a system configuration causing the limitation
I've tried both MacOS and Debian which yield the same results. It's possible that's it a *nix problem.
Variations in responses cause a backlog which grows linearly until it cannot be tackled fast enough
Sometimes responses from the API grow and shrink between 0.2 and 1.2 seconds. The number of active tasks returned by asyncio.all_tasks remains consistent in the telemetry data. If this were true, we wouldn't be consistently encountering the issue at the same time every time.
We're overtaxing the hardware with the number of tasks generated per second and causing thermal throttling
Although CPU temperatures spike, neither MacOS nor Linux report any thermal throttling in the logs. We are not hitting more than 80% CPU utilization on a single core.
At this point, I'm not sure what's causing it and have considered refactoring the clients into a different language (perhaps C++ with Boost libraries). Before I dive into something so foolish, I wanted to ask if I'm missing something simple.
Conclusion
Performance appears to vary wildly depending on time of day. It's likely to be the API.
How this conclusion was made
I created a new project to demonstrate the capabilities of asyncio and determine if it's the bottleneck. This project takes two websites, one to act as the baseline and the other is the target API, and runs through different methods of testing:
Spawn one process per core, pass a semaphore, and query up to n-times per second
Create a single event loop and create n-number of tasks per second
Create multiple processes with an event loop each to distribute the work, with each loop performing (n-number / processes) tasks per second
(Note that spawning processes is incredibly slow and often commented out unless using high-end desktop processors with 12 or more cores)
The baseline website would be queried up to 50 times per second. asyncio could complete 30 tasks per second reliably for an extended period, with each task completing their run in 0.01 to 0.02 seconds. Responses were very consistent.
The target website would be queried up to 20 times per second. Sometimes asyncio would struggle despite circumstances being identical (JSON handling, dumping response data to queue, returning immediately, no CPU-bound processing). However, results varied between tests and could not always be reproduced. Responses would be under 0.4 seconds initially but quickly increase to 4-10 seconds per request. 10-20 requests would return as complete per second.
As an alternative method, I chose a parent URI for the target website. This URI wouldn't have a large query to their database but instead be served back with a static JSON response. Responses bounced between 0.06 seconds to 2.5-4.5 seconds. However, 30-40 responses would be completed per second.
Splitting requests across processes with their own event loop would decrease response time in the upper-bound range by almost half, but still took more than one second each to complete.
The inability to reproduce consistent results every time from the target website would indicate that it's a performance issue on their end.
I have an API Service, that sometimes takes a while to spit out the answer, like maybe 120 to 200 seconds. Most of the time the answer will be produced in 1 to 5 seconds.
When there are let's say 20 requests to IIS that have long-running answers between 120 - 200 seconds. All the other incoming requests seem to take much longer to process and everything gets real slow until those 20 requests produce results.
I have the application pool set 50,000 concurrent requests, but I see this behaviour with as little as 100 x 200 requests and then the requests keep growing until the long running ones complete.
I was under the impression that all requests will run normally and independently of one another. And a few shouldn't slow down the rest. The CPU on the machine never churns over 1%.
Is there anything I might be overlooking? Configuration wise?
After reading lex li answer, and setting the max worker threads might help, how many CPU's is this considered?
And how many worker threads will this enable?
<processModel autoConfig="false" requestQueueLimit="100000" maxWorkerThreads="200" maxIoThreads="200" minWorkerThreads="50" minIoThreads="50" />
And how many under this scenario?
Solved : It's a node bug. Happens after ~25 days (2^32 milliseconds), See answer for details.
Is there a maximum number of iterations of this cycle?
function do_every_x_seconds() {
//Do some basic things to get x
setTimeout( do_every_x_seconds, 1000 * x );
};
As I understand, this is considered a "best practice" way of getting things to run periodically, so I very much doubt it.
I'm running an express server on Ubuntu with a number of timeout loops .
One loop that runs every second and basically prints the timestamp.
One that calls an external http request every 5 seconds
and one that runs every X 30 to 300 seconds.
It all seemes to work well enough. However after 25 days without any usage, and several million iterations later, the node instance is still up, but all three of the setTimout loops have stopped. No error messages are reported at all.
Even stranger is that the Express server is still up, and I can load http sites which prints to the same console as where the periodic timestamp was being printed.
I'm not sure if its related, but I also run nodejs with the --expose-gc flag and perform periodic garbage collection and to monitor that memory is in acceptable ranges.
It is a development server, so I have left the instance up in case there is some advice on what I can do to look further into the issue.
Could it be that somehow the event-loop dropped all it's timers?
I have a similar problem with setInterval().
I think it may be caused by the following bug in Node.js, which seem to have been fixed recently: setInterval callback function unexpected halt #22149
Update: it seems the fix has been released in Node.js 10.9.0.
I think the problem is that you are relying on setTimeout to be active over days. setTimeout is great for periodic running of functions, but I don't think you should trust it over extended time periods. Consider this question: can setInterval drift over time? and one of its linked issues: setInterval interval includes duration of callback #7346.
If you need to have things happen intermittently at particular times, a better way to attack this would be to schedule cron tasks that perform the tasks instead. They are more resilient and failures are recorded at a system level in the journal rather than from within the node process.
A good related answer/question is Node.js setTimeout for 24 hours - any caveats? which mentions using the npm package cron to do task scheduling.
I'm working on a crawler and I've noticed that by setting the length of time for waiting 1 minute per request has made the application more reliable and I now get fewer connection resets. Can you recommend a reasonable amount of time to wait? I think 1 minute is quite the belts and braces approach and I would like to reduce this ideally.