Google Cloud Function: behaviour when --max-instances reached - python-3.x

I’m learning to work with Cloud Functions (and Cloud Run) and would like to know it’s behaviour when (HTTP triggered) function already running at Max instances capacity and more HTTP requests come-in
1)
Here’s my function basic code (simple function with approx. 1000ms execution time per invocation):
ctr = 0
def hello_world(request):
global ctr
print("hello_world(): "+str(ctr))
ctr=ctr+1
time.sleep(10)
response = flask.Response("success::", 200)
return response
2)
deployed this function with flag --max-instances=1 (just to ensure no new VM instances come up to handle concurrent requests)
3)
and then send 5 concurrent results
From what I observe, only one of the requests get processed. Other 4 requests just dropped (client received HTTP status code 500, and no trace for these dropped requests in Stackdriver Logging either)
In the link here https://cloud.google.com/functions/docs/max-instances it says:
In that case, incoming requests queue for up to 60 seconds. During this 60 second window, if an instance finishes processing a request, it becomes available to process queued requests. If no instances become available during the 60 second window, the request fails.
Based on which I expected, that while one request is being handled, others would be queued up for max 60 seconds. Therefore, all 5 requests should have been processed (or at-least >1 requests if not all 5!). However actual behaviour that I'm seeing is different from this.
Can someone explain pls.
UPDATE-1: It seems the fix has been released and documentations updated
1) It still continued to return status code 500 during initial colds start (when no instances running) for some of the concurrent requests.. EXPECTED I suppose
2) Also, temporarily exceeded max-instances=1 during very initial bursts of 10 requests, launching upto 4 instances AND successfully server all 4 requests.
3) thereafter, # instanced did come down to respect --max-instances=1settings and all but one requests returned withstatus code 429`

The Cloud Function engineering team is now aware of this and they will proceed to change the documentation to reflect the changes made to the feature still in active development. Stay tuned for the documentation, the update will come shortly =). Thank you for noticing this with your tests!
Aside from this, and as a suggestion from the team, if you are interested on queuing jobs, it is recommended to use Cloud Tasks

Related

Azure Functions service not recognizing request sent from outside client

We have a service which pings our EP1 Premium service and yesterday we received 3 client side timeout errors after 2 minutes of waiting. When opening the trace in App insights, these requests which time out are not even logged and have no trace of ever being received Azure side, and therefore stay unanswered. By looking at the metrics provided in the Azure Functions app, I found out that 1-2 minutes after the request has been sent, the app loses all its ability to work as its Total App Domains falls to 0 as well as all connections, threads and so on and this state lasts until the next request is received, therefore "skipping" the request that happened beforehand. This is a big issue as I need to make sure requests get answered in a timely manner.
The client service sent HTTP requests to the Azure Functions app expecting an answer, only to time out while the Azure-side doesn't have any record of ever receiving the request.
I believe this issues is related to Consumption Plan of Azure Functions called Cold Start behaviour. The "skipping" mechanism is explained below:
Apps may scale to zero when idle, meaning some requests may have additional latency at startup. The consumption plan does have some optimizations to help decrease cold start time, including pulling from pre-warmed placeholder functions that already have the function host and language processes running.https://learn.microsoft.com/en-us/azure/azure-functions/functions-scale#cold-start-behavior
Please also consider of having look on this article, which explains the behaviour. https://azure.microsoft.com/en-us/blog/understanding-serverless-cold-start/

One Instance-One Request at a time App Engine Flexible

I am using
App Engine Flexible, custom runtime.
nodejs, as base Image.
express
Cloud Tasks for queuing the requests
puppeteer job
My Requirements
20GB RAM
long-running process
because of my unique requirement, I want 1 request to be handled by only 1 instance. when it gets free or the request gets timed-out, only then it should get a new request.
I have managed to reject other requests while the instance is processing 1 request, but not able to figure out the appropriate automatic scaling settings.
Please suggest the best way to achieve this.
Thanks in advance!
In your app.yaml try restricting the max_instances and max_concurrent_requests.
I also recommend looking into rate limiting your Cloud Tasks queue in order to reduce unnecessary attempts to send requests. Also you may want to increase your MIN_INTERVAL for retry attempts to spread out requests as well.
Your task queue will continue to process and send tasks by the rate you have set, so if your instance rejects the request it will go into a retry pattern. It seems like you're focused on the scaling of App Engine but your issue is with Cloud Tasks. You may want to schedule your tasks so they fire at the interval you want.
You could set readiness checks on your app.
When an instance is handling a request, set the readiness check to return a non-ready status. 429 (too many requests) seems like a good option.
This should avoid traffic to that specific instance.
Once the request is finished, return a 200 from the readiness endpoint to signal that the instance is ready to accept a new request.
However, I'm not sure how will this work with auto-scaling options. Since the app will only scale up once the average CPU is over the threshold defined, if all instances are occupied but do not reach that threshold, the load balancer won't know where to route requests (no instances are ready), and it won't scale up.
You could play around a little bit with this idea and manual scaling, or by programatically changing min_instances (in automatic scaling) through the GAE admin API.
Be sure to always return a 200 for the liveness check, or the instance will be killed as it will be considered unhealthy.

What is the correct client reaction to a HTTP 429 when the client is multi-threaded?

The HTTP status code 429 tells the client making the request to back off and retry the request after a period specified in the response's Retry-After header.
In a single-threaded client, it is obvious that the thread getting the 429 should wait as told and then retry. But the RFC explicitly states that
this specification does not define how the origin server identifies
the user, nor how it counts requests.
Consequently, in a multi-threaded client, the conservative approach would stop all threads from sending requests until the Retry-After point in time. But:
Many threads may already be past the point where they can note the information from the one rejected thread and will send at least one more request.
The global synchronization between the threads can be a pain to implement and get right
If the setup runs not only several threads but several clients, potentially on different machines, backing off all of them on one 429 becomes non-trivial.
Does anyone have specific data from the field how servers of cloud providers actually handle this? Will they get immediately aggravated if I don't globally hold back all threads. Microsoft's advice is
Wait the number of seconds specified in the Retry-After field.
Retry the request.
If the request fails again with a 429 error code, you are still being throttled. Continue to use the recommended Retry-After delay and retry the request until it succeeds.
It twice says 'the request' not 'any requests' or 'all requests', but this is legal-type interpretation I am not confident about.
To be sure this is not an opinion question, let me phrase it as fact-based as possible:
Are there more detailed specifications for cloud APIs (Microsoft, Google, Facebook, Twitter) then the example above that allow me to make an informed decision whether global back-off is necessary or whether it suffices to back-off with the specific request that got the 429?
Servers knows that its tuff to sync or expect programmers to do this. So doubt if there is a penalty unless they get an ocean of requests that do not back off at all after 429.
Each thread should wait, but each would, after being told individually.
A good system would know what its rate is and be within that. One way to impolement this is having a sleepFor variable between requests. Exact prod value can be arrived at by trial and error, and would be the sleep time minus the previous request time.
So if one requests ends, and say it took x milliseconds. Now if the sleep time is
0 or less, move immediately to next request
if 1 or more than find out sleepTime - x, if this is less than 1, go to next immediately, else sleep for so many milliseconds and then move to next request.
Another way would be to have a timeCountStrarted at request 1; count for every 5 minutes or so. After every request, check if the actual request count of current thread is more than that. If yes current thread sleeps till 5 minutes is up before moving to next. Here 5 can be configured as the timePeriod. If after a request the count is not more than set figure but time elapsed since timeCountStrarted is more than 5 minutes; then set timeCountStrarted to current time and the count of requests to 0.
What we do is keep these configuration values in a data base but cache them at run time.
Also have a page to invalidate the caches so if we like we can update the data base from an admin page, then invalidate the caches and thus the clients would pick up the new information on the run. This helps to configure the correct value to stay within API limits and get enough jobs done.

Are Message Queue Triggers betters than Http Triggers for scalability?

2 types of triggers for Azure Functions are message queues and http triggers. I'm guessing that one difference is that with http triggers a request could be denied if there are not enough instances to service the request, whereas with message queues it will check if there is an instance available and if not spin one up before trying to process the message. Is that a correct understanding?
Not quite..! I think you're getting the information from here:
Whenever possible, refactor large functions into smaller function sets that work together and return responses fast. For example, a webhook or HTTP trigger function might require an acknowledgment response within a certain time limit; it is common for webhooks to require an immediate response.
The acknowledgement within a certain time limit is all about how long the client is willing to wait for a response. So if you were to execute some task that takes a long time (on the order of minutes), you may not receive an informative response because the client would just call the connection dead. However, your function would still execute just fine as long as it stayed within the functionTimeout limits (one a consumption plan, the default is 5 minutes, max being 10 minutes).

How to optimize AWS Lambda?

I'm currently building web API using AWS Lambda with Serverless Framework.
In my lambda functions, each of them connects to Redis (elasticache) and RDB (Aurora, RDS) or DynamoDB to retrieve data or write new data.
And all my lambda functions are running in my VPC.
Everything works fine except that when a lambda function is first executed or executed a while after last execution, it takes quite a long time (1-3 seconds) to execute the lambda function, or sometimes it even respond with a gateway timeout error (around 30 seconds), even though my lambda functions are configured to 60 seconds timeout.
As stated in here, I assume 1-3 seconds is for initializing a new container. However, I wonder if there is a way to reduce this time, because 1-3 seconds or gateway timeout is not really an ideal for production use.
You've go two issues:
The 1-3 second delay. This is expected and well-documented when using Lambda. As #Nick mentioned in the comments, the only way to prevent your container from going to sleep is using it. You can use Lambda Scheduled Events to execute your function as often as every minute using a rate expression rate(1 minute). If you add some parameters to your function to help you distinguish between a real request and one of these ping requests you can immediately return on the ping requests and then you've worked around your problem. It will cost you more, but we're probably talking pennies per month if anything. Lambda has a generous free tier.
The 30 second delay is unusual. I would definitely check your CloudWatch logs. If you see logs from when your function is working normally but no logs from when you see the 30 second timeout then I would assume the problem is with API Gateway and not with Lambda. If you do see logs then maybe they can help you troubleshoot. Another place to check is the AWS Status Page. I've seen sometimes where Lambda functions timeout and respond intermittently and I pull my hair out only to realize that there's a problem on Amazon's end and they're working on it.
Here's a blog post with additional information on Lambda Container Reuse that, while a little old, still has some good information.

Resources