How do I set my AWS Lambda provisioned capacity to zero in off hours? - node.js

I'm using the Serverless framework (serverless.com) and it looks like I can save some pretty serious money by only scaling up the provisioned capacity of my Lambda functions during business hours (9am-5pm, 40 hour per week) and then down to zero during off hours. My application isn't used very much in off hours, and if it is, my users are okay with cold starts and it taking longer.
In my serverless.yml, I have functions declared such as:
home_page:
handler: homePage/home_page_handler.get
events:
- http:
path: homePage
method: get
cors: true
authorizer: authorizer_handler
provisionedConcurrency: 1
I also have a Lambda that runs on a regular basis to set the provisioned capacity of the other Lambdas in the account depending on the time of day. In that Lambda if I call:
await lambda.putProvisionedConcurrencyConfig({
FunctionName: myFunctionName,
ProvisionedConcurrentExecutions: 0,
Qualifier: "provisioned",
}).promise();
Then I get this error:
Received error: ValidationException: 1 validation error detected: Value '0' at 'provisionedConcurrentExecutions' failed to satisfy constraint: Member must have value greater than or equal to 1
at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)
at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:55:8)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:688:14)
at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:690:12) {
code: 'ValidationException',
time: 2021-08-14T17:45:48.932Z,
requestId: '8594fad6-d5dd-4a00-adca-c34f9d38b25e',
statusCode: 400,
retryable: false,
retryDelay: 77.81562781029108
}
On the other hand, if I try to delete the provisioned capacity entirely, like this:
await lambda.deleteProvisionedConcurrencyConfig({
FunctionName: myFunctionName,
Qualifier: "provisioned",
}).promise();
That works fine, but when I try to deploy my function again in off hours (which is the norm), CloudFormation errors out with:
No Provisioned Concurrency Config found for this function (Service: AWSLambdaInternal; Status Code: 404; Error Code: ProvisionedConcurrencyConfigNotFoundException; Request ID: 75dd221b-35d2-4a49-80c5-f07ce261d357; Proxy: null)
Does anybody have a neat solution to turn off provisioned capacity during some hours of the day?

After some thinking, if you're still committed to Lambda as the compute solution, I think your best option is to manage provisioned concurrency outside of the Serverless Framework entirely.
You've already got an orchestrator function which will enable provisioned concurrency, you could try removing provisionedConcurrency from your serverless.yml file, adding another method in your orchestrator to disable provisioned currency in the evenings, and verifying that you can deploy when your orchestrator has set your functions to either state.
If you're willing to throw away your orchestrator function, AWS suggests using Application Auto Scaling, which is very useful for exactly what you're doing. (hat tip to #mpv)
That being said, Lambda isn't particularly well suited to predictable, steady-state traffic. If cost is a concern, I'd suggest exploring Fargate or ECS and writing a few autoscaling rules. Your Lambda code is already stateless, and likely is portable and has pretty limited networking rules. There are other forms of compute which would be dramatically cheaper to use.

Related

Azure Function StackExchange.Redis.RedisTimeoutException on StringGetAsync

I am running an azure function app on azure cloud , from time to time I am getting the fallowing error:
Exception while executing function: ****Timeout performing SETEX (10000ms), next: GET *****, inst: 140, qu: 0, qs: 0, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 2939, serverEndpoint: *****:6380, mc: 1/1/0, mgr: 10 of 10 available, clientName: 4ad57eb720e9(SE.Redis-v2.6.66.47313), IOCP: (Busy=0,Free=1000,Min=6,Max=1000), WORKER: (Busy=69,Free=32698,Min=6,Max=32767), POOL: (Threads=69,QueuedItems=54,CompletedItems=8674751), v: 2.6.66.47313 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
It doesn't look like there a specific reason for this to happen.
Any ideas why this happens ?
The bottle neck could either be network bandwidth or CPU cycles. Based on the log you shared, you have 69 work threads, so I would first check the CPU usage to be sure you aren't maxing it out.
You are likely not hitting network bandwidth issues assuming you are running on Azure and since the qs/qu values are 0 but there could be network glitches that are causing the timeouts too but should be transient and resolve on their own.

Name or Service not known - intermittent error in Azure

I have a TimerTrigger which calls my own Azure Functions at a relatively high rate - a few times per second. It is not being stress tested. Every call takes just a 100ms and the purpose of the test is not a stress test.
This call to my own endpoint works about 9999 times out of 10000 but just once in a while I get the following error:
System.Net.Http.HttpRequestException: Name or service not known (app.mycustomdomain.com:443)
---> System.Net.Sockets.SocketException (0xFFFDFFFF): Name or service not known
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
I replaced my actual domain with "app.mycustomdomain.com" in the error message above. It is a custom domain set up to point to the Azure Function App using CNAME.
The Function App does not detect any downtime in the Azure Portal and I have Application Insights enabled and do not see any errors. So I assume the issue is somehow on the callers side and the call never actually happens.
What does this error indicate? And how can I alleviate the problem?
For your second question - alleviating the problem, one option would certainly be to build in retry using a library like Polly. High level you create a policy, e.g. for a simple retry:
var myPolicy = Policy
.Handle<SomeExceptionType>()
.Retry(3);
This would retry 3 times, to use the policy you can call a sync or async version of Execute:
await myPolicy.ExecuteAsync(async () =>
{
//do stuff that might fail up to three times
});
More complete samples are available
This library has lots of support for other approaches, e.g. with delays, exponential delays, etc.

How to define multiple targets in CloudWatch events by serverless framework

I have been using the serverless framework (1.61.0). I have many and many scheduled events that are syncing data from another source. For instance, I am syncing Category entities within one lambda function.
categories:
handler: functions/scheduled/categories/index.default
name: ${self:provider.stage}-categories-sync
description: Sync categories
events:
- schedule:
name: ${self:provider.stage}-moe-categories
rate: rate(1 hour)
enabled: true
input:
params:
appId: moe
mallCode: moe
scheduled: true
So for this worker, I have another 15 scheduled events. They are preserved as new resources on CloudWatch and which makes it really big. We are exceeding CloudWatch Event limit even if we increased it by submitting a limit increase request to AWS.
Is there any way to define multiple targets for the same CloudWatch Event? So that instead of defining
lambda_func_count (15) x event_count (15) x stage_count (dev, staging, prod) resources on CloudWatch, we could just define one event with multiple targets for each individual lambda function.
Currently, it is supported on AWS console but couldn't find a way to achieve this by the serverless framework.
One way to help mitigate this issue is to not use the same AWS account for all your stages. Take a look at the AWS Organisations feature that helps you create sub accounts to a master account and if you use Serverless Framework Pro, even on the free tier, you can easily have specific stages deploy to specific AWS accounts. Each sub account has its own set of resources that don't affect other accounts. You could even take this further if you have multiple ways of breaking things across multiple accounts; perhaps you can break it up per Category?
Here is an example of a single CloudWatch rule, with multiple targets (each either an AWS Lamdba function, or Lambda alias)
"LCSCombinedKeepWarmRule2":{
"Type":"AWS::Events::Rule",
"Properties": {
"Description":"LCS Keep Functions Warm Rule",
"ScheduleExpression": "rate(3 minutes)",
"State":"ENABLED",
"Targets":[
{
"Arn":{"Fn::GetAtt":["CheckCustomer","Arn"]},
"Id":"CheckCustomerId"
},
{
"Arn":{"Fn::GetAtt":["PatchCustomerId","Arn"]},
"Id":"PatchCustomerId"
},
{
"Arn":{"Ref":"GetTierAttributes.Alias"},
"Id":"GetTierAttributes"
},
{
"Arn":{"Ref":"ValidateToken.Alias"},
"Id":"ValidateTokenId"
},
{
"Arn":{"Ref":"EventStoreRecVoucher.Alias"},
"Id":"EventStoreRecVoucherId"
}
]
}
},

Why ffmpeg launched on a google cloud function can throw "Error: Output stream error: Maximum call stack size exceeded"?

I'm trying to process video files with ffmpeg running on google cloud functions. Video files are downloaded from a google file storage, processed in stream by fluent-ffmpeg and streamed to a new google storage file. It works on smaller files but throws an "Output stream error: Maximum call stack size exceeded" on larger files.
I tried running the code on a normal pc, and I haven't encountered this error, even with larger files.
These are the parameters I deploy the function with
gcloud functions deploy $FUNCTION_NAME --runtime nodejs8 --trigger-http --timeout=180 --memory 256
This is the code that processes video
function cutVideo({videoUrl, startTime, duration, dist}) {
return ffmpeg(videoUrl)
.outputOptions('-movflags frag_keyframe+empty_moov')
.videoCodec('copy')
.audioCodec('copy')
.format('mp4')
.setStartTime(startTime)
.setDuration(duration);
}
const sectionStream = cutVideo({
videoUrl,
startTime,
duration,
dist: tempFilePath,
});
const outputStream = bucket.file(sectionPath)
.createWriteStream({
metadata: {
contentType: config.contentType,
},
public: true,
});
Actual error stack looks like this
Error: Output stream error: Maximum call stack size exceeded
at Pumpify.<anonymous> (/srv/node_modules/fluent-ffmpeg/lib/processor.js:498:34)
at emitOne (events.js:121:20)
at Pumpify.emit (events.js:211:7)
at Pumpify.Duplexify._destroy (/srv/node_modules/duplexify/index.js:191:15)
at /srv/node_modules/duplexify/index.js:182:10
at _combinedTickCallback (internal/process/next_tick.js:132:7)
at process._tickDomainCallback (internal/process/next_tick.js:219:9)
RangeError: Maximum call stack size exceeded
at replaceProjectIdToken (/srv/node_modules/#google-cloud/projectify/build/src/index.js:28:31)
at replaceProjectIdToken (/srv/node_modules/#google-cloud/projectify/build/src/index.js:37:30)
at replaceProjectIdToken (/srv/node_modules/#google-cloud/projectify/build/src/index.js:37:30)
at value.map.v (/srv/node_modules/#google-cloud/projectify/build/src/index.js:30:32)
at Array.map (<anonymous>)
at replaceProjectIdToken (/srv/node_modules/#google-cloud/projectify/build/src/index.js:30:23)
at replaceProjectIdToken (/srv/node_modules/#google-cloud/projectify/build/src/index.js:37:30)
at replaceProjectIdToken (/srv/node_modules/#google-cloud/projectify/build/src/index.js:37:30)
at value.map.v (/srv/node_modules/#google-cloud/projectify/build/src/index.js:30:32)
at Array.map (<anonymous>)
What could cause this error on a google cloud function?
Apart from memory/cpu based restrictions due to the fact that these kind of long running processes are impossible to be applicable in Google Cloud Functions due to the timeout in reality.
The only way to achieve this is to use "Google App Engine Flex". It is by nature have the longest available timeout mechanism which an be set in two levels both app.yaml/gunicorn (or whichever webserver you intend to use) and actual GAE timeout.
The rest of the services GAE standard or Google Cloud Functions, they have a strict timeout which you cannot change beyond 10 seconds to 30 minutes. These timeouts will not be sufficient for usage with ffmpeg and transcoding purposes.

AWS node.js SDK error - SignatureDoesNotMatch: Signature expired

Node.js version 0.10.25
AWS SDK Version latest - 2.0.23
I have an app that is continuously listening to a Queue (SQS) and if there are messages posted in that queue the app will read the message and process it and save some data to S3. When I start the app after about 20 minutes I am getting the following error continuously.
Potentially unhandled rejection [160] SignatureDoesNotMatch: Signature expired: 20141104T062952Z is now earlier than 20141104T062952Z (20141104T064452Z - 15 min.)
at Request.extractError (/myproject/node_modules/aws-sdk/lib/protocol/query.js:39:29)
at Request.callListeners (/myproject/node_modules/aws-sdk/lib/sequential_executor.js:100:18)
at Request.emit (/myproject/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
at Request.emit (/myproject/node_modules/aws-sdk/lib/request.js:604:14)
at Request.transition (/myproject/node_modules/aws-sdk/lib/request.js:21:12)
at AcceptorStateMachine.runTo (/myproject/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /myproject/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request.<anonymous> (/myproject/node_modules/aws-sdk/lib/request.js:22:9)
at Request.<anonymous> (/myproject/node_modules/aws-sdk/lib/request.js:606:12)
at Request.callListeners (/myproject/node_modules/aws-sdk/lib/sequential_executor.js:104:18)
It is not an issue with my system time. My system time is in sync with the time of my EC2 instance. Why am I getting this error? Is it related to SQS or S3?
I know this is an old question, but I experienced it myself today.
Fortunately, and the AWS NodeJS SDK now has this configuration option called correctClockSkew, which will fix the system clock offset once an error occurs:
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#correctClockSkew-property
It's likely the time on your computer is incorrect
The signature mismatch is because authentication process depends on clock syncronisation.
Check the time on your machine.
On Linux / WSL you can run sudo hwclock -s to fix. Other OSs will require different commands.
Thanks to Loren Segal (Amazon) for his quick response. Refer https://github.com/aws/aws-sdk-js/issues/401 for more details. In short, the SDK is not retrying signature errors for which the work around is
AWS.events.on('retry', function(resp) {
if (resp.error.code === 'SignatureDoesNotMatch') {
resp.error.retryable = true;
}
});
This is not a regression i.e, it is not a bug introduced in 2.0.23

Resources