Why ffmpeg launched on a google cloud function can throw "Error: Output stream error: Maximum call stack size exceeded"? - node.js

I'm trying to process video files with ffmpeg running on google cloud functions. Video files are downloaded from a google file storage, processed in stream by fluent-ffmpeg and streamed to a new google storage file. It works on smaller files but throws an "Output stream error: Maximum call stack size exceeded" on larger files.
I tried running the code on a normal pc, and I haven't encountered this error, even with larger files.
These are the parameters I deploy the function with
gcloud functions deploy $FUNCTION_NAME --runtime nodejs8 --trigger-http --timeout=180 --memory 256
This is the code that processes video
function cutVideo({videoUrl, startTime, duration, dist}) {
return ffmpeg(videoUrl)
.outputOptions('-movflags frag_keyframe+empty_moov')
.videoCodec('copy')
.audioCodec('copy')
.format('mp4')
.setStartTime(startTime)
.setDuration(duration);
}
const sectionStream = cutVideo({
videoUrl,
startTime,
duration,
dist: tempFilePath,
});
const outputStream = bucket.file(sectionPath)
.createWriteStream({
metadata: {
contentType: config.contentType,
},
public: true,
});
Actual error stack looks like this
Error: Output stream error: Maximum call stack size exceeded
at Pumpify.<anonymous> (/srv/node_modules/fluent-ffmpeg/lib/processor.js:498:34)
at emitOne (events.js:121:20)
at Pumpify.emit (events.js:211:7)
at Pumpify.Duplexify._destroy (/srv/node_modules/duplexify/index.js:191:15)
at /srv/node_modules/duplexify/index.js:182:10
at _combinedTickCallback (internal/process/next_tick.js:132:7)
at process._tickDomainCallback (internal/process/next_tick.js:219:9)
RangeError: Maximum call stack size exceeded
at replaceProjectIdToken (/srv/node_modules/#google-cloud/projectify/build/src/index.js:28:31)
at replaceProjectIdToken (/srv/node_modules/#google-cloud/projectify/build/src/index.js:37:30)
at replaceProjectIdToken (/srv/node_modules/#google-cloud/projectify/build/src/index.js:37:30)
at value.map.v (/srv/node_modules/#google-cloud/projectify/build/src/index.js:30:32)
at Array.map (<anonymous>)
at replaceProjectIdToken (/srv/node_modules/#google-cloud/projectify/build/src/index.js:30:23)
at replaceProjectIdToken (/srv/node_modules/#google-cloud/projectify/build/src/index.js:37:30)
at replaceProjectIdToken (/srv/node_modules/#google-cloud/projectify/build/src/index.js:37:30)
at value.map.v (/srv/node_modules/#google-cloud/projectify/build/src/index.js:30:32)
at Array.map (<anonymous>)
What could cause this error on a google cloud function?

Apart from memory/cpu based restrictions due to the fact that these kind of long running processes are impossible to be applicable in Google Cloud Functions due to the timeout in reality.
The only way to achieve this is to use "Google App Engine Flex". It is by nature have the longest available timeout mechanism which an be set in two levels both app.yaml/gunicorn (or whichever webserver you intend to use) and actual GAE timeout.
The rest of the services GAE standard or Google Cloud Functions, they have a strict timeout which you cannot change beyond 10 seconds to 30 minutes. These timeouts will not be sufficient for usage with ffmpeg and transcoding purposes.

Related

How do I set my AWS Lambda provisioned capacity to zero in off hours?

I'm using the Serverless framework (serverless.com) and it looks like I can save some pretty serious money by only scaling up the provisioned capacity of my Lambda functions during business hours (9am-5pm, 40 hour per week) and then down to zero during off hours. My application isn't used very much in off hours, and if it is, my users are okay with cold starts and it taking longer.
In my serverless.yml, I have functions declared such as:
home_page:
handler: homePage/home_page_handler.get
events:
- http:
path: homePage
method: get
cors: true
authorizer: authorizer_handler
provisionedConcurrency: 1
I also have a Lambda that runs on a regular basis to set the provisioned capacity of the other Lambdas in the account depending on the time of day. In that Lambda if I call:
await lambda.putProvisionedConcurrencyConfig({
FunctionName: myFunctionName,
ProvisionedConcurrentExecutions: 0,
Qualifier: "provisioned",
}).promise();
Then I get this error:
Received error: ValidationException: 1 validation error detected: Value '0' at 'provisionedConcurrentExecutions' failed to satisfy constraint: Member must have value greater than or equal to 1
at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)
at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:55:8)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:688:14)
at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:690:12) {
code: 'ValidationException',
time: 2021-08-14T17:45:48.932Z,
requestId: '8594fad6-d5dd-4a00-adca-c34f9d38b25e',
statusCode: 400,
retryable: false,
retryDelay: 77.81562781029108
}
On the other hand, if I try to delete the provisioned capacity entirely, like this:
await lambda.deleteProvisionedConcurrencyConfig({
FunctionName: myFunctionName,
Qualifier: "provisioned",
}).promise();
That works fine, but when I try to deploy my function again in off hours (which is the norm), CloudFormation errors out with:
No Provisioned Concurrency Config found for this function (Service: AWSLambdaInternal; Status Code: 404; Error Code: ProvisionedConcurrencyConfigNotFoundException; Request ID: 75dd221b-35d2-4a49-80c5-f07ce261d357; Proxy: null)
Does anybody have a neat solution to turn off provisioned capacity during some hours of the day?
After some thinking, if you're still committed to Lambda as the compute solution, I think your best option is to manage provisioned concurrency outside of the Serverless Framework entirely.
You've already got an orchestrator function which will enable provisioned concurrency, you could try removing provisionedConcurrency from your serverless.yml file, adding another method in your orchestrator to disable provisioned currency in the evenings, and verifying that you can deploy when your orchestrator has set your functions to either state.
If you're willing to throw away your orchestrator function, AWS suggests using Application Auto Scaling, which is very useful for exactly what you're doing. (hat tip to #mpv)
That being said, Lambda isn't particularly well suited to predictable, steady-state traffic. If cost is a concern, I'd suggest exploring Fargate or ECS and writing a few autoscaling rules. Your Lambda code is already stateless, and likely is portable and has pretty limited networking rules. There are other forms of compute which would be dramatically cheaper to use.

how to debug error from stripe API call? need a node.js stacktrace across event emitter?

I have a node webapp that makes various API calls to Stripe using the stripe npm package. Sometimes I get errors like the one below. Notice that the stacktrace is truncated so that I cannot see which stripe API call causes the error and I also cannot see where in my app this API call is made.
Is there anything I can do to get better error stacktraces?
Error: Missing required param: customer.
at Function.generate (/home/molsson/dv/foobar/node_modules/stripe/lib/Error.js:39:16)
at IncomingMessage.<anonymous> (/home/molsson/dv/foobar/node_modules/stripe/lib/StripeResource.js:175:33)
at Object.onceWrapper (events.js:299:28)
at IncomingMessage.emit (events.js:215:7)
at IncomingMessage.EventEmitter.emit (domain.js:476:20)
at endReadableNT (_stream_readable.js:1183:12)
at processTicksAndRejections (internal/process/task_queues.js:80:21)
Note: the error itself is just an example. I have already fixed it. I just want to get better stacktraces or a better method of debugging these types of errors quickly.
I'm pretty sure my node version has async stacks by default:
$ node --version
v12.12.0
$ node -p process.versions.v8
7.7.299.13-node.12
I tried running with NODE_OPTIONS='--trace-warnings --stack-trace-limit=9999' but it didn't help.
Does it exists some kind of "async stacktraces across event emitters" debugging thing?
I found a good answer to this question myself. The stripe library fires and event before making a new API request, so you can print a stacktrace from there:
stripe.on('request', request => {
const currentStack = (new Error()).stack.replace(/^Error/, '')
console.log(`Making Stripe HTTP request to ${request.path}, callsite: ${currentStack}`)
})
I added a STRIPE_API_TRACING option to my app that I can turn on if I experience errors without stacks. With the tracing on I can just scroll a bit up in the log and see what API calls was dispatched just before the error happened.

How to upload files larger than 10mb via google cloud http functions. ? Any alternative options?

I have made a google cloud function to upload the file into google bucket and returns signed URL in response.
Whenever large files (more than 10mb) uploaded. It is not working.
It works fine for files less than 10mb.
I have searched and see in cloud documentation. It says max data sent size is 10mb for HTTP functions not allowed to increase size.
resource: {…}
severity: "ERROR"
textPayload: "Function execution could not start, status: 'request too large'"
timestamp: "2019-06-25T06:26:41.731015173Z"
for successful file upload, it gives below log
Function execution took 271 ms, finished with status code: 200
for large files, it gives below log
Function execution could not start, status: 'request too large'
Are there any alternative options to upload file in the bucket using API? Any different service would be fine. I need to upload file up to 20mb files. Thanks in advance
You could upload directly to a Google Cloud Storage bucket using the Firebase SDK for web and mobile clients. Then, you can use a Storage trigger to deal with the file after it's finished uploading.

Load testing our elastic cluster

We are currently trying to load test our app which involves a lot of logging to our elastic cluster. On heavy load , i start seeing the below error from ES
Error: No Living connections
at sendReqWithConnection (D:\home\site\wwwroot\node_modules\elasticsearch\src\lib\transport.js:225:15)
at next (D:\home\site\wwwroot\node_modules\elasticsearch\src\lib\connection_pool.js:213:7)
at _combinedTickCallback (internal/process/next_tick.js:131:7)
at process._tickDomainCallback (internal/process/next_tick.js:218:9)
and before that , we see another bunch of errors
Error: Request Timeout after 30000ms
at D:\home\site\wwwroot\node_modules\elasticsearch\src\lib\transport.js:354:15
at Timeout.<anonymous> (D:\home\site\wwwroot\node_modules\elasticsearch\src\lib\transport.js:383:7)
at ontimeout (timers.js:482:11)
at tryOnTimeout (timers.js:317:5)
at Timer.listOnTimeout (timers.js:277:5)
and
Error: [es_rejected_execution_exception] rejected execution of org.elasticsearch.transport.TransportService$7#4d532edc on EsThreadPoolExecutor[bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor#6c5897a1[Running, pool size = 1, active threads = 1, queued tasks = 200, completed tasks = 122300]]
at respond (D:\home\site\wwwroot\node_modules\elasticsearch\src\lib\transport.js:307:15)
at checkRespForFailure (D:\home\site\wwwroot\node_modules\elasticsearch\src\lib\transport.js:266:7)
at HttpConnector.<anonymous> (D:\home\site\wwwroot\node_modules\elasticsearch\src\lib\connectors\http.js:159:7)
at IncomingMessage.bound (D:\home\site\wwwroot\node_modules\elasticsearch\node_modules\lodash\dist\lodash.js:729:21)
at emitNone (events.js:111:20)
at IncomingMessage.emit (events.js:208:7)
at endReadableNT (_stream_readable.js:1064:12)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickDomainCallback (internal/process/next_tick.js:218:9)
Is this just caused due to heavy load? I'm wondering how i can fix the bottleneck. We currently have 3 data nodes and 3 master nodes running on separate linux servers.
Should i bring in something like logstash? how many servers would i need?
Should i bring in a queue to set aside ES tasks for later
EDIT : a bit more info -
We're performing one insert per request (we send around 100 parallel requests upto 2000 in total)
Cpu performance hasnt gone very high < 10 %
We hosting the machines in azure. All applications (node and es) stay in the same region
I think the problem is your queue capacity is exceeded. It said that your limit is 200. You didn't provide the memory limit in your es server. But let's try increase the limit and monitor your memory.
edit elasticsearch.yaml:
threadpool.bulk.queue_size: 500
As It's different in different scenario, I'm not sure but you need to workout yourself to test different ways.
In case, you have a lot of data to insert at the same time, you may consider using message queue like kafka for asynchronous handling data.
You can read more on this topic for more information about this: https://discuss.elastic.co/t/any-idea-what-these-errors-mean-version-2-4-2/70690/4

ElasticSearch _bulk call resulting in "socket hang up" despite small size

I'm using elastical to connect to ElasticSearch via node.js.
In the process of profiling my app with Nodetime to attempt to improve performance, I noticed something odd. My ElasticSearch "PUT" requests to _bulk index are frequently resulting in a "socket hang up". Furthermore, these calls are taking huge amounts of CPU time.
I'm capping each _bulk index request # 10 items to index, and as you can see, the content-length of the requests does not even reach 50Kb, so it is hard to imagine that the size is an issue. Yet, the response time is > 60 seconds and the CPU time is >10+ seconds! Yikes!!
In attempts to debug, I started running ElasticSearch in the foreground. I noticed this strange error:
[2013-02-27 11:42:39,188][WARN ][index.gateway.s3 ] [Lady Mandarin] [network][1] failed to read commit point [commit-f34]
java.io.IOException: Failed to get [commit-f34]
at org.elasticsearch.common.blobstore.support.AbstractBlobContainer.readBlobFully(AbstractBlobContainer.java:83)
at org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.buildCommitPoints(BlobStoreIndexShardGateway.java:847)
at org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.doSnapshot(BlobStoreIndexShardGateway.java:188)
at org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.snapshot(BlobStoreIndexShardGateway.java:160)
at org.elasticsearch.index.gateway.IndexShardGatewayService$2.snapshot(IndexShardGatewayService.java:271)
at org.elasticsearch.index.gateway.IndexShardGatewayService$2.snapshot(IndexShardGatewayService.java:265)
at org.elasticsearch.index.engine.robin.RobinEngine.snapshot(RobinEngine.java:1090)
at org.elasticsearch.index.shard.service.InternalIndexShard.snapshot(InternalIndexShard.java:496)
at org.elasticsearch.index.gateway.IndexShardGatewayService.snapshot(IndexShardGatewayService.java:265)
at org.elasticsearch.index.gateway.IndexShardGatewayService$SnapshotRunnable.run(IndexShardGatewayService.java:366)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: Status Code: 404, AWS Service: Amazon S3, AWS Request ID: ..., AWS Error Code: NoSuchKey, AWS Error Message: The specified key does not exist., S3 Extended Request ID: ....
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:548)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:288)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:170)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:2632)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:811)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:717)
at org.elasticsearch.cloud.aws.blobstore.AbstractS3BlobContainer$1.run(AbstractS3BlobContainer.java:73)
I'm aware that I'm using a deprecated gateway (the S3 bucket gateway). However, given that I have multiple servers running on the Amazon Cloud which need to share data (I use ElasticSearch for caching), I don't see any alternative until the ElasticSearch team releases a replacement for the S3 Bucket Gateway...
Other than this problem with the _bulk calls, I'm not seeing any problems. Searches etc. all return quickly and effectively.

Resources