Google App Engine NodeJS app stops after 30 min

Google App Engine NodeJS app stops after 30 min - node.js

I have a very basic NodeJS application hosted on Google App Engine that executes an async function on 15 second intervals. The deployment is successful and the app starts and runs fine, but stops after about 30 minutes with the following error logs. This runs fine locally, though.
Quitting on terminated signal
Start program failed: user application failed with exit code -1 (refer to stdout/stderr logs for more detail): signal: terminated
I have used App Engine before with no issues, so I'm not sure why this is happening. I used https://github.com/GoogleCloudPlatform/nodejs-docs-samples/tree/main/appengine/typescript as a reference and am still not able to resolve this issue. Any ideas?

Quitting on terminated signal
You may receive this error if your App Engine instances is down scaling or shutting down due to some reasons and possibly due to:
Your application runs out of Instance Hours quota.
Your instance is moved to a different machine, either because the current machine that is running the instance is restarted, or App Engine moved your instance to improve load distribution.
There are good strategies to avoid the downtime of your instance and here are additional:
You can try to have a minimum number of idle instances
Use manual scaling which you can specify the number of instances
will continuously run regardless of the load level.
Increase the maximum instance.
Asynchronous background work is not recommended in App Engine. It can result in higher billing and users may also experience increased latency because of high pushback or request queuing. Google recommend to use Cloud Tasks. With Cloud Tasks, HTTP requests are long-lived and return a response only after any asynchronous work ends.

Related

Why is my Azure node.js app becoming unresponsive?

I recently deployed a Node.js Backend Service to Azure and have the following problem. The service becomes unresponsive after a certain amount of time, and only comes back to life if a external request is sent. The problem is, that it takes about 3 minutes for the Container to start back up and actually return the request. I'm running Node 14 LTS. I also added a health check yesterday, but azure simply doesn't bother actually keeping the app alive, here is the metric off azure
I verified azure is actually trying to reach the correct endpoint, and it does. I also have "Always On" enabled. I also verified that the app itself, is not crashing. I log every request and all of a sudden requests are no longer received, which means the health endpoint doesn't respond either, but it does not result in a container restart. It just waits for an external request to appear and then decides to start everything back up, which takes too long.
I feel like it's some kind of configuration issue, because the app itself is not very complex and I never experienced crashes when doing local development.

The official document tells us that the Free pricing tier you are currently using, Always on does not take effect.
How do I decrease the response time for the first request after idle time?

Heroku - restart on failed health check

Heroku does not support health checks on its own. It will restart services that crashed, but there is nothing like health checks.
It sometimes happen that service become unresponsive, but the process is still running. In most of modern cloud solution, you can provide health endpoint which is periodically called by the cloud hosting service and if that endpoints return either error or not at all, it will shut down such service and start new one.
That seems like industrial standard these days, but I am unable to find any solution to this for Heroku. I can even use external service with Heroku CLI, but just calling some endpoint is not sufficient - if there are multiple instances, they all share same URL and load balancer calls one of them randomly -> therefore it is possible to not hit failed instance at all. Even when I hit it, usually the health checks have something like "after 3 failed health checks in a row restart that instance", which is highly unprobable if there are 10 instances and one of it become unhealthy.
Do you have any solution to this?

You are right that this is industry standard and shame that it's not provided out of box.
I can think of 2 solutions (both involve running some extra code that does all of this:
a) use heroku API which allows you to get the IP of individual dynos, and then you can call each dyno how you want
b) in each dyno instance you can send a request to webserver like https://iamaalive.com/?dyno=${process.env.HEROKU_DYNO_ID}

Best way to approach long running tasks (but not cpu, memory intensive) in GCP

We built a web application where we utilized firebase functions for lightweight works such as login, update profile etc. and we deployed 2 functions to App Engine with Nodejs.
Function 1: Downloading an audio/video file from firebase storage and converting it with ffmpeg and uploading converted version back to storage.
But App Engine is terminating with a signal in the middle of download process (after ~40 seconds) if the file is larger (>500MB)
Function 2: Calling Google API (ASR) and waiting response (progress %) and writing this response to firestore until it's completed (100%).
This process may take between 1 min - 20 min depending on the file length. But here we get two different problems.
Either App Engine creates a new instance in the middle of API call process and kills current instance (since we set instances# to 1) even there is no concurrent requests.
I don't understand this behavior since this should not require intensive CPU or memory usage. App Engine gives following Info in logs:
This request caused a new process to be started for your application,
and thus caused your application code to be loaded for the first time.
This request may thus take longer and use more CPU than a typical
request for your application
Or App Engine terminates due to idle_timeout even it is waiting async API response and writing it to db.
It looks like when there is no incoming requests, App Engine is considering itself as idle and terminating after a while (~10 minutes)
We are new to GCP and App Engine, so maybe we are using wrong product (e.g. Compute Engine?) or doing the wrong implementation. Also we saw PubSub, Cloud Tasks etc. which looks like a solution for our case.
Thus I wonder what could be most elegant way to approach the problem and implement solution?
Any comment, feedback is appreciated.
Regards
A. Faruk Acar
App Engine app.yaml Configuration
runtime: nodejs10
manual_scaling:
instances: 1

App Engine has a maximum timeout per request
10 minutes for App Engine Standard
60 minutes for App Engine Flex
in both cases the default values are less.
So for what you describe as your process it is not the optimal solution.
Cloud Tasks has similar limitations as App Engine so you might get to similar problems.
Compute Engine can work for you as it is virtual machines where you control the configuration. To keep it cost effective see what is the smallest computer type which can run your app.

How to find/cure source of function app throughput issues

I have an Azure function app triggered by an HttpRequest. The function app reads the request, tosses one copy of it into a storage table for safekeeping and sends another copy to a queue for further processing by another element of the system. I have a client running an ApacheBench test that reports approximately 148 requests per second processed. That rate of processing will not be enough for our expected load.
My understanding of function apps is that it should spawn as many instances as is needed to handle the load sent to it. But this function app might not be scaling out quickly enough as it’s only handling that 148 requests per second. I need it to handle at least 200 requests per second.
I’m not 100% sure the problem is on my end, though. In analyzing the performance of my function app I found a LOT of 429 errors. What I found online, particularly https://learn.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits, suggests that these errors could be due to too many requests being sent from a single IP. Would several ApacheBench 10K and 20K request load tests within a given day cause the 429 error?
However, if that’s not it, if the problem is with my function app, how can I force my function app to spawn more instances more quickly? I assume this is the way to get more throughput per second. But I’m still very new at working with function apps so if there is a different way, I would more than welcome your input.
Maybe the Premium app service plan that’s in public preview would handle more throughput? I’ve thought about switching over to that and running a quick test but am unsure if I’d be able to switch back?
Maybe EventHub is something I need to investigate? Is that something that might increase my apparent throughput by catching more requests and holding on to them until the function app could accept and process them?
Thanks in advance for any assistance you can give.

You dont provide much context of you app but this is few steps how you can improve
If you want more control you need to use App Service plan with always on to avoid cold start, also you will need to configure auto scaling since you are responsible in this plan and auto scale is not enabled by default in app service plan.
Your azure function must be fully async as you have external dependencies so you dont want to block thread while you are calling them.
Look on the limits. Using host.json you can tweek it.
429 error means that function is busy to process your request, so probably when you writing to table you are not using async and blocking thread

Function apps work very well and scale as it says. It could be because request coming from Single IP and Azure could be considering it DDOS. You can do the following
AzureDevOps Load Test
You can load test using one of the azure service . I am very sure they have better criteria of handling IPs. Azure DeveOps Load Test
Provision VM in Azure
The way i normally do is provision the VM (windows 10 pro) in azure and use JMeter to Load test. I have use this method to test and it works fine. You can provision couple of them and subdivide the load.
Use professional Load testing services
If possible you may use services like Loader.io . They use sophisticated algos to run the load test and provision bunch of VMs to run the same test.
Use Application Insights
If not already you must be using application insights to have a better look from server perspective. Go to live stream and see how many instance it would provision to handle the load test . You can easily look into events and error logs that may be arising and investigate. You can deep dive into each associated dependency and investigate the problem.

Google App Engine - nodejs application goes down over night

Hi I am using google app engine to host a single instance nodejs application. The application works fine and my scripts are showing no errors in the logs. The application is currently just in testing and is not getting used over night, however often I come to work the next day and the server is just returning internal server errors. No errors are shown in my application log other then the 502 errors which i get when trying to access the next day. I see like 100s of calls for /_ah/_background/ overnight some appear to have timed out. At this point I must restart my instance for the app to continue to function.
I am completely stumped.. Because my app using web-sockets I must use manual scaling and a single instance. Would appreciate any help / suggestions.

I would venture a gues that you have a deferred task stuck running. Tasks that run in the taskqueue api are set by default to continuously retry. You can visit the taskqueue api TaskQueue API
To get the tasks to stop running right now visit the Google Cloud Console
select your project. Then select App Engine. Then select Task queues. Click on the task that is running (probably default). There should be a option to Pause the queue. This should prevent the 500 errors from occurring but will not fix the reason the task is failing.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string