My node app on ibm cloud keeps crashinng - node.js

I have a node app on IBM cloud and it keeps crashing every time and most of the time it's not running, I've even increased the memory per instance to one gb, How do I diagnose where the issue is? Here is my manifest.yml. So I'm in a situation whereby I have to continually check the app and do a manual restart
applications:
- instances: 1
timeout: 600
name: TicketSokoChatbot
buildpack: sdk-for-nodejs
command: npm start
memory: 1024M
random-route: true
here is the error:
an instance of the app crashed: Instance never healthy after 1m0s: Failed to make TCP connection to port 8080: connection refused; process did not exit

When running on cloud foundry, the port is set for you. You must use that port which you can find in the environment variable PORT, e.g.
app.listen(process.env.PORT || 3000);
If the port isn’t the cause of the issue, the next thing you could try is changing the health check timeout.
If this doesn’t work for you, the cloud foundry docs provide information on Troubleshooting, in particular take a look at the section App Fails to Start. Here is one of the debug steps listed in the cloud foundry documentation:
Find the reason app is failing and modify your code. Run cf events
APP-NAME and cf logs APP-NAME --recent and look for messages similar
to this:
2014-04-29T17:52:34.00-0700 app.crash index: 0, reason: CRASHED, exit_description: app instance exited, exit_status: 1
These messages may identify a memory or port issue. If they do, take
that as a starting point when you re-examine and fix your application
code.
After trying all of debug steps, if you are still unable to fix your problem add more information to your question with what you have tried.
I recommend that anyone building cloud foundry apps gets acquainted with the developer focused cloud foundry documentation Deploying and Managing Applications.

Related

Google App Engine NodeJS app stops after 30 min

I have a very basic NodeJS application hosted on Google App Engine that executes an async function on 15 second intervals. The deployment is successful and the app starts and runs fine, but stops after about 30 minutes with the following error logs. This runs fine locally, though.
Quitting on terminated signal
Start program failed: user application failed with exit code -1 (refer to stdout/stderr logs for more detail): signal: terminated
I have used App Engine before with no issues, so I'm not sure why this is happening. I used https://github.com/GoogleCloudPlatform/nodejs-docs-samples/tree/main/appengine/typescript as a reference and am still not able to resolve this issue. Any ideas?
Quitting on terminated signal
You may receive this error if your App Engine instances is down scaling or shutting down due to some reasons and possibly due to:
Your application runs out of Instance Hours quota.
Your instance is moved to a different machine, either because the current machine that is running the instance is restarted, or App Engine moved your instance to improve load distribution.
There are good strategies to avoid the downtime of your instance and here are additional:
You can try to have a minimum number of idle instances
Use manual scaling which you can specify the number of instances
will continuously run regardless of the load level.
Increase the maximum instance.
Asynchronous background work is not recommended in App Engine. It can result in higher billing and users may also experience increased latency because of high pushback or request queuing. Google recommend to use Cloud Tasks. With Cloud Tasks, HTTP requests are long-lived and return a response only after any asynchronous work ends.

Heroku - restart on failed health check

Heroku does not support health checks on its own. It will restart services that crashed, but there is nothing like health checks.
It sometimes happen that service become unresponsive, but the process is still running. In most of modern cloud solution, you can provide health endpoint which is periodically called by the cloud hosting service and if that endpoints return either error or not at all, it will shut down such service and start new one.
That seems like industrial standard these days, but I am unable to find any solution to this for Heroku. I can even use external service with Heroku CLI, but just calling some endpoint is not sufficient - if there are multiple instances, they all share same URL and load balancer calls one of them randomly -> therefore it is possible to not hit failed instance at all. Even when I hit it, usually the health checks have something like "after 3 failed health checks in a row restart that instance", which is highly unprobable if there are 10 instances and one of it become unhealthy.
Do you have any solution to this?
You are right that this is industry standard and shame that it's not provided out of box.
I can think of 2 solutions (both involve running some extra code that does all of this:
a) use heroku API which allows you to get the IP of individual dynos, and then you can call each dyno how you want
b) in each dyno instance you can send a request to webserver like https://iamaalive.com/?dyno=${process.env.HEROKU_DYNO_ID}

Azure App service returns 502 bad gateway from HttpClient

I have an app service (plan B2) running on Azure.
My integration tests running from docker container are calling some app service endpoints one by one and sometimes receive 500 or 502 error.
When I debug tests I make some pauses between calls and all requests work successfully. Also, when I scale up my app service, everything works properly.(I don't want to scale up because cpu and other params are low.)
In my tests I have only one HttpClient and I dispose it at the end so I don't think there should be any connections leaks.
Also, in TCP Connections I have around 60 total connections while in Azure docs the limit is 1,920.
This app is not accessed by any users but here it says that I had the maximum connections. Is there any way how can I track these connections? Why when I receive these 5xx errors I don't see anything in app insights? Also how 15 connections can exceed the limit when the limit is 1920? Are these connections related to my errors and how they can be fixed?
You don't see them in Application Insights because they're happening at IIS level which is breaking the request, and because of that, data is not being sent to Application Insights.
The place to look for information is "Diagnose and solve problems", then "Availability and Performance". More info in here:
https://learn.microsoft.com/en-us/azure/app-service/overview-diagnostics
PS: I do think the problem is related to the Dispose of your HTTPClient. It's a well known issue and the reason why they've introduced HttpClientFactory. More info in here:
https://www.stevejgordon.co.uk/httpclient-creation-and-disposal-internals-should-i-dispose-of-httpclient
https://stackoverflow.com/a/15708633/1384539

AWS EBS runs into "504 Gateway Time-out"

I'm new to using AWS EBS and ECS, so please bear with me if I ask questions that might be obvious for others. To the issue:
I've got a single-container Node/Express application that runs on EBS. The local docker container works as expected. On EBS, I can access one endpoint of the API and get the expected output. For the second endpoint, which runs longer (around 10-15 seconds) I get no response and run after 60 seconds into a time out: "504 Gateway Time-out".
I wonder how I would approach debugging this as I can't connect to the container directly? Currently there isn't any debugging functionality in the code included either as I'm not sure what the best node approach for a EBS container is - any recommendations are highly appreciated.
Thank you in advance!
You can see the EC2 instances running on EBS in your AWS, and you can choose to give them IP addresses in your EBS options. That will let you SSH directly into them if you need to.
Otherwise check the keepAliveTimeout field in your server (the value returned by app.listen() of you're using express).
I got a decent number of 504s when my Node server timeout was less than my load balancer timeout.
Your application takes longer than expected (> 60 seconds) to respond, so either nginx or the Load Balancer terminates your request.
See my answer here

Node.JS with forever on Heroku

So, I need to run my node.js app on heroku, it works very well, but when my app crashes, i need something to restart it, so i added forever to package.json, and created a file named forever.js with this:
var forever = require('forever');
var child = new (forever.Monitor)('web.js', {
max: 3,
silent: false,
options: []
});
//child.on('exit', this.callback);
child.start();
forever.startServer(child);
on my Procfile (that heroku uses to know what to start) i put:
web: node forever.js
alright! Now everytime my app crashes it auto restarts, but, from time to time (almost every 1 hour), heroku starts throwing H99 - Platform error, and about this error, they say:
Unlike all of the other errors which will require action from you to correct, this one does not require action from you. Try again in a minute, or check the status site.
But I just manually restart my app and the error goes away, if I don't do that, it may take hours to go away by itself.
Can anyone help me here? Maybe this is a forever problem? A heroku issue?
This is an issue with free Heroku accounts: Heroku automatically kills unpaid apps after 1 hour of inactivity, and then spins them back up the next time a request comes in. (As mentioned below, this does not apply to paid accounts. If you scale up to two servers and pay for the second one, you get two always-on servers.) - https://devcenter.heroku.com/articles/dynos#dyno-sleeping
This behavior is probably not playing nicely with forever. To confirm this, run heroku logs and look for the lines "Idling" and " Stopping process with SIGTERM" and then see what comes next.
Instead of using forever, you might want to try the using the Cluster API and automatically create a new child each time one dies. http://nodejs.org/api/cluster.html#cluster_cluster is a good example, you'd just put your code into the else block.
The upshot is that your app is now much more stable, plus it gets to use all of the available CPU cores (4 in my experience).
The downside is that you cannot store any state in memory. If you need to store sessions or something along those lines, try out the free Redis To Go addon (heroku addons:add redistogo).
Here's an example that's currently running on heroku using cluster and Redis To Go: https://github.com/nfriedly/node-unblocker
UPDATE: Heroku has recently made some major changes to how free apps work, and the big one is they can only be online for a maximum of 18 hours per day, making it effectively unusable as a "real" web server. Details at https://blog.heroku.com/archives/2015/5/7/heroku-free-dynos
UPDATE 2: They changed it again. Now, if you verify your ID, you can run 1 free dyno constantly: https://blog.heroku.com/announcing_heroku_free_ssl_beta_and_flexible_dyno_hours#flexible-free-dyno-hours

Resources