Docker exit status 1 for Node app on AWS - node.js

I'm hosting a beta app on AWS using Express.js, Node, mongoose and docker. Daily active users < 10, mainly friends of mine for testing. The app is down almost everyday for some reason. Initially I thought it was AWS's issue, so I stopped my app, changed it from free tier to t2.medium and started it again.
It didn't resolve the issue, I checked docker log for the container. It was not caused by OOMKilled.
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 1,
"Error": "",
"StartedAt": "2017-03-22T00:51:59.234643501Z",
"FinishedAt": "2017-03-22T07:21:41.351927073Z"
},
"Config": {
...
"AttachStdin": false,
"AttachStdout": true,
"AttachStderr": true,
...
}
I could set docker to always restart, but I want to figure out what's the root cause of it. Any suggestions?

That happens to everyone. Lots of things can kill Express applications, like weird HTTP requests. The Docker log should show the exception. Add an uncaughtException handler to log the issue.
process.on('uncaughtException', (e) => {
console.error(e); // try console.log if that doesn't work
process.exit(10);
});
If you can't find an error in the Docker log, then instead of logging to the console, maybe you can log to a file (make sure it is in a volume that stays between Docker runs though).
People don't like admitting this but many applications actually just log the exception and eat it and keep going in the uncaughtException handler without exiting. Because things like broken requests or other stuff that doesn't matter often kill the server you can usually get away with this. But then once in a while you will get burned by something strange happening to the server state that it can't recover from and you will have no idea because you just ate the exception.
You may be able to have it autorestart the app https://docs.docker.com/docker-cloud/apps/autorestart/ which might be a good solution.
Otherwise look into an example of using pm2 along with Docker if possible, pm2 will handle restarting for you.

Related

Azure Function App Cosmos DB trigger connection drop

I am using a Function app with cosmos DB trigger, when running locally, the behavior is very strange as I stop receiving events randomly, like if the connection to the Lease collection drops. I am getting an error message that says a read operation fails to Blob storage, but not sure if this is related. Here's the error:
There was an error performing a read operation on the Blob Storage Secret Repository.
Please ensure the 'AzureWebJobsStorage' connection string is valid
I am running the function app with this code: func host start --cors * --verbose
And here's the CosmosDBOptions object I can see in the console:
[2021-02-09T16:17:58.305Z] CosmosDBOptions
[2021-02-09T16:17:58.307Z] {
[2021-02-09T16:17:58.307Z] "ConnectionMode": null,
[2021-02-09T16:17:58.308Z] "Protocol": null,
[2021-02-09T16:17:58.309Z] "LeaseOptions": {
[2021-02-09T16:17:58.310Z] "CheckpointFrequency": {
[2021-02-09T16:17:58.310Z] "ExplicitCheckpoint": false,
[2021-02-09T16:17:58.311Z] "ProcessedDocumentCount": null,
[2021-02-09T16:17:58.311Z] "TimeInterval": null
[2021-02-09T16:17:58.312Z] },
[2021-02-09T16:17:58.313Z] "FeedPollDelay": "00:00:05",
[2021-02-09T16:17:58.313Z] "IsAutoCheckpointEnabled": true,
[2021-02-09T16:17:58.314Z] "LeaseAcquireInterval": "00:00:13",
[2021-02-09T16:17:58.314Z] "LeaseExpirationInterval": "00:01:00",
[2021-02-09T16:17:58.315Z] "LeasePrefix": null,
[2021-02-09T16:17:58.316Z] "LeaseRenewInterval": "00:00:17"
[2021-02-09T16:17:58.316Z] }
[2021-02-09T16:17:58.323Z] }
and my host.json file:
{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[1.*, 2.0.0)"
}
}
Finally, that issue started since I added a shared folder, not sure if it's related but it's really annoying, deleting leases collection solves temporary the problem but It costs a lot of time and all the other running functions break because I clean all the collection.
TLDR; Using the CosmosDB emulator for local development solves this issue, as you won't have two functions pointing to the same lease collection.
There are two important points:
If you have one Azure Function deployed on Azure and one running locally in your machine with the same lease configuration listening for changes in the same monitored collection then these will behave as multiple instances of the same deployment and changes will be delivered to one or the other and you might experience "event loss" on the one running in Azure. This is documented in https://learn.microsoft.com/en-us/azure/cosmos-db/troubleshoot-changefeed-functions#some-changes-are-missing-in-my-trigger. If you want to have 2 independent Functions listening for changes in the same monitored collection, sharing the same lease collection, you need to use the LeaseCollectionPrefix configuration https://learn.microsoft.com/en-us/azure/cosmos-db/how-to-create-multiple-cosmos-db-triggers
The error you are seeing locally is potentially related to either not having the Azure Storage emulator running or not configuring the AzureWebJobsStorage configuration locally to use it. Azure Functions runtime (regardless of the Cosmos DB Trigger) requires a storage account. You can use UseDevelopmentStorage=true for the local storage emulator.

Repeated 'Too many certificates already issued for exact set of domains' error with GreenLock package

I have already checked this post. But even though I tried that method, it didn't work, so I open a new issue.
I use AWS EC2 server and deploy with aws pipeline. So When I push to github repository, it will automatically build and deploy to production server.
At first it's works fine, and there are no errors in the console.
But one day an error began to occur. So when I checked the console, there was an error as below.
[Error Message in console]
set greenlockOptions.notify to override the default logger
certificate_order (more info available: account subject altnames challengeTypes)
Error cert_issue:
[acme-v2.js] authorizations were not fetched for 'mydomain.com':
{"type":"urn:ietf:params:acme:error:rateLimited","detail":"Error creating new order :: too many certificates already issued for exact set of domains: mydomain.com: see https://letsencrypt.org/docs/rate-limits/","status":429,"_identifiers":[{"type":"dns","value":"mydomain.com"}]}
Error: [acme-v2.js] authorizations were not fetched for 'mydomain.com':
{"type":"urn:ietf:params:acme:error:rateLimited","detail":"Error creating new order :: too many certificates already issued for exact set of domains: mydomain.com: see https://letsencrypt.org/docs/rate-limits/","status":429,"_identifiers":[{"type":"dns","value":"mydomain.com"}]}
at Object.E.NO_AUTHORIZATIONS (/home/project/build/node_modules/#root/acme/errors.js:75:9)
at /home/project/build/node_modules/#root/acme/acme.js:1198:11
at processTicksAndRejections (internal/process/task_queues.js:97:5)
Error cert_issue:
[acme-v2.js] authorizations were not fetched for 'mydomain.com':
In my opinion, I think there was a limit to the process of reissuing the certificate every time I push the code, but I don't know where the problem occurred even if I check the code.
My code structure is written as below and developed with Express.
[server.js]
"use strict";
const app = require("./app.js");
require("greenlock-express")
.init({
packageRoot: __dirname,
configDir: "./greenlock.d",
// contact for security and critical bug notices
maintainerEmail: process.env.EMAIL,
// whether or not to run at cloudscale
cluster: false
})
// Serves on 80 and 443
// Get's SSL certificates magically!
.serve(app);
[greenlock.d/config.json]
{ "sites": [{ "subject": "mydomain.com", "altnames": ["mydomain.com"] }] }
[.greenlockrc]
{"configDir":"./greenlock.d"}
[package.json (scripts.start line)]
"scripts": {
"start": "node server.js"
},
I am aware of the seven-day limit from Let's Encrypt. So I want to find a way to solve this problem.
in my express folder, I do
sudo chmod 775 ./greenlock.d
then I delete greenlock.d(one time) and npm start
I dont have issue since

Azure Function silently fails to start if dependency is misconfigured

When developing locally on my machine, if I do not start the Azure Cosmos Emulator, my Function fails to start correctly. There are no errors in the console output, however if I try to call a HttpTrigger function the TCP connection is refused. No exceptions are trapped by the Visual Studio debugger, and no errors are shown in the console.
How can I get an error to be logged to console?
The only difference I've seen is in the console output is the following output not showing when Cosmos is switched off:
[2020-10-19T19:20:29.487] Host started (2444ms)
[2020-10-19T19:20:29.490] Job host started
Hosting environment: Production
Content root path: C:\Code\MyApp.Functions\bin\Debug\netcoreapp3.1
Now listening on: http://0.0.0.0:7071
Application started. Press Ctrl+C to shut down.
[2020-10-19T19:20:34.554] Host lock lease acquired by instance ID '000000000000000000000000C51E9459'.
I'm not seeing much when using enhanced logging either, as configured below:
{
"version": "2.0",
"logging": {
"logLevel": {
"Host.Triggers.Warmup": "Trace",
"Host.General": "Trace",
"Host": "Trace",
"Function": "Trace",
"MyApp": "Trace",
"default": "Trace"
}
}
}
Below is a diff of the logs after normalizing meaningless differences, including timestamps, guids and execution-times.

The V8 platform used by this instance of Node does not support creating Workers

With my current project I run into the following error message when creating a worker:
ERROR Error: The V8 platform used by this instance
of Node does not support creating Workers
I found a variety of posts here on SO with comments like these: It was added in nodejs v10.5.0.
Does anyone know whats going on?
$ process.versions
ares:'1.16.0'
brotli:'1.0.7'
chrome:'85.0.4183.39'
electron:'10.0.0-beta.14'
http_parser:'2.9.3'
icu:'67.1'
llhttp:'2.0.4'
modules:'82'
napi:'5'
nghttp2:'1.41.0'
node:'12.16.3'
openssl:'1.1.0'
unicode:'13.0'
main.ts
win = new BrowserWindow({
webPreferences: {
nodeIntegrationInWorker: true,
nodeIntegration: true,
allowRunningInsecureContent: (serve) ? true : false,
},
});
Launch Workers from main.js
I had my Workers launching from my renderer.js and then I noticed a comment here (https://www.giters.com/nrkno/sofie-atem-connection/issues/125) that mentioned there's a bug that results in this V8 platform message when Workers are kicked-off outside of main.js.
In my particular case, it's not working fully yet, but I don't get this message any more and I think my outstanding problems are unrelated.
I had a similar error when I tried to use the worker pool in electron. I solved it by adding the {workertype: 'process'} when I create the worker pool as follows.
const pool = workerpool.pool('', { workerType: 'process' });

Running integration tests on ephemeral server instance using Heroku CI

I'm trying to take advantage of this new feature of Heroku to test a parse-server/nodejs application that we have on Heroku, using mocha.
I was expecting Heroku to launch an ephemeral instance of my app along with the tests so that they could be run against it, but it doesn't seem like that's happening. Only the tests get launched.
Now, I found at least one snippet about configuring the Dyno formation to use dynos other than performance-m for the test, so I'm trying to declare my other dynos there as well:
"environments": {
"test": {
"scripts": {
"test-setup": "echo done",
"test": "npm run test"
},
"addons": [
{
"plan": "rediscloud:30",
"as": "REDISCLOUD_URL"
}
],
"formation": {
"test": {
"quantity": 1,
"size": "standard-1x"
},
"worker": {
"quantity": 1,
"size": "standard-1x"
},
"web": {
"quantity": 1,
"size": "standard-1x"
}
}
}
}
in my app.json, but it seems to be getting totally ignored.
I know my mocha script could import the relevant part of the web server and test against it, and that's what I've seen in the non-heroku-related examples, but our app consists of a worker too, and I'd like to profile the interaction of both and test the job lengths against our expectations of performance, rather than individual components, hence "integration tests". Is this a legit use for Heroku tests or I'm doing something wrong or have wrong expectations? I'm more concerned about this than getting it to work, because I'm quite certain I could get it to work in a certain number of ways (mocha spawning the server processes, npm concurrently package, etc), but if I can avoid hacks, all the better.
Locally, I was able to get both imported in the script, but the performance is degraded since it's now 2 processes + the tests running in a single memory process, with nodejs's memory cap limitations and a single event loop instead of 3. While writing this I'm thinking I could probably use throng and spawn different functions depending on the process ID. I'll try this if I don't get any better solutions.
Edit: I already managed to make it run by spawning the server/worker as separate processes in a before step in mocha, calculating the proper ram amounts to allow to each using the env vars. I'm still interested in knowing if there's a better solution.

Resources