Azure node expressjs app crashes randomly without error - node.js

I have built a simple node application with expressjs which applies socket.io.
In order for sockets to communicate in cluster mode on azure the app uses also redis cache from azure as well.
This project has being deployed to azure under a linux web app which it uses docker container.
The problem is that i am facing a random crash of the app after 18 hours to 36 (as i have seen) and when you visit the url you see the nginx error web page.
I had used the following code to detect the error to the logs and i also deployed it to a staging environment on a linux machine i own
process
.on('unhandledRejection', (reason, p) => {
logger.error(reason, 'Unhandled Rejection at Promise', p)
})
.on('uncaughtException', err => {
logger.error(err, 'Uncaught Exception thrown')
process.exit(1)
})
On the staging machine there are no crashes at all.
I am starting to think this is something to do with the docker container on the azure but i have no indication of such thing.
** Important the web app is set to be always active **
Any ideas or suggestions

I had something similar before. In my case, it was because I had no log rotation, so the log file would grow into a unique giant file making it impossible for the server to write to it.

Related

PWA Setup with Cloudfront CDN?

My web app has a service worker that was working fine before I added integration with Cloudfront CDN.
I'm registering the service worker on the client like so:
navigator.serviceWorker
.register('https://www.my-domain-name.com/sw.js')
.then(() => console.info('service worker registered.'))
.catch(error => {
console.log('Error registering serviceWorker: ', error)
})
I get this in the client console logs:
And Lighthouse is telling me this:
No matching service worker detected. You may need to reload the page,
or check that the scope of the service worker for the current page
encloses the scope and start URL from the manifest.
I've tried multiple URLs when registering the service worker, including:
"/sw.js"
"https://www.my-domain-name.com/sw.js"
"https://###.cloudfront.net/sw.js"
...but so far none have worked.
What am I missing?
UPDATES
I've learned a lot more but still have questions:
I got a great answer from
#JeffPosnick to my related SO post here. That explained the errors.
The URL for "failed to fetch" was the URL specified in my manifest "start_url". But I'm still getting a fetch error here, even outside of Lighthouse testing. Anybody know why?
I unchecked the Lighthouse "Clear Storage" checkbox. Now Lighthouse is saying my service worker is working.
Theory: when Lighthouse clears storage it blows away the service worker, explaining why I get a failing PWA score from Lighthouse when "Clear Storage" is checked. Is this theory correct?
Okay, I finally got this all working. In addition to the things I posted in the update to the original post, I also had to find a service worker that would work with React.
I went with this one.
Now if the network is offline, and I reload the page, I still see my React home page. Nice!

HTTP Error 500.30 - ANCM In-Process Start Failure with newly created app service

we are created new development environment so I cloned a current working app service into a new one and changed the configurations and deployed same code but the new app service is returning HTTP Error 500.30 - ANCM In-Process Start Failure
after trying the console for more details that's what I get, I don't think its related to runtime identifier because same code runs on different exact app services
The dreaded 500.3x ACNM error can mean different things, so I'm going to assist you in pinpointing those things.
My recommendation:
Go to Azure Portal > Your App Service > development tools
Open console.
Screen should look like this:
Console Screen Azure
Type in (YourWebAppName).exe
What this will do, is show error messages that are relevant to your startup issue.
Also, some information regarding errors can be seen here:
https://learn.microsoft.com/en-us/aspnet/core/test/troubleshoot-azure-iis?view=aspnetcore-3.1#app-startup-errors

Stackdriver-trace on Google Cloud Run failing, while working fine on localhost

I have a node server running on Google Cloud Run. Now I want to enable stackdriver tracing. When I run the service locally, I am able to get the traces in the GCP. However, when I run the service as Google Cloud Run, I am getting an an error:
"#google-cloud/trace-agent ERROR TraceWriter#publish: Received error with status code 403 while publishing traces to cloudtrace.googleapis.com: Error: The request is missing a valid API key."
I made sure that the service account has tracing agent role.
First line in my app.js
require('#google-cloud/trace-agent').start();
running locally I am using .env file containing
GOOGLE_APPLICATION_CREDENTIALS=<path to credentials.json>
According to https://github.com/googleapis/cloud-trace-nodejs These values are auto-detected if the application is running on Google Cloud Platform so, I don't have this credentials on the gcp image
There are two challenges to using this library with Cloud Run:
Despite the note about auto-detection, Cloud Run is an exception. It is not yet autodetected. This can be addressed for now with some explicit configuration.
Because Cloud Run services only have resources until they respond to a request, queued up trace data may not be sent before CPU resources are withdrawn. This can be addressed for now by configuring the trace agent to flush ASAP
const tracer = require('#google-cloud/trace-agent').start({
serviceContext: {
service: process.env.K_SERVICE || "unknown-service",
version: process.env.K_REVISION || "unknown-revision"
},
flushDelaySeconds: 1,
});
On a quick review I couldn't see how to trigger the trace flush, but the shorter timeout should help avoid some delays in seeing the trace data appear in Stackdriver.
EDIT: While nice in theory, in practice there's still significant race conditions with CPU withdrawal. Filed https://github.com/googleapis/cloud-trace-nodejs/issues/1161 to see if we can find a more consistent solution.

Error while trying to query Active Directory - Requires App Pool recycling

I have the following chunk of code in a .NET web app used to query AD for a user
using (DirectoryEntry de = new DirectoryEntry(ldap))
{
using (DirectorySearcher adSearch = new DirectorySearcher(de))
{
adSearch.Filter = "(&(objectCategory=person)(objectClass=user)(samAccountName=username))";
SearchResult adSearchResult = adSearch.FindOne();
}
}
When I run this, I sometimes get the following error:
System.Runtime.InteropServices.COMException (0x80005000): Unknown
error (0x80005000) at
System.DirectoryServices.DirectoryEntry.Bind(Boolean throwIfFail)
at System.DirectoryServices.DirectoryEntry.Bind() at
System.DirectoryServices.DirectoryEntry.get_AdsObject() at
System.DirectoryServices.DirectorySearcher.FindAll(Boolean
findMoreThanOne) at
System.DirectoryServices.DirectorySearcher.FindOne()
When this error is being thrown, it errors every time it runs. To correct this, I have to go to the IIS App Pool associated with this web app and recycle it. After recycling, the code works ... for a period of time. Then the error comes back a few hours later.
Additional information to note:
The App Pool is still running when it errors. Still, recycling
fixes it
I have this same code running on 2 different web servers that are identically configured. This issue occurs on the first web server but
never on the second
The App Pool is running under an AD service account. The same account is used on the app pool of both servers
I have tried recreating the App Pool associated with this web app without success
I would greatly appreciate any suggestions on where to look in IIS for a permanent solution. I can't be recycling the app pool every few hours.
Thanks

Unable to update VM with nodejs app on Google App Engine

When I try to deploy from the gcloud CLI I get the following error.
Copying files to Google Cloud Storage...
Synchronizing files to [gs://staging.logically-abstract-www-site.appspot.com/].
Updating module [default]...\Deleted [https://www.googleapis.com/compute/v1/projects/logically-abstract-www-site/zones/us-central1-f/instances/gae-builder-vm-20151030t150724].
Updating module [default]...failed.
ERROR: (gcloud.preview.app.deploy) Error Response: [4] Timed out creating VMs.
My app.yaml is:
runtime: nodejs
vm: true
api_version: 1
automatic_scaling:
min_num_instances: 2
max_num_instances: 20
cool_down_period_sec: 60
cpu_utilization:
target_utilization: 0.5
and I am logged in successfully and have the correct project ID. I see the new version created in the Cloud Console for App Engine, but the error is after that it seems.
In the stdout log I see both instances go up with the last console.log statement I put in the app after it starts listening on the port, but in the shutdown.log I see "app was unhealthy" and in syslog I see "WARNING: never got healthy response from app, but sending /_ah/start query anyway."
From my experience with nodejs using Google Cloud App Engine, I see that "Timed out creating VMs" is neither a traditional timeout nor does it have to do with creating VMs. I had found that other errors were reported during the launch of the server --which happens to be right after VMs are created. So, I recommend checking console output to see if it tells you anything.
To see the console output:
For a vm instance, then go to /your/ vm instances and click the vm instance you want, then scroll towards the bottom and click "Serial console output".
For stdout console logging, go monitoring /your/ logs then change the log type dropdown from Request to be stdout.
I had found differences in the process.env when running locally versus in the cloud. I hope you find your solution too --good luck!

Resources