Winston: multi process logging

Winston: multi process logging - node.js

I have created a nodejs application which encapsulates four nodejs processes.
Till now all the individual nodejs processes are using winston npm for logging to different log files.
Now I want to make a single log file where every node process can log.
Does winston implicitly ensures the serialization of logging data to make it process safe(multiple process writing to same file without bothering about race conditions or deadlocks etc.)? or it's developer work to ensure only one process exclusively writes to the log file at a certain time.

Does Winston implicitly ensures the serialization of logging data to
make it process safe?
The answer is no.
Data is lost when you have multiple processes writing logs to the same file through Winston. This is actually a know issue that they decided to not address properly.
There are a lot of options, you could change your logging tool, use inter-process communication and only call Winston through the master process, use a message broker or
even write the logs to a database for example.
Assuming that your software uses MongoDB, the last one is really easy to achieve with the help of winston-mongodb.
1. Install it using NPM
npm install --save winston-mongodb
2. Configure it
require('winston-mongodb');
const options = {};
winston.add(winston.transports.MongoDB, options);

Related

Is it possible to force Node.js/Express to process requests sequentially?

I took over a project where the developers were not fully aware of how Node.js works, so they created code accessing MongoDB with Mongoose which would leave inconsistent data in the database whenever you had any concurrent request reaching the same endpoint / modifying the same data. The project uses the Express web framework.
I already instructed them to implement a fix for this (basically, to use Mongoose transaction support with automatically managed retriable transactions), but due to the size of the project they will take a lot of time to fix it.
I need to put this in production ASAP, so I thought I could try to do it if I'm able to guarantee sequential processing of the incoming requests. I'm completely aware that this is a bad thing to do, but it would be just a temporary solution (with a low count of concurrent users) until a proper fix is in place.
So is there any way to make Node.js to process incoming requests in a sequential manner? I just basically don't want code from different requests to run interleaved, or putting it another way, I don't want non-blocking operations (.then()/await) to yield to another task and instead block until the asynchronous operation ends, so every request is processed entirely before attending another request.

I have an NPM package that can do this: https://www.npmjs.com/package/async-await-queue
Create a queue limited to 1 concurrent user and enclose the code that calls Mongo in wait()/end()
Or you can also use an async mutex, there are a few NPM packages as well.

Change the log destination for node.js running on GCE

I am using rc.local to start my node script on start with:
node .> "/log_file_$(date +"%H:%M:%S_%m_%d_%Y").txt"
It works fine - but now once the log grows in size - I need to create a new log on a server every 12/24 hours; without restarting the server.
Is there any simple way to change the node app output destination?
I would prefer not to use any library for that, because I need to log all the messages including errors, warns, not only console.log.
Thanks for your help.

There are a number of options, I'll offer two:
1. Stackdriver
Stream your logs to Stackdriver, which is part of Google Cloud, and don't store them on your server at all. In your node.js application, you can can setup Winston and use the Winston transport for Stackdriver. Then you can analyze and query them there, and don't need to worry about storage running out.
2. logrotate
If you want to deal with this manually, you can configure logrotate. It will gzip older logs so that they consume less disk space. This is a sort of older, "pre-cloud" way of doing things.

Doing tasks before heroku nodejs server is ready

When deploying a new release, I would like my server to do some tasks before actually being released and listen to http requests.
Let's say that those tasks take around a minute and are setting some variables: until the tasks are done I would like the users to be redirected to the old release.
Basically do some nodejs work before the server is ready.
I tried a naive approach:
doSomeTasks().then(() => {
app.listen(PORT);
})
But as soon as the new version is released, all https request during the tasks do not work instead of being redirect to old release.
I have read https://devcenter.heroku.com/articles/release-phase but this looks like I can only run an external script which is not good for me since my tasks are setting cache variables.
I know this is possible with /check_readiness on App Engine, but I was wondering for Heroku.

You have a couple options.
If the work you're doing only changes on release, you can add a task as part of your dyno build stage that will fetch and store data inside of the compiled slug that will be deployed to virtual containers on Heroku and booted as your dyno. For example, you can run a task in your build cycle that fetches data and stores/caches it as a file in your app that you read on-boot.
If this data changes more frequently (e.g. daily), you can utilize “preboot” to capture and cache this data on a per-dyno basis. Depending on the data and architecture of your app you may want to be cautious with this approach when running multiple dynos as each dyno will have data that was fetched independently, thus this data may not match across instances of your application. This can lead to subtle, hard to diagnose bugs.
This is a great option if you need to, for example, pre-cache a larger chunk of data and then fetch only new data on a per-request basis (e.g. fetch the last 1,000 posts in an RSS feed on-boot, then per request fetch anything newer—which is likely to be fewer than a few new entries—and coalesce the data to return to the client).
Here's the documentation on customizing a build process for Node.js on Heroku.
Here's the documentation for enabling and working with Preboot on Heroku

I don't think it's a good approach to do it this way. you can use an external script ( npm script ) to do this task and then use the release phase. the situation here is very similar to running migrations you can require the needed libraries to the script you can even load all the application to the script without listening to a port let's make it clearer by example
//script file
var client = require('cache_client');
// and here you can require all the needed libarires to the script
// then execute your logic using sync apis
client.setCacheVar('xyz','xyz');
then in packege.json in "scripts" add this script let assume that you named it set_cache
"scripts": {
"set_cache": "set_cache",
},
now you can use npm to run this script as npm set_cache and use this command in Procfile
web: npm start
release: npm set_cache

How to log - the 12 factor application way

I want to know the best practice behind logging my node application. I was reading the 12 factor app guidelines at https://12factor.net/logs and it states that logs should always be sent to the stdout. Cool, but then how would someone manage logs in production? Is there an application that scoops up whatever is sent to stdout? In addition, is it recommended that I only be logging to stdout and not stderr? I would appreciate a perspective on this matter.

Is there an application that scoops up whatever is sent to stdout?
The page you linked to provides some examples of log management tools, but the simplest version of this would be just redirecting the output of your application to a file. So in bash node app.js > app.out. You could also split your stdout and stderr like node app.js 2> app.err 1> app.out.
You could additionally have some sort of service that collects the logs from this file, and then puts them indexes them for searching somewhere else.
The idea behind the suggestion to only log to stdout is to let the environment control what to do with the logs because the application doesn't necessarily know the environment that it will eventually run within. Furthermore, by treating all logs as an event stream, you leave the choice of what to do with this stream up to the environment. You may want to send the log stream directly to a log aggregation service for instance, or you may want to first preprocess it, and then stream the result somewhere else. If you mandate a specific output such as logging to a file, you reduce the portability of your service.
Two of the primary goals of the 12 factor guidelines are to be "suitable for deployment on modern cloud platforms" and to offer "maximum portability between execution environments". On a cloud platform where you might have ephemeral storage on your instance, or many instances running the same service, you'd want to aggregate your logs into some central store. By providing a log stream, you leave it up to the environment to coordinate how to do this. If you put them directly into a file, then you would have to tailor your environment to wherever each application has decided to put the logs in order to then redirect them to the central store. Using stdout for logs is thus primarily a useful convention.

I think it's a mistake to categorically say "[web] applications should write logs to stdout".
Rather, I would suggest:
a) Professional-quality, robust web apps should HAVE logs
b) The application should treat the "log" as an abstract, "stream" object
c) Ideally, the logger implementation MAY be configured to write to stdout, to stderr, to a file, to a date-stamped file, to a rotating file, filter by severity level, etc. etc. as appropriate.
I would strongly argue that hard-coded writes to stdout, without any intervening "logger" abstraction, is POOR practice.
Here is a good article:
https://blog.risingstack.com/node-js-logging-tutorial/

Cool, but then how would someone manage logs in production?
The log sink is what you're looking for.
Is there an application that scoops up whatever is sent to stdout?
Yes and no. It's the log ship (or log router). It could be an application, but it's really just some process within the execution or runtime environment that your app doesn't really know about.
Another way to look at this is separation of concern. As it was stated in a different answer, it's about letting the environment own what happens to the log and only expecting the application to concern itself with emitting log events at all. I think what's missing from the 12FA documentation is that they don't try to complete the puzzle for you because there will be different opinions on where to go from stdout, so I'll help by adding in those missing pieces based on my personal experience and what I'm seeing all over the cloud space.
Logger sends log event to log stream (aka 'the log')
It goes without saying that your application should have some sort of "logger" abstraction, but that's really just an entry point for emitting a log event to stdout. That abstraction's responsibility is to get your log event onto the log stream (stdout) in the desired format and then your application's responsibility is done. In fact, the 12FA documentation ends here.
12 Factor App is about creating cloud-friendly and portable applications, so you have to assume that you don't know what the executing/runtime environment even is. So we don't know what "the environment" is and that's the whole point. So from here, it is the responsibility of the executing/runtime environment to process the stream and move it to the sink.
Log ship/router realizes log stream to log sink
So the way we solve for this now is to have some sort of listener for the stdout stream that will take the output and send it downstream to the log sink.
The "ship" (also known as the log router or scraper) might be something in the environment or the runtime, or honestly it could be something running the background of your application (a stream listener); it could be some other custom process; it could be even be Kafka -- I think GCP uses fluentd to scoop up logs from various sources and put them in stackdriver. The point is that it should be a separate "class" in your application that your application doesn't really know about. It just listens to the stream and sends it to the sink. In some solutions, this is something you need to build, in other solutions, it's handled by your platform. Put simply "how do I get the stream to the sink?"
The "sink" is the destination. This can be the console (hello it's literally a stream reader), it can be a file, it can be Splunk, Application Insights, Stack Driver, etc. There are simple solutions and there are larger more complex enterprise solutions, but the concept stays the same.
So in short, this is the answer to your question, if we're writing to stdout "how do we manage logs in production." It's the log sink or log aggregator that you're looking for. In 12FA vernacular, something like "splunk" isn't the "log". The log is the stream itself (stdout). In terms of 12FA - Your application doesn't know what the sink is and ideally, it shouldn't because that sink could change, in which case all of your applications would break, or there could be many different sinks and that could bog your application down particularly if you're writing straight to the sinks instead of stdout first. It's just another decoupling exercise if nothing else.
You can send to a single sink, multiple sinks at once, or you can send to a single sink and have some other component 'ship' your logs from that sink to another (e.g. write to a rolling file and have a router scrape that into splunk). Just depends on your needs.
You can actually see this popping up more and more in cloud providers by default. For example, on GCP, all logs to stdout automatically get picked up and sent to stackdriver. In Azure, so long as you add the instrumentation to your .NET application (the application diagnostics package), it will emit events to stdout and it'll get picked up by azure monitor. There are also more and more packages out there that are beginning to implement this pattern, so in .NET you could use Serilog to abstract most of these concepts.
Logger -> Log Event -> Log [stream] (stdout) -> Sink -> Your eyeballs
Logger: The thing you use to emit the log, typically an abstraction (e.g. Serilog, NLog, Log4net)
Log Event: The individual log itself
Log Stream (or 'the log'): stdout it's the unbuffered, time-ordered aggregation of all events and has no beginning or end.
Log Ship/Router: The transport that sends the stream to one or more sinks. (e.g. in process like log4net, out of process like fluentd)
Log Sink: The thing that you're actually looking at like a console, file, or index/search engine, or analytics/monitoring platform (e.g. splunk, datadog, appinsights, stackdriver, etc.)
There are packages and platforms that provide one or more of these pieces, but all of those pieces are always there. It makes 12FA logging make more sense when you're aware of them.

Nodejs failover

I am a beginner in nodejs. I am trying to use nodejs in production. I wanted to achieve nodejs failover. As I am executing chat app, if a node server fails the chat should not breakup and should be automatically connected to a different nodeserver and the same socket id should be used for further chatting so that the chat message shouldn't go off. Is this can be achieved? Any samples.
I should not use Ngnix/HAProxy. Also let me know how the node servers should be: Either Active-Active or Active-Passive

PM2 is preferred to be the manager of process, especially the features of auto-failover, auto-scailing, auto-restart .
The introduction is as followed,
PM2 is a production process manager for Node.js applications with
a built-in load balancer. It allows you to keep applications alive
forever, to reload them without downtime and to facilitate common
system admin tasks.
Starting an application in production mode is as easy as:
$ pm2 start app.js
PM2 is constantly assailed by more than 700 tests.
Official website: http://pm2.keymetrics.io
Works on Linux (stable) & MacOSx (stable) & Windows (bêta).

There's several problem's you're tackling at once there:
Daemonization - keeping your app up: As already mentioned, scripts such as forever can be used to supervise your nodeJS application to restart it on fail. This is good for starting the application in a worst-case failure.
Similarly recluster can be used to fork your application and make it more fault-resistant by creating a supervisor process and subprocesses.
Uncaught exceptions: A Known hurdle in nodejs is that asyncronous errors cannot be caught with a try/catch block. As a consequence exceptions can bubble up and cause your entire application to crash.
Rather than letting this occur, you should use domains to create a logical grouping of activities that are affected by the exception and handle it as appropriate. If you're running a webserver with state, an unhandled exception should probably be caught and the rest of the connections closed off gracefully before terminating the application.
(If you're running a stateless application, it may be possible to just ignore the exception and attempt to carry on; though this is not necessarily advisable. use it with care).
Security: This is a huge topic. You need to ensure at the very least:
Your application is running as a non-root user with least privileges. Ports < 1024 require root permissions. Typically this is proxied from a higher port with nginx, docker or similar.
You're using helmet and have hardened your application to the extent you can.
As an aside, I see you're using Apache in front of NodeJS, this isn't necessarily as apache will probably struggle under load with it's threading model more than nodeJS with it's event-loop model.

assuming you use a database for authenticating clients, there isn't much into it to accomplish, i mean, a script to manage state of the server script, like forever does,
it would try to start the script if it fails, more than that you should design the server script to handle every known and possible unknown error, any signal send to it , etc.
a small example would be with streams.
(Websocket Router)
|
|_(Chat Channel #1) \
|_(Chat Channel #2) - Channel Cache // hold last 15 messages of every channel
|_(Chat Channel #3) /
|_(Authentication handler) // login-logout
-- hope i helped in some way.

For a simple approach, I think you should build a reconnect mechanism in your client side and use a process management as forever or PM2 to manage your Node.js processes. I was trying too many ways but still can't overcome the socket issue, it's always killed whenever the process stops.

You could try usingPm2 start app.js -I 0. This would run your application in cluster mode creating many child processes for same thread. You can share socket information between various processes.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string