How to log - the 12 factor application way - node.js

I want to know the best practice behind logging my node application. I was reading the 12 factor app guidelines at https://12factor.net/logs and it states that logs should always be sent to the stdout. Cool, but then how would someone manage logs in production? Is there an application that scoops up whatever is sent to stdout? In addition, is it recommended that I only be logging to stdout and not stderr? I would appreciate a perspective on this matter.

Is there an application that scoops up whatever is sent to stdout?
The page you linked to provides some examples of log management tools, but the simplest version of this would be just redirecting the output of your application to a file. So in bash node app.js > app.out. You could also split your stdout and stderr like node app.js 2> app.err 1> app.out.
You could additionally have some sort of service that collects the logs from this file, and then puts them indexes them for searching somewhere else.
The idea behind the suggestion to only log to stdout is to let the environment control what to do with the logs because the application doesn't necessarily know the environment that it will eventually run within. Furthermore, by treating all logs as an event stream, you leave the choice of what to do with this stream up to the environment. You may want to send the log stream directly to a log aggregation service for instance, or you may want to first preprocess it, and then stream the result somewhere else. If you mandate a specific output such as logging to a file, you reduce the portability of your service.
Two of the primary goals of the 12 factor guidelines are to be "suitable for deployment on modern cloud platforms" and to offer "maximum portability between execution environments". On a cloud platform where you might have ephemeral storage on your instance, or many instances running the same service, you'd want to aggregate your logs into some central store. By providing a log stream, you leave it up to the environment to coordinate how to do this. If you put them directly into a file, then you would have to tailor your environment to wherever each application has decided to put the logs in order to then redirect them to the central store. Using stdout for logs is thus primarily a useful convention.

I think it's a mistake to categorically say "[web] applications should write logs to stdout".
Rather, I would suggest:
a) Professional-quality, robust web apps should HAVE logs
b) The application should treat the "log" as an abstract, "stream" object
c) Ideally, the logger implementation MAY be configured to write to stdout, to stderr, to a file, to a date-stamped file, to a rotating file, filter by severity level, etc. etc. as appropriate.
I would strongly argue that hard-coded writes to stdout, without any intervening "logger" abstraction, is POOR practice.
Here is a good article:
https://blog.risingstack.com/node-js-logging-tutorial/

Cool, but then how would someone manage logs in production?
The log sink is what you're looking for.
Is there an application that scoops up whatever is sent to stdout?
Yes and no. It's the log ship (or log router). It could be an application, but it's really just some process within the execution or runtime environment that your app doesn't really know about.
Another way to look at this is separation of concern. As it was stated in a different answer, it's about letting the environment own what happens to the log and only expecting the application to concern itself with emitting log events at all. I think what's missing from the 12FA documentation is that they don't try to complete the puzzle for you because there will be different opinions on where to go from stdout, so I'll help by adding in those missing pieces based on my personal experience and what I'm seeing all over the cloud space.
Logger sends log event to log stream (aka 'the log')
It goes without saying that your application should have some sort of "logger" abstraction, but that's really just an entry point for emitting a log event to stdout. That abstraction's responsibility is to get your log event onto the log stream (stdout) in the desired format and then your application's responsibility is done. In fact, the 12FA documentation ends here.
12 Factor App is about creating cloud-friendly and portable applications, so you have to assume that you don't know what the executing/runtime environment even is. So we don't know what "the environment" is and that's the whole point. So from here, it is the responsibility of the executing/runtime environment to process the stream and move it to the sink.
Log ship/router realizes log stream to log sink
So the way we solve for this now is to have some sort of listener for the stdout stream that will take the output and send it downstream to the log sink.
The "ship" (also known as the log router or scraper) might be something in the environment or the runtime, or honestly it could be something running the background of your application (a stream listener); it could be some other custom process; it could be even be Kafka -- I think GCP uses fluentd to scoop up logs from various sources and put them in stackdriver. The point is that it should be a separate "class" in your application that your application doesn't really know about. It just listens to the stream and sends it to the sink. In some solutions, this is something you need to build, in other solutions, it's handled by your platform. Put simply "how do I get the stream to the sink?"
The "sink" is the destination. This can be the console (hello it's literally a stream reader), it can be a file, it can be Splunk, Application Insights, Stack Driver, etc. There are simple solutions and there are larger more complex enterprise solutions, but the concept stays the same.
So in short, this is the answer to your question, if we're writing to stdout "how do we manage logs in production." It's the log sink or log aggregator that you're looking for. In 12FA vernacular, something like "splunk" isn't the "log". The log is the stream itself (stdout). In terms of 12FA - Your application doesn't know what the sink is and ideally, it shouldn't because that sink could change, in which case all of your applications would break, or there could be many different sinks and that could bog your application down particularly if you're writing straight to the sinks instead of stdout first. It's just another decoupling exercise if nothing else.
You can send to a single sink, multiple sinks at once, or you can send to a single sink and have some other component 'ship' your logs from that sink to another (e.g. write to a rolling file and have a router scrape that into splunk). Just depends on your needs.
You can actually see this popping up more and more in cloud providers by default. For example, on GCP, all logs to stdout automatically get picked up and sent to stackdriver. In Azure, so long as you add the instrumentation to your .NET application (the application diagnostics package), it will emit events to stdout and it'll get picked up by azure monitor. There are also more and more packages out there that are beginning to implement this pattern, so in .NET you could use Serilog to abstract most of these concepts.
Logger -> Log Event -> Log [stream] (stdout) -> Sink -> Your eyeballs
Logger: The thing you use to emit the log, typically an abstraction (e.g. Serilog, NLog, Log4net)
Log Event: The individual log itself
Log Stream (or 'the log'): stdout it's the unbuffered, time-ordered aggregation of all events and has no beginning or end.
Log Ship/Router: The transport that sends the stream to one or more sinks. (e.g. in process like log4net, out of process like fluentd)
Log Sink: The thing that you're actually looking at like a console, file, or index/search engine, or analytics/monitoring platform (e.g. splunk, datadog, appinsights, stackdriver, etc.)
There are packages and platforms that provide one or more of these pieces, but all of those pieces are always there. It makes 12FA logging make more sense when you're aware of them.

Related

Azure function reduce log on Application insight

When my system is running some time I got the connection error so I want to remove it from my Application Insign
It is possible If I want to remove the exception and trace come from EventProcessorHost error. You can see my insign log as below.
The only way is that you can use app insights Purge api to delete logs from Exceptions table and Traces table.
But the limitation is that you cannot specify such detailed filters, like the messages are from EventProcessorHost etc.
And the delete operation will be competed in 7 days in background, you should know these limitaions when using this api.
If the question was "how do i not collect these in the future", I believe the information you are looking for is here:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-monitoring?tabs=cmd#configure-categories-and-log-levels
summary:
Log configuration in host.json
The host.json file configures how much logging a function app sends to Application Insights. For each category, you indicate the minimum log level to send
there are a lot of samples in the link above to turn on and off things of various levels, sources, sampling, batching and is probably too much to paste here and keep up to date

Using Apache Kafka for log aggregation

I am learning Apache Kafka from their quickstart tutorial: http://kafka.apache.org/documentation.html#quickstart. Upto now, I have done the setup as follows. A producer node, where a web server is running at port 8888. A Kafka server(broker), Consumer and Zookeeper instance on another node. And I have tested the default console/file enabled producer and consumer with 3 partitions. The setup is perfect, and I am able to see the messages I sent in the order they created (with in each partition).
Now, I want to send the logs generated from the web server to Kafka Broker. These messages will be processed by consumer later. Currently I am using syslog-ng to capture server logs to a text file. I have come up with 3 rough ideas on how to implement producer to use kafka for log aggregation
Producer Implementations
First Kind:
Listen to tcp port of syslog-ng. Fetch each message and send to kafka server. Here we have two middle processes: Producer and syslog-ng
Second Kind: Using syslog-ng as Producer. Should find a way to send messages to Kafka server instead of writing to a file. Syslog-ng, the producer is the middle process.
Third Kind: Configuring the webserver itself as producer.
Am I correct in my thinking. In the last case we don't have any middle process. But I doubt its implementation will effect server performance. Can anyone let me know the best way of using Apache Kafka(if the above 3 are not good) and guide me through appropriate configuration of server?..
P.S.: I am using node.js for my web server
Thanks,
Sarath
Since you specify that you wish to send the logs generated to kafka broker, it indeed looks as if executing a process to listen and resend messages mainly creates another point of failure with no additional value (unless you need a specific syslog-ng capability).
Syslog-ng can send messages to external applications using:
http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-3.4-guides/en/syslog-ng-ose-v3.4-guide-admin/html/configuring-destinations-program.html. I don't know if there are other ways to do that.
For the third option, I am not sure if kafka can easily be integrated into Node.js as it requires a c++ producer and when I last looked for one, I was not able to find. However, an easy alternative could be to have kafka read the log file created by the server and send those logs (using the console producer provided with kafka). This is usually a good way, as it completely remove dependencies between kafka and the web server (embedding the producer in would require error handling, configuration, etc). It requires the use of tail --follow and it works for us very well. If you wish more details on that, I can include them as well. Still you would need to supervise kafka execution to make sure messages are not lost (and provide a recovery option to offline send messages that failed). But, the good thing about this method is that there are no dependency between the tools.
Hope it helps...
Eran

Custom Logging mechanism: Master Operation with n-Operation Details or Child operations

I'm trying to implement logging mechanism in a Service-Workflow-hybrid application. The requirements for logging is that instead for independent log action, each log must be considered as a detail operation and placed against a parent/master operation. So, it's a parent-child and goes to database table(s). This is the primary reason, NLog failed.
To help understand better, I'm diving in a generic detail. This is how the application flow goes:
Now, the Main entry point of the application (normally called Program.cs) is Platform. It initializes an engine that is capable of listening incoming calls from ISDN lines, VoIP, or web services. The interface is generic, so any call that reaches the Platform triggers OnConnecting(). OnConnecting() is a thread-safe event and can be triggered as many times as system requires.
Within OnConnecting(), a new instance of our custom Workflow manager is launched and the context is a custom object called ProcessingInfo:
new WorkflowManager<ZeProcessingInfo>();
Where, ZeProcessingInfo:
var ZeProcessingInfo = new ProcessingInfo(this, new LogMaster());
As you can see, the ProcessingInfo is composed of Platform itself and a new instance of LogMaster. LogMaster is defined in an independent assembly.
Now this LogMaster is available throughout the WorkflowManager, all the Workflows it launches, all the activities within any running Workflow, and passed on to external code called from within any Activity. Now, when a new LogMaster is initialized, a Master Operation entry is created in the database and this LogMaster object now lives until this call is ended after a series of very serious roller coaster rides through different workflows. Upon every call of OnConnecting(), a new Master Operation is created and maintained.
The LogMaster allows for calling a AddDetail() method that adds new child detail under the internally stored Master Operation (distinguished through a Guid Primary Key). The LogMaster is built upon Entity Framework.
And, I'm able to log under the same Master Operation as many times as I require. But the application requirements are changing and there is a need to log from other assemblies now. There is a Platform Server assembly witch is a Windows Service that acts as a server listening to web service based calls and once a client calls a method, OnConnecting in Platform is triggered.
I need a mechanism to somehow retrieve the related LogMaster object so that I can add detail to the same Master Operation. But Platform Server is the once triggering the OnConnecting() on the Platform and thus, instantiating LogMaster. This creates a redundancy loop.
Also, failure scenarios are being considered as well. If LogMaster fails, need to revert to Event Logging from Database Logging. If Event Logging is failed (or not allowed through unified configuration), need to revert to file-based (XML) logging.
I hope I have given a rough idea. I don't expect code but I need some strategy for a very seamless plug-able configurable logging mechanism that supports Master-Child operations.
Thanks for reading. Any help would be much appreciated.
I've read this question a number of times and it was pretty hard to figure out what was going on. I don't think your diagram helps at all. If your question is about trying to retrieve the master log record when writing child log records then I would forget about trying to create normalised data in the log tables. You will just slow down the transactional system in trying to do so. You want the log/audit records to write as fast as possible and you can later aggregate them when you want to read them.
Create a de-normalised table for the logs entries and use a single Guid in that table to track the session/parent log master. Yes this will be a big table but it will write fast.
As for guaranteed delivery of log messages to a destination, I would try not to create multiple destinations as combining them later will be a nightmare but rather use something like MSMQ to emit the audit logs as fast as possible and have another service pick them up and process them in a guaranteed delivery manner. ETW (Event Logging) is not guaranteed under load and you will not know that it has failed.

Architecture and performance issue

I have an question about architecture/performance. I'm talking about a SIP server that processes multiples client requests concurrently. I suppose that each request is treated in a dedicated thread. At the end of the process, the concerned thread log request specific infos in a file. I want to optimize the last part of processing. I mean I want to know what alternatives you propose instead of logging these infos in a file. Why? Because writing in a file after processing uses resources that I would use to process other arriving requests.
First, what do you think about the question? And, if you think that it's a "true" question (I mean that an alternative may optimize the performances), what do you propose?
I thought about logging the data into a queue and to use another process IN ANOTHER MACHINE that would read from the queue and write to a file.
Thanks for your suggestions
If it is NOT a requirement that the log is written before the request returns - i.e. the logging is not part of the atomic response - then you have the option of returning the response and just initiating the logging action.
Putting the logging data in a queue in memory seems reasonable. You can read that queue and write to disk either on the same machine or another. I would start with a thread in your app as this is easiest to implement and since the disk I/O is going to be the limiting factor, it shouldn't impact your server much.
If the log is required to be written BEFORE the response is returned, you still have the option of using a reliable queue like MSMQ.
I suspect that network overhead involved in moving the logging to another machine is problably going to create more problems than it solves. I would go with #Nicholas' solution - queue off the logs to one thread on the same machine. The queue allows slack so that occasional disk latency is mitigated and the logging thread can make its own optimizations, eg. waiting until it has a cluster-size of logs before writing. Other stuff, like opening a new log file every day or whenever the log-file reaches a limiting size are also much easier without affecting the performance of the main server.
Even if you log on another machine, you should still queue off the logging to mitigate network latency.
If the log objects on the queue contain, say, a 'request' enumeration, (eg. ElogWrite, ElogNewFile, ElogPath, ElogShutdown), you could try both - you could queue up a request for the log thread to close its current log file and open a path to a file on a networked machine at runtime - the queue buffer would absorb the delay of doing this.

syslog question

I am looking into syslog.
I understand that it is a centralized logging facility that collects logs from various sources.
I have heard that syslog can generate alerts on conditions e.g. max file size of log file is reached.
Is this true?
Because I haven't found how this is done.
Most posts just refer to the logging.
How is the event generation done?
I.e. if I have an app that acts as a log source (redirects logging to a syslog) then is it possible my app can receive an alert, if the max file size has been reached?
How is this configured?
Thank you!
From the application perspective, the syslog function is primarily a receiver of information from the application; the application can write messages to the syslog. There are various bits of information that the application provides to the syslog daemon, including the severity of the message.
The syslog daemon can be configured to take different actions on receipt of different types of message.
No, your application cannot receive an alert when the maximum file size is reached - at least, not via syslog. You might get a SIGXFSZ signal which you can trap. You might prefer to look at your resource limits and keep tabs on your file size to avoid the problem.

Resources