Logstash log processing from multiple source - logstash

I am new to elk stack. Let me explain what i am trying to do. I have a application that is running separately for different users i.e. 5 different users will have 5 independent instance of the same application. I am using filebeats to send the logs from the application to logstash where it will be processed first before being sent to the elasticsearch. What i want is to write the application which enables the users to view the logs of theirs instance of application only. Now what i tried to do is creating the logstash pipeline for each the user with different port which will process the log and send it to elasticsearch with the different index name.
Can you suggest me if this is the best practice or i am doing it wrong? Is there a more better way to do it without having separate pipeline for individual users with separate port? I think the way I am doing it is wrong and it will be harder for me to manage when the instances will grow in numbers.
Thank You

I would suggest if there's no skinning , validation and enrichment involved then skip logstash altogether. You can straight away pass filebeat logs to ES. Now there are two ways from here. Filebeat can additionally send a parameter (any fixed string) along with the scanned message to ES or you can store the meta (like ip) source which filebeat will send along with message. This string can then be used to identify the source of the log message and then on kibana you can configure to show dashboard based on that fixed string / user / meta. This simply the process and avoid unnecessary hops.

Related

Logstash listens to its own logs

I want to be able to track logstash logs in case of an error.
I want to be able to monitor issues when logstash tries to send events to output destination. I've checked the monitor API but it doesn't fulfills my requirements.
Is it a good practice to use file input plugin and set the path to its own log?

Solution to bypass Logstash mixing logs

I'm currently in front of a structure problem with logstash.
I have a syslog-ng client sending logs from different files through the network to an ELK stack.
I noticed that Logstash is mixing logs, especially multiline with adding exception lines to non error logs from others files. So, i guess the trouble is that my logs are treated without any differentiation from their file origins. I found 2 ways to avoid that, but they are not optimal in my opinion:
Instead of using syslog-ng, use FileBeat and add a tag for each file which identify their origins. Then, parse my log with Logstash depending of this tag. The problem is that using syslog-ng as client is a requirement, and it will bother me if I have to change it
Change my syslog-ng sources to send each log files to a different port on ELK. I found that a little dirty and can be embarrassing with a great quantity of log files
What do you think about that ? Did I miss a better solution?
Is there a way to add a tag just like filebeat in syslog-ng?

How to log - the 12 factor application way

I want to know the best practice behind logging my node application. I was reading the 12 factor app guidelines at https://12factor.net/logs and it states that logs should always be sent to the stdout. Cool, but then how would someone manage logs in production? Is there an application that scoops up whatever is sent to stdout? In addition, is it recommended that I only be logging to stdout and not stderr? I would appreciate a perspective on this matter.
Is there an application that scoops up whatever is sent to stdout?
The page you linked to provides some examples of log management tools, but the simplest version of this would be just redirecting the output of your application to a file. So in bash node app.js > app.out. You could also split your stdout and stderr like node app.js 2> app.err 1> app.out.
You could additionally have some sort of service that collects the logs from this file, and then puts them indexes them for searching somewhere else.
The idea behind the suggestion to only log to stdout is to let the environment control what to do with the logs because the application doesn't necessarily know the environment that it will eventually run within. Furthermore, by treating all logs as an event stream, you leave the choice of what to do with this stream up to the environment. You may want to send the log stream directly to a log aggregation service for instance, or you may want to first preprocess it, and then stream the result somewhere else. If you mandate a specific output such as logging to a file, you reduce the portability of your service.
Two of the primary goals of the 12 factor guidelines are to be "suitable for deployment on modern cloud platforms" and to offer "maximum portability between execution environments". On a cloud platform where you might have ephemeral storage on your instance, or many instances running the same service, you'd want to aggregate your logs into some central store. By providing a log stream, you leave it up to the environment to coordinate how to do this. If you put them directly into a file, then you would have to tailor your environment to wherever each application has decided to put the logs in order to then redirect them to the central store. Using stdout for logs is thus primarily a useful convention.
I think it's a mistake to categorically say "[web] applications should write logs to stdout".
Rather, I would suggest:
a) Professional-quality, robust web apps should HAVE logs
b) The application should treat the "log" as an abstract, "stream" object
c) Ideally, the logger implementation MAY be configured to write to stdout, to stderr, to a file, to a date-stamped file, to a rotating file, filter by severity level, etc. etc. as appropriate.
I would strongly argue that hard-coded writes to stdout, without any intervening "logger" abstraction, is POOR practice.
Here is a good article:
https://blog.risingstack.com/node-js-logging-tutorial/
Cool, but then how would someone manage logs in production?
The log sink is what you're looking for.
Is there an application that scoops up whatever is sent to stdout?
Yes and no. It's the log ship (or log router). It could be an application, but it's really just some process within the execution or runtime environment that your app doesn't really know about.
Another way to look at this is separation of concern. As it was stated in a different answer, it's about letting the environment own what happens to the log and only expecting the application to concern itself with emitting log events at all. I think what's missing from the 12FA documentation is that they don't try to complete the puzzle for you because there will be different opinions on where to go from stdout, so I'll help by adding in those missing pieces based on my personal experience and what I'm seeing all over the cloud space.
Logger sends log event to log stream (aka 'the log')
It goes without saying that your application should have some sort of "logger" abstraction, but that's really just an entry point for emitting a log event to stdout. That abstraction's responsibility is to get your log event onto the log stream (stdout) in the desired format and then your application's responsibility is done. In fact, the 12FA documentation ends here.
12 Factor App is about creating cloud-friendly and portable applications, so you have to assume that you don't know what the executing/runtime environment even is. So we don't know what "the environment" is and that's the whole point. So from here, it is the responsibility of the executing/runtime environment to process the stream and move it to the sink.
Log ship/router realizes log stream to log sink
So the way we solve for this now is to have some sort of listener for the stdout stream that will take the output and send it downstream to the log sink.
The "ship" (also known as the log router or scraper) might be something in the environment or the runtime, or honestly it could be something running the background of your application (a stream listener); it could be some other custom process; it could be even be Kafka -- I think GCP uses fluentd to scoop up logs from various sources and put them in stackdriver. The point is that it should be a separate "class" in your application that your application doesn't really know about. It just listens to the stream and sends it to the sink. In some solutions, this is something you need to build, in other solutions, it's handled by your platform. Put simply "how do I get the stream to the sink?"
The "sink" is the destination. This can be the console (hello it's literally a stream reader), it can be a file, it can be Splunk, Application Insights, Stack Driver, etc. There are simple solutions and there are larger more complex enterprise solutions, but the concept stays the same.
So in short, this is the answer to your question, if we're writing to stdout "how do we manage logs in production." It's the log sink or log aggregator that you're looking for. In 12FA vernacular, something like "splunk" isn't the "log". The log is the stream itself (stdout). In terms of 12FA - Your application doesn't know what the sink is and ideally, it shouldn't because that sink could change, in which case all of your applications would break, or there could be many different sinks and that could bog your application down particularly if you're writing straight to the sinks instead of stdout first. It's just another decoupling exercise if nothing else.
You can send to a single sink, multiple sinks at once, or you can send to a single sink and have some other component 'ship' your logs from that sink to another (e.g. write to a rolling file and have a router scrape that into splunk). Just depends on your needs.
You can actually see this popping up more and more in cloud providers by default. For example, on GCP, all logs to stdout automatically get picked up and sent to stackdriver. In Azure, so long as you add the instrumentation to your .NET application (the application diagnostics package), it will emit events to stdout and it'll get picked up by azure monitor. There are also more and more packages out there that are beginning to implement this pattern, so in .NET you could use Serilog to abstract most of these concepts.
Logger -> Log Event -> Log [stream] (stdout) -> Sink -> Your eyeballs
Logger: The thing you use to emit the log, typically an abstraction (e.g. Serilog, NLog, Log4net)
Log Event: The individual log itself
Log Stream (or 'the log'): stdout it's the unbuffered, time-ordered aggregation of all events and has no beginning or end.
Log Ship/Router: The transport that sends the stream to one or more sinks. (e.g. in process like log4net, out of process like fluentd)
Log Sink: The thing that you're actually looking at like a console, file, or index/search engine, or analytics/monitoring platform (e.g. splunk, datadog, appinsights, stackdriver, etc.)
There are packages and platforms that provide one or more of these pieces, but all of those pieces are always there. It makes 12FA logging make more sense when you're aware of them.

How to install logstash-forwarder for multiple logstash server?

Currently we are working on forwarding logs to 2 different logstash servers. We cannot figure out a way with which we can install logstash-forwarder on a single machine. Is it possible with logstash-forwarder forwarding logs to multiple logstash ??
Else how can we do it with filebeat ??
In the LSF config, you can specify a list of hosts, but it will pick one at random and only switch to another in case of failure.
FB has the same system, but it allows you to also load balance across the list of hosts.
AFAIK, neither allows you to send events to multiple logstash instances.
Logstash, on the other hand, will send events to all of its outputs, so you could have FB send to a single LS, and have that LS output to your other LS instances. Note that if one output is unavailable, the system will block.

Suggestion on Logstash Setup

I've implemented logstash ( in testing ) as below mentioned architecture.
Component Break Down
Rsyslog client: By default syslog installed in all Linux destros, we just need to configure rsyslog to send logs to remote server.
Logstash: Logstash will received logs from syslog client and it will
store in Redis.
Redis: Redis will work as broker, broker is to hold log data sent by agents before logstash indexes it. Having a broker will enhance performance of the logstash server, Redis acts like a buffer for log data, till logstash indexes it and stores it. As it is in RAM its too fast.
Logstash: yes, two instance of logstash, 1st one for syslog server,
2nd for read data from redis and send out to elasticsearch.
Elasticsearch: The main objective of a central log server is to collect all logs at one place, plus it should provide some meaningful data for analysis. Like you should be able to search all log data for your particular application at a specified time period.Hence there must be a searching and well indexing capability on our logstash server. To achieve this, we will install another opensource tool called as elasticsearch.Elasticsearch uses a mechanism of making an index, and then search that index to make it faster. Its a kind of search engine for text data.
Kibana : Kibana is a user friendly way to view, search and visualize
your log data
But I'm little bit confuse with redis. using this scenario I'll be running 3 java process on Logstash server and one redis, this will take hugh ram.
Question
Can I use only one logstash and elastic search ? Or what would be the best way ?
I am actually in the midst of setting up logstash, redis, elasticsearch, kibana (aka ELK architecture) at my company.
I have the processes split between virtual machines. While you could put them on the same machine, what happens if a machine dies? Then you are left with your indexer and cluster down at the same time.
You also have the problem of not being able to properly replicate your shards on the Elasticsearch. Since you only have one server, the shards won't be replicated and your cluster health will always be yellow. You need to add enough servers to avoid the split-brain scenario.
Why keep Redis?
Since Redis can talk to multiple logstash indexers, one key point is that this makes the indexing transparent to your shippers in that if one indexer goes down, the alternates will pick up the load. This makes your setup high availability.
It's not just a matter of shipping logs and having them indexed and searchable. While your setup will likely work in a very small, rare situation, the stuff people are doing with ELK setups are hundreds of servers, even thousands, so the ELK architecture is meant to scale. All of these servers will also need to be remotely managed by something called Puppet.
Finally, if you have not read it yet, I suggest you read The Logstash Book by James Turnbull.
The following are some more recommended books that have helped me so far:
Pro Puppet, Second Edition
Elasticsearch Cookbook, Second Edition
Redis Cookbook
Redis in Action
Mastering Elasticsearch
ElasticSearch Server
Elasticsearch: The Definitive Guide
Puppet Types and Providers
Puppet 3 Cookbook
You can use only one logstash and elasticsearch if you put all the instance in a machine. Logstash directly read the syslog file by using file input plugin.
Otherwise, you have to use two logstash and redis. It is because logstash do not have any buffer mechanism, so it needs redis as its broker to buffer the log event. Redis do not use more ram. When logstash read the log event from it, the memory will release. If redis use large ram, you have to add the logstash workers for processing the logs faster.
You should only be running one instance of logstash. logstash by design has the ability to have multiple input channels and output channels.
input {
# input instances
syslog {
# add other settings accordingly
type => "syslog"
}
redis {
# add other settings accordingly
type => "redis"
}
}
filter {
# add other settings accordingly
}
output {
# output instances
if [type] == "syslog" {
redis {
# add other settings accordingly
}
}
else if [type] == "redis" {
elasticsearch {
# add other settings accordingly
}
}
}

Resources