Logstash + Syslog Input Plugin VS Logstash + File Input Plugin + Syslog server - logstash

I have an existing system that sends me log entries to my server via Syslog protocol. The log entries are written into local files, and then I process these log files with Logstash using its File input plugin.
I like it because even if the Logstash goes down (it happens sometimes), I do not lose any log.
I have just realized today that the Logstash also has a Syslog input plugin that is capable of reading log data on the Syslog protocol.
I am wondering if I turn off my Syslog server, and read the data via the Syslog input plugin of the Logstash, will I have the same reliable system, or If the Logstash goes down, I will lose data during the downtime?

If Logstash goes down you will lose data during the downtime.
Also, the syslog input only works if the messages from your logs are in compliance with the RFC3164, anything different and you will need a grok pattern to parse that message.
If you don't want to use the file input anymore you can create a rule on your syslog server to redirect the messages to your logstash input, in this case, if your logstash goes down, you will still have the files to fill the missing data.

Related

Logstash listens to its own logs

I want to be able to track logstash logs in case of an error.
I want to be able to monitor issues when logstash tries to send events to output destination. I've checked the monitor API but it doesn't fulfills my requirements.
Is it a good practice to use file input plugin and set the path to its own log?

Solution to bypass Logstash mixing logs

I'm currently in front of a structure problem with logstash.
I have a syslog-ng client sending logs from different files through the network to an ELK stack.
I noticed that Logstash is mixing logs, especially multiline with adding exception lines to non error logs from others files. So, i guess the trouble is that my logs are treated without any differentiation from their file origins. I found 2 ways to avoid that, but they are not optimal in my opinion:
Instead of using syslog-ng, use FileBeat and add a tag for each file which identify their origins. Then, parse my log with Logstash depending of this tag. The problem is that using syslog-ng as client is a requirement, and it will bother me if I have to change it
Change my syslog-ng sources to send each log files to a different port on ELK. I found that a little dirty and can be embarrassing with a great quantity of log files
What do you think about that ? Did I miss a better solution?
Is there a way to add a tag just like filebeat in syslog-ng?

Logstash should log only grok parsed messages

Currently I have a ELK stack in which logs are shipped by filebeat and after some filters in logstash, it is forwarded to ES. As there are a lot of servers and logs, a huge logs are coming to logstash, but I have configured the filter to only process a very specific type of log message. Which it is doing fine, but the logs which are not even matching are logged in logstash.log file. As I mentioned earlier that huge logs are coming, the size of logstash.log file is soon reaching to a high value and there is space issue coming up. How to configure the logstash so that I only log the processed logs, and not all.
You could use logrotate to automatically rotate on either a daily basis or once it hits a certain threshold. You could then set the number of rotations to be 1 or 2. This would allow you time to see what is going to the file in case you need to troubleshoot, but purge before it creates space contention.

How to install logstash-forwarder for multiple logstash server?

Currently we are working on forwarding logs to 2 different logstash servers. We cannot figure out a way with which we can install logstash-forwarder on a single machine. Is it possible with logstash-forwarder forwarding logs to multiple logstash ??
Else how can we do it with filebeat ??
In the LSF config, you can specify a list of hosts, but it will pick one at random and only switch to another in case of failure.
FB has the same system, but it allows you to also load balance across the list of hosts.
AFAIK, neither allows you to send events to multiple logstash instances.
Logstash, on the other hand, will send events to all of its outputs, so you could have FB send to a single LS, and have that LS output to your other LS instances. Note that if one output is unavailable, the system will block.

Suggestion on Logstash Setup

I've implemented logstash ( in testing ) as below mentioned architecture.
Component Break Down
Rsyslog client: By default syslog installed in all Linux destros, we just need to configure rsyslog to send logs to remote server.
Logstash: Logstash will received logs from syslog client and it will
store in Redis.
Redis: Redis will work as broker, broker is to hold log data sent by agents before logstash indexes it. Having a broker will enhance performance of the logstash server, Redis acts like a buffer for log data, till logstash indexes it and stores it. As it is in RAM its too fast.
Logstash: yes, two instance of logstash, 1st one for syslog server,
2nd for read data from redis and send out to elasticsearch.
Elasticsearch: The main objective of a central log server is to collect all logs at one place, plus it should provide some meaningful data for analysis. Like you should be able to search all log data for your particular application at a specified time period.Hence there must be a searching and well indexing capability on our logstash server. To achieve this, we will install another opensource tool called as elasticsearch.Elasticsearch uses a mechanism of making an index, and then search that index to make it faster. Its a kind of search engine for text data.
Kibana : Kibana is a user friendly way to view, search and visualize
your log data
But I'm little bit confuse with redis. using this scenario I'll be running 3 java process on Logstash server and one redis, this will take hugh ram.
Question
Can I use only one logstash and elastic search ? Or what would be the best way ?
I am actually in the midst of setting up logstash, redis, elasticsearch, kibana (aka ELK architecture) at my company.
I have the processes split between virtual machines. While you could put them on the same machine, what happens if a machine dies? Then you are left with your indexer and cluster down at the same time.
You also have the problem of not being able to properly replicate your shards on the Elasticsearch. Since you only have one server, the shards won't be replicated and your cluster health will always be yellow. You need to add enough servers to avoid the split-brain scenario.
Why keep Redis?
Since Redis can talk to multiple logstash indexers, one key point is that this makes the indexing transparent to your shippers in that if one indexer goes down, the alternates will pick up the load. This makes your setup high availability.
It's not just a matter of shipping logs and having them indexed and searchable. While your setup will likely work in a very small, rare situation, the stuff people are doing with ELK setups are hundreds of servers, even thousands, so the ELK architecture is meant to scale. All of these servers will also need to be remotely managed by something called Puppet.
Finally, if you have not read it yet, I suggest you read The Logstash Book by James Turnbull.
The following are some more recommended books that have helped me so far:
Pro Puppet, Second Edition
Elasticsearch Cookbook, Second Edition
Redis Cookbook
Redis in Action
Mastering Elasticsearch
ElasticSearch Server
Elasticsearch: The Definitive Guide
Puppet Types and Providers
Puppet 3 Cookbook
You can use only one logstash and elasticsearch if you put all the instance in a machine. Logstash directly read the syslog file by using file input plugin.
Otherwise, you have to use two logstash and redis. It is because logstash do not have any buffer mechanism, so it needs redis as its broker to buffer the log event. Redis do not use more ram. When logstash read the log event from it, the memory will release. If redis use large ram, you have to add the logstash workers for processing the logs faster.
You should only be running one instance of logstash. logstash by design has the ability to have multiple input channels and output channels.
input {
# input instances
syslog {
# add other settings accordingly
type => "syslog"
}
redis {
# add other settings accordingly
type => "redis"
}
}
filter {
# add other settings accordingly
}
output {
# output instances
if [type] == "syslog" {
redis {
# add other settings accordingly
}
}
else if [type] == "redis" {
elasticsearch {
# add other settings accordingly
}
}
}

Resources