Graylog when use multiple input (use case) - graylog2

When should I use multiple input for my graylog? Do you have a use case?
For instance, I have different Symfony (3.x) applications with different environments (integration, prod, ..) and I want all of them into my graylog.
What is the best way (or the bests pratices) to send all of them into my graylog and easily crate stream based on this environments?

The way I always understood this is that you create separate inputs for "kinds of logs". Like: one rsyslog input for all machines sending logs in rsyslog format, second for all GELF applications, third for capturing NetFlow, etc.

You can send logs from an arbitrary number of applications and systems (i.e. environments) to Graylog (even on the same input).
Simply configure your applications and systems to send logs to Graylog and create an appropriate input for them.
See http://docs.graylog.org/en/2.4/pages/sending_data.html for some hints.

Related

can any one give me more details about journald and syslog

can any one help me in understanding the difference between journaled and rsyslog and advantages, disadvantages for each one ,which one I should use
Journald is the part of systemd that deals with logging - systemd, at its core, is in charge of managing services: it starts them up and keeps them alive.
It was originally designed for local logs on desktops – where there are not that many logs. On the other hand, rsyslog was designed for high-performance central log collections from the ground up.
Rsyslog can collect logs from many more sources, including pipes, sockets, and files. File sources are especially important, as many applications – like web servers – log to files and do that at a rate that journald cannot handle.
Both use syslog protocol. So you could use rsyslog and journald on the same machine without any problems.
The rsyslog modules imjournal (input) and omjournal output make it possible for rsyslog to read and write to the journal.
So if you write something to rsyslog, it will only appear in journald if you've configured the omjournal module.

Logstash vs Rsyslog for log file aggregation

I am working on a solution for centralized log file aggregation from our CentOs 6.x servers. After installing Elasticsearch/Logstash/Kibana (ELK) stack I came across an Rsyslog omelasticsearch plugin which can send messages from Rsyslog to Elasticsearch in logstash format and started asking myself why I need Logstash.
Logstash has a lot of different input plugins including the one accepting Rsyslog messages. Is there a reason why I would use Logstash for my use case where I need to gather the content of logs files from multiple servers? Also, is there a benefit of sending messages from Rsyslog to Logstash instead of sending them directly to Elasticsearch?
I would use Logstash in the middle if there's something I need from it that rsyslog doesn't have. For example, getting GeoIP from an IP address.
If, on the other hand, I would need to get syslog or file contents indexed in Elasticsearch, I'd use rsyslog directly. It can do buffering (disk+memory), filtering, you can choose how the document will look like (you can put the textual severity instead of the number, for example), and it can parse unstructured data. But the main advantage is performance, on which rsyslog is focused on. Here's a presentation with some numbers (and tips and tricks) on Logstash, rsyslog and Elasticsearch:
http://blog.sematext.com/2015/05/18/tuning-elasticsearch-indexing-pipeline-for-logs/
I would recommend logstash. That would be easier to setup, more examples and they are tested to fit together.
Also, there are some benefits, in logstash you can filter and modify your logs.
You can extend logs with useful data: server name, timestamp, ...
Cast types, string to int, etc. (useful for correct Elastic index)
Filter out logs by some rules
Moreover, you can setup batch size to optimize saving to elastic.
Another feature, if something went wrong and there are crazy amount of logs per second that elastic can not process, you can setup logstash that it would save some queue of events or drop events that can not be saved.
If you go straight from the server to elasticsearch, you can get the basic documents in (assuming the source is json, etc). For me, the power of logstash is to add value to the logs by applying business logic to modify and extend the logs.
Here's one example: syslog provides a priority level (0-7). I don't want to have a pie chart where the values are 0-7, so I make a new field that contains the pretty names ("emerg", "debug", etc) that can be used for display.
Just one example...
Neither are a viable option if you really want to rely on the system to operate under load and be highly available.
We found that using rsyslog to send to a centralized location, archive it using redis of kafka and then using logstash to do its magic and ship to Elasticsearch is the best option.
Read our blog about it here - http://logz.io/blog/deploy-elk-production/
(Disclaimer - I am the VP product for logz.io and we offer ELK as a service)

Graphite, collect metrics with similar names from different hosts

I've installed Graphite+Carbon to collect metrics from several hosts. These hosts send Apache Spark and Java metrics. I can't distinguish metrics from different hosts on Graphite side. What should be the right approach? I want to group metrics by host.
"Master" is located on remote host, "workers" are located on three remote hosts and I can't distinguish incoming numbers. Don't understand what is the right way to add host determinant into metric.
Graphite has the notion of namespaces. e.g. host.app.metric.dimension If you don't send your metrics in this way you have no way of distinguishing them from one another.
Depending on your library there should be a way to prefix the sent metrics with some kind of identifier. I recommend some unique internal identifier and then going on from it.

how to get send, receive byte and network bandwidth from different servers to parse with elk stack (elasticsearch logstash kibana)

Can I get the send and receive bytes from unix servers? for sample centos server. I want to get the inbound and outbound bytes then parse it to logstash so that Im able to see the histogram in kibana. Thanks in advance. I am new to here.
There are lots of ways to make this happen.
You could run a script to gather the data on the machine, write it to a log file, use logstash-forwarder to send the logs to a centralized logstash, which then inserts into elasticsearch.
You could use some existing program to ship to logstash (collectd, etc).
Since you won't really need any of the features of logstash, you could write to elasticsearch directly via one of the client libraries (python, etc).
Unless you're limited to ELK, you could also use some "normal" network management system (snmp polling, etc).

AWS EC2 instance Application logs

I want to store logs of applications like uWSGI ("/var/log/uwsgi/uwsgi.log") on a device that can be accessed from
multiple instances and can save their logs to that particular device under their own instance name dir.
So does AWS provides any solution to do that....
There are a number of approaches you can take here. If you want to have an experience that is like writing directly to the filesystem, then you could look at using something like s3fs to mount a common S3 bucket to each of your instances. This would give you more or less a real-time log merge though honestly I would be concerned over the performance of such a set up in a high volume application.
You could process the logs at some regular interval to push the data to some common store. This would not be real time, but would likely be a pretty simple solution. The problem here is that it may be difficult to interleave your log entries from different servers if you need to have them arranged in time order.
Personally, I set up a Graylog server for each instance cluster I have, to which I log all my access logs, error logs, etc. It is UDP based, so it is fire and forget from the application servers' standpoint. It provides nice search/querying tools as well. Personally I like this approach as it removes log management from the application servers altogether.
Two options that I've used:
Use syslog (or Syslog-NG) to log to a centralized location. We do this to ship our AWS log data offsite to our datacenter. Syslog-NG is more reliable than plain ole' Syslog and allows us to use MongoDB as a backing store.
Use logrotate to push your logs to S3. It's not real-time like the Syslog solution, but it's a lot easier to set up and manage, especially if you have a lot of instances and aren't using a VPC
Loggly and Splunk Storm are also two interesting SaaS products intended to solve this problem.

Resources