Logstash configuration variable expansion - logstash

I have a strange problem with a logstash filter, that was working up until yesterday.
This is my .conf file:
input {
beats {
port => 5044
}
}
filter {
if "access.log" in [source] {
grok {
match => { "message" => "%{GREEDYDATA:messagebefore}\[%{HTTPDATE:real_date}\]\ %{GREEDYDATA:messageafter}" }
}
mutate {
replace => { "[message]" => "%{messagebefore} %{messageafter}" }
remove_field => [ "messagebefore" ]
remove_field => [ "messageafter" ]
}
date {
match => [ "real_date", "dd/MMM/YYYY:HH:mm:ss Z" ]
}
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
}
}
The issue is that in the output, the derived variables %messagebefore and %message after are coming through as literal text, rather than the content.
Example:
source:/var/log/nginx/access.log message:%{messagebefore} %{messageafter}...
The strange thing is that this was working fine before yesterday afternoon. I also appreciate that this is probably not the best way to process nginx logs, but I'm using this one as an example only as it's affecting all of my other configuration files as well.
My environment:
ELK stack running as a docker container on Centos 7 derived from docker.io/sebp/elk.
Filebeat running on Centos 7 client.
Any ideas?
Thanks.

Solved this myself, and posting here in case anyone gets the same issue.
When building the docker container, I inadvertently left behind another .conf file that also contained reference to access.log. The two .conf files were clashing as logstash was processing both. I deleted the erroneous file and it has all started working.

Related

Filebeat with multiple Logstash pipelines

I have Filebeat configured to watch several different logs on a single host, e.g. Nginx and my app server. However, as I understand it, you cannot have multiple outputs in any one Beat -- so my filebeat.yml has a single output.logstash directive which points to my Logstash server.
Does Logstash have the concept of pipeline routing? I have several pipelines configured on my Logstash server but it's unclear how to leverage that from Filebeat, e.g. I would like to send Nginx logs to my Logstash pipeline for Nginx, etc.
Alternatively, is there a way to route the beats for Nginx to logstash:5044, and the beats for my app server to logstash:5045.
For each of the filebeat prospectors you can use the fields option to add a field that logstash can check to identify what type of data the prospector is collecting. Then in logstash you can use pipeline-to-pipeline communication with the distributor pattern to send different types of data to different pipelines.
You can use tags on your filebeat inputs and filter on your logstash pipeline using those tags.
For example, add the tag nginx to your nginx input in filebeat and the tag app-server in your app server input in filebeat, then use those tags in the logstash pipeline to use different filters and outputs, it will be the same pipeline, but it will route the events based on the tag.
If you want to send the different logs to different ports, you will need to run another instance of Filebeat.
You can use tag concepts for multiple log files
filebeat.yml
filebeat.inputs:
- type: log
tags: [apache]
paths:
- "/home/ubuntu/data/apache.log"
- type: log
tags: [gunicorn]
paths:
- "/home/ubuntu/data/gunicorn.log"
queue.mem:
events: 4096
flush.min_events: 512
flush.timeout: 5s
output.logstash:
hosts: ["****************:5047"]
conf.d/logstash.conf
input {
beats {
port => "5047"
host => "0.0.0.0"
}
}
filter {
if "gunicorn" in [tags] {
grok {
match => {
"message" => "%{USERNAME:u1} %{USERNAME:u2} \[%{HTTPDATE:http_date}\] \"%{DATA:http_verb} %{URIPATHPARAM:api} %{DATA:http_version}\" %{NUMBER:status_code} %{NUMBER:byte} \"%{DATA:external_api}\" \"%{GREEDYDATA:android_client}\""
remove_field => ["message"]
}
}
date {
match => ["http_date", "dd/MMM/yyyy:HH:mm:ss XX"]
}
mutate {
remove_field => ["agent"]
}
}
else if "apache" in [tags] {
grok {
match => {
"message" => "%{IPORHOST:client_ip} %{DATA:u1} %{DATA:u2} \[%{HTTPDATE:http_date}\] \"%{WORD:http_method} %{URIPATHPARAM:api} %{DATA:http_version}\" %{NUMBER:status_code} %{NUMBER:byte} \"%{DATA:external_api}\" \"%{GREEDYDATA:gd}\" \"%{DATA:u3}\""
remove_field => ["message"]
}
}
date {
match => ["http_date", "dd/MMM/yyyy:HH:mm:ss +ssss"]
}
mutate {
remove_field => ["agent"]
}
}
}
output {
if "gunicorn" in [tags] {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["0.0.0.0:9200"]
index => "gunicorn-sample-%{+YYYY.MM.dd}"
}
}
else if "apache" in [tags] {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["0.0.0.0:9200"]
index => "apache-sample-%{+YYYY.MM.dd}"
}
}
}

Read logs for last month with logstash

I have configured my logstash config file to read apache access logs like this:
input {
file {
type => "apache_access"
path => "/etc/httpd/logs/access_log*"
start_position => beginning
sincedb_path => "/dev/null"
}
}
filter {
if [path] =~ "access" {
mutate { replace => { "type" => "apache_access" } }
grok {
match => { "message" => "%{IPORHOST:clientip} - %{DATA:username} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-)" }
}
kv {
source => "request"
field_split => "&?"
prefix => "requestarg_"
}
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
host => "10.13.10.18"
cluster => "awstutorialseries"
}
}
files that i have in the directory /etc/httpd/logs are:
access_log
access_log-20161002
access_log-20161005
access_log-20161008
access_log-20161011
...
When accessing all files in path access_log* it can make time if we have a interesting number of archived files.
In the server we rotate logs avery 3 days, so we archive the access_log file to be access_log-{date} and logstash as the config says, it reads all access_log files in that directory even the archived ones are included.
after some month we are in front of a lot of files that logstash should read so it can make time to read them all.
Q1: Is there a way to read all the logs once, and then just access_log file?
Q2: Is there a way or a custom expression to do in config file to read just some log files deponds on date and not all of them ?
I have tried a plenty of conbinaison and filters on my config file based on official documentation but no chance.
Your pattern "access_log*" will match all the old files, too, but logstash will ignore any files older than a day. See the ignore_older param in the file{} input. When catching up on old files, you can set this to a higher value.
Once you're caught up, I would release a new config that only looked at "access_log" (no wildcard, this the latest file only).

Logstash not printing anything

I am using logstash for the first time and trying to setup a simple pipeline for just printing the nginx logs. Below is my config file
input {
file {
path => "/var/log/nginx/*access*"
}
}
output {
stdout { codec => rubydebug }
}
I have saved the file as /opt/logstash/nginx_simple.conf
And trying to execute the following command
sudo /opt/logstash/bin/logstash -f /opt/logstash/nginx_simple.conf
However the only output I can see is:
Logstash startup completed
Logstash shutdown completed
The file is not empty for sure. As per my understanding I should be seeing the output on my console. What am I doing wrong ?
Make sure that the character encoding of your logfile is UTF-8. If it is not, try to change it and restart the Logstash.
Please try this code as your Logstash configuration, in order to setup a simple pipeline for just printing the nginx logs.
input {
file {
path => "/var/log/nginx/*.log"
type => "nginx"
start_position => "beginning"
sincedb_path=> "/dev/null"
}
}
filter {
if [type] == "nginx" {
grok {
patterns_dir => "/home/krishna/Downloads/logstash-2.1.0/pattern"
match => {
"message" => "%{NGINX_LOGPATTERN:data}"
}
}
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
hosts => [ "127.0.0.1:9200" ]
}
stdout { codec => rubydebug }
}

Groking and then mutating?

I am running the following filter in a logstash config file:
filter {
if [type] == "logstash" {
grok {
match => {
"message" => [
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{DATA:mymessage}, reason:%{GREEDYDATA:reason}",
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{GREEDYDATA:mymessage}"
]
}
}
}
}
It kind of works:
it does identify and carve out variables "timestamp", "severity", "instance", "mymessage", and "reason"
Really what I wanted was to have text which is now %{mymessage} to be the ${message} but when I add any sort of mutate command to this grok it stops working (btw, should there be a log that tells me what is breaking? I didn't see it... ironic for a logging solution to not have verbose logging).
Here's what I tried:
filter {
if [type] == "logstash" {
grok {
match => {
"message" => [
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{DATA:mymessage}, reason:%{GREEDYDATA:reason}",
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{GREEDYDATA:mymessage}"
]
}
mutate => {
replace => [ "message", "%{mymessage}"]
remove => [ "mymessage" ]
}
}
}
}
So in summary I'd like to understand:
Are there log files I can look at to see why/where a failure is happening?
Why would my mutate commands illustated above not work?
I also thought that if I never used the mymessage variable but instead just referred to message as the variable that maybe it would automatically truncate message to just the matched pattern but that appeared to append the results instead ... what is the correct behaviour?
Using the overwrite option is the best solution, but I thought I'd address a couple of your questions directly anyway.
It depends on how Logstash is started. Normally you'd run it via an init script that passes the -l or --log option. /var/log/logstash would be typical.
mutate is a filter of its own, not a part of grok. You could've done like this (or used rename instead of replace + remove):
grok {
...
}
mutate {
replace => [ "message", "%{mymessage}" ]
remove => [ "mymessage" ]
}
I'd do it a different way. For what you're trying to do, the overwrite option might be more apt.
Something like this:
grok {
overwrite => "message"
match => [
"message" => [
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{DATA:message}, reason:%{GREEDYDATA:reason}",
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{GREEDYDATA:message}"
]
]
}
This'll replace 'message' with the 'grokked' bit.
I know that doesn't directly answer your question - about all I can say is when you start logstash, it writes to STDOUT - at least on the version I'm using - which I'm capturing and writing to a file. In here, it reports some of the errors.
There's a -l option to logstash that lets you specify a log file to use - this will usually show you what's going on in the parser, but bear in mind that if something doesn't match a rule, it won't necessarily tell you why it didn't.

Can't get logstash to geoip locate from Netflow input

I'm trying to use Logstash to parse out and geolocate IP addresses from a Netflow source, it works to get the data into Elasticsearch, but it's not putting in the geoip info. Here's my config file that I'm using in logstash
input {
udp {
host => localhost
port => 5555
codec => netflow
}
}
filter {
geoip {
target => "geoip"
source => "ipv4_dst_addr"
add_tag => ["geoip"]
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}"$
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" $
}
}
output {
stdout { }
elasticsearch { host => "127.0.0.1" }
}
More info that might help, Using Logstash 1.4.2 and Elasticsearch 1.3.4.
Any luck in figuring this one out?
If not, please note that you need to use a mutate to convert the coordinates to float.
However, the geoip filter in Logstash 1.3 and up adds a location field directly so you won't have to use add_field and you won't even have to use the converter. If you try these two solutions, please tell me how it goes. Thank you.
A side note: The recommended version to work with Logstash 1.4.2 of Elasticsearch is 1.1.1
I just spent some time digging into this, and it ends up being something of a bug in the Netflow codec code (specifically, in the IP4Addr class in netflow/util.rb).
You should be able to work around this with a mutate filter, like this:
filter {
mutate {
convert => {
"[netflow][ipv4_src_addr]" => "string"
"[netflow][ipv4_dst_addr]" => "string"
}
}
geoip {
source => "[netflow][ipv4_src_addr]"
target => "src_geoip"
}
geoip {
source => "[netflow][ipv4_dst_addr]"
target => "dst_geoip"
}
}
I've submitted a pull request to fix this properly, but in the time being, try that config.

Resources