Duplicate entries into Elastic Search while logstash instance is running - logstash

I have been trying to send logs from logstash to elasticsearch.Suppose I am running a logstash instance and while it is running,I make a change to the file which the logstash instance is monitoring,then all the logs which have been previously saved in the elasticsearch are saved again,hence duplicates are formed.
Also,when the logstash instance is closed and is restarted again,the logs gets duplicated in the elasticsearch.
How do I counter this problem?
How to send only the newest added entry in the file from logstash to elasticsearch?
My logstash instance command is the following:
bin/logstash -f logstash-complex.conf
and the configuration file is this:
input
{
file
{
path => "/home/amith/Desktop/logstash-1.4.2/accesslog1"
}
}
filter
{
if [path] =~ "access"
{
mutate
{
replace =>
{ "type" => "apache_access" } }
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
host => localhost
index => feb9
}
stdout { codec => rubydebug }
}

I got the solution.
I was opening the file,adding a record and saving it ,due to which logstash treated the same file as a different file each time I saved it as it registered different inode number for the same file.
The solution is to append a line to the file without opening the file but by running the following command.
echo "the string you want to add to the file" >> filename

[ELK stack]
I wanted some custom configs in
/etc/logstash/conf.d/vagrant.conf
so the first step was to make a backup: /etc/logstash/conf.d/vagrant.conf.bk
This caused logstash to add 2 entries in elasticseach for each entry in <file>.log;
the same if i had 3 files in /etc/logstash/conf.d/*.conf.* in ES i had 8 entries for each line in *.log

As you mentioned in your question.
when the logstash instance is closed and is restarted again,the logs gets duplicated in the elasticsearch.
So, it probably you have delete the .since_db. Please have a look at here.
Try to specific the since_db and start_position. For example:
input
{
file
{
path => "/home/amith/Desktop/logstash-1.4.2/accesslog1"
start_position => "end"
sincedb_path => /home/amith/Desktop/sincedb
}
}

Related

logstash or filebeat to create multiple output files from tomcat log

I need to parse a tomcat log file and output it into several output files.
Each file is the result of a certain filter that will pick certain entries in the tomcat file that match a series of regexes or other transformation rule.
Currently I am doing this using a python script but it is not very flexible.
Is there a configurable tool for doing this?
I have looked into filebeat and logstash [none of which I am very familiar with] but it is not clear if it is possible to configure them to map a single input file into multiple output files each with a different filter/grok set of expressions.
Is it possible to achieve this with filebeat/logstash?
If all logs files are on the same servers you dont need Filebeat. Logstash can do the work.
Here an example of what your config logstash can look like.
In input you have you tomcat log file and you have multi output (json) depend of loglevel once logs have been parsed.
The grok is also an example you must define your own grok pattern depend on your log format.
input {
file {
path => "/var/log/tomcat.log"
}
}
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:loglevel} - %{POSTFIX_SESSIONID:sessionId}: %{GREEDYDATA:messageText}" }
}
}
output {
if [loglevel] == "info" {
file {
codec => "json"
path => "/var/log/tomcat_info_parsed.log"
}
}
if [loglevel] == "warning" {
file {
codec => "json"
path => "/var/log/tomcat_warning_parsed.log"
}
}
}

Make logstash filter by name

I have a log file called "/var/log/commands.log" that I'm trying to separate into fields with logstash & grok. I've got it working. Now, I'm trying to make logstash only do this to the file "/var/log/commands.log" and not any input by doing "if name = commands.log" but something with the "if" statement seems wrong as it skips over it.
input{
file{
path => "/var/log/commands.log"
}
beats{
port => 5044
}
}
filter {
if [log][file][path] == "/var/log/commands.log" {
grok{
match => { "message" => "*very long statement*"
}
}
}
}
output{
elasticsearch { hosts => ["localhost:9200"]}
}
If I remove the if statement it works and the fields are visible in kibana. I'm testing things locally. Does anyone know what's going on?
EDIT: SOLVED: In logstash, it has to be only [path] instead of all the rest.

Logstash not producing output or inserting to elastic search

When i execute the configuration file using the command bin\logstash -f the configfile.conf. There is not display on the console just the logs by logstash.
Here is the configuation file:
input
{
file
{
path => "F:\ELK\50_Startups.csv"
start_position => "beginning"
}
}
filter
{
csv
{
separator => ","
columns => ["R&D","Administration","Marketing","State","Profit"]
}
}
output
{
elasticsearch
{
hosts => ["localhost:9200"]
index => ["Startups"]
}
stdout{}
}
do the input file (50_Startups.csv) has fresh data written to? if not, it might be that logstash already stored the read offset as the last line, and it would not re-read it on future runs, unless you delete the sincedb_path offset files, of just add the following config:
sincedb_path => "/dev/null"
That would force logstash to re-parse the file.
see more info on files offsets here:https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html#_tracking_of_current_position_in_watched_files
from it:
By default, the sincedb file is placed in the data directory of Logstash with a filename based on the filename patterns being watched

How do I parse a json-formatted log message in Logstash to get a certain key/value pair?

I send json-formatted logs to my Logstash server. The log looks something like this (Note: The whole message is really on ONE line, but I show it in multi-line to ease reading)
2016-09-01T21:07:30.152Z 153.65.199.92
{
"type":"trm-system",
"host":"susralcent09",
"timestamp":"2016-09-01T17:17:35.018470-04:00",
"#version":"1",
"customer":"cf_cim",
"role":"app_server",
"sourcefile":"/usr/share/tomcat/dist/logs/trm-system.log",
"message":"some message"
}
What do I need to put in my Logstash configuration to get the "sourcefile" value, and ultimately get the filename, e.g., trm-system.log?
If you pump the hash field (w/o the timestamp) into ES it should recognize it.
If you want to do it inside a logstash pipeline you would use the json filter and point the source => to the second part of the line (possibly adding the timestamp prefix back in).
This results in all fields added to the current message, and you can access them directly or all combined:
Config:
input { stdin { } }
filter {
# split line in Timestamp and Json
grok { match => [ message , "%{NOTSPACE:ts} %{NOTSPACE:ip} %{GREEDYDATA:js}"] }
# parse json part (called "js") and add new field from above
json { source => "js" }
}
output {
# stdout { codec => rubydebug }
# you access fields directly with %{fieldname}:
stdout { codec => line { format => "sourcefile: %{sourcefile}"} }
}
Sample run
2016-09-01T21:07:30.152Z 153.65.199.92 { "sourcefile":"/usr" }
sourcefile: /usr
and with rubydebug (host and #timestamp removed):
{
"message" => "2016-09-01T21:07:30.152Z 153.65.199.92 { \"sourcefile\":\"/usr\" }",
"#version" => "1",
"ts" => "2016-09-01T21:07:30.152Z",
"ip" => "153.65.199.92",
"js" => "{ \"sourcefile\":\"/usr\" }",
"sourcefile" => "/usr"
}
As you can see, the field sourcefile is directly known with the value in the rubydebug output.
Depending on the source of your log records you might need to use the multiline codec as well. You might also want to delete the js field, rename the #timestamp to _parsedate and parse ts into the records timestamp (for Kibana to be happy). This is not shown in the sample. I would also remove message to save space.

Multiple identical messages with logstash/kibana

I'm running an ELK stack on my local filesystem. I have the following configuration file set up:
input {
file {
path => "/var/log/rfc5424"
type => "RFC"
}
}
filter {
grok {
match => { "message" => "%{SYSLOG5424LINE}" }
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
}
I have a kibana instance running as well. I write a line to /var/log/rfc5424:
$ echo '<11>1' "$(date +'%Y-%m-%dT%H:%M:%SZ')" 'test-machine test-tag f81d4fae-7dec-11d0-a765-00a0c91e6bf6 log [nsId orgID="12 \"hey\" 345" projectID="2345[hehe]6"] this is a test message' >> /var/log/rfc5424
And it shows up in Kibana. Great! However, weirdly, it shows up six times:
As far as I can tell everything about these message is identical, and I only have one instance of logstash/kibana running, so I have no idea what could be causing this duplication.
Check out if there is .swp or .tmp file for your configuration under conf directory.
Add document id to documents:
output {
elasticsearch {
hosts => ["localhost:9200"]
document_id => "%{uuid_field}"
}
}

Resources