As part of my logstash config I want to pull the current date from the server, which it uses as part of it's API query using http_poller.
Is there any way to do that? I've tried something along the lines of this:
$(date +%d%m%y%H%M%S)
But it doesn't get picked up. This is the config:
input{
http_poller {
#proxy => { host => "" }
proxy => ""
urls => {
q1 => {
method => post
url => ""
headers => {.... }
body => '{
"rsid": "....",
"globalFilters": [
{
"type": "dateRange",
"dateRange": "%{+ddMMyyHHmmss}"
}
................
}'
}
}
request_timeout => 60
# Supports "cron", "every", "at" and "in" schedules by rufus scheduler
schedule => { cron => "* * * * * UTC"}
codec => "json"
metadata_target => "http_poller_metadata"
}
}
output {
elasticsearch {
hosts => ["xxxx"]
index => "xxxx"
}
}
There is nothing like a variable declaration we can do in input ..
a workaround rather is to define env variables with dates eg. on windows (powershell script) -
$env:startDate=(Get-Date).AddDays(-1).ToString('yyyy-MM-dd')
$env:endDate=(Get-Date).AddDays(0).ToString('yyyy-MM-dd')
Then we can use these variables as ${startDate} in the url .
However once logstash is started the dates remain static.
Guess we need to restart logstash script everyday for it to take the new value of date.
Another alternative is to write a proxy webservice, which will probably be in java or other languages , where the java class can be declared with variables and then it invokes the actual webservice and returns the response back to the logstash script.
This issue is pending in logstash since 2016... not sure why it cannot be addressed !
Related
I send json-formatted logs to my Logstash server. The log looks something like this (Note: The whole message is really on ONE line, but I show it in multi-line to ease reading)
2016-09-01T21:07:30.152Z 153.65.199.92
{
"type":"trm-system",
"host":"susralcent09",
"timestamp":"2016-09-01T17:17:35.018470-04:00",
"#version":"1",
"customer":"cf_cim",
"role":"app_server",
"sourcefile":"/usr/share/tomcat/dist/logs/trm-system.log",
"message":"some message"
}
What do I need to put in my Logstash configuration to get the "sourcefile" value, and ultimately get the filename, e.g., trm-system.log?
If you pump the hash field (w/o the timestamp) into ES it should recognize it.
If you want to do it inside a logstash pipeline you would use the json filter and point the source => to the second part of the line (possibly adding the timestamp prefix back in).
This results in all fields added to the current message, and you can access them directly or all combined:
Config:
input { stdin { } }
filter {
# split line in Timestamp and Json
grok { match => [ message , "%{NOTSPACE:ts} %{NOTSPACE:ip} %{GREEDYDATA:js}"] }
# parse json part (called "js") and add new field from above
json { source => "js" }
}
output {
# stdout { codec => rubydebug }
# you access fields directly with %{fieldname}:
stdout { codec => line { format => "sourcefile: %{sourcefile}"} }
}
Sample run
2016-09-01T21:07:30.152Z 153.65.199.92 { "sourcefile":"/usr" }
sourcefile: /usr
and with rubydebug (host and #timestamp removed):
{
"message" => "2016-09-01T21:07:30.152Z 153.65.199.92 { \"sourcefile\":\"/usr\" }",
"#version" => "1",
"ts" => "2016-09-01T21:07:30.152Z",
"ip" => "153.65.199.92",
"js" => "{ \"sourcefile\":\"/usr\" }",
"sourcefile" => "/usr"
}
As you can see, the field sourcefile is directly known with the value in the rubydebug output.
Depending on the source of your log records you might need to use the multiline codec as well. You might also want to delete the js field, rename the #timestamp to _parsedate and parse ts into the records timestamp (for Kibana to be happy). This is not shown in the sample. I would also remove message to save space.
I'm building out logstash and would like to build functionality to anonymize fields as specified in the message.
Given the message below, the field fta is a list of fields to anonymize. I would like to just use %{fta} and pass it through to the anonymize filter, but that doesn't seem to work.
{ "containsPII":"True", "fta":["f1","f2"], "f1":"test", "f2":"5551212" }
My config is as follows
input {
stdin { codec => json }
}
filter {
if [containsPII] {
anonymize {
algorithm => "SHA1"
key => "123456789"
fields => %{fta}
}
}
}
output {
stdout {
codec => rubydebug
}
}
The output is
{
"containsPII" => "True",
"fta" => [
[0] "f1",
[1] "f2"
],
"f1" => "test",
"f2" => "5551212",
"#version" => "1",
"#timestamp" => "2016-07-13T22:07:04.036Z",
"host" => "..."
}
Does anyone have any thoughts? I have tried several permutations at this point with no luck.
Thanks,
-D
EDIT:
After posting in the Elastic forums, I found out that this is not possible using base logstash functionality. I will try using the ruby filter instead. So, to ammend my question, How do I call another filter from within the ruby filter? I tried the following with no luck and honestly can't even figure out where to look. I'm very new to ruby.
filter {
if [containsPII] {
ruby {
code => "event['fta'].each { |item| event[item] = LogStash::Filters::Anonymize.execute(event[item],'12345','SHA1') }"
add_tag => ["Rubyrun"]
}
}
}
You can execute the filters from ruby script. Steps will be:
Create the required filter instance in the init block of inline ruby script.
For every event call the filter method of the filter instance.
Following is the example for above problem statement. It will replace my_ip field in event with its SHA1.
Same can be achieved using ruby script file.
Following is the sample config file.
input { stdin { codec => json_lines } }
filter {
ruby {
init => "
require 'logstash/filters/anonymize'
# Create instance of filter with applicable parameters
#anonymize = LogStash::Filters::Anonymize.new({'algorithm' => 'SHA1',
'key' => '123456789',
'fields' => ['my_ip']})
# Make sure to call register
#anonymize.register
"
code => "
# Invoke the filter
#anonymize.filter(event)
"
}
}
output { stdout { codec => rubydebug {metadata => true} } }
Well, I wasn't able to figure out how to call another filter from within a ruby filter, but I did get to the functional goal.
filter {
if [fta] {
ruby {
init => "require 'openssl'"
code => "event['fta'].each { |item| event[item] = OpenSSL::HMAC.hexdigest(OpenSSL::Digest::SHA256.new, '123456789', event[item] ) }"
}
}
}
If the field FTA exists, it will SHA2 encode each of the fields listed in that array.
I am using Logstash to get the log from a url using http_poller. This works fine. The problem I have is that the log that gets received does not get send to Elastic Search in the right way. I tried splitting the result in different events but the only event that shows in Kibana is the last event from the log. Since I am pulling the log every 2 minutes, a lot of log information gets lost this way.
The input is like this:
input {
http_poller {
urls => {
logger1 => {
method => get
url => "http://servername/logdirectory/thislog.log"
}
}
keepalive => true
automatic_retries => 0
# Check the site every 2 minutes
interval => 120
request_timeout => 110
# Wait no longer than 110 seconds for the request to complete
# Store metadata about the request in this field
metadata_target => http_poller_metadata
type => 'log4j'
codec => "json"
# important tag settings
tags => stackoverflow
}
}
I then use a filter to add some fields and to split the logs
filter {
if "stackoverflow" in [tags] {
split {
terminator => "\n"
}
mutate {
add_field => {
"Application" => "app-stackoverflow"
"Environment" => "Acceptation"
}
}
}
}
The output then gets send to the Kibana server using the following output conf
output {
redis {
host => "kibanaserver.internal.com"
data_type => "list"
key => "logstash N"
}
}
Any suggestions why not all the events are stored in Kibana?
I have been trying to send logs from logstash to elasticsearch.Suppose I am running a logstash instance and while it is running,I make a change to the file which the logstash instance is monitoring,then all the logs which have been previously saved in the elasticsearch are saved again,hence duplicates are formed.
Also,when the logstash instance is closed and is restarted again,the logs gets duplicated in the elasticsearch.
How do I counter this problem?
How to send only the newest added entry in the file from logstash to elasticsearch?
My logstash instance command is the following:
bin/logstash -f logstash-complex.conf
and the configuration file is this:
input
{
file
{
path => "/home/amith/Desktop/logstash-1.4.2/accesslog1"
}
}
filter
{
if [path] =~ "access"
{
mutate
{
replace =>
{ "type" => "apache_access" } }
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
host => localhost
index => feb9
}
stdout { codec => rubydebug }
}
I got the solution.
I was opening the file,adding a record and saving it ,due to which logstash treated the same file as a different file each time I saved it as it registered different inode number for the same file.
The solution is to append a line to the file without opening the file but by running the following command.
echo "the string you want to add to the file" >> filename
[ELK stack]
I wanted some custom configs in
/etc/logstash/conf.d/vagrant.conf
so the first step was to make a backup: /etc/logstash/conf.d/vagrant.conf.bk
This caused logstash to add 2 entries in elasticseach for each entry in <file>.log;
the same if i had 3 files in /etc/logstash/conf.d/*.conf.* in ES i had 8 entries for each line in *.log
As you mentioned in your question.
when the logstash instance is closed and is restarted again,the logs gets duplicated in the elasticsearch.
So, it probably you have delete the .since_db. Please have a look at here.
Try to specific the since_db and start_position. For example:
input
{
file
{
path => "/home/amith/Desktop/logstash-1.4.2/accesslog1"
start_position => "end"
sincedb_path => /home/amith/Desktop/sincedb
}
}
I have my logstash instance create a new directory everyday to store its logs. The config file is below. It seems to create a directory (and start using it) in the evening a day early; as opposed to creating it right after midnight (when the date actually changes). I am on the West coast (UTC−08:00). I am on an OEL os.
Configuration:
input {
udp {
port => 6379
}
}
filter {
ruby {
code => "event['#timestamp'] = event['#timestamp'].localtime('-08:00')"
}
}
output {
file {
path => ["/logstash-1.4.1/logs/%{+YYYY-MM-dd}/logstash_in.txt"]
}
elasticsearch {
protocol => http
}
stdout {
codec => rubydebug
}
}
My system date and time are correct:
[root#xxx]# date
Mon Jul 14 18:22:37 PDT 2014
For short answer, the file output path timestamp %{+YYYY-MM-dd} is refer to UTC time.
That's means your directory will be create at your evening time.
For long answer, you can refer to the file output source code. The path is
path = event.sprintf(#path)
And drill down to the event.rb
t = #data["#timestamp"]
formatter = org.joda.time.format.DateTimeFormat.forPattern(key[1 .. -1])\
.withZone(org.joda.time.DateTimeZone::UTC)
#next org.joda.time.Instant.new(t.tv_sec * 1000 + t.tv_usec / 1000).toDateTime.toString(formatter)
# Invoke a specific Instant constructor to avoid this warning in JRuby
# > ambiguous Java methods found, using org.joda.time.Instant(long)
org.joda.time.Instant.java_class.constructor(Java::long).new_instance(
t.tv_sec * 1000 + t.tv_usec / 1000
).to_java.toDateTime.toString(formatter)
The path paramter %{+YYYY-MM-dd} is based on the UTC time: (org.joda.time.DateTimeZone::UTC).
So, there are two solution to do what you need,
a) Modify event.rb to use your timezone, instead of UTC.
b) Create a your own day field and use that field you specific %{+YYYY-MM-dd}
Here is my configuration:
filter {
ruby {
code => "
ownTime = event['#timestamp'].localtime('-08:00')
event['day'] = ownTime.strftime('%Y-%m-%d')
"
}
}
output {
file {
path => "/logstash-1.4.1/logs/%{day}/logstash_in.txt"
}
stdout {
codec => "rubydebug"
}
}
Hope this can help you.
If you want to convert the timezone based on the timezone's name:
filter {
date {
match => [ "#timestamp", "ISO8601" ]
timezone => "America/New_York"
}
}