Logstash fingerprint filter not working - logstash

I have a logstash filter that extracts an api token string from an XML payload. I don't want to store the actual API token in elasticsearch, I want to store a hashed version. My filter file is as follows:
filter {
xml {
source => "xml_request"
store_xml => "false"
force_array => "false"
xpath => [ "//authentication/apiKey/text()", "api_key" ]
}
if [api_key] =~ /.+/ {
fingerprint {
method => "SHA256"
key => "some_random_string"
source => "api_key"
target => "api_key"
}
}
}
Unfortunately the fingerprint filter does not seem to be working because the api_key value is always the value from the XML input and not SHA256 hashed. I have tried setting the target field to a new field (e.g. api_key_hashed) to test, but the new field does not show up. Can anyone shed some light please?

I do not know if this helps, but you can try:
fingerprint {
add_field => {
"apikey" => "%{api_key}"
remove_field => [ "%{api_key}" ]
}
add_field => {
"api_key" => "%{apikey}"
remove_field => [ "%{apikey}" ]
}
}
Otherwise, you can try:
GROK_OVERRIDE or GROK ADD FIELD or DROP ADD FIELD

Related

logstash - Conditionally converts field types

I inherited a logstash config as follows. I do not want to do major changes in this because I do not want to break anything that is working. The metrics are sent as logs with json in format - "metric": "metricname", "value": "int". This has been working great. However, there is a requirement to have a string in value for a new metric. It is not really a metric but to indicate the state of the processing in string. Based on the following filter, it converts everything to integer and any string in value will be converted to 0. The requirement is that if the value is a string, it shouldn't attempt convert. Thank you!
input {
beats {
port => 5044
}
}
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:ts} - M_%{DATA:task}_%{NUMBER:thread} - INFO - %{GREEDYDATA:jmetric}"}
remove_field => [ "message", "ecs", "original", "agent", "log", "host", "path" ]
break_on_match => false
}
if "_grokparsefailure" in [tags] {
drop {}
}
date {
match => ["ts", "ISO8601"]
target => "#timestamp"
}
json {
source => "jmetric"
remove_field => "jmetric"
}
split {
field => "points"
add_field => {
"metric" => "%{[points][metric]}"
"value" => "%{[points][value]}"
}
remove_field => [ "points", "event", "tags", "ts", "stream", "input" ]
}
mutate {
convert => { "value" => "integer" }
convert => { "thread" => "integer" }
}
}
You should use index mappings for this mainly.
Even if you handle things in logstash, elasticsearch will - if configured with the defaults - do dynamic mapping, which may work against any configuration you do in logstash.
See Elasticsearch index templates
An index template is a way to tell Elasticsearch how to configure an index when it is created.
...
Index templates can contain a collection of component templates, as well as directly specify settings, mappings, and aliases.
Mappings are pr index! This means that when you apply new mapping, you will have to create a new index. You can "rollover" to a new index, or delete / import your data again. What you do depends on your data, how you receive it, etc. ymmv...
No matter what, if your index has the wrong mapping you will need to create a new index to get the new mapping.
PS! If you have a lot of legacy data take a look at the reindex API for elasticsearch.

Creating dynamic Key-Value pairs in logstash

I have the following data in logstash output:
"Details" => "SAID,:EGT1_M2P7_01,::LIP,:10-168-98-203::RIP,:10-81-122-84:",
I want to make dynamic Key-value pairs according to delimiters
",:" means that "SAID" is the key and "EGT1_M2P7_01" is the value
"::" means that it is a new line and again ",:" means that "LIP" is the key and "10-168-98-203" is the value.
Need to know how to do it. Looking forward for answers
for the input you have given
"SAID,:EGT1_M2P7_01,::LIP,:10-168-98-203::RIP,:10-81-122-84:"
this filter plugin and stdout
filter {
kv {
source => "Details"
field_split => "::"
value_split => ":"
}
mutate {
remove_field => ["host", "#timestamp","#version", "message", "sequence" ]
}
}
output {
stdout {
codec => rubydebug
}
}
gives you
{
"LIP," => "10-168-98-203",
"SAID," => "EGT1_M2P7_01,",
"RIP," => "10-81-122-84"
}
remove the additional fields that are specific to your host system by adding in above remove_field list.

Logstash - change value of field in cloned document (logstash-clone filter plugin)

Logstash 7.8.1
I'm trying to create two documents from one input with logstash. Different templates, different output indexes. Everything worked fine until I tried to change value only on the cloned doc.
I need to have one field in both documents with different values - is it possible with clone filter plugin?
Doc A - [test][event]- trn
Doc B (cloned doc) - [test][event]- spn
I thought that it will work if I use remove_field and next add_field in clone plugin, but I'm afraid that there was problem with sorting - maybe remove_field method is called after add_field (the field was only removed, but not added with new value).
Next I tried to add value to cloned document first and than to original, but it always made an array with both values (orig and cloned) and I need to have only one value in that field:/.
Can someone help me please?
Config:
input {
file {
path => "/opt/test.log"
start_position => beginning
}
}
filter {
grok {
match => {"message" => "... grok...."
}
}
mutate {
add_field => {"[test][event]" => "trn"}
}
clone {
clones => ["cloned"]
#remove_field => [ "[test][event]" ] #remove the field completely
add_field => {"[test][event]" => "spn"} #not added
add_tag => [ "spn" ]
}
}
output {
if "spn" in [tags] {
elasticsearch {
index => "spn-%{+yyyy.MM}"
hosts => ["localhost:9200"]
template_name => "templ1"
}
stdout { codec => rubydebug }
} else {
elasticsearch {
index => "trn-%{+yyyy.MM}"
hosts => ["localhost:9200"]
template_name => "templ2"
}
stdout { codec => rubydebug }
}
}
If you want to make the field that is added conditional on whether the event is the clone or the original then check the [type] field.
clone { clones => ["cloned"] }
if [type] == "cloned" {
mutate { add_field => { "foo" => "spn" } }
} else {
mutate { add_field => { "foo" => "trn" } }
}
add_field is always done before remove_field.

How to validate the data format in Logstash after parsing with KV filter

I have log messages like:
2017-01-06 19:27:53,893 INFO [[ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'] com.symantec.olp.sas.log.SummaryLogAspect - {country=JP, svid=182, typ=sassum, vid=1120, xffmode=0, initallocdate=2014-12-15T01:08:24Z, xffip=222.230.107.165, seatcnt=1, plgid=2, api=941753644, oslocale=JPN, fng=CJ6FRE1208VMNRQG, req=/228228131/28RXAAPB1DqJj/RSLHL940/EMBXtu+/f+/Zeb/KV1Q/DTXZBFC94ZE5AOmz/mDCqB7zJOARDQO/166180202502162557303662649078783407201612&D09DEEFB7E78065D?NAM=SFlBUy1WQTE2&MFN=VkFJTyBDb3Jwb3JhdGlvbg==&MFM=VkpQMTEx&OLA=JPN&OLO=JPN, llmv=470, oslang=JPN, ctok=166180202502162557303662649078783407201612, resptime=119, epid=70D3B811A994477F957A90985109BE9D, campnid=0, remip=222.230.107.165, lictype=SOS, dbepid=70D3B811A994477F957A90985109BE9D, cid=nav1sasapppex02.msp.symantec.com1481215212435, status=10002, siid=240, skum=21356539, skup=01001230, psn=O749UPCN8KSY, cip=84.100.138.144, mname=VAIO Corporation, puid=1199, skuf=01100470, st=1481765738387, prbid=5967, mmodel=VJP111, clang=EN, pnfi=1120, cprbid=745, cpmv=7428, euip=222.230.107.165, prcdline=2, dvnm=HYAS-VA16, remdays=0, seatid=ah00s8CIdqUQyW2V, sasvid=106, xlsid=3730, baseactkey=186635290403122706518307794, coupon=651218, translogid=75033f05-9cf2-48e2-b924-fc2441d11d33}
2017-01-06 19:28:03,894 INFO [[ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'] com.symantec.olp.sas.log.SummaryLogAspect - {country=JP, svid=182, typ=sassum, vid=1120, xffmode=0, initallocdate=2014-12-15T01:08:24Z, xffip=222.230.107.165, seatcnt=1, plgid=2, api=228228131, oslocale=JPN, fng=1TA6U8RVL0JQXA0N, req=/228228131/28RXAAPB1DqJj/RSLHL940/EMBXtu+/f+/Zeb/KV1Q/DTXZBFC94ZE5AOmz/mDCqB7zJOARDQO/166180202502162557303662649078783407201612&D09DEEFB7E78065D?NAM=SFlBUy1WQTE2&MFN=VkFJTyBDb3Jwb3JhdGlvbg==&MFM=VkpQMTEx&OLA=JPN&OLO=JPN, lpmv=470, oslang=JPN, ctok=166180202502162557303662649078783407201612, resptime=119, epid=70D3B811A994477F957A90985109BE9D, campnid=0, remip=222.230.107.165, lictype=SOS, dbepid=70D3B811A994477F957A90985109BE9D, cid=nav1sasapppex02.msp.symantec.com1481215212435, status=0000, siid=240, skum=21356539, skup=01001230, psn=28MHHH2VPR4T, cip=222.230.107.165, mname=VAIO Corporation, puid=1199, skuf=01100470, st=1481765738387, prbid=5967, mmodel=VJP111, clang=EN, pnfi=1120, cprbid=745, cpmv=1027, euip=222.230.107.165, prcdline=2, dvnm=HYAS-VA16, remdays=0, seatid=StrlisGXA4yAt1ad, sasvid=130, xlsid=2820, baseactkey=028200017462383754273799438, coupon=123456, translogid=72df4536-6038-4d1c-b213-d0ff5c3c20fb}
I use the below grok pattern to match against these:
(?m)%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:severity} \[%{GREEDYDATA:thread}\] %{JAVACLASS:className} - \{%{GREEDYDATA:logmsg}\}
Post that, I use the KV filter to split the fields in logmsg field and include only those fields that are of interest to me. My question is: How can I validate the format of those fields that are of interest to me? One thing I need to mention is - the log contains different numbers of fields in logmsg that's why I've used GREEDYDATA
My Logstash.conf is as follows:
input {
kafka {
bootstrap_servers => "brokers_list"
topics => ["transaction-log"]
codec => "json"
}
}
filter {
grok {
match => [ "message", "(?m)%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:severity} \[%{GREEDYDATA:thread}\] %{JAVACLASS:className} - \{%{GREEDYDATA:logmsg}\}" ]
#overwrite => [ "message" ]
}
if "_grokparsefailure" not in [tags] {
kv {
field_split => ", "
source => "logmsg"
include_keys => ["api", "fng", "status", "cip", "cpmv", "translogid", "coupon", "baseactkey", "xlsid", "sasvid", "seatid", "srcHostname", "serverId" ]
allow_duplicate_values => false
remove_field => [ "message", "kafka.*", "logmsg"]
}
}
if [api] != "228228131" {
mutate { add_tag => "_grokparsefailure" }
}
date { # use timestamp from the log
"match" => [ "timestamp", "YYYY-MM-dd HH:mm:ss,SSS" ]
target => "#timestamp"
}
mutate {
remove_field => [ "timestamp" ] # remove unused stuff
}
}
output {
if "_grokparsefailure" not in [tags] {
kafka {
topic_id => "invalid topic"
bootstrap_servers => "brokers_list"
codec => json {}
}
} else {
kafka {
topic_id => "valid topic"
bootstrap_servers => "brokers_list"
codec => json { }
}
}
}
After parsing with KV filter, I'm checking the value of api field and if it is NOT equal to 228228131, then I add _grokparsefailure tag to it and don't process further.
I want to be able to validate the format of the fields listed in Include_keys like cip which is client IP? How can I validate the data format for those fields? Since my log contains different number of fields, at the grok level I can't validate. Only after parsing with KV I can get those fields and validate them. By validation i mean, validate that they conform to the type defined in ES Index. This is because in case, they are not conforming, I want to send them to invalid-topic in Kafka.
Should I use ruby filter to validate? If so, can you give me a sample? Or should I rebuild the event after KV parsing and again use grok on that newly created event.
Will v much appreciate some sample showing these.
A concrete example would have helped, but you can check for a lot of things with regular expressions:
if [myField] =~ /^[0-9]+$/ {
# it contains a number
}
or something like this:
if [myField] =~ /^[a-z]+$/ {
# it contains lower lowercase letters
}

Retrieving RESTful GET parameters in logstash

I am trying to get logstash to parse key-value pairs in an HTTP get request from my ELB log files.
the request field looks like
http://aaa.bbb/get?a=1&b=2
I'd like there to be a field for a and b in the log line above, and I am having trouble figuring it out.
My logstash conf (formatted for clarity) is below which does not load any additional key fields. I assume that I need to split off the address portion of the URI, but have not figured that out.
input {
file {
path => "/home/ubuntu/logs/**/*.log"
type => "elb"
start_position => "beginning"
sincedb_path => "log_sincedb"
}
}
filter {
if [type] == "elb" {
grok {
match => [ "message", "%{TIMESTAMP_ISO8601:timestamp}
%{NOTSPACE:loadbalancer} %{IP:client_ip}:%{NUMBER:client_port:int}
%{IP:backend_ip}:%{NUMBER:backend_port:int}
%{NUMBER:request_processing_time:float}
%{NUMBER:backend_processing_time:float}
%{NUMBER:response_processing_time:float}
%{NUMBER:elb_status_code:int}
%{NUMBER:backend_status_code:int}
%{NUMBER:received_bytes:int} %{NUMBER:sent_bytes:int}
%{QS:request}" ]
}
date {
match => [ "timestamp", "ISO8601" ]
}
kv {
field_split => "&?"
source => "request"
exclude_keys => ["callback"]
}
}
}
output {
elasticsearch { host => localhost }
}
kv will take a URL and split out the params. This config works:
input {
stdin { }
}
filter {
mutate {
add_field => { "request" => "http://aaa.bbb/get?a=1&b=2" }
}
kv {
field_split => "&?"
source => "request"
}
}
output {
stdout {
codec => rubydebug
}
}
stdout shows:
{
"request" => "http://aaa.bbb/get?a=1&b=2",
"a" => "1",
"b" => "2"
}
That said, I would encourage you to create your own versions of the default URI patterns so that they set fields. You can then pass the querystring field off to kv. It's cleaner that way.
UPDATE:
For "make your own patterns", I meant to take the existing ones and modify them as needed. In logstash 1.4, installing them was as easy as putting them in a new file the 'patterns' directory; I don't know about patterns for >1.4 yet.
MY_URIPATHPARAM %{URIPATH}(?:%{URIPARAM:myuriparams})?
MY_URI %{URIPROTO}://(?:%{USER}(?::[^#]*)?#)?(?:%{URIHOST})?(?:%{MY_URIPATHPARAM})?
Then you could use MY_URI in your grok{} pattern and it would create a field called myuriparams that you could feed to kv{}.

Resources