Multi-value fields only store last value - logstash

I'm relatively new to ELK and grok. I'm trying to parse a log file that may contain 1 or more repetitions of the same value. For example the log file could contain:
value1;value2;value3;
value1;
value1;value2;value3;value4;........value900;
For this example, I'm using the following grok pattern:
((?[a-z0-9]*)[;])+
This appears to work properly, and parse each value. The problem is that the "tag" field only contains the last value (ie value900). All of the previous values seem to be overwritten.
Is there a way to gather all of the values and store them into an array instead of only getting the last value?

Simply use mutate:
mutate {
split => ["tag",";"]
}
This will split the value that's in the tag field into an array. So just match the whole string in your grok ((?<tag>[a-z0-9;]+) and then split it from there.

Related

Hapi/Joi Validation For Number Fails

I am trying to validate number value which will include integer as well as float values. Following is my implementation for the same.
Joi Schema.
const numcheckschema = Joi.object().keys({
v1:Joi.number().empty("").allow(null).default(99999),
v2:Joi.number().empty("").allow(null).default(99999),
v3:Joi.number().empty("").allow(null).default(99999)
})
Object
objnum={
v1:"15",
v2:"13.",
v3:"15"
}
objValidated = Joi.validate(objnum, numcheckschema);
console.log(objValidated);
When i execute the above mentioned code I get an error
ValidationError: child "v2" fails because ["v2" must be a number]
as per the documentation when we tries to pass any numeric value as a string it converts the values to number but here in this case my value is 13. which is not able to convert into number and throwing an error.
Is there any way by which we can convert this value to 13.0
You can use a regex in order to match numbers with a dot, for instance:
Joi.string().regex(/\d{1,2}[\,\.]{1}/)
And then combine both validations using Joi.alternatives:
Joi.alternatives().try([
Joi.number().empty("").allow(null),
Joi.string().regex(/\d{1,2}[\,\.]{1}/)
])
However, I think you may need to convert the payload to number using Number(string value). You need to check the payload type, if it isn't a Number, you need to convert it.
If you want to know more about the regex used in the example, you can test it in here: https://regexr.com/

How to extract value from log with grok and logstash

I must extract value from a log composed by row like this:
<38>1 [2017-03-15T08:45:23.168Z] apache.01.mysite.com event=login;src_ip=xxx.xxx.xxx.xxx\, xxx.xxx.xxx.xxx\, xxx.xxx.xxx.xxx;site=FE-B1-Site;cstnr=1454528;user=498119;result=SUCCESS
For example with %{IP:source}
I obtain only the first IP but, sometimes, I have 3 IP address.
How I can extract all IP,'cstnr', 'result' and 'user' ?
Looks like you have a nested, delimited key-value format. First delimiter is ;, with each of those a key=value. Additionally, the values are delimited on ,'. You have enough grok to get the first IP address, but I suggest doing something a bit different:
Use grok to grab the entire string after your site-name.
Use the kv filter with field_split => ';', which will create fields named the same as your keys.
Use the csv filter on the src_ip address captured in the kv filter stage.
Use columns => [ cstnr', 'result', 'user' ] to get those fields named right.

Elasticsearch: How to get the length of a string field(before analysis)?

My index has a string field containing a variable length random id. Obviously it shouldn't be analysed.
But I don't know much about elasticsearch especially when I created the index.
Today I tried a lot to filter documents based on the length of id, finally I got this groovy script:
doc['myfield'].values.size()
or
doc['myfield'].value.size()
both returns mysterious numbers, I think that's because of the field got analysed.
If it's really the case, is there any way to get the original length or fix the problem, without rebuild the whole index?
Use _source instead of doc. That's using the source of the document, meaning the initial indexed text:
_source['myfield'].value.size()
If possible, try to re-index the documents to:
use doc[field] on a not-analyzed version of that field
even better, find out the size of the field before you index the document and consider adding its size as a regular field in the document itself
Elasticsearch stores a string as tokenized in the data structure ( Field data cache )where we have script access to.
So assuming that your field is not not_analyzed , doc['field'].values will look like this
"In america" => [ "in" , "america" ]
Hence what you get from doc['field'].values is a array and not a string.
Now the story doesn't change even if you have a single token or have the field as not_analyzed.
"america" => [ "america" ]
Now to see the size of the first token , you can use the following request
{
"script_fields": {
"test1": {
"script": "doc['field'].values[0].size()"
}
}
}

How to define grok pattern for pipe delimited log message?

setting up ELK is very easy until you hit the logstash filter. I have a log delimited 10 fields. I may have some field blank but I am sure there will be 10 fields:
7/5/2015 10:10:18 AM|KDCVISH01|
|ClassNameUnavailable:MethodNameUnavailable|CustomerView|xwz261|ef315792-5c41-4bdf-aa66-73317e82e4d6|52|6182d1a1-7916-4874-995b-bc9a23437dab|<Exception>
afkh akla 487234 &*<Exception>
Q:
1- I am confused how grok or regex pattern will pick only the field that I am looking and not the similar match from another field. For example, what is the guarantee that DATESTAMP pattern picks only the first value and not the timestamp present in the last field (buried in stack trace)?
2- Is there a way to define positional mapping? For example, 1st fiels is dateTime, 2nd is machine name, 3rd is class name and so on. This will make sure I have fields displayed in Kibana no matter the field value is present or not.
I know i am little late, But here is a simple solution which i am using,
replace your | with space
option 1:
filter {
mutate {
gsub => ["message","\|"," "]
}
grok {
match => ["message","%{DATESTAMP:time} %{WORD:MESSAGE1} %{WORD:EXCEPTION} %{WORD:MESSAGE2}"]
}
}
option 2: excepting |
filter {
grok {
match => ["message","%{DATESTAMP:time}\|%{WORD:MESSAGE1}\|%{WORD:EXCEPTION}\|%{WORD:MESSAGE2}"]
}
}
it is working fine : http://grokdebug.herokuapp.com/. check here.

logstash grok filter for logs with arbitrary attribute-value pairs

(This is related to my other question logstash grok filter for custom logs )
I have a logfile whose lines look something like:
14:46:16.603 [http-nio-8080-exec-4] INFO METERING - msg=93e6dd5e-c009-46b3-b9eb-f753ee3b889a CREATE_JOB job=a820018e-7ad7-481a-97b0-bd705c3280ad data=71b1652e-16c8-4b33-9a57-f5fcb3d5de92
14:46:17.378 [http-nio-8080-exec-3] INFO METERING - msg=c1ddb068-e6a2-450a-9f8b-7cbc1dbc222a SET_STATUS job=a820018e-7ad7-481a-97b0-bd705c3280ad status=ACTIVE final=false
I built a pattern that matched the first line:
%{TIME:timestamp} %{NOTSPACE:http} %{WORD:loglevel}%{SPACE}%{WORD:logtype} - msg=%{NOTSPACE:msg}%{SPACE}%{WORD:action}%{SPACE}job=%{NOTSPACE:job}%{SPACE}data=%{NOTSPACE:data}
but obviously that only works for lines that have the data= at the end, versus the status= and final= at the end of the second line, or other attribute-value pairs on other lines? How can I set up a pattern that says that after a certain point there will be an arbitrary of foo=bar pairs that I want to recognize and output as attribute/value pairs in the output?
You can change your grok pattern like this to have all the key value pairs in one field (kvpairs):
%{TIME:timestamp} %{NOTSPACE:http} %{WORD:loglevel}%{SPACE}%{WORD:logtype} - %{GREEDYDATA:kvpairs}
Afterwards you can use the kv filter to parse the key value pairs.
kv {
source => "kvpairs"
remove_field => [ "kvpairs" ] # Delete the field afterwards
}
Unfortunately, you have some simple values inside your kv pairs (e.g. CREATE_JOB). You could parse them with grok and use one kv filter for the values before and another kv filter for the values after those simple values.

Resources