A piece of data I read from Kafka is like aaaabbbb\u0000�\u0005{"a":"1"},
How to use \u0000�\u0005 as the delimiter
I tried to edit with vim to config file:
use Ctrl+v input u0000 ufffd u0005
filter {
dissect {
mapping => {
"message" => "%{header}^#�^E%{body}"
}
}
json {
source => "body"
target => "body"
}
}
Related
I've tried to parse it using the json, json_lines and even the multiline input plugin, yet to no avail. The multiline works well on my local machine but doesn't seem to work on my s3 and ec2 instance.
How would I write the grok filter to parse this?
This is what my JSON file looks like
{
"sourceId":"94:54:93:3B:81:6F1",
"machineId":"c1VR21A0GoCBgU6EMJ78d3CL",
"columnsCSV":"timestamp,state,0001,0002,0003,0004",
"tenantId":"iugcp",
"valuesCSV":"1557920277890,1,98.66,0.07,0.1,0.17 ",
"timestamp":"2019-05-15T11:37:57.890Z"
}
This is my config -
input {
file{
codec => multiline
{
pattern => '^\{'
negate => true
what => previous
}
path => "/home/*myusername*/Desktop/data/*.json"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
mutate
{
replace => [ "message", "%{message}}" ]
gsub => [ 'message','\n','']
}
if [message] =~ /^{.*}$/
{
json { source => message }
}
}
//Output tag is correct, haven't included it here
The results I get is just the json file present in the "message" field.
What I wanted is for every json tag, there should be a separate field in the document.
I am using logstash to parse json input message and then add another field from one of the parsed values:
filter {
json {
source => "message"
target => "data"
}
mutate {
add_field => {
"index_date" => "%{[data][#timestamp]}}"
}
}
}
This works fine, but now I need index_date to be only the date.
How can I format the [data][#timestamp] field to only return the date?
You will need to install the date_formatter plugin with
bin/logstash-plugin install logstash-filter-date
And then you can use something like this in your logstash filter function
date_formatter {
source => "index_data"
target => "[#metadata][indexDateOnlyDate]"
pattern => "YYYY.MM.dd"}
This should work :)
I got a filename in the format <key>:<value>-<key>:<value>.log like e.g. pr:64-author:mxinden-platform:aws.log containing logs of a test run.
I want to stream each line of the file to elasticsearch via logstash. Each line should be treated as a separate document. Each document should get the fields according to the filename. So e.g. for the above example let's say log-line 17-12-07 foo something happened bar would get the fields: pr with value 64, author with value mxinden and platform with value aws.
At the point in time, where I write the logstash configuration I do not know the names of the fields.
How do I dynamically add fields to each line based on the fields contained in the filename?
The static approach so far is:
filter {
mutate { add_field => { "file" => "%{[#metadata][s3][key]}"} }
else {
grok { match => { "file" => "pr:%{NUMBER:pr}-" } }
grok { match => { "file" => "author:%{USERNAME:author}-" } }
grok { match => { "file" => "platform:%{USERNAME:platform}-" } }
}
}
Changes to the filename structure are fine.
Answering my own question based on #dan-griffiths comment:
Solution for a file like pr=64,author=mxinden,platform=aws.log is to use the Elasticsearch kv filter like e.g.:
filter {
kv {
source => "file"
field_split => ","
}
}
where file is a field extracted from the filename via the AWS S3 input plugin.
my log format is:
XXX: 03-20 17:52:28: XXX. * 0 XXX [XXX] [X XX: X]:XXX\tABC:AD_EF:123\t0\tXXXXXXXXXXXXXXXX\tXXXXXXXXXXXXXXXXXXX
how to write the logstash output config to get ABC, AD_EF, 123 ?
output example:
good,ABC,DEF,123
output {
file {
path => "/xxx/xxx/xxx/output.txt"
codec => plain {
format => "good,ABC,DEF,123" # how to write this regular expression????
}
flush_interval => 0
}
}
Your log output seems to have embedded tabs in it, and those tabs bracket your data. This is good, as it means the csv filter can pull that out for you.
filter {
csv {
separator => " "
columns => [ 'garbage1', 'good', 'garbage2', 'garbage3', 'garbage4' ]
source => "message"
}
}
Note, that is the actual tab character in there, which is hard to represent here.
You would then output the content of the good field to your file.
Thanks For All help, but maybe I made a mistake.
And Finally, I get the answer for my quest:
filter {
grok {
match => {
"message" => "XXX\t(?<field1>\w+?):(?<field2>\w+?):(?<field3>\d+?)\t"
}
}}
I'm building out logstash and would like to build functionality to anonymize fields as specified in the message.
Given the message below, the field fta is a list of fields to anonymize. I would like to just use %{fta} and pass it through to the anonymize filter, but that doesn't seem to work.
{ "containsPII":"True", "fta":["f1","f2"], "f1":"test", "f2":"5551212" }
My config is as follows
input {
stdin { codec => json }
}
filter {
if [containsPII] {
anonymize {
algorithm => "SHA1"
key => "123456789"
fields => %{fta}
}
}
}
output {
stdout {
codec => rubydebug
}
}
The output is
{
"containsPII" => "True",
"fta" => [
[0] "f1",
[1] "f2"
],
"f1" => "test",
"f2" => "5551212",
"#version" => "1",
"#timestamp" => "2016-07-13T22:07:04.036Z",
"host" => "..."
}
Does anyone have any thoughts? I have tried several permutations at this point with no luck.
Thanks,
-D
EDIT:
After posting in the Elastic forums, I found out that this is not possible using base logstash functionality. I will try using the ruby filter instead. So, to ammend my question, How do I call another filter from within the ruby filter? I tried the following with no luck and honestly can't even figure out where to look. I'm very new to ruby.
filter {
if [containsPII] {
ruby {
code => "event['fta'].each { |item| event[item] = LogStash::Filters::Anonymize.execute(event[item],'12345','SHA1') }"
add_tag => ["Rubyrun"]
}
}
}
You can execute the filters from ruby script. Steps will be:
Create the required filter instance in the init block of inline ruby script.
For every event call the filter method of the filter instance.
Following is the example for above problem statement. It will replace my_ip field in event with its SHA1.
Same can be achieved using ruby script file.
Following is the sample config file.
input { stdin { codec => json_lines } }
filter {
ruby {
init => "
require 'logstash/filters/anonymize'
# Create instance of filter with applicable parameters
#anonymize = LogStash::Filters::Anonymize.new({'algorithm' => 'SHA1',
'key' => '123456789',
'fields' => ['my_ip']})
# Make sure to call register
#anonymize.register
"
code => "
# Invoke the filter
#anonymize.filter(event)
"
}
}
output { stdout { codec => rubydebug {metadata => true} } }
Well, I wasn't able to figure out how to call another filter from within a ruby filter, but I did get to the functional goal.
filter {
if [fta] {
ruby {
init => "require 'openssl'"
code => "event['fta'].each { |item| event[item] = OpenSSL::HMAC.hexdigest(OpenSSL::Digest::SHA256.new, '123456789', event[item] ) }"
}
}
}
If the field FTA exists, it will SHA2 encode each of the fields listed in that array.