Logstash multiline input works on local but not on EC2 instance - logstash

I've tried to parse it using the json, json_lines and even the multiline input plugin, yet to no avail. The multiline works well on my local machine but doesn't seem to work on my s3 and ec2 instance.
How would I write the grok filter to parse this?
This is what my JSON file looks like
{
"sourceId":"94:54:93:3B:81:6F1",
"machineId":"c1VR21A0GoCBgU6EMJ78d3CL",
"columnsCSV":"timestamp,state,0001,0002,0003,0004",
"tenantId":"iugcp",
"valuesCSV":"1557920277890,1,98.66,0.07,0.1,0.17 ",
"timestamp":"2019-05-15T11:37:57.890Z"
}
This is my config -
input {
file{
codec => multiline
{
pattern => '^\{'
negate => true
what => previous
}
path => "/home/*myusername*/Desktop/data/*.json"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
mutate
{
replace => [ "message", "%{message}}" ]
gsub => [ 'message','\n','']
}
if [message] =~ /^{.*}$/
{
json { source => message }
}
}
//Output tag is correct, haven't included it here
The results I get is just the json file present in the "message" field.
What I wanted is for every json tag, there should be a separate field in the document.

Related

In Logstash, how do I extract fields from a log event using the json filter?

Logstash v2.4.1.
I'm sending JSON formatted logs to my Logstash server via UDP packet. The logs look something similar to this.
{
"key1":"value1",
"key2":"value2",
"msg":"2017-03-02 INFO [com.company.app] Hello world"
}
This is my output filter
output {
stdout {
codec => rubydebug
}
file {
path => "/var/log/trm/debug.log"
codec => line { format => "%{msg}" }
}
}
The rubydebug output codec shows the log like this
{
"message" => {\"key1\":\"value1\", "key2\":\"value2\", \"msg\":\"2017-03-02 INFO [com.company.app] Hello world\"
}
and the file output filter also shows the JSON log correctly, like this
{"key1":"value1", "key2":"value2", "msg":"2017-03-02 INFO [com.company.app] Hello world"}
When I use the JSON code in the input filter, I get _jsonparsefailures from Logstash on "some" logs, even though different online JSON parsers parse the JSON correctly, meaning my logs are in a valid JSON format.
input {
udp => {
port => 5555
codec => json
}
}
Therefore, I'm trying to use the json filter instead, like this
filter {
json => {
source => "message"
}
}
Using the json filter, how can I extract the "key1", "key2", and the "msg" fields in the "message?"
I tried this to no avail, that is, I don't see the "key1" field in my rubydebug output.
filter {
json => {
source => "message"
add_field => {
"key1" => "%{[message][key1]}"
}
}
}
I would suggest you to start with one of the two configuration below (I use the multiline codec to concatenate the input into a json, because otherwise logstash will read line by line, and one line of a json is not a valid json), then either filter the json, or use the json codec, and then output it to wherever it is needed. You will still have some configuration to do, but I believe it might help you to get started:
input{
file {
path => "/an/absolute/path/tt2.json" #It really has to be absolute!
start_position => beginning
sincedb_path => "/another/absolute/path" #Not mandatory, just for ease of testing
codec => multiline{
pattern => "\n"
what => "next"
}
}
}
filter{
json {
source => "multiline"
}
}
output {
file {
path => "data/log/trm/debug.log"
}
stdout{codec => json}
}
Second possibility:
input{
file {
path => "/an/absolute/path/tt2.json" #It really has to be absolute!
start_position => beginning
sincedb_path => "/another/absolute/path" #Not mandatory, just for ease of testing
codec => multiline{
pattern => "\n"
what => "next"
}
codec => json{}
}
}
output {
file {
path => "data/log/trm/debug.log"
}
stdout{codec => json}
}
Edit With the udp input I guess it should be (not tested):
input {
udp => {
port => 5555
codec => multiline{ #not tested this part
pattern => "^}"
what => "previous"
}
codec => json{}
}
}

Not able to create csv output from LogStash

Maybe it is me, but how come that when I use the CSV Output from LogStash it does not output in a csv format? I am using nothing special (as seen in the configuration). Can someone tell me what I am doing wrong?
input
{
stdin {
type => "stdin-type"
}
}
filter
{
mutate { add_field => { "test" => "testme" } }
mutate { add_field => { "[#metadata][test]" => "Hello" } }
mutate { add_field => { "[#metadata][test2]" => "world" } }
}
output {
# .\bin\logstash-plugin.bat install logstash-output-csv
csv {
fields => ["test", "[#metadata][test]"]
path => "./TestLogs.csv"
}
stdout { codec => rubydebug { metadata => true } }
}
It actually create an output. If I type something (Ex.: test me) in the console (stdin) it creates the file and all. But the CSV file contains the following:
2016-11-25T11:49:40.338Z MyPcName test me
And I am expecting the following:
testme,Hello
Note: I am using LogStash 5 (latest version at the moment).
This is a Logstash 5.x issue. For now, I'm using the script below:
output {
file {
path => "/app/logstash/test.csv"
message_pattern => (grok pattern)
}

How to call another filter from within a ruby filter in logstash.

I'm building out logstash and would like to build functionality to anonymize fields as specified in the message.
Given the message below, the field fta is a list of fields to anonymize. I would like to just use %{fta} and pass it through to the anonymize filter, but that doesn't seem to work.
{ "containsPII":"True", "fta":["f1","f2"], "f1":"test", "f2":"5551212" }
My config is as follows
input {
stdin { codec => json }
}
filter {
if [containsPII] {
anonymize {
algorithm => "SHA1"
key => "123456789"
fields => %{fta}
}
}
}
output {
stdout {
codec => rubydebug
}
}
The output is
{
"containsPII" => "True",
"fta" => [
[0] "f1",
[1] "f2"
],
"f1" => "test",
"f2" => "5551212",
"#version" => "1",
"#timestamp" => "2016-07-13T22:07:04.036Z",
"host" => "..."
}
Does anyone have any thoughts? I have tried several permutations at this point with no luck.
Thanks,
-D
EDIT:
After posting in the Elastic forums, I found out that this is not possible using base logstash functionality. I will try using the ruby filter instead. So, to ammend my question, How do I call another filter from within the ruby filter? I tried the following with no luck and honestly can't even figure out where to look. I'm very new to ruby.
filter {
if [containsPII] {
ruby {
code => "event['fta'].each { |item| event[item] = LogStash::Filters::Anonymize.execute(event[item],'12345','SHA1') }"
add_tag => ["Rubyrun"]
}
}
}
You can execute the filters from ruby script. Steps will be:
Create the required filter instance in the init block of inline ruby script.
For every event call the filter method of the filter instance.
Following is the example for above problem statement. It will replace my_ip field in event with its SHA1.
Same can be achieved using ruby script file.
Following is the sample config file.
input { stdin { codec => json_lines } }
filter {
ruby {
init => "
require 'logstash/filters/anonymize'
# Create instance of filter with applicable parameters
#anonymize = LogStash::Filters::Anonymize.new({'algorithm' => 'SHA1',
'key' => '123456789',
'fields' => ['my_ip']})
# Make sure to call register
#anonymize.register
"
code => "
# Invoke the filter
#anonymize.filter(event)
"
}
}
output { stdout { codec => rubydebug {metadata => true} } }
Well, I wasn't able to figure out how to call another filter from within a ruby filter, but I did get to the functional goal.
filter {
if [fta] {
ruby {
init => "require 'openssl'"
code => "event['fta'].each { |item| event[item] = OpenSSL::HMAC.hexdigest(OpenSSL::Digest::SHA256.new, '123456789', event[item] ) }"
}
}
}
If the field FTA exists, it will SHA2 encode each of the fields listed in that array.

Avoid collision of 'type' key when both event and logstash input provide values

I am trying to write a pipeline to load a file into logstash. My setup requires specifying the type field in the input section to Run multiple independent logstash config files with input,filter and output. Unfortunately the source data already contains the field type and it looks like the value from the source data is conflicting with the value provided from the input configuration.
The source data contains a json array like the following
[
{"key1":"obj1", "type":"looks like a bad choose for a key name"},
{"key1":"obj2", "type":"you can say that again"}
]
My pipeline looks like the following
input {
exec {
command => "cat /path/file_containing_above_json_array.txt"
codec => "json"
type => "typeSpecifiedInInput"
interval => 3600
}
}
output {
if[type] == "typeSpecifiedInInput" {
stdout {
codec => rubydebug
}
}
}
The output never gets called because type has been set to the value provided from the source data instead of the value provided from the input section.
How can I set up the input pipeline to avoid this conflict?
Nathan
Create a new field in your input instead of reusing 'type'. The exec{} input has add_field available.
Below is the final pipeline that uses add_field instead of type. A filter phase was added to clean up the document so that type field contains the expected value needed for writing into ElasticSearch (class of similar documents). The type value from the original JSON document is perserved in the key typeSpecifiedFromDoc The mutate step had to be broken into separate phases so that the replace would not affect type before its original value had been added as the new field typeSpecifiedFromDoc.
input {
exec {
command => "cat /path/file_containing_above_json_array.txt"
codec => "json"
add_field => ["msgType", "typeSpecifiedInInput"]
interval => 3600
}
}
filter {
if[msgType] == "typeSpecifiedInInput" {
mutate {
add_field => ["typeSpecifiedFromDoc", "%{type}"]
}
mutate {
replace => ["type", "%{msgType}"]
remove_field => ["msgType"]
}
}
}
output {
if[type] == "typeSpecifiedInInput" {
stdout {
codec => rubydebug
}
}
}

Parse log file of json using logstatsh

I have the following json object logs as following in a log file
{"con":"us","sl":[[1,2]],"respstats_1":"t:2,ts:140,m:192.168.7.5,p:|mobfox:1,P,E,0,0.4025:0.0:-:-,0-98;appnexus-marimedia:2,P,L,140,0.038:0.0:-:-,-;","rid":"AKRXRWLYCZIDFM","stats":"t:2,h:2,ts:140,mobfox:0,appnexus-marimedia:140,;m:192.168.7.5;p:","resp_count":0,"client_id":"15397682","err_stats":"mobfox:0-98,"}
{"con":"br","sl":[[1,2,3,4]],"respstats_1":"t:4,ts:285,m:192.168.7.5,p:|smaato:1,P,M,143,0.079:0.0:-:-,-;vserv-specialbuy:2,P,W,285,0.0028:0.0:-:-,-;mobfox:3,P,E,42,0.077:0.0:-:-,0-98;inmobi-pre7:4,P,H,100,0.0796:0.0:-:-,-;","rid":"AKRXRWLYCY4DOU","stats":"t:4,h:4,ts:285,smaato:143,vserv-specialbuy:285,mobfox:42,inmobi-pre7:100,;m:192.168.7.5;p:","resp_count":1,"client_id":"15397682","err_stats":"mobfox:0-98,","ads":[{"pricing":{"price":"0","type":"cpc"},"rank":2,"resp_json":{"img_url":"http://img.vserv.mobi/i/320x50_7/7bfffd967a91e0e38ee06ffcee1a75e5.jpg?108236_283989_c46e3f74","cli_url":"http://c.vserv.mobi/delivery/ck.php?p=2__b=283989__zoneid=108236__OXLCA=1__cb=c46e3f74__dc=1800__cd=usw3_uswest2a-1416567600__c=37742__rs=0a587520_15397682__mser=cdn__dat=3__dacp=12__zt=s__r=http%3A%2F%2Fyeahmobi.go2cloud.org%2Faff_c%3Foffer_id%3D28007%26aff_id%3D10070%26aff_sub%3D108236_283989_c46e3f74","beacons":["http://img.vserv.mobi/b.gif"],"ad_type":"image"},"resp_code":200,"resp_html":"<a href=\"http://c.vserv.mobi/delivery/ck.php?p=2__b=283989__zoneid=108236__OXLCA=1__cb=c46e3f74__dc=1800__cd=usw3_uswest2a-1416567600__c=37742__rs=0a587520_15397682__mser=cdn__dat=3__dacp=12__zt=s__r=http%3A%2F%2Fyeahmobi.go2cloud.org%2Faff_c%3Foffer_id%3D28007%26aff_id%3D10070%26aff_sub%3D108236_283989_c46e3f74\"><img src=\"http://img.vserv.mobi/i/320x50_7/7bfffd967a91e0e38ee06ffcee1a75e5.jpg?108236_283989_c46e3f74\" alt=\"\" /> <\/a><img src=\"http://img.vserv.mobi/b.gif\" alt=\"\" />","tid":"vserv-specialbuy","bid":"576111"}]}
How ever I am not able to figure out whether they are multiline or single line but I have used as following configuration
input {
file {
codec => multiline {
pattern => '^{'
negate => true
what => previous
}
path => ['/home/pp38/fetcher.log']
}
}
filter {
json {
source => message
remove_field => message
}
}
output { stdout { codec => rubydebug } }
I am not able to see any kind of output or error when it is started
edited:
I have used the following config which had generated output.
input {
file {
codec => "json"
type => "json"
path => "/home/pp38/fetcher.log"
sincedb_path => "/home/pp38/tmp/logstash/sincedb"
}
}
filter {
json {
source => "message"
target => "message"
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["localhost:9200"]
}
}
But i am getting the output where each field is indexed by elasticsearch
how can i append the entire json message to new field as message:jsonContent ?
You can handle this with the plain multiline, but for you situation there is a better codec plugin called json_lines.
The json_lines will input a source with multiple jsons(one in each line) and handle each json out of the box.
This codec will decode streamed JSON that is newline delimited. Encoding will emit a single JSON string ending in a \n NOTE: Do not use this codec if your source input is line-oriented JSON, for example, redis or file inputs. Rather, use the json codec. More info: This codec is expecting to receive a stream (string) of newline terminated lines. The file input will produce a line string without a newline. Therefore this codec cannot work with line oriented inputs.

Resources