Logstash is receiving a docs JSON object which contains various types of docs.
{
"docs": [
{
"_id": "project:A",
"_rev": "project:1",
"name": "secret",
"children": ["item:A"]
},
{
"_id": "item:A",
"_rev": "item:1",
"name": "secret"
}
]
}
I want each doc with an _id starting with project to include matching children. The end result should be:
{
"docs": [
{
"_id": "project:A",
"_rev": "project:1",
"name": "secret",
"children": [{
"_id": "item:A",
"_rev": "item:1",
"name": "secret"
}]
},
]
}
How can I achieve this?
Here is my conf file. I aven't been able to figure out how to solve this:
input {
file {
path => ["/home/logstash/logstash-testdata.json"]
sincedb_path => "/dev/null"
start_position => "beginning"
}
}
filter {
json {
source => "message"
}
// ... ???
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
stdout {
codec => rubydebug
}
}
Related
I am using logstash 5.6
In my document, I have a subfield "[emailHeaders][reingested-on]", and another field called [attributes], which contains several subfields [string], [double], each of which are arrays. :
{
"emailHeaders": {
"reingested-on": ["1613986076000"]
},
"attributes": {
"string": [
{
"name": "attributeString1",
"value": "attributeStringValue1"
},
{
"name": "attributeString2",
"value": "attributeStringValue2"
}
],
"double": [
{
"name": "attributeDouble1",
"value": 1.0
}
]
}
}
If the element [emailHeaders][reingested-on] is present in the document, I want to copy 1613986076000 (ie. the first element of [emailHeaders][reingested-on]) into [attributes][date] as follows:
{
"emailHeaders": {
"reingested-on": ["1613986076000"]
},
"attributes": {
"string": [
{
"name": "attributeString1",
"value": "attributeStringValue1"
},
{
"name": "attributeString2",
"value": "attributeStringValue2"
}
],
"double": [
{
"name": "attributeDouble1",
"value": 1.0
}
],
"date": [
{
"name": "Reingested on",
"value": 1613986076000
}
]
}
}
Note that if [attributes][date] already exists, and already contains an array of name/value pairs, I want my new object to be appended to the array.
Also, note that [attributes][date] is an array of objects which contain a date in their [value] attribute, as per the mapping of my ElasticSearch index:
...
"attributes": {
"properties": {
...
"date": {
"type": "nested",
"properties": {
"id": {"type": "keyword"},
"name": {"type": "keyword"},
"value": {"type": "date"}
}
},
...
}
},
...
I tried the following logstash configuration, with no success:
filter {
# See https://stackoverflow.com/questions/30309096/logstash-check-if-field-exists : this is supposed to allow to "test" if [#metadata][reingested-on] exists
mutate {
add_field => { "[#metadata][reingested-on]" => "None" }
copy => { "[emailHeaders][reingested-on][0]" => "[#metadata][reingested-on]" }
}
if [#metadata][reingested-on] != "None" {
# See https://stackoverflow.com/questions/36127961/append-array-of-json-logstash-elasticsearch: I create a temporary [error] field, and I try to append it to [attributes][date]
mutate {
add_field => { "[error][name]" => "Reingested on" }
add_field => { "[error][value]" => "[#metadata][reingested-on]" }
}
mutate {
merge => {"[attributes][date]" => "[error]"}
}
}
}
But what I get is:
{
"emailHeaders": {
"reingested-on": ["1613986076000"]
},
"error": {
"name": "Reingested on",
"value": "[#metadata][reingested-on]"
},
"attributes": {
"string": [
{
"name": "attributeString1",
"value": "attributeStringValue1"
},
{
"name": "attributeString2",
"value": "attributeStringValue2"
}
],
"double": [
{
"name": "attributeDouble1",
"value": 1.0
}
]
}
}
My temporary [error] object is created, but its value is wrong: it should be 1613986076000 instead of [#metadata][reingested-on]
Also, it is not appended to the array [attribute][date]. In this example, this array does not exist, so I want it to be created with my temporary object as first element, as per the expected result above.
Finally, I got working ELK stack to get some logs from a remote server. However, I would like to customize the output of the logs. Is there a way to remove some fields which I am highlighting in yellow:
I tried to remove them from _source including remove_field in the logstash.conf:
input {
beats {
port => 5044
ssl => true
ssl_certificate => "/..."
ssl_key => "/..logstash.key"
}
}
filter {
grok {
match => {
"message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}"
}
remove_field => [ "tags", "prospector.type", "host.architecture", "host.containerized", "host.id", "host.os.platform", "host.os.family" ]
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
}
}
Do you know how can I get rid of the yellow fields in _source for the logs coming from filebeat?
Update of logstash.conf based on Leandro comments:
input {
beats {
port => 5044
ssl => true
ssl_certificate => ".../logstash.crt"
ssl_key => ".../logstash.key"
}
}
filter {
grok {
match => {
"message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}"
}
remove_field => [ "tags","[prospector][type]","[host][architecture]", "[host][containerized]", "[host][id]", "[host][os][platform]", "[host][os][family]", "[beat][hostname]", "[beat][name]", "[beat][version], "[offset]", "[input][type]", "[meta][cloud][provider]", "[meta][cloud][machine_type]", "[meta][cloud][instance_id]"]
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
}
}
In logs:
019-02-27T17:03:41.637-0800 DEBUG [input] file/states.go:68 New state added for /logs/api.log
2019-02-27T17:03:41.637-0800 DEBUG [registrar] registrar/registrar.go:315 Registrar state updates processed. Count: 1
2019-02-27T17:03:41.637-0800 DEBUG [registrar] registrar/registrar.go:400 Write registry file: /filebeat/registry
2019-02-27T17:03:41.637-0800 INFO log/harvester.go:255 Harvester started for file: /logs/api.log
2019-02-27T17:03:41.647-0800 DEBUG [publish] pipeline/processor.go:308 Publish event: {
"#timestamp": "2019-02-28T01:03:41.647Z",
"#metadata": {
"beat": "filebeat",
"type": "doc",
"version": "6.6.0"
},
"log": {
"file": {
"path": "/logs/api.log"
}
},
"input": {
"type": "log"
},
"host": {
"name": "tomcat",
"os": {
"family": "redhat",
"name": "CentOS Linux",
"codename": "Core",
"platform": "centos",
"version": "7 (Core)"
},
"id": "6aaed308aa5a419f880c5e45eea65414",
"containerized": true,
"architecture": "x86_64"
},
"meta": {
"cloud": {
"region": "CanadaCentral",
"provider": "az",
"instance_id": "6452bcf4-7f5d-4fc3-9f8e-5ea57f00724b",
"instance_name": "tomcat",
"machine_type": "Standard_D8s_v3"
}
},
"message": "2018-09-14 20:23:37 INFO ContextLoader:272 - Root WebApplicationContext: initialization started",
"source": "/logs/api.log",
"offset": 0,
"prospector": {
"type": "log"
},
"beat": {
"hostname": "tomcat",
"version": "6.6.0",
"name": "tomcat"
}
}
Thanks
Some of those fields are nested fields, the way to access them in a Logstash filter is using the [field][subfield] notation.
Your remove_field shoud be something like this:
remove_field => ["tags","[host][architecture]","[meta][cloud][provider]"]
But I don't think you can remove the #version field.
UPDATE:
Using the event example from your Filebeat log I simulated a pipeline and got a _grokparsefailure, to remove the fields even when the grok fails you need to use the remove_field inside a mutate filter:
filter {
grok {
your grok
}
mutate {
remove_field => ["[prospector]","[host][architecture]", "[host][containerized]", "[host][id]", "[host][os][platform]", "[host][os][family]", "[beat]", "[offset]", "[input]", "[meta]"]
}
}
Don't remove the tags field until you have fixed your groks.
The logstash output on that example is:
{
"source": "/logs/api.log",
"tags": [
"_grokparsefailure"
],
"#timestamp": "2019-02-28T01:03:41.647Z",
"message": "2018-09-14 20:23:37 INFO ContextLoader:272 - Root WebApplicationContext: initialization started",
"log": {
"file": {
"path": "/logs/api.log"
}
},
"#version": "1",
"host": {
"os": {
"codename": "Core",
"version": "7 (Core)",
"name": "CentOS Linux"
},
"name": "tomcat"
}
}
When attempting to parse JSON data with Logstash, it seems to fail the parse and my JSON doesn't get sent to ES as expected. Any suggestions would be great. Attempting to log failed Wordpress logins, but having no luck with the parsing of the JSON.
Currently using Logstash 6.4.2 on FreeBSD 11.
Example log file. File has nothing else but this data.
{
"username": "billy",
"password": "gfdgdfdfg4",
"time": "2019-02-03 00:39:11",
"agent": "Mozilla\/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko\/20100101 Firefox\/62.0",
"ip": "11.11.11.11"
}
Template
{
"index_patterns": ["wpbadlogin*"],
"settings": {
"number_of_shards": 1,
"number_of_replicas" : 0,
"index.refresh_interval": "60s"
},
"mappings": {
"_default_": {
"properties": {
"host": {
"type": "text"
},
"username": {
"type": "text"
},
"password": {
"type": "text"
},
"agent": {
"type": "text"
},
"ip": {
"type": "ip"
}
},
"_all": {
"enabled": false
}
}
}
}
Logstash config
input {
file {
type => "json"
codec => "json"
sincedb_path => "/dev/null"
path => "/var/log/lighttpd/badlogin.txt"
start_position => "beginning"#
tags => ["wpbadlogin"]
}
}
#filter { }
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["10.0.5.30:9200"]
template => "/usr/local/etc/logstash/templates/wpbadlogin.json"
template_name => "wpbadlogin"
template_overwrite => true
index => "wpbadlogin"
}
}
Error: https://pastebin.com/raw/KWEYGkLn
Please don't tell me to "googleit"!
I have been poring over the Apache pages and the IBM pages for days trying to find the full allowed syntax for a Design Doc.
From the above readings:
the 'map' property is always a Javascript function
the 'options' property may be one/both of local_seq or include_design.
When I use Fauxton to edit a Mango Query, however, I see that the reality is much broader.
I defined a query ...
{
"selector": {
"data.type": {
"$eq": "invoice"
},
"data.idib": {
"$gt": 0,
"$lt": 99999
}
},
"sort": [
{
"data.type": "desc"
},
{
"data.idib": "desc"
}
]
}
... with an accompanying index ...
{
"index": {
"fields": [
"foo"
]
},
"name": "foo-json-index",
"type": "json"
}
... and then looked at the design doc produced ...
{
"_id": "_design/5b1cf1be5a6b7013019ba4afac2b712fc06ea82f",
"_rev": "1-1e6c5b7bc622d9b3c9b5f14cb0fcb672",
"language": "query",
"views": {
"invoice_code": {
"map": {
"fields": {
"data.type": "desc",
"data.idib": "desc"
},
"partial_filter_selector": {}
},
"reduce": "_count",
"options": {
"def": {
"fields": [
{
"data.type": "desc"
},
{
"data.idib": "desc"
}
]
}
}
}
}
}
Both of the published syntax rules are broken!
map is not a function
options defines the fields of the index
Where can I find a full description of all the allowed properties of a Design Document?
Hi i am trying to setup ELK server for log management . My logstash service is running fine .I am receiving logs from other machine but logstash pipeline is not able to send data to elastic search
When i look at the logstash .log file it shows this error
I am not able to identify the error in my configuration file
:message=>"Error: Expected one of #, input, filter, output at line 34, column 1 (byte 855) after ", :level=>:error}
my logstash/conf.d
input {
beats {
port => 5044
ssl => true
ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder.crt"
ssl_key => "/etc/pki/tls/private/logstash-forwarder.key"
}
}
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{#timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
sniffing => true
manage_template => false
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}
{
"mappings": {
"_default_": {
"_all": {
"enabled": true,
"norms": {
"enabled": false
}
},
"dynamic_templates": [
{
"template1": {
"mapping": {
"doc_values": true,
"ignore_above": 1024,
"index": "not_analyzed",
"type": "{dynamic_type}"
},
"match": "*"
}
}
],
"properties": {
"#timestamp": {
"type": "date"
},
"message": {
"type": "string",
"index": "analyzed"
},
"offset": {
"type": "long",
"doc_values": "true"
},
"geoip" : {
"type" : "object",
"dynamic": true,
"properties" : {
"location" : { "type" : "geo_point" }
}
}
}
}
},
"settings": {
"index.refresh_interval": "5s"
},
"template": "filebeat-*"
}
This error happens when there are files in /etc/logstash/conf.d directory which logstash can not parse. Remove them and see if this helps. In my case, I had the same error when reports.xml file was presented in conf.d directory.