LogStash concat Filebeat input

LogStash concat Filebeat input - logstash

I am trying to merge filebeat messages in LogStash.
I have the next Log file:
----------- SCAN SUMMARY -----------
Known viruses: 8520944
Engine version: 0.102.4
Scanned directories: 408
Scanned files: 1688
Infected files: 0
Total errors: 50
Data scanned: 8.93 MB
Data read: 4.42 MB (ratio 2.02:1)
Time: 22.052 sec (0 m 22 s)
I read it from Filebeat and send to LogStash.
The problem is that I received in LogStash each line in different message. I want to merge it all, and also add a new field that is "received_at", from the fileBeat input.
I would like to have the next output from LogStash:
{
Known_viruses: 8520944
Engine_version: 0.102.4
Scanned_directories: 408
Scanned_files: 1688
Infected_files: 0
Total_errors: 50
Data_scanned: 8.93MB
Data_read: 4.42MB
Time: 22.052sec (0 m 22 s)
Received_at: <timeStamp taked from filebeat received JSON message>
}
The input I received in LogStash from Filebeat is the next per each line:
{
"#timestamp": "2021-04-20T08:03:33.843Z",
"#version": "1",
"tags": ["beats_input_codec_plain_applied"],
"host": {
"name": "PRLN302",
"architecture": "x86_64",
"ip": ["10.126.40.18", "fe80::7dbf:4941:cd39:c0f9", "172.17.0.1", "fe80::42:82ff:fe2b:895", "fe80::24b2:8cff:feeb:20b4", "172.18.0.1", "fe80::42:53ff:fe31:a025", "fe80::b420:e1ff:fe97:c152", "fe80::9862:21ff:fe3a:c33e", "fe80::48a6:70ff:fec2:60d6", "192.168.126.117", "fe80::2831:c644:33d5:321"],
"id": "a74e193a551f4d379f9488b80a463581",
"os": {
"platform": "ubuntu",
"version": "20.04.2 LTS (Focal Fossa)",
"family": "debian",
"name": "Ubuntu",
"type": "linux",
"codename": "focal",
"kernel": "5.8.0-49-generic"
},
"mac": ["e8:6a:64:32:fe:4d", "dc:8b:28:4a:c8:88", "02:42:82:2b:08:95", "26:b2:8c:eb:20:b4", "02:42:53:31:a0:25", "b6:20:e1:97:c1:52", "9a:62:21:3a:c3:3e", "4a:a6:70:c2:60:d6", "00:50:b6:b9:19:d7"],
"containerized": false,
"hostname": "PRLN302"
},
"message": "----------- SCAN SUMMARY -----------",
"agent": {
"ephemeral_id": "ad402f64-ab73-480c-b6de-4af6184f012c",
"type": "filebeat",
"version": "7.12.0",
"id": "f681d775-d452-490a-9b8b-036466a87d35",
"name": "PRLN302",
"hostname": "PRLN302"
},
"input": {
"type": "log"
},
"ecs": {
"version": "1.8.0"
},
"log": {
"offset": 0,
"file": {
"path": "/var/log/clamav-test.log"
}
}
}
Could it be possible?

It is possible, you'll need to look for the multiline messages in the filebeat input:
https://www.elastic.co/guide/en/beats/filebeat/current/multiline-examples.html
somthing like below would do it i think:
multiline.type: pattern
multiline.pattern: '^----------- SCAN SUMMARY -----------'
multiline.negate: true
multiline.match: after

Related

how to find the string and it value using grep or awk [duplicate]

This question already has answers here:
Extract a specific field from JSON output using jq
(3 answers)
Closed 4 years ago.
I'm new to shell scripting and started baby steps in it.
I've written shell script recently to call rest API and I was able to execute it without any issues.
I've stored the output in a variable like below
{
"id": 3184136,
"name": "XXX TEST API",
"url": "http://xxxxxxxxxxx/_apis/test/Runs/3184136",
"isAutomated": true,
"owner": {
"displayName": "XXXX",
"url": "http://xxxxxxxxxxx/_apis/Identities/dbf722a9-73b0-46d6-a2bd-9835c1f0c221",
"_links": {
"avatar": {
"href": "http://xxxxxxxxxxx/_api/_common/identityImage?id=dbf722a9-73b0-46d6-a2bd-9835c1f0c221"
}
},
"id": "dbf722a9-73b0-46d6-a2bd-9835c1f0c221",
"uniqueName": "xxxxxxxxxxx\\ServiceLaunchpadDev",
"imageUrl": "http://xxxxxxxxxxx/_api/_common/identityImage?id=dbf722a9-73b0-46d6-a2bd-9835c1f0c221"
},
"project": {
"id": "6d5e21e7-c75e-464a-9708-90fbff086902",
"name": "eDellPrograms"
},
"startedDate": "2018-10-11T06:36:50.627Z",
"completedDate": "2018-10-11T07:04:45.153Z",
"state": "Completed",
"plan": {
"id": "5299555",
"name": "Smoke Dashboard Peso - DIT",
"url": "http://xxxxxxxxxxx/_apis/test/Plans/5299555"
},
"postProcessState": "Complete",
"totalTests": 5,
"incompleteTests": 0,
"notApplicableTests": 0,
"passedTests": 0,
"unanalyzedTests": 5,
"createdDate": "2018-10-11T06:36:50.533Z",
"lastUpdatedDate": "2018-10-11T07:04:45.153Z",
"lastUpdatedBy": {
"displayName": "xxxxxxxxxxx",
"url": "http://xxxxxxxxxxx/_apis/Identities/8de2a654-063b-48bd-8101-87e4ec2f05e3",
"_links": {
"avatar": {
"href": "http://xxxxxxxxxxx/_api/_common/identityImage?id=8de2a654-063b-48bd-8101-87e4ec2f05e3"
}
},
"id": "8de2a654-063b-48bd-8101-87e4ec2f05e3",
"uniqueName": "xxxxxxxxxxx\\xxxxxxxxxxx",
"imageUrl": "http://xxxxxxxxxxx/_api/_common/identityImage?id=8de2a654-063b-48bd-8101-87e4ec2f05e3"
},
"controller": "xxxxxxxxxxx",
"revision": 5,
"comment": "Build Definition : xxxxxxxxxxx \nBuild Version : xxxxxxxxxxx_20180925.1\nConfiguration : DIT\nBatch type : Suite\nTest type : Parallel\nTest Controller Name : xxxxxxxxxxx\nPreferred Agents : ADPTAW10A618|ADPTAW10A619|ADPTAW10A621 \nRequested by : xxxxxxxxxxx\nEmail Request : Y\nEmail To : xxxxxxxxxxx\nEmailCc : xxxxxxxxxxx\nEnvironment : DIT\nTest Setting : DIT\nContinue On Failure : false\nDNS Setting : false",
"dropLocation": "\\\\xxxxxxxxxxx\\DropFolder\\xxxxxxxxxxx_20180925.1",
"runStatistics": [
{
"state": "Completed",
"outcome": "Failed",
"count": 5
}
],
"webAccessUrl": "http://xxxxxxxxxxx/_TestManagement/Runs#runId=3184136&_a=runCharts"
}
from the above output, I'm trying to find the "state" and its value. But I couldn't make that happen. kindly anyone helps me.
echo $result | grep -o 'state*'
with above command, I was able to print state. but I'm expecting both state and its value.
Appreciate your help. Thanks in Advance.

I tried storing your json in a file called n2.json.
cat n2.json
{
"id":3184136,
"name":"XXX TEST API",
"url":"http://xxxxxxxxxxx/_apis/test/Runs/3184136",
"isAutomated":true,
"owner":{
"displayName":"XXXX",
"url":"http://xxxxxxxxxxx/_apis/Identities/dbf722a9-73b0-46d6-a2bd-9835c1f0c221",
"_links":{
"avatar":{
"href":"http://xxxxxxxxxxx/_api/_common/identityImage?id=dbf722a9-73b0-46d6-a2bd-9835c1f0c221"
}
},
"id":"dbf722a9-73b0-46d6-a2bd-9835c1f0c221",
"uniqueName":"xxxxxxxxxxx\\ServiceLaunchpadDev",
"imageUrl":"http://xxxxxxxxxxx/_api/_common/identityImage?id=dbf722a9-73b0-46d6-a2bd-9835c1f0c221"
},
"project":{
"id":"6d5e21e7-c75e-464a-9708-90fbff086902",
"name":"eDellPrograms"
},
"startedDate":"2018-10-11T06:36:50.627Z",
"completedDate":"2018-10-11T07:04:45.153Z",
"state":"Completed",
"plan":{
"id":"5299555",
"name":"Smoke Dashboard Peso - DIT",
"url":"http://xxxxxxxxxxx/_apis/test/Plans/5299555"
},
"postProcessState":"Complete",
"totalTests":5,
"incompleteTests":0,
"notApplicableTests":0,
"passedTests":0,
"unanalyzedTests":5,
"createdDate":"2018-10-11T06:36:50.533Z",
"lastUpdatedDate":"2018-10-11T07:04:45.153Z",
"lastUpdatedBy":{
"displayName":"xxxxxxxxxxx",
"url":"http://xxxxxxxxxxx/_apis/Identities/8de2a654-063b-48bd-8101-87e4ec2f05e3",
"_links":{
"avatar":{
"href":"http://xxxxxxxxxxx/_api/_common/identityImage?id=8de2a654-063b-48bd-8101-87e4ec2f05e3"
}
},
"id":"8de2a654-063b-48bd-8101-87e4ec2f05e3",
"uniqueName":"xxxxxxxxxxx\\xxxxxxxxxxx",
"imageUrl":"http://xxxxxxxxxxx/_api/_common/identityImage?id=8de2a654-063b-48bd-8101-87e4ec2f05e3"
},
"controller":"xxxxxxxxxxx",
"revision":5,
"comment":"Build Definition : xxxxxxxxxxx \nBuild Version : xxxxxxxxxxx_20180925.1\nConfiguration : DIT\nBatch type : Suite\nTest type : Parallel\nTest Controller Name : xxxxxxxxxxx\nPreferred Agents : ADPTAW10A618|ADPTAW10A619|ADPTAW10A621 \nRequested by : xxxxxxxxxxx\nEmail Request : Y\nEmail To : xxxxxxxxxxx\nEmailCc : xxxxxxxxxxx\nEnvironment : DIT\nTest Setting : DIT\nContinue On Failure : false\nDNS Setting : false",
"dropLocation":"\\\\xxxxxxxxxxx\\DropFolder\\xxxxxxxxxxx_20180925.1",
"runStatistics":[
{
"state":"Completed",
"outcome":"Failed",
"count":5
}
],
"webAccessUrl":"http://xxxxxxxxxxx/_TestManagement/Runs#runId=3184136&_a=runCharts"
}
Then use jq on top of this:
jq -r '.state' n2.json
Completed

You are looking for 'state', 'statee', 'stateee', 'stateeee', etc.
The wildcard applies to the preceeding character.
Try this:
echo $result | grep -o '"state":[^,]*'
It looks for everything up to, but excluding, the next comma.

Custom date in azure blob folder path

I have looked at some posts and documentation on how to specify custom folder paths while creating an azure blob (using the azure data factories).
Official documentation:
https://learn.microsoft.com/en-us/azure/data-factory/v1/data-factory-azure-blob-connector#using-partitionedBy-property
Forums posts:
https://dba.stackexchange.com/questions/180487/datafactory-tutorial-blob-does-not-exist
I am successfully able to put into date indexed folders, however what I am not able to do is put into incremented/decremented date folders.
I tried using $$Text.Format (like below) but it gives a compile error --> Text.Format is not a valid blob path .
"folderPath": "$$Text.Format('MyRoot/{0:yyyy/MM/dd}/', Date.AddDays(SliceEnd,-2))",
I tried using the PartitionedBy section (like below) but it too gives a compile error --> Only SliceStart and SliceEnd are valid options for "date"
{
"name": "MyBlob",
"properties": {
"published": false,
"type": "AzureBlob",
"linkedServiceName": "MyLinkedService",
"typeProperties": {
"fileName": "MyTsv.tsv",
"folderPath": "MyRoot/{Year}/{Month}/{Day}/",
"format": {
"type": "TextFormat",
"rowDelimiter": "\n",
"columnDelimiter": "\t",
"nullValue": ""
},
"partitionedBy": [
{
"name": "Year",
"value": {
"type": "DateTime",
"date": "Date.AddDays(SliceEnd,-2)",
"format": "yyyy"
}
},
{
"name": "Month",
"value": {
"type": "DateTime",
"date": "Date.AddDays(SliceEnd,-2)",
"format": "MM"
}
},
{
"name": "Day",
"value": {
"type": "DateTime",
"date": "Date.AddDays(SliceEnd,-2)",
"format": "dd"
}
}
]
},
"availability": {
"frequency": "Day",
"interval": 1
},
"external": false,
"policy": {}
}
Any pointers are appreciated!
EDIT for response from Adam:
I also used folder structure directly in FileName as per suggestion from Adam as per below forum post:
Windows Azure: How to create sub directory in a blob container
I used it like in below sample.
"typeProperties": {
"fileName": "$$Text.Format('{0:yyyy/MM/dd}/MyBlob.tsv', Date.AddDays(SliceEnd,-2))",
"folderPath": "MyRoot/",
"format": {
"type": "TextFormat",
"rowDelimiter": "\n",
"columnDelimiter": "\t",
"nullValue": ""
},
It gives no compile error and also no error during deployment. But it throws an error during execution!!
Runtime Error is ---> Error in Activity: ScopeJobManager:PrepareScopeScript, Unsupported unstructured stream format '.adddays(sliceend,-2))', can't convert to unstructured stream.
I think the problem is that FileName can be used to create folders but not dynamic folder names, only static ones.

you should create a blob using the following convention: "foldername/myfile.txt" , so you could also append additional blobs under that foldername. I'd recommend checking this thread: Windows Azure: How to create sub directory in a blob container , It may help you resolve this case.

Can I use Spark's REST API to get the version of Spark on the Workers

I know I can get the version of Spark v2.2.1 that's running on Spark Master with this command:
http://<spark-master>:4040/api/v1/version
which will return something like
{
"spark" : "2.2.1"
}
However, I also want to check the version of Spark running on each Worker. I know I can get a list of all Workers thusly:
http://<spark-master>:8080/json/
which will return a response similar to
{
"url": "spark://<spark-master>:7077",
"workers": [{
"id": "worker-20180228071440-<ip-address>-7078",
"host": "<ip-address>",
"port": 7078,
"webuiaddress": "http://<ip-address>:8081",
"cores": 8,
"coresused": 8,
"coresfree": 0,
"memory": 40960,
"memoryused": 35875,
"memoryfree": 5085,
"state": "ALIVE",
"lastheartbeat": 1519932580686
}, ...
],
"cores": 32,
"coresused": 32,
"memory": 163840,
"memoryused": 143500,
"activeapps": [{
"starttime": 1519830260440,
"id": "app-20180228070420-0000",
"name": "<spark-app-name>",
"user": "<spark-app-user>",
"memoryperslave": 35875,
"submitdate": "Wed Feb 28 07:04:20 PST 2018",
"state": "RUNNING",
"duration": 102328434
}
],
"completedapps": [],
"activedrivers": [],
"status": "ALIVE"
}
I'd like to use that information to query each Spark Worker's version. Is this possible?

Node search is not working in test kitchen

I am not geeting output and error
------Exception-------
Class: Kitchen::ActionFailed
Message: 1 actions failed."
cookbook/test/integration/nodes
Json file
{
"id": "hive server",
"chef_type": "node",
"environment": "dev",
"json_class": "Chef::Node",
"run_list": [],
"automatic": {
"hostname": "test.net",
"fqdn": "127.0.0.1",
"name": "test.net",
"ipaddress": "127.0.0.1",
"node_zone": "green",
"roles": []
},
"attributes": {
"hiveserver": "true"
}
}
Recipe
hiveNodes = search(:node, "hiveserver:true AND environment:node.environment AND node_color:node["node_color"])
# hiveserverList = ""
# hiveNodes.each |hnode| do
# hiveserverList += hnode
#end
#file '/tmp/test.txt' do
# content '#{hiveserverList}'
#end

I think you mean to be using "hiveserver:true AND chef_environment:#{node.chef_environment} AND node_color:#{node["node_color"]}" as your search string. The #{} syntax is how you embed a Ruby expression value in to a string. Also for reasons of complex backwards compat, the environment on a node is called chef_environment.

_grokparsefailure on varnish log

Message looks like
1.2.3.4 "-" - - [19/Apr/2016:11:42:18 +0200] "GET http://monsite.vpù/api/opa/status HTTP/1.1" 200 92 "-" "curl - API-Player - PREPROD" hit OPA-PREPROD-API - 0.000144958
My grok pattern is
grok {
match => { "message" => "%{IP:clientip} \"%{DATA:x_forwarded_for}\" %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} (%{NOTSPACE:hitmiss}|-) (%{NOTSPACE:varnish_conf}|-) (%{NOTSPACE:varnish_backend}|-) %{NUMBER:time_firstbyte}"}
}
I have a grokparsefailure tag whereas all my fields are fulfilled correctly except for the last one, I get 0 instead of 0.000144958
The full message in ES is
{
"_index": "logstash-2016.04.19",
"_type": "syslog",
"_id": "AVQt7WSCN-2LsQj9ZIIq",
"_score": null,
"_source": {
"message": "212.95.71.201 \"-\" - - [19/Apr/2016:11:50:12 +0200] \"GET http://monsite.com/api/opa/status HTTP/1.1\" 200 92 \"-\" \"curl - API-Player - PREPROD\" hit OPA-PREPROD-API - 0.000132084",
"#version": "1",
"#timestamp": "2016-04-19T09:50:12.000Z",
"type": "syslog",
"host": "212.95.70.80",
"tags": [
"_grokparsefailure"
],
"application": "varnish-preprod",
"clientip": "1.2.3.4",
"x_forwarded_for": "-",
"ident": "-",
"auth": "-",
"timestamp": "19/Apr/2016:11:50:12 +0200",
"verb": "GET",
"request": "http://monsite.com/api/opa/status",
"httpversion": "1.1",
"response": "200",
"bytes": "92",
"referrer": "\"-\"",
"agent": "\"curl - API-Player - PREPROD\"",
"hitmiss": "hit",
"varnish_conf": "OPA-PREPROD-API",
"varnish_backend": "-",
"time_firstbyte": "0.000132084",
"geoip": {
"ip": "1.2.3.4",
"country_code2": "FR",
"country_code3": "FRA",
"country_name": "France",
"continent_code": "EU",
"region_name": "C1",
"city_name": "Strasbourg",
"latitude": 48.60040000000001,
"longitude": 7.787399999999991,
"timezone": "Europe/Paris",
"real_region_name": "Alsace",
"location": [
7.787399999999991,
48.60040000000001
]
},
"agentname": "Other",
"agentos": "Other",
"agentdevice": "Other"
},
"fields": {
"#timestamp": [
1461059412000
]
},
"highlight": {
"agent": [
"\"curl - API-Player - #kibana-highlighted-field#PREPROD#/kibana-highlighted-field#\""
],
"varnish_conf": [
"OPA-#kibana-highlighted-field#PREPROD#/kibana-highlighted-field#-API"
],
"application": [
"#kibana-highlighted-field#varnish#/kibana-highlighted-field#-#kibana-highlighted-field#preprod#/kibana-highlighted-field#"
],
"message": [
"1.2.3.4 \"-\" - - [19/Apr/2016:11:50:12 +0200] \"GET http://monsote.com/api/opa/status HTTP/1.1\" 200 92 \"-\" \"curl - API-Player - #kibana-highlighted-field#PREPROD#/kibana-highlighted-field#\" hit OPA-#kibana-highlighted-field#PREPROD#/kibana-highlighted-field#-API - 0.000132084"
]
},
"sort": [
1461059412000
]
}
The answer is that kibana do not display very little number

You would only get a grokparsefailure if the grok, um, fails. So, it's not this grok that's producing the tag. Use the tag_on_failure parameter in your groks to provide a unique tag for each grok.
As for your parsing problem, I'll bet that your grok is working just fine. Note that elasticsearch can make fields dynamically and will guess as to the type of the field based on the first data seen. If your first data was "0", it would have made the field an integer and later entries would be cast to that type. You can pull the mapping to see what happened.
You need to control the mapping that is created. You can specify that the field is a float in the grok itself (%{NUMBER:myField:int}) or by creating your own template.
Also notice that NOTSPACE matches "-", so your patterns for varnish_backend, etc, are not entirely correct.

The problem was coming from the syslog filter using grok internally as explained here https://kartar.net/2014/09/when-logstash-and-syslog-go-wrong/.
The solution was then to remove the tag in my own filter
The other problem is that kibana do not display number like 0.0000xxx but they are correctly stored anyway so I can use it anyway.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

LogStash concat Filebeat input - logstash

Related

how to find the string and it value using grep or awk [duplicate]

Custom date in azure blob folder path

Can I use Spark's REST API to get the version of Spark on the Workers

Node search is not working in test kitchen

_grokparsefailure on varnish log

Categories

Resources