Sample message from Azure Event Hubs logstash plugin:
https://pastebin.com/b8WnQHug
I would like to have output:
{
"operationName": "Microsoft.ContainerService/managedClusters/diagnosticLogs/Read",
"category": "kube-apiserver",
"ccpNamespace": "5d764286d7481f0001d4b054",
"resourceId": "/SUBSCRIPTIONS/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/RESOURCEGROUPS/MY-RG/PROVIDERS/MICROSOFT.CONTAINERSERVICE/MANAGEDCLUSTERS/MY-AKS",
"properties": {
"log": "First line from record\n Second line from another record\n Third line from another record \n etc from another recors",
"stream": "stderr",
"pod": "kube-apiserver-8b5b9cd44-khjfk",
"containerID": "4c2ddb8ba9639ae9c88f728d850d550473eb36f4eb3e1d99c3f052b87cff9357"
},
"time": "2019-10-16T13:44:16.0000000Z",
"Cloud": "Public",
"Environment": "prod"
}
Main fields:
time ( as timestamp )
pod ( name of pod field)
stream ( event type field )
log ( worst part, log field should be concatenate from other message.records[] with same time and containerID fields )
Elasticsearch has experimental Azure module, here is source code/filter for logstash:
https://github.com/elastic/logstash/blob/master/x-pack/modules/azure/configuration/logstash/azure.conf.erb
I dont need such a complexity.
I guess I need:
split filter for new fields
date filter for message.records[].timestamp
"something" to find all message.records with same message.records[].time and message.records[].properties.containerID fields and concatenate message.records[].properties.log field
Can anyone help?
Thanks
EDIT: It think I will have to consider also this:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-aggregate.html
, altough there will be probably in 90% all multiline logs in single event, there may be a chance that it will be splitted into multiple events.
Another problem is that aggregate does not work at scale ( azure event hub plugin can ), so aggregate will be bottleneck..
Related
I start learning the Azure Logic Apps and my first tasks is to store the result of a specific Kusto query from calling the log analytics of azure https://api.loganalytics.io/v1/workspaces/{guid}/query.
Currently, I can successfully call the log analytics api using Http in Logic App and this is the sample return.
{
"tables": [
{
"name": "PrimaryResult",
"columns": [
{
"name": "UserPrincipalName",
"type": "string"
},
{
"name": "OperationName",
"type": "string"
},
{
"name": "ResultDescription",
"type": "string"
},
{
"name": "AuthMethod",
"type": "string"
},
{
"name": "TimeGenerated",
"type": "string"
}
],
"rows": [
[
"first.name#email.com",
"Fraud reported - no action taken",
"Successfully reported fraud",
"Phone call approval (Authentication phone)",
"22-01-03 [09:01:03 AM]"
],
[
"last.name#email.com",
"Fraud reported - no action taken",
"Successfully reported fraud",
"Phone call approval (Authentication phone)",
"22-02-19 [01:28:29 AM]"
]
]
}
]
}
From this result, I'm stuck on how should iterate the rows property of the json result and save those data to Azure Table Storage which correspond to the columns property in the json result.
E.g.,
| UserPrincipalName | OperationName | ResultDescription | AuthMethod | TimeGenerated |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
| first.name#email.com | Fraud reported - no action taken | Successfully reported fraud | Phone call approval (Authentication phone) | 22-01-03 [09:01:03 AM] |
| last.name#email.com | Fraud reported - no action taken | Successfully reported fraud | Phone call approval (Authentication phone) | 22-02-19 [01:28:29 AM] |
Hope someone can guide me on how to achieve this.
TIA!
You can use Parse_JSON to extract the inner data of the output provided and then can use Insert or Replace Entity action inside a for_each loop. Here is the screenshot of my logic app
In my storage account
UPDATED ANSWER
Instead of directly using Insert or Replace Entity I have Initialised 2 variables and then used Insert or Merge Entity. One variable is to iterate inside rows and other is to iterate inside columns using until loop and fetched the required values from tables. Here is the screenshot of my logic app.
In the first until loop, The iteration continues until rows variable is equal to no of rows. Below is the expression:
length(body('Parse_JSON')?['tables']?[0]?['rows'])
In the second until loop, The iteration continues until columns variable is equal to no of columns. Below is the expression:
length(body('Parse_JSON')?['tables']?[0]?['rows'])
Below is the expression I'm using in Insert or Merge Entity's entity
{
"#{body('Parse_JSON')?['tables']?[0]?['columns']?[variables('columns')]?['name']}": "#{body('Parse_JSON')?['tables']?[0]?['rows']?[variables('rows')]?[variables('columns')]}"
}
RESULTS:
Firstly, use a Parse JSON action and load your JSON in as sample to generate the schema.
Then use a For each (rename them accordingly) to traverse the rows, this will then automatically generate an outer For each for the tables.
This is the trickier part, you need to generate a payload that contains your data with some specific keys that you can then identify in your storage table.
This is my test flow with your data ...
... and this is the end result in storage explorer ...
The JSON within the entity field in the Insert Entity action looks like this ...
{
"Data": "#{items('For_Each_Row')}",
"PartitionKey": "#{guid()}",
"RowKey": "#{guid()}"
}
I simply used GUID's to make it work but you'd want to come up with some kind of key from your data to make it much more rational. Maybe the date field or something.
So, I want to capture Administrative events sent by Azure to an EventHub with Stream Analytics Job and forward only the events which match an specific criteria to an Azure Function. The events come in an object like this (heavily trimmed to simplify):
{
"records": [
{
"resourceId": "<resource_path>",
"operationName": "MICROSOFT.COMPUTE/VIRTUALMACHINES/WRITE",
},
{
"time": "2021-03-19T19:19:56.0639872Z",
"operationName": "MICROSOFT.COMPUTE/VIRTUALMACHINES/WRITE",
"category": "Administrative",
"resultType": "Accept",
"resultSignature": "Accepted.Created",
"properties": {
"statusCode": "Created",
"serviceRequestId": "<trimmed>",
"eventCategory": "Administrative",
"message": "Microsoft.Compute/virtualMachines/write",
"hierarchy": "<trimmed>"
},
"tenantId": "<trimmed>"
}
],
"EventProcessedUtcTime": "2021-03-19T19:25:21.1471185Z",
"PartitionId": 1,
"EventEnqueuedUtcTime": "2021-03-19T19:20:43.9080000Z"
}
I want to filter the query based on these criteria: records[0].operationName = 'MICROSOFT.COMPUTE/VIRTUALMACHINES/WRITE' AND records[1].properties.statusCode = 'Created'. To achieve that, I began with the following query which returns this record, but it's lacking one of the criteria I NEED to match (statusCode)
SELECT
records
INTO
[output]
FROM
[input]
WHERE
GetArrayElement(records, 0).operationName = 'MICROSOFT.COMPUTE/VIRTUALMACHINES/WRITE'
Trying the query below doesn't work (it returns 0 matches):
SELECT
records
INTO
[output]
FROM
[input]
WHERE
GetArrayElement(records, 0).operationName = 'MICROSOFT.COMPUTE/VIRTUALMACHINES/WRITE'
AND GetArrayElement(records, 1).properties.statusCode = 'OK'
Anyone has a clue on this?
Found out the solution! I need to use GetRecordPropertyValue, like so:
SELECT
records
INTO
[output]
FROM
[input]
WHERE
GetArrayElement(records, 0).operationName = 'MICROSOFT.COMPUTE/VIRTUALMACHINES/WRITE'
AND GetRecordPropertyValue(GetArrayElement(records, 1).properties, 'statusCode') = 'Created'
Looks a bit clumsy to me, but it worked!
I'm using: NiFi v1.8.0 and Logstash v7.1.1
I'm tasked to move all our Logstash configurations over to NiFi. I am trying to understand how the NiFi ExtractGrok works, but I can't find any examples. How is this intended to be used? And how can you set a NiFi attribute with this grok processor? And when I mean examples, I mean actual examples that show you a before and after so people can understand whats going on. I've read the NiFi ExtractGrok documentation, but its very limited and seems to assume you understand how it works.
This is the only example I've been able to find: How to fetch multiline with ExtractGrok processor in ApacheNifi?
According to what you are saying, the processor you need, is rather ConvertRecord than ExtractGrok. ExtractGrok will only extract certain fields into FlowFile attributes or content.
If you want to format your log files into a workable format(like JSON, if you want to send those files to ElasticSearch), then you would use GrokReader as Record Reader and Record Writer as JsonRecordSetWriter.
Then, you would configure your Schema Text (or use a Schema Registry) in both RecordReader and RecordWriter to be your schema, and set Grok Expression to be your grok expression in your GrokReader.
For example:
my log messages log like this:
2019-12-09 07:59:59,136 this is the first log message
2019-12-09 09:59:59,136 this is the first log message with a stack trace: org.springframework.boot.actuate.jdbc.DataSourceHealthIndicator - DataSource health check failed
org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is org.apache.commons.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.)
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:81)......
So, my grok would be:
%{TIMESTAMP_ISO8601:timestamp}\s+%{GREEDYDATA:log_message}
and my schema would be:
{
"name": "MyClass",
"type": "record",
"namespace": "com.acme.avro",
"fields": [
{
"name": "timestamp",
"type": "string"
},
{
"name": "log_message",
"type": "string"
},
{
"name": "stackTrace",
"type": "string"
}
]
}
Note the stackTrace field I've added to the schema. The GrokReader automatically maps stack traces into their own field. So you have to add stackTrace field if you want to map it too. Then, you can put it into the log_message field if you want, using Jolt.
The output of this ConvertRecord would be:
[ {
"timestamp" : "2019-12-09 07:59:59,136",
"log_message" : "this is the first log message",
"stackTrace" : null
}, {
"timestamp" : "2019-12-09 09:59:59,136",
"log_message" : "this is the first log message with a stack trace: org.springframework.boot.actuate.jdbc.DataSourceHealthIndicator - DataSource health check failed",
"stackTrace" : "org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is org.apache.commons.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.)\nat org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:81)......"
} ]
This is a sample JSON input packet. I'm writing transformation queries to get the data and it is working fine.
[{
"source": "xda",
"data":
[{
"masterTag": "UNIFY",
"speed": 180
}],
"EventEnqueuedUtcTime": "2018-07-20T19:28:18.5230000Z",
},
{
"source": "xda",
"data": [{
"masterTag": "UNIFY",
"speed": 214
}],
"EventEnqueuedUtcTime": "2018-07-20T19:28:20.5550000Z",
}
]
However, a custom property has been added to the message object when it is sent to IoT hub by the name of "proFilter". This is not inside the payload, but is present in the message object. I can get this property using Azure function but I'm not sure how to get it in Stream Analytics transformation query. Is there any way I can get it?
Basic transformation query:
WITH data AS
(
SELECT
source,
GetArrayElement(data,0) as data_packet
FROM input
)
SELECT
source,
data_packet.masterTag
INTO
output
FROM data
Include the following function in your SELECT statement:
GetMetadataPropertyValue(input, '[User].[proFilter]') AS proFilter
If you are interested in retrieving all your custom properties as a record, you can use
GetMetadataPropertyValue(input, '[User]') AS userprops
See this doc for further reference
I am doing a spike in which we want to publish data as it is written in a Cassandra table to a Kafka Topic. We are looking at using Kafka Connect and the Stream Reactor Connectors.
I am using Kafka 0.10.0.1
I am using DataMountaineer Stream Reactor 0.2.4
I placed the jar file for Stream Reactor into the Kafka libs folder and am running Kafka Connect in distributed mode
bin/connect-distributed.sh config/connect-distributed.properties
I added the Cassandra Source connector as follows:
curl -X POST -H "Content-Type: application/json" -d #config/connect-idoc-cassandra-source.json.txt localhost:8083/connectors
When I add data to the Cassandra table I see it being added to the topic using the Kafka command line consumer
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic idocs-topic --from-beginning
Here is a sample of what is being written to the Topic right now:
{
"schema": {
"type": "struct",
"fields": [{
"type": "string",
"optional": true,
"field": "idoc_id"
}, {
"type": "string",
"optional": true,
"field": "idoc_event_ts"
}, {
"type": "string",
"optional": true,
"field": "json_doc"
}],
"optional": false,
"name": "idoc.idocs_events"
},
"payload": {
"idoc_id": "dc4ab8a0-fdf8-11e6-8285-1bce55915fdd",
"idoc_event_ts": "dc4ab8a1-fdf8-11e6-8285-1bce55915fdd",
"json_doc": "{\"foo\":\"bar\"}"
}}
What I would like written to the topic is the value of the json_doc column.
Here is what I have in my config for the Cassandra source
{
"name": "cassandra-idocs",
"config": {
"tasks.max": "1",
"connector.class": "com.datamountaineer.streamreactor.connect.cassandra.source.CassandraSourceConnector",
"connect.cassandra.key.space": "idoc",
"connect.cassandra.source.kcql": "INSERT INTO idocs-topic SELECT json_doc FROM idocs_events PK idoc_event_ts",
"connect.cassandra.import.mode": "incremental",
"connect.cassandra.contact.points": "localhost",
"connect.cassandra.port": 9042,
"connect.cassandra.import.poll.interval": 10000
}}
How do I change the way Kafka Connect Cassandra Source is configured so that only the value of json_doc is written to the Topic so it would look something like this:
{"foo":"bar"}
The Kassandra Connect Query Language seemed to be the way to go but it isn't limiting what is written to the column specified in the KCQL.
UPDATE
Saw this answer on StackOverflow and changed the converters in the connect-distributed.properties file from JsonConverter to StringConverter.
The result is this is now written to the Topic:
Struct{idoc_id=74597cf0-fdf7-11e6-8285-1bce55915fdd,idoc_event_ts=74597cf1-fdf7-11e6-8285-1bce55915fdd,json_doc={"foo":"bar"}}
UPDATE 2
Changed the converters in the connect-distributed.properties file back to JsonConverter. Then also disabled the schemas.
key.converter.schemas.enable=false
value.converter.schemas.enable=false
The result is this is now written to the Topic:
{
"idoc_id": "dc4ab8a0-fdf8-11e6-8285-1bce55915fdd",
"idoc_event_ts": "dc4ab8a1-fdf8-11e6-8285-1bce55915fdd",
"json_doc": "{\"foo\":\"bar\"}"
}
Note
Using code from snapshot release and changing the KCQL to
INSERT INTO idocs-topic
SELECT json_doc, idoc_event_ts
FROM idocs_events
IGNORE idoc_event_ts
PK idoc_event_ts
Yields this result on the Topic
{"json_doc": "{\"foo\":\"bar\"}"}
Thanks
Turns out what I was attempting to do was not possible in the Cassandra Source in DataMountaineer Stream Reactor 0.2.4. However, the snapshot release (of what I assume will become release 0.2.5) will support this.
Here is how it will work:
1) Set the converters in the connect-distributed.properties file to StringConverter.
2) Set the KCQL in the JSON configuration for the Cassandra Source connector to
INSERT INTO idocs-topic
SELECT json_doc, idoc_event_ts
FROM idocs_events
IGNORE idoc_event_ts
PK idoc_event_ts
WITHUNWRAP
This will result in the value of the json_doc column being published to the Kafka Topic without any schema information or the column name itself.
So if the column json_doc contained the value {"foo":"bar"} then this is what would appear on the Topic:
{"foo":"bar"}
Here is some background information on how the KCQL works in the snapshot release.
The SELECT will now retrieve only the columns in that table that are specified in the KCQL. Originally it was always retrieving all of the columns. It is important to note that the PK column must be part of the SELECT statement when using the incremental import mode. If the value of the PK column is not something that should be included in the message published to the Kafka Topic then add it to the IGNORE statement (as in the example above).
The WITHUNWRAP is the new feature to KCQL that will tell the Cassandra Source connector to create a SourceRecord using the String Schema type (instead of Struct). In this mode only the values of the columns that are in the SELECT statement will be stored as the value of the SourceRecord. If there is more than one column in the SELECT statement after applying the IGNORE statement then the values are appended to together and separated by a comma.