Logstash grok filter for collectd metrics data - logstash-grok

I'm processing some metrics data and store them into Elasticsearch. Now I want to get those data from Elasticsearch and apply a filter on them, the goal is to have more relevant fields after the logstash filtering. For this purpose, I planed to use a grok filter. But I'm not a grok expert and I never parsed this kind of data.
This is a sample data coming from Elasticsearch:
{
"_index" : "metrics",
"_type" : "metrics",
"_id" : "AVh4R8n3cN8PY7B3sFIM",
"_score" : 1.0,
"_source" : {
"event_time" : "2016-11-18T16:31:59.769Z",
"message" : "[{\"values\":[0.04,0.18,0.17],\"dstypes\":[\"gauge\",\"gauge\",\"gauge\"],\"dsnames\":[\"shortterm\",\"midterm\",\"longterm\"],\"time\":1479486719.645,\"interval\":10.000,\"host\":\"test-host\",\"plugin\":\"load\",\"plugin_instance\":\"\",\"type\":\"load\",\"type_instance\":\"\"}]",
"version" : "1",
"tags" : [ ]
}
}
After logstash filtering I expect to have this:
{
"_index" : "metrics",
"_type" : "metrics",
"_id" : "AVh4R8n3cN8PY7B3sFIM",
"_score" : 1.0,
"_source" : {
"event_time" : "2016-11-18T16:31:59.769Z",
"values" : [0.04,0.18,0.17],
"dstypes" : ["gauge","gauge","gauge"],
"dsnames": ["shortterm","midterm","longterm"],
"time" : 1479486719.645,
"interval" : 10.000,
"host" : "test-host",
"plugin" : "load",
"plugin_instance" : "",
"type" : "load",
"type_instance" : ""
}
}
Can someone help me by giving advices or sample grok filter to achieve this?
Thank you in advance!!

I finally resolved that issue by using another filter. grok was not appropriate for this use case.
filter {
json {
source => "message"
}
}
The json filter extract exactly each data from message array as a json of key value pair. and that solve the issue.

Related

Apache Spark circular reference Exception when creating Encoders

I am trying to generate a test AVRO file from a collection of objects represented by generated classes (TestAggregate.java, TestTuple.java). I used avro-tools-1.10.2.jar to generate those classes from this AVRO schema (dataset.avsc):
{
"type" : "record",
"name" : "TestAggregate",
"namespace" : "com....",
"fields" : [ {
"name" : "uuid",
"type" : "string"
}, {
"name" : "bag",
"type" : {
"type" : "array",
"items" : {
"type" : "record",
"name" : "TestTuple",
"fields" : [ {
"name" : "s",
"type" : "int"
}, {
"name" : "n",
"type" : "int"
}, {
"name" : "c",
"type" : "int"
}, {
"name" : "f",
"type" : "int"
} ]
}
},
"aliases" : [ "bag" ]
} ]
}
When I try to create an Encoder using
Encoder<TestAggregate> datasetEncoder = Encoders.bean(TestAggregate.class); , it throws an Exception:
Exception in thread "main" java.lang.UnsupportedOperationException: Cannot have circular references in bean class, but got the circular reference of class class org.apache.avro.Schema...
There is no circular reference in those generated files (or schema) as far as I can tell.
I am using Spark release 3.2.1.
Any ideas on how to resolve it?
I'm not sure you need an encoder (or the compiled class)
Take the AVSC text itself, and you can get a Schema like so
SchemaConverters.toSqlType(new Schema.Parser().parse(avroSchema))
Then this can be given to the spark-sql from_avro function.

Artifactory AQL search for builds on promotion.status

I'm trying to use AQL to get a list of all build not promoted to "release".
Our binaries pass through status integration-> aat -> release
I want to get a list of those with promotion status integration and aat but not release.
One example of a build has statuses:
"statuses" : [ {
"status" : "integration",
"timestamp" : "2016-04-20T08:36:42.009+0000",
"user" : "user",
"ciUser" : "changes",
"timestampDate" : 1461141402009
}, {
"status" : "aat",
"repository" : "repo-aat",
"timestamp" : "2016-04-20T08:56:11.843+0000",
"user" : "user",
"ciUser" : "changes",
"timestampDate" : 1461142571843
}, {
"status" : "aat",
"repository" : "repo-aat",
"timestamp" : "2016-04-20T08:58:55.417+0000",
"user" : "user",
"ciUser" : "changes",
"timestampDate" : 1461142735417
}, {
"status" : "aat",
"repository" : "repo-aat",
"timestamp" : "2016-04-20T09:20:32.619+0000",
"user" : "user",
"ciUser" : "changes",
"timestampDate" : 1461144032619
}, {
"status" : "release",
"repository" : "repo-release",
"timestamp" : "2016-04-20T09:30:12.143+0000",
"user" : "user",
"ciUser" : "changes",
"timestampDate" : 1461144612143
}, {
"status" : "release",
"repository" : "repo-release",
"timestamp" : "2016-04-20T09:40:50.595+0000",
"user" : "admin",
"ciUser" : "changes",
"timestampDate" : 1461145250595
} ],
This build is matched regardless if we set:
{"promotion.status": {"$nmatch":"aat"}}
to
{"promotion.status": {"$nmatch":"release"}}
{"promotion.status": {"$nmatch":"integration"}}
with the request:
builds.find({
"$and" : [
{"name": {"$match": "test"}},
{"created": {"$lt": "2016-12-01"}},
{"promotion.status": {"$nmatch":"release"}}
]
}).include("promotion.status").limit(10)
we get this response:
{
"results" : [ {
"build.created" : "2016-04-20T10:12:46.905Z",
"build.created_by" : "test",
"build.modified" : "2016-04-20T11:45:12.309Z",
"build.modified_by" : "admin",
"build.name" : "user",
"build.number" : "2551",
"build.promotions" : [ {
"build.promotion.status" : "aat"
}, {
"build.promotion.status" : "integration"
} ],
"build.url" : "URL"
} ],
"range" : {
"start_pos" : 0,
"end_pos" : 1,
"total" : 1,
"limit" : 10
}
If you do not need to use wildcards with $nmatch, you can use $ne instead, for example:
builds.find({
"$and" : [
{"name": {"$match": "test"}},
{"created": {"$lt": "2016-12-01"}},
{"promotion.status": {"$ne":"release"}}
]
}).include("promotion.status").limit(10)
With $nmatch, the following will also work:
builds.find({
"$and" : [
{"name": {"$match": "test"}},
{"created": {"$lt": "2016-12-01"}},
{"promotion.status": {"$nmatch":"releas*"}}
]
}).include("promotion.status").limit(10)
What you are trying to do, is to ask Artifactory for the "most recent" status of a build, and filter based on that. However this is not how Artifactory treats your AQL query.
Please note that your build does not have a property "build.promotion.status". Instead, your build has a property of type array with the name "build.promotions". Within this array, any number of promotion history items may be set for your build, including the property "build.promotion.status".
Now suppose your AQL query is going to select builds that have "build.promotion.status" : "aat", what you are really asking Artifactory is this: please return any build for which any of the elements of the build.promotions array has a matching property "build.promotion.status" : "aat".
So eventhough the build #2551 in your example has been promoted from "aat" to "released", you are asking AQL if it did - at any point in time - have promotion status "aat", which it did.
To add to the confusion, when you include("promotion.status"), you are going to see a filtered subset of the promotion history items.
If you are trying to work around this by asking for the reverse question: which builds do not have any build status history item with "build.promotion.status" = "released", even if that would be possible with AQL, it would not tell you what the current status is. Nor would it work correctly if you build is ever "Rolled-back".
I think JFROG should actually introduce a "build.promotion.status" field, which does what people reasonably expect: to give you the current status to display and to query on. Until that time, the only solution I can think of is to fetch all the build promotion items and then do the magic in a higher order language.

Find documents with sub-documents matching both of two (or more) properties

In Node with Mongoose I want to find an object in the collection Content. It has a list of sub-documents called users which has the properties stream, user and added. I do this to get all documents with a certain user's _id property in there users.user field.
Content.find( { 'users.user': user._id } ).sort( { 'users.added': -1 } )
This seems to work (although I'm unsure if .sort is really working here. However, I want to match two fields, like this:
Content.find( { 'users.user': user._id, 'users.stream': stream } } ).sort( { 'users.added': -1 } )
That does not seem to work. What is the right way to do this?
Here is a sample document
{
"_id" : ObjectId("551c6b37859e51fb9e9fde83"),
"url" : "https://www.youtube.com/watch?v=f9v_XN7Wxh8",
"title" : "Playing Games in 360°",
"date" : "2015-03-10T00:19:53.000Z",
"author" : "Econael",
"description" : "Blinky is a proof of concept of enhanced peripheral vision in video games, showcasing different kinds of lens projections in Quake (a mod of Fisheye Quake, using the TyrQuake engine).\n\nDemo and additional info here:\nhttps://github.com/shaunlebron/blinky\n\nThanks to #shaunlebron for making this very interesting proof of concept!\n\nSubscribe: http://www.youtube.com/subscription_center?add_user=econaelgaming\nTwitter: https://twitter.com/EconaelGaming",
"duration" : 442,
"likes" : 516,
"dislikes" : 13,
"views" : 65568,
"users" : [
{
"user" : "54f6688c55407c0300b883f2",
"added" : 1427925815190,
"_id" : ObjectId("551c6b37859e51fb9e9fde84"),
"tags" : []
}
],
"images" : [
{
"hash" : "1ab544648d7dff6e15826cda7a170ddb",
"thumb" : "...",
"orig" : "..."
}
],
"tags" : [],
"__v" : 0
}
Use $elemMatch operator to specify multiple criteria on an array of embedded documents:
Content.find({"users": {$elemMatch: {"user": user.id, "stream": stream}}});

MongoDB remove the lowest score, node.js

I am trying to remove the lowest homework score.
I tried this,
var a = db.students.find({"scores.type":"homework"}, {"scores.$":1}).sort({"scores.score":1})
but how can I remove this set of data?
I have 200 pieces of similar data below.
{
"_id" : 148,
"name" : "Carli Belvins",
"scores" : [
{
"type" : "exam",
"score" : 84.4361816750119
},
{
"type" : "quiz",
"score" : 1.702113040528119
},
{
"type" : "homework",
"score" : 22.47397850465176
},
{
"type" : "homework",
"score" : 88.48032660881387
}
]
}
you are trying to remove an element but the statement you provided is just to find it.
Use db.students.remove(<query>) instead. Full documentation here

How to get the double quotes arround number typed field from MongoDB aggregation query result?

Scenario: I have a collection 'People' with following documents
{
"_id" : ObjectId("512bc95fe835e68f199c8686"),
"name": "David",
"age" : 78
},
{ "_id" : ObjectId("512bc962e835e68f199c8687"),
"name" : "Dave",
"age" : 35
}
When I query using following code from Node.js
db.articles.aggregate(
{ $match : { author : "Dave" } }
);
The output will be like:
{ "_id" : ObjectId("512bc962e835e68f199c8687"),
"name" : "Dave",
"age" : 35
}
Issues: The above is just a sample of the actual scenario, I want the 'age' filed value to be embedded in double quotes i.e for above quoted example it should be like "age": "35".
That is full resultant document should be like following:
{ "_id" : ObjectId("512bc962e835e68f199c8687"),
"name" : "Dave",
"age" : "35"
}
Consider I have huge number of documents how efficiently I can achieve the same to get the desired output?
Question: Can someone help out with bright and efficient way to achieve the same?

Resources