How to create custom formatted complex jsons with logstash? - logstash

Is it possible in log stash with use of filters to adjust resulting json to build nested documents?
I will need to build JSON like following:
[
{
"name" : "hd_used",
"columns" : ["value", "host", "mount"],
"points" : [
[23.2, "serverA", "/mnt"]
]
}
]

Related

Inject data from MongoDB into chart.js

I'm building a little app and I need to inject data from MongoDB which looks like this :
And there is my chart which looks like this :
labels : ['label1', 'label2', 'label3'],
datasets: [ {
data: [ 170.3, 108.75, 35.3, 37.98 ]
} ]
What I would like is :
In labels : It needs to contain all the "titre" data ( See the picture ) , if one "titre" is used more than once, it needs to display only one
In data, it needs to add every items from the "montant" ordered by category
I'm sure nobody will understand 🤪 Sorry
Example : They are 3 documents with the "titre" groceries.
The "montant" are 4, 7 , 12 So the chart needs to look like this :
labels : ['Groceries', 'label2', 'label3'],
datasets: [ {
data: [ 23, ..., ... ]
} ]
How can I achieve this in the component.ts ?
Thank you everyone !

Indexing a JSON file with multiple documents in elastic search

I am new to elasticsearch i want to index a JSON file and perform search queries from elasticsearch
How can I index this json and perform queries to get value if i pass parameter as "field3.innerfield" : "someval"
I have tried indexing this file with helpers.bulk and search but it returns all the fields instead of a selected field.
Below is the JSON sample
[
{
"id": "someid",
"metadata": {
"docType": "value",
"otherfield": " ",
morefields
.
.
},
"field1":"value1",
"field2":"value2,
"field3": [
{
"innerfield": "someval",
"innerfield1":[
"kind of a paragraph"
]
}
],
"field4": [
{
"innerfield": "someval",
"innerfield1": "kind of a paragraph"
}
],
},
{ again the format repeats with different id but same fields
},
{
}
]
Your question lacks clarity however what I understood is that you want to fetch values from its key for a nested json. You can do that in the following way as shown below.
Parse it multiple times and make the required changes as per your need.
import json
data = data.apply(lambda x: json.loads(json.loads(x).get("metadata","{}")).get("doctype") if x else None)

How to sort on multiple fields individually using a single index

I am trying to declare multiple fields in a single index like below and trying to sort on the single field only. is it possible?
Is there any way by which using a single combine fields index I can sort on individual fields dynamically.
{
"index": {
"fields": ["name","createdDate","updatedDate"]
},
"name" : "multi-filter",
"ddoc" : "MultiFilter"
"type" : "json"
}
after that, I can apply sort on the same sequence and list like
{
"selector": {"name": "Robert De Niro"},
"sort": [{"name": "asc"}, {"createdDate": "asc"},{"updatedDate": "asc"}]
}
BUT if I change the sequence or want to use a filter/sort on a single field like
{
"selector": {"name": "Robert De Niro"},
"sort": [{"name": "asc"}]
}
it gives an error saying, my motive is to use the single index, but sort individual fields. It looks like it is a limitation of couch DB and I need to create three separate indexes for the same to make it work, still hoping for the best option
{"error":"no_usable_index","reason":"No index exists for this sort, try indexing by the sort fields."}
I found this answer here: "Unknown Error: mango_idx :: {no_usable_index,missing_sort_index}"}
you could define an index only with the good field, eg:
{
"index": {
"fields": ["name"]
},
"name" : "name_sort",
"type" : "json"
}

PySpark: Formatting JSON before input to DataFrame

I want to create a Spark dataframe which contains a list of labelled tweets from a number of separate JSON files. I've tried simply using spark.read.json(files, multiLine=True) but I end up with a _corrupted_record in some files, there's something Spark doesn't seem to like about the format (JSON is valid, I've checked).
The following is a representation of the format of each JSON object per file that I'm dealing with:
{"annotator": {
"eventsAnnotated" : [ {...} ],
"id" : "0939"
},
"events": [
{"eventid": "039393",
"tweets": [
{
"postID" : "111",
"timestamp" : "01/01/01",
"categories" : [ "Category" ],
"indicatorTerms" : [ ],
"priority" : "Low",
"text" : "text"
},
...]
However, I'm only interested in the tweets section of the JSON and can disregard eventid, or anything included in annotator:
"tweets": [
{
"postID" : "111",
"timestamp" : "01/01/01",
"categories" : [ "Category" ],
"indicatorTerms" : [ ],
"priority" : "Low",
"text" : "text"
},
...]
I'd like that to end up in a Spark dataframe in which postID, timestamp, categories, indicatorTerms, priority, and text are my columns and each row corresponds to one of these JSON entries.
I guess what I'm asking is how can I read these files into some sort of temporary structure where I can stream, line-by-line, each tweet and then transform that into a Spark dataframe? I've seen some posts about RDDs but only managed to confuse myself, I am pretty new to Spark as a whole.

Getting counts of embedded collections in a MongoDB Document

I am using MongoDB and the 10Gen node.js driver.
I have a collection in MongoDB that has docs similar to this:
{
_id: ObjectId( "500310affdc47af710000001" ),
name:'something',
tags:['apple','fruit','red'],
created: Date( 1342378221351 )
}
What I would like to get is look at all the documents and get a distinct count of all tags across all documents in the collection. I tried the group function and got this:
> db.photos.group(
... {cond:{}
... ,key:{tags:true}
... ,initial:{count:0}
... ,reduce:function(doc,out){out.count++}
... ,finalize:function(out){}});
[
{
"tags" : null,
"count" : 35
},
{
"tags" : [
"#strawberry",
"#friutpicture"
],
"count" : 1
}
]
Which is not right. I can get the distinct without the counts like this:
> db.photos.distinct('tags')
[
"#friutpicture",
"#strawberry",
"#factory",
"#wallpaper",
"#bummber",
"#test",
"#fire",
"#watermark"
]
but I need the counts.
You can use the following in the new Aggregation Framework (2.1+):
> db.photos..aggregate([{$unwind:"$tags"},
{$group:{_id:"$tags"}},
{$group:{_id:"DistictCount",count:{$sum:1}}}])
Your result will be:
{
"result" : [
{
"_id" : "DistictCount",
"count" : 8
}
],
"ok" : 1
}
The group function won't do what you need because you need to "unroll" the tag array before you can group on it and that means you need a map function that emits each tag in a document, as well as a reduce. You can use map/reduce if you are stuck on 2.0 or earlier and can't use aggregation framework. Here is an example that may help.

Resources