NIFI - Jolt how to select the first non null value in a Json Array - transform

I have a problem that I am trying to solve in Nifi and would love your help in coming up with a solution. I have thought of using Jolt transform to achieve this, but am open to any other suggestions
I have an Json array that looks like this:
[
{
"val1": "AAA",
"val2": "",
"val3": "111",
"val4": "red"
},
{
"val1": "BBB",
"val2": "2",
"val3": "222",
"val4": "blue"
},
{
"val1": "CCC",
"val2": "",
"val3": "333",
"val4": "orange"
},
{
"val1": "DDD",
"val2": "2",
"val3": "4444",
"val4": "green"
}
]
and I wrote a JoltSpec
[
{
"operation": "shift",
"spec": {
"*": {
"val1": "&",
"val2": "&",
"val3": "&"
},
"0": {
"val4": "&"
}
}
}
]
that transform the json array to:
{
"val4" : "red",
"val1" : [ "BBB", "CCC", "DDD" ],
"val2" : [ "2", "", "2" ],
"val3" : [ "222", "333", "4444" ]
}
However, this is not exactly the outcome I am looking for. What I need is for val2 to be only a single value (I want to ignore all the empty string occurrences and basically select the the first non-empty string that is available.
val2 can either be an empty string "" or some string that occurs repeatedly e.g. "2" (I am using 2 as an example here, but val2 can be anything like 3 or 123 or 345 etc, but if it is 123 all occurrences of val2 will be 123)
Sample desired output
{
"val4" : "red",
"val1" : [ "BBB", "CCC", "DDD" ],
"val2" : "2",
"val3" : [ "222", "333", "4444" ]
}
Any help would be appreciated. Thank you in advance

I figured it out... silly me.
I used QueryRecord processor and used ORDER BY and then used JoltTransform to select the first record.

Related

Filtering Data in JSON based on value instead of Index - Kusto Query Langauge

I am trying to extract specific field from json by filtering data based on it's value instead of Index.
For example my json looks like below
"AllData": [
{
"ID": "1",
"Value": "Value1"
},
{
"ID": "2",
"Value": "Value2"
},
{
"ID": "3",
"Value": "Value3"
},
{
"ID": "4",
"Value": "Value4"
},
{
"ID": "5",
"Value": "Value5"
}
]
}
I need to project section (id and value) where value = valueX. But valueX may not always at index X it can be at any other index also. So while projecting I can not use Index. I need to project based on value. I can use contains operator in my where clause which helps to filter the arrays (list of AllData array) as shown below
MyDataSet
| where parse_json(MyJson) contains("Value5")
| project MyJson[5].ID, MyJson[5].Value // this may give wrong result because Value5 can be at some other index
Any Suggestions will be helpful.
you can use mv-apply: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/mv-applyoperator
let my_value = "Value3";
print d = dynamic({"AllData": [
{
"ID": "1",
"Value": "Value1"
},
{
"ID": "2",
"Value": "Value2"
},
{
"ID": "3",
"Value": "Value3"
},
{
"ID": "4",
"Value": "Value4"
},
{
"ID": "5",
"Value": "Value5"
}
]
})
| mv-apply d = d.AllData on (
project ID = d.ID, Value = d.Value
| where Value == my_value
)
ID
Value
3
Value3

MongoDB Aggregate with sum of array object values

I have a collection with the following data:
{ "id": 1,
"name": "abc",
"age" : "12"
"quizzes": [
{
"id": "1",
"time": "10"
},
{
"id": "2",
"time": "20"
}
]
},
{ "id": 2,
"name": "efg",
"age" : "20"
"quizzes": [
{
"id": "3",
"time": "11"
},
{
"id": "4",
"time": "25"
}
]
},
...
I would like to perform the MongoDB Aggregation for a sum of quizzes for each document.and set it to totalTimes field
And this is the result that I would like to get after the querying:
{ "id": 1,
"name": "abc",
"age" : "12",
"totalTimes": "30"
"quizzes": [
{
"id": "1",
"time": "10"
},
{
"id": "2",
"time": "20"
}
]
},
{ "id": 2,
"name": "efg",
"age" : "20",
"totalTimes": "36"
"quizzes": [
{
"id": "3",
"time": "11"
},
{
"id": "4",
"time": "25"
}
]
},
...
How can I query to get the sum of quizzes time?
Quite simple using $reduce
db.collection.aggregate([
{
$addFields: {
totalTimes: {
$reduce: {
input: "$quizzes",
initialValue: 0,
in: {
$sum: [
{
$toInt: "$$this.time"
},
"$$value"
]
}
}
}
}
}
])
Mongo Playground

How do I remove a list from an array of lists based on a condition using groovy?

I am new to groovy and need help removing an entire list if it does not meet a criteria
Here is the JSON --
{
"School" : New Elementary School,
"District" : "District1",
"City" : "NewTown",
"Students" : [ {
"Name": "Student1",
"Grade": "1"
}, {
"Name": "Student2",
"Grade": "2"
}, {
"Name": "Student3",
"Grade": "1"
}, {
"Name": "Student4",
"Grade": "1"
}, {
"Name": "Student5",
"Grade": "1"
} ],
}
I want a JSON which will have students from Grade 1 only i.e. remove Student2.
Output should be --
{
"School" : New Elementary School,
"District" : "District1",
"City" : "NewTown",
"Students" : [ {
"Name": "Student1",
"Grade": "1"
}, {
"Name": "Student3",
"Grade": "1"
}, {
"Name": "Student4",
"Grade": "1"
}, {
"Name": "Student5",
"Grade": "1"
} ],
}
I have the loop in place and the condition as well. I looked up online to removing an entire list but can't seem to find anything.
You can use the removeAll function to do that. Example
List a = [[a:1],[a:2],[a:1]]
a.removeAll{ it.a==2 }
print(a)
[[a:1], [a:1]]
In your case
students = students.removeAll{ it.Grade == "2" }

Iterating over array in Cosmos DB

I have a Cosmos DB where a document called Auditlog resides.
The simplified structure is as follows:
[
{
"id": "1",
"name": "A",
"messages": [
{
"gps": {
"src": "GPS"
},
"ts": "0"
}
]
},
{
"id": "2",
"name": "B",
"messages": [
{
"gps": {
"src": "DR"
},
"ts": "1"
}
]
}
]
I want to filter the document to get all entries that have src: GPS.
The result also needs to show the ID.
I have no idea on how to accomplish this.
I tried using the 'IN'-operator but without luck.
Using the 'IN'-operator makes it impossible to display the ID.
I tried:
SELECT * FROM c
IN Auditlog.messages
WHERE c.gps.src = "GPS"
The result is correct but I need the ID to be displayed in the result.
The following just results in an array of empty objects:
SELECT c.id FROM c
IN Auditlog.messages
WHERE c.gps.src = "GPS"
Can someone please help me?
Thanks in advance.
Regards
SELECT c.id AS id
FROM c JOIN a IN c.messages
WHERE a.gps.src = "GPS"
result will be
[
{
"id": "1"
},
{
"id": "2"
}
]

How to find duplicate values in mongodb using distinct query?

I am working on Mongodb distinct query, i have one collection with repeated entry, i am doing as per the created_at. But i want to fetch without repeated values.
Sample JSON
{
"posts": [{
"id": "580a2eb915a0161010c2a562",
"name": "\"Ah Me Joy\" Porter",
"created_at": "15-10-2016"
}, {
"id": "580a2eb915a0161010c2a562",
"name": "\"Ah Me Joy\" Porter",
"created_at": "25-10-2016"
}, {
"id": "580a2eb915a0161010c2a562",
"name": "\"Ah Me Joy\" Porter",
"created_at": "01-10-2016"
}, {
"id": "580a2eb915a0161010c2bf572",
"name": "Hello All",
"created_at": "05-10-2016"
}]
}
Mongodb Query
db.getCollection('posts').find({"id" : ObjectId("580a2eb915a0161010c2a562")})
So i want to know about distinct query of mongodb, please kindly go through my post and let me know.
try as follows:
db.getCollection('posts').distinct("id")
It will return all the unique IDs in the collection posts as follows:
["580a2eb915a0161010c2a562", "580a2eb915a0161010c2bf572"]
From MongoDB docs:
The example use the inventory collection that contains the following documents:
{ "_id": 1, "dept": "A", "item": { "sku": "111", "color": "red" }, "sizes": [ "S", "M" ] }
{ "_id": 2, "dept": "A", "item": { "sku": "111", "color": "blue" }, "sizes": [ "M", "L" ] }
{ "_id": 3, "dept": "B", "item": { "sku": "222", "color": "blue" }, "sizes": "S" }
{ "_id": 4, "dept": "A", "item": { "sku": "333", "color": "black" }, "sizes": [ "S" ] }
To Return Distinct Values for a Field (dept):
db.inventory.distinct( "dept" )
The method returns the following array of distinct dept values:
[ "A", "B" ]
Reference:
https://docs.mongodb.com/v3.2/reference/method/db.collection.distinct/
As per my understanding, you want to get distinct results which should eliminates the duplicate id in that collection
By using distinct in mongodb, It will return list of distinct values
db.getCollection('posts').distinct("id");
["580a2eb915a0161010c2a562", "580a2eb915a0161010c2bf572"]
So you should look into mongodb aggregation
db.posts.aggregate(
{ "$group" : { "_id" : "$id", "name" : {"$first" : "$name"}, "created_at" : {"$first" : "$created_at"} }}
)
The output will be list of results which eliminates the duplicate id documents

Resources