Jolt merge array values from objects in one array - object

I have the following Json array:
[ {
"name" : [ "roger", "roger" ],
"state" : [ "primary", "quality" ],
"value" : [ 1, 2 ]
}, {
"name" : [ "david", "david" ],
"state" : [ "primary", "quality" ],
"value" : [ 4, 5 ]
} ]
and I want to have the following Json object result using Jolt
{
"name" : [ "roger", "roger" , "david", "david" ],
"state" : [ "primary", "quality" ,"primary", "quality" ],
"value" : [ 1, 2 , 4, 5]
}
please someone can help me?

You can apply shift transformation twice such as
[
{
"operation": "shift",
"spec": {
"*": {
"*": "&.&1"
}
}
},
{
"operation": "shift",
"spec": {
"*": {
"0": {
"*": "&2[]"
},
"1": {
"*": "&2[]"
}
}
}
}
]
where determine the keys(&) and respective indexes(&1->0 and 1) by prepending ampersand of keys such as "&.&1" in the first step, then dissipate each respective values through use of "*": "&2[]" in which &2 represents going two levels up in order to traverse two curly braces in order to reach the root key to target the each values of the arrays.

Related

Pymongo query to extract value of a matching key

I have below document:
{
"_id": "61f7d5cfd0c32b744d3f81c2",
"_form": "61e66b8fd0c32b744d3e24a0",
"_workflow": "61e54fe2d0c32b744d3e0b7c",
"_appUser": "61e6b098d0c32b744d3e3808",
"sectionResponse": [{
"_id": "61f7d5cfd0c32b744d3f81c3",
"name": "Project Details & Goals",
"order": 2,
"fieldResponse": [{
"_id": "61f7d5cfd0c32b744d3f81c4",
"fieldType": "Text",
"name": "Project Name",
"value": "TRT",
"order": 0
},
{
"_id": "61f7d5cfd0c32b744d3f81c5",
"fieldType": "Number",
"name": "Amount Requested",
"value": "20",
"order": 1
},
{
"_id": "61f7d5cfd0c32b744d3f81c6",
"fieldType": "Number",
"name": "Project Cost",
"value": "50",
"order": 1
},
{
"_id": "61f7d5cfd0c32b744d3f81c7",
"fieldType": "Comment",
"name": "Project Goals",
"value": "TRT",
"order": 3
}
]
},
{
"_id": "61f7d5cfd0c32b744d3f81c8",
"name": "Section Heading",
"order": 2,
"fieldResponse": [{
"_id": "61f7d5cfd0c32b744d3f81c9",
"fieldType": "Multiselectdropdown",
"name": "Multiselectdropdown",
"value": "Y",
"order": 0
},
{
"_id": "61f7d5cfd0c32b744d3f81ca",
"fieldType": "Image_Upload",
"name": "Image Upload",
"value": "Y",
"order": 1
}
]
}
],
"order": 2,
"status": "Reviewed",
"updatedAt": "2022-01-31T12:27:59.541Z",
"createdAt": "2022-01-31T12:27:59.541Z",
"__v": 0
}
Inside the document, there is a sectionResponse which contains response of multiple sections. Inside this, there is a fieldResponse which contains the name and value. I have to extract the value from all the documents where name is Amount Requested.
How can I write a query for such a situation?
Here is a solution that returns only matching material and requires no $unwind.
db.foo.aggregate([
// This stage alone is enough to give you the basic info.
// You will get not only doc _id but also an array of arrays
// (fieldResponse within sectionResponse) containing the whole
// fieldResponse doc. It is slight awkward but if you need structural data
// other than *just* the value, it is a good start:
{$project: {
// outer filter removes inner filter results where size is 0
// i.e. no "Amount Requested" found.
XX: {$filter: {input:
{$map: {
input: "$sectionResponse", as: "z1", in:
// inner filter gets only name = Amount Requested
{$filter: {input: "$$z1.fieldResponse",
as: "z1",
cond: {$eq:["$$z1.name","Amount Requested"]}
}}
}},
as: "z2",
cond: {$ne: ["$$z2", [] ]}
}}
}}
which yields (given a slightly expanded input set where subdocs were copied but the value and order changed for clarity):
{
"_id" : 0,
"XX" : [
[
{
"_id" : "61f7d5cfd0c32b744d3f81c5",
"fieldType" : "Number",
"name" : "Amount Requested",
"value" : "20",
"order" : 1
},
{
"_id" : "61f7d5cfd0c32b744d3f81c5",
"fieldType" : "Number",
"name" : "Amount Requested",
"value" : "77",
"order" : 18
}
],
[
{
"_id" : "61f7d5cfd0c32b744d3f81c5",
"fieldType" : "Number",
"name" : "Amount Requested",
"value" : "99",
"order" : 818
}
]
]
}
{
"_id" : 1,
"XX" : [
[
{
"_id" : "61f7d5cfd0c32b744d3f81c5",
"fieldType" : "Number",
"name" : "Amount Requested",
"value" : "333",
"order" : 1
}
]
]
}
{ "_id" : 2, "XX" : [ ] }
If you don't want to know about top level docs that contained
NO fieldResponses where name = "Amount Requested" then append this stage:
{$match: {XX: {$ne: [] }}}
Finally, if you really want just the values, append this reduce stage:
,{$addFields: {XX: {$reduce: {
input: "$XX",
initialValue: [],
in: {$concatArrays: ["$$value",
{$map: {input: "$$this",
as:"z",
in: "$$z.value"
}} ] }
}}
}}
which yields:
{ "_id" : 0, "XX" : [ "20", "77", "99" ] }
{ "_id" : 1, "XX" : [ "333" ] }
If you want a little more than just value(like order for example) then have $map return a doc instead of a scalar, e.g.:
{$map: {input: "$$this",
as:"z",
in: {v:"$$z.value",o:"$$z.order"}
}} ] }
to yield:
{
"_id" : 0,
"XX" : [
{
"v" : "20",
"o" : 1
},
{
"v" : "77",
"o" : 18
},
{
"v" : "99",
"o" : 818
}
]
}
{ "_id" : 1, "XX" : [ { "v" : "333", "o" : 1 } ] }
Again, the input set provided by the OP was expanded with additional {name:"Amount Requested"} subdocs tossed into different sectionResponse arrays to generate a more complex structure.
Maybe something like this which you may easy adapt to python supposing you need only value from sectionResponse.$[].fieldResponse.$[] elements having the name "Amount Requested":
db.collection.aggregate([
{
$match: {
"sectionResponse.fieldResponse.name": "Amount Requested"
}
},
{
"$project": {
"sectionResponse": {
"$filter": {
"input": {
"$map": {
"input": "$sectionResponse",
"as": "somesub",
"in": {
"_id": "$$somesub._id",
"fieldResponse": {
"$filter": {
"input": "$$somesub.fieldResponse",
"as": "sub",
"cond": {
"$eq": [
"$$sub.name",
"Amount Requested"
]
}
}
}
}
}
},
"as": "some",
"cond": {
"$gt": [
{
"$size": "$$some.fieldResponse"
},
0
]
}
}
}
}
},
{
$unwind: "$sectionResponse"
},
{
$unwind: "$sectionResponse.fieldResponse"
},
{
$project: {
value: "$sectionResponse.fieldResponse.value"
}
}
])
Match the documents containing at least one element with sectionResponse.fieldResponse.name:"Amount Requested"
project/map all sectionResponse.fieldResponse elements containing name="Amount Requested" ( non empty elements only )
unwind the sectionResponse array
unwind the fieldResponse array
project only the value field.
playground
For best results index on "sectionResponse.fieldResponse.name" need to be added.

Using Jolt Spec how to reverse reduce a list of dictionary by a key using

Using the following code I was able to map a list of dictionaries by a key
import json
values_list = [{"id" : 1, "user":"Rick", "title":"More JQ"}, {"id" : 2, "user":"Steve", "title":"Beyond"}, {"id" : 1, "user":"Rick", "title":"Winning"}]
result = {}
for data in values_list:
id = data['id']
user = data['user']
title = data['title']
if id not in result:
result[id] = {
'id' : id,
'user' : user,
'books' : {'titles' : []}
}
result[id]['books']['titles'].append(title)
print(json.dumps((list(result.values())), indent=4))
Knowing how clean is Jolt Spec and trying to separate the schema outside of the code.
Is there a way to use Jolt Spec to achieve the same result.
The Result
[
{
"id": 1,
"user": "Rick",
"books": {
"titles": [
"More JQ",
"Winning"
]
}
},
{
"id": 2,
"user": "Steve",
"books": {
"titles": [
"Beyond"
]
}
}
]
You can use three levels of consecutive specs
[
{
"operation": "shift",
"spec": {
"*": {
"*": "#(1,id).&",
"title": "#(1,id).books.&s[]"
}
}
},
{
"operation": "shift",
"spec": {
"*": ""
}
},
{
"operation": "cardinality",
"spec": {
"*": {
"id": "ONE",
"user": "ONE"
}
}
}
]
in the first spec, the common id values are combined by "#(1,id)." expression
in the second spec, the integer keys(1,2) of the outermost objects are removed
in the last spec,only the first of the repeating elements are picked

How do I use Jolt to flatten a json array of n objects with the key?

I have a fairly straightforward use case, but I can't seem to wrap my head around the shift specification that would make this transpose possible. It's primarily just flattening the tree hierarchy into simple output arrays.
How would a turn this input JSON:
{
"123": [
{
"VALUE_ONE": "Y",
"VALUE_TWO": "12"
},
{
"VALUE_ONE": "N",
"VALUE_TWO": "2"
}
],
"456": [
{
"VALUE_ONE": "Y",
"VALUE_TWO": "35"
}
]
}
Into this output:
[
{
"value_one_new_name": "Y",
"value_two_new_name": "12",
"key": "123"
},
{
"value_one_new_name": "N",
"value_two_new_name": "2",
"key": "123"
},
{
"value_one_new_name": "Y",
"value_two_new_name": "35",
"key": "456"
}
]
NOTE that I don't know what the key ("456", "123" .. etc) would be for each object, so the jolt spec needs to be generic enough to convert any keys, only known field names are "VALUE_ONE" and "VALUE_TWO".
This steps will do the trick:
[
{
"operation": "shift",
"spec": {
"*": {
"*": {
"VALUE_ONE": "&2.[&1].value_one_new_name",
"VALUE_TWO": "&2.[&1].value_two_new_name",
"$1": "&2.[&1].key"
}
}
}
},
{
"operation": "shift",
"spec": {
"*": {
"*": "[]"
}
}
}
]

Find with arrayFilters using Mongoose

I have to filter the object which contains only status C in comments(If atleast only comment have the status C then that object alone should be print) I tried using array Filters but I don't get exact result
{
"_id" : ObjectId("5b8f84379f432a42383a85bb"),
"projectID" : ObjectId("00000000e614c33390237ce3"),
"inspection_data" : [
{
"locationAspects" : [
{
"aspectname" : "Ground floor",
"comments" : [
{
"status" :"C",
"comment" : [
"good"
],
"_id" : ObjectId("5b8f84379f16400f884d9974")
}
],
"_id" : ObjectId("5b8f84379f16400f884d9975")
},
{
"aspectname" : "Second floor",
"comments" : [
{
"status" :"P",
"comment" : [
"nothing"
],
"_id" : ObjectId("5b8f84379f16400f884d9971")
}
],
"_id" : ObjectId("5b8f84379f16400f884d9972")
},
],
"published_date" : ISODate("2018-09-05T07:22:31.017Z"),
"_id" : ObjectId("5b8f84379f16400f884d9976")
},
{
"locationAspects" : [
{
"aspectname" : "Ground floor",
"comments" : [
{
"status" :"P",
"comment" : [
"good"
],
"_id" : ObjectId("5b8f84379f16400f884d9974")
}
],
"_id" : ObjectId("5b8f84379f16400f884d9975")
}
],
"published_date" : ISODate("2018-09-05T07:22:31.017Z"),
"_id" : ObjectId("5b8f84379f16400f884d9976")
}
]
Now the inspection data having two object but one object only containing comment status c, So That should be print
Expected Result
[ {
"locationAspects" : [
{
"aspectname" : "Ground floor",
"comments" : [
{
"status" :"C",
"comment" : [
"good"
],
"_id" : ObjectId("5b8f84379f16400f884d9974")
}
],
"_id" : ObjectId("5b8f84379f16400f884d9975")
},
{
"aspectname" : "Second floor",
"comments" : [
{
"status" :"P",
"comment" : [
"nothing"
],
"_id" : ObjectId("5b8f84379f16400f884d9971")
}
],
"_id" : ObjectId("5b8f84379f16400f884d9972")
},
],
"published_date" : ISODate("2018-09-05T07:22:31.017Z"),
"_id" : ObjectId("5b8f84379f16400f884d9976")
}]
Above object only having the status C if alteast one comment status is C that object alone have to display
You need $filter to process inspection_data. Things are getting complicated since you have multiple levels of nestings, so before you can apply $in condition you need to get an array of all statuses for single inspection_data. To achieve that you can use direct path like "$$data.locationAspects.comments.status" but it returns an array of arrays like this:
[ [ "C" ], [ "P" ] ], [ [ "P" ] ] ]
So you have to flatten that array and that can be achieved using $reduce and $concatArrays. Try:
db.col.aggregate([
{ $match: { projectID: ObjectId("00000000e614c33390237ce3") } },
{
$project: {
filtered_inspection_data: {
$filter: {
input: "$inspection_data",
as: "data",
cond: {
$let: {
vars: {
statuses: {
$reduce: {
input: "$$data.locationAspects.comments.status",
initialValue: [],
in: { $concatArrays: [ "$$this", "$$value" ] }
}
}
},
in: { $in: [ "C", "$$statuses" ] }
}
}
}
}
}
}
])
EDIT: to match "C" or "P" you can use replace { $in: [ "C", "$$statuses" ] } with following line
{ $or: [ { $in: [ "C", "$$statuses" ] }, { $in: [ "P", "$$statuses" ] } ] }

Converting a MongoDB aggregate into an ArangoDB COLLECT

I'm migrating data from Mongo to Arango and I need to reproduce a $group aggregation. I have successfully reproduced the results but I'm concerned that my approach maybe sub-optimal. Can the AQL be improved?
I have a collection of data that looks like this:
{
"_id" : ObjectId("5b17f9d85b2c1998598f054e"),
"department" : [
"Sales",
"Marketing"
],
"region" : [
"US",
"UK"
]
}
{
"_id" : ObjectId("5b1808145b2c1998598f054f"),
"department" : [
"Sales",
"Marketing"
],
"region" : [
"US",
"UK"
]
}
{
"_id" : ObjectId("5b18083c5b2c1998598f0550"),
"department" : "Development",
"region" : "Europe"
}
{
"_id" : ObjectId("5b1809a75b2c1998598f0551"),
"department" : "Sales"
}
Note the value can be a string, Array or not present
In Mongo I'm using the following code to aggregate the data:
db.test.aggregate([
{
$unwind:{
path:"$department",
preserveNullAndEmptyArrays: true
}
},
{
$unwind:{
path:"$region",
preserveNullAndEmptyArrays: true
}
},
{
$group:{
_id:{
department:{ $ifNull: [ "$department", "null" ] },
region:{ $ifNull: [ "$region", "null" ] },
},
count:{$sum:1}
}
}
])
In Arango I'm using the following AQL:
FOR i IN test
LET FIELD1=(FOR a IN APPEND([],NOT_NULL(i.department,"null")) RETURN a)
LET FIELD2=(FOR a IN APPEND([],NOT_NULL(i.region,"null")) RETURN a)
FOR f1 IN FIELD1
FOR f2 IN FIELD2
COLLECT id={department:f1,region:f2} WITH COUNT INTO counter
RETURN {_id:id,count:counter}
Edit:
The APPEND is used to convert string values into an Array
Both produce results that look like this;
{
"_id" : {
"department" : "Marketing",
"region" : "US"
},
"count" : 2.0
}
{
"_id" : {
"department" : "Development",
"region" : "Europe"
},
"count" : 1.0
}
{
"_id" : {
"department" : "Sales",
"region" : "null"
},
"count" : 1.0
}
{
"_id" : {
"department" : "Marketing",
"region" : "UK"
},
"count" : 2.0
}
{
"_id" : {
"department" : "Sales",
"region" : "UK"
},
"count" : 2.0
}
{
"_id" : {
"department" : "Sales",
"region" : "US"
},
"count" : 2.0
}
Your approach seems alright. I would suggest to use TO_ARRAY() instead of APPEND() to make it easier to understand though.
Both functions skip null values, thus it is unavoidable to provide some placeholder, or test for null explicitly and return an array with a null value (or whatever works best for you):
FOR doc IN test
FOR field1 IN doc.department == null ? [ null ] : TO_ARRAY(doc.department)
FOR field2 IN doc.region == null ? [ null ] : TO_ARRAY(doc.region)
COLLECT department = field1, region = field2
WITH COUNT INTO count
RETURN { _id: { department, region }, count }
Collection test:
[
{
"_key": "5b17f9d85b2c1998598f054e",
"department": [
"Sales",
"Marketing"
],
"region": [
"US",
"UK"
]
},
{
"_key": "5b18083c5b2c1998598f0550",
"department": "Development",
"region": "Europe"
},
{
"_key": "5b1808145b2c1998598f054f",
"department": [
"Sales",
"Marketing"
],
"region": [
"US",
"UK"
]
},
{
"_key": "5b1809a75b2c1998598f0551",
"department": "Sales"
}
]
Result:
[
{
"_id": {
"department": "Development",
"region": "Europe"
},
"count": 1
},
{
"_id": {
"department": "Marketing",
"region": "UK"
},
"count": 2
},
{
"_id": {
"department": "Marketing",
"region": "US"
},
"count": 2
},
{
"_id": {
"department": "Sales",
"region": null
},
"count": 1
},
{
"_id": {
"department": "Sales",
"region": "UK"
},
"count": 2
},
{
"_id": {
"department": "Sales",
"region": "US"
},
"count": 2
}
]

Resources