Mongodb aggregation or projection

Mongodb aggregation or projection - node.js

{
"items": [
{
"id": "5bb619e49593e5d3cbaa0b52",
"name": "Flowers",
"weight": "1.5"
},
{
"id": "5bb619e4ebdccb9218aa9dcb",
"name": "Chair",
"weight": "8.4"
},
{
"id": "5bb619e4911037797edae511",
"name": "TV",
"weight": "20.8"
},
{
"id": "5bb619e4504f248e1be543d3",
"name": "Skateboard",
"weight": "5.9"
},
{
"id": "5bb619e40fee29e3aaf09759",
"name": "Donald Trump statue",
"weight": "18.4"
},
{
"id": "5bb619e44251009d72e458b9",
"name": "Molkkÿ game",
"weight": "17.9"
},
{
"id": "5bb619e439d3e99e2e25848d",
"name": "Helmet",
"weight": "22.7"
}
]
}
I have this structure of models. I want to calculate the weight of each order.
Should I use aggregation or does someone have any idea?
this is an example of order :
{
"id": "5bb61dfd4d64747dd8d7d6cf",
"date": "Sat Aug 11 2018 02:01:25 GMT+0000 (UTC)",
"items": [
{
"item_id": "5bb619e44251009d72e458b9",
"quantity": 4
},
{
"item_id": "5bb619e4504f248e1be543d3",
"quantity": 2
},
{
"item_id": "5bb619e40fee29e3aaf09759",
"quantity": 3
}
]
}

You can use below aggregation
db.order.aggregate([
{ "$unwind": "$items" },
{ "$lookup": {
"from": "items",
"localField": "items.item_id",
"foreignField": "id",
"as": "item"
}},
{ "$unwind": "$item" },
{ "$addFields": { "items.weight": "$item.weight" }},
{ "$group": {
"_id": "$_id",
"items": { "$push": "$items" },
"date": { "$first": "$date" }
}}
])

You have two options here without changing your model structure:
pull all items used in Parcel from database in your application
perform all computations on database side using aggregation (and $lookup)
It very depends on your actual data model and dataset size. First option is very straightforward and potentially can be more performant on big datasets especially when sharding/replica set involved. But it requires more roundtrips to database which will bring more latency. On the other hand aggregation in certain cases can be quite slow on lookups.
But the only good way is to test it on your real data. If your current dataset is tiny (say 100s of Mb) choose the way you comfortable with - both will work great.
Update
Since you need to distribute Orders to Parcels I'd prefer to go with option #1, though using aggregation is still possible.
This is what I would do:
pull an Order from database
pull all related Items from database by ids found in Order.items
perform calculation of Order weight
create one Parcel if weight < 30 and save it to database
or if weight > 30 distribute somehow Items to Parcels and save them to database
Note, that you can pull multiple Items by their ids in one call with query like this:
{
_id: { $in: [<id1>, <id2>] }
}
There is also one more thing to consider. Please pay attention to the fact that MongoDB do not have transactions or multidocument atomicity. So performing this type of operations (pulling something from DB, performing calculations, and storing back) with schema defined the way you show can lead to creating duplicates.

Related

How to perform a bidirectional upsert in ArangoDB

I'm working with a dataset similar to ArangoDB official "friendship" example, except I'm adding a "weight" concept on the Edge Collection. Like so :
People
[
{ "_id": "people/100", "_key": "100", "name": "John" },
{ "_id": "people/101", "_key": "101", "name": "Fred" },
{ "_id": "people/102", "_key": "102", "name": "Jacob" },
{ "_id": "people/103", "_key": "103", "name": "Ethan" }
]
Friendship
[
{ "_from": "people/100", "_to": "people/101", "weight": 27 },
{ "_from": "people/103", "_to": "people/102", "weight": 31 },
{ "_from": "people/102", "_to": "people/100", "weight": 12 },
{ "_from": "people/101", "_to": "people/103", "weight": 56 }
]
I want to write a function that, when someone interacts with someone else, UPSERTs the Friendship between the two (incrementing the weight by 1 if it existed before, or initializing with a weight of 1 if it's new).
The trouble is, when executing that function, I have now clue on which direction the friendship was initialized, thus I cannot really use an upsert. So 2 questions here :
Is there any way to make an upsert on an edge with "bidirectional" filter ?
Like so, but bidirectional
UPSERT {
// HERE, I BASICALLY WAN'T TO IGNORE THE SIDE
_from: ${people1}, _to: ${people2}
}
INSERT {
_from: ${people1}, _to: ${people2}, weight: 1
}
UPDATE {
weight: OLD.weight + 1
}
IN ${friendshipCollection}
RETURN NEW
Instead of trying to "select the friendship, no matter the direction"; should I rather actually duplicate the friendship on both directions (and constantly maintain / update it) ?

How to identify possible inconsistencies in a particular collection based on similar fields?

My boss asked me if I could implement some sort of report where human error might occur. We have a trips collection that contains an origin, destination and distance. The idea is that if 10 trips with the same origin and destination have a distance of 40 and 1 single trip have a distance of 39 or 41, it should be mark as suspicious or something that would indicate an inconsistency.
In other words, if a trip with the same fields as others has a different distance, say, to the other 90% of the trips, it might be wrong, it should be reviewed.
Is this something that can be done within the aggregation pipeline? Or it would require some sort of extra logic in code?
Example:
[
{
"_id": "1",
"source": "City A",
"destination": "City B",
"distance": 40
},
{
"_id": "2",
"source": "City A",
"destination": "City B",
"distance": 40
},
{
"_id": "3",
"source": "City A",
"destination": "City B",
"distance": 40
},
{
"_id": "4",
"source": "City A",
"destination": "City B",
"distance": 39 // This is inconsistent, and should be flagged so it can be reviewed
},
]

You can certainly find all source-destination pairs that have distances outside of an acceptable margin.
Consider this aggregation pipeline.
First a $group stage to collect together all of the documents with the same source and destination and calculate the average and standard deviation
$unwind so that each document can be considered separately with the aggregate stats
Flag each record whose distance is greater than the standard deviation from the average Also flag all documents for which the standard deviation is greater than 5
$match only flagged documents
db.collection.aggregate([
{ $group: {
_id: { source: "$source", destination: "$destination" },
original: { $push: "$$ROOT" },
avdistance: { $avg: "$distance" },
stdDev: { $stdDevSamp: "$distance" } }
},
{ $unwind: "$original" },
{ $addFields: {
flag: { $or: [ { $eq: [1,
{ $cmp: [
{ $abs: {
$subtract: [
"$original.distance",
"$avdistance" ] } },
"$stdDev"
]}
]},
{ $eq: [1, {$cmp: [ "$stdDev", 5 ] }
] } ] },
} },
{ $match: { flag: true } }
])
Playground

How to get friend's leaderboard in MongoDB

This is my Friends Collection
[
{
"_id": "59e4fbcac23f38cdfa6963a8",
"friend_id": "59e48f0af8c277d7a8886ed7",
"user_id": "59e1d36ad17ad5ad3d0453f7",
"__v": 0,
"created_at": "2017-10-16T18:34:50.875Z"
},
{
"_id": "59e5065f705a90cfa218c9e5",
"friend_id": "59e48f0af8c277d7a8886edd",
"user_id": "59e1d36ad17ad5ad3d0453f7",
"__v": 0,
"created_at": "2017-10-16T19:19:59.483Z"
}
]
This is my Scores collection:
[
{
"_id": "59e48f0af8c277d7a8886ed8",
"score": 19,
"user_id": "59e48f0af8c277d7a8886ed7",
"created_at": "2017-10-13T09:02:10.010Z"
},
{
"_id": "59e48f0af8c277d7a8886ed9",
"score": 24,
"user_id": "59e48f0af8c277d7a8886ed7",
"created_at": "2017-10-11T00:56:10.010Z"
},
{
"_id": "59e48f0af8c277d7a8886eda",
"score": 52,
"user_id": "59e48f0af8c277d7a8886ed7",
"created_at": "2017-10-24T09:16:10.010Z"
},
]
This is my Users collection.
[
{
"_id": "59e48f0af8c277d7a8886ed7",
"name": "testuser_0",
"thumbnail": "path_0"
},
{
"_id": "59e48f0af8c277d7a8886edd",
"name": "testuser_1",
"thumbnail": "path_1"
},
{
"_id": "59e48f0af8c277d7a8886ee3",
"name": "testuser_2",
"thumbnail": "path_2"
},
{
"_id": "59e48f0af8c277d7a8886ee9",
"name": "testuser_3",
"thumbnail": "path_3"
},
]
And finally i need list of friends sorted in highscore order for a particular time period (say last 24 hours) with something like this...
[
{
"friend_id": "59e48f0af8c277d7a8886ed7",
"friend_name":"test_user_2"
"thumbnail":"image_path",
"highscore":15
},
"friend_id": "59e48f0af8c277d7a8886edd",
"friend_name":"test_user_3"
"thumbnail":"image_path",
"highscore":10
}
]
What's the best way to achieve this? I have tried aggregation pipeline but getting quite confused with working with 3 collections.

Following your answers, an array size of 500 entries in a document may not be a bad idea to store the friends as you would only store "friends id" and "created" in each entry. It saves having a collection.
You would not have too much performances issues if you project the data in your query by selecting only the fields you want.
https://docs.mongodb.com/v3.2/tutorial/project-fields-from-query-results/#return-specified-fields-only
For the score that increase of 30 per day; it depends what type of query you do.
It would take a while to reach the 2MB limit per the document by adding 30 scores per day.
regarding joining the different collections there is a stack overflow question about it:
How do I perform the SQL Join equivalent in MongoDB?
or
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
You will need to use the aggregation framework from mongoDB to use if; not just a find command.

Mongoose find and return only a part of the document

My object:
[
{
"_id": "568ad3db59b494d4284ac191",
"name": "Test",
"groups": [
{
"number": "1",
"name": "GroupTest",
"_id": "568ad3db59b494d4284ac19b",
"orders": [
{
"date": "2016-03-06T13:07:40.990Z",
"_id": "56dc2b9c1d47772806e4f0f4",
"readings": [
{
"readingid": "568ad3db59b494d4284ac1a5",
"_id": "56dc2b9c1d47772806e4f0fc"
},
{
"readingid": "568ad3db59b494d4284ac1a4",
"_id": "56dc2b9c1d47772806e4f0fb"
},
{
"readingid": "568ad3db59b494d4284ac1a3",
"_id": "56dc2b9c1d47772806e4f0fa"
},
{
"readingid": "568ad3db59b494d4284ac1a2",
"_id": "56dc2b9c1d47772806e4f0f9"
},
{
"readingid": "56d48ae1a0f6e04413fc8b3e",
"_id": "56dc2b9c1d47772806e4f0f8"
},
{
"readingid": "56d48ae1a0f6e04413fc8b3f",
"_id": "56dc2b9c1d47772806e4f0f7"
},
{
"readingid": "568ad3db59b494d4284ac1a1",
"_id": "56dc2b9c1d47772806e4f0f6"
},
{
"readingid": "568ad3db59b494d4284ac1a0",
"_id": "56dc2b9c1d47772806e4f0f5"
}
]
},
{....}
]
},
{....}
]
},
{.....}
]
I need to finde the order with the _id: "56dc2b9c1d47772806e4f0f4" in the group with the _id: "568ad3db59b494d4284ac19b" inside the client object with _id:"568ad3db59b494d4284ac191" and I only want to get that order subobject, not the whole client object.
I tried something like:
Client.find(
{_id: "568ad3db59b494d4284ac191", groups._id: "568ad3db59b494d4284ac19b", groups.orders._id:"56dc2b9c1d47772806e4f0f4"},
{groups.orders:{$elemMatch:{_id: "56dc2b9c1d47772806e4f0f4"}}})
Antoher attempt without success:
Client.find(
{_id: req.company, groups:ObjectId(req.params.groupId)},
{"groups.orders":{$elemMatch:{_id: ObjectId(req.params.orderId)}}}, function(e,company){
if(!e) {
console.log(company);
}
});

mongo is a "document-oriented" database, so all the normal operations and queries it provides return "documents" and not "parts of documents", so with the normal query methods there is no way to get just a subobject.
You can use the mongo aggregation framework to achieve what you want (https://docs.mongodb.org/manual/reference/operator/aggregation/), using two $unwind: https://docs.mongodb.org/manual/reference/operator/aggregation/unwind/ .
This will produce a document for each order and then use $match to filter out the ones you need.
But probably it might be better to rethink your data model. If you really need to query and retrieve orders only, without getting the whole client object, it might be better to store the orders in their own collection and use references to groups and clients.

index and query items in an array with mango query for cloudant and couchdb 2.0

I have the following db structure:
{"_id": "0096874","genre": ["Adventure","Comedy", "Sci-Fi" ]}
{"_id": "0099088","genre": ["Comedy", "Sci-Fi", "Western"]}
and like to query it like I could do in mongodb
db.movies.find({genre: {$in: ["Comedy"]}})
It works when i use a text index for the whole database, but that seems very wasteful:
// index
{
"index": {},
"type": "text"
}
//query
{
"selector": {
"genre": {
"$in": ["Comedy"]
}
},
"fields": [
"_id",
"genre"
]
}
The following index does not work:
{
"index": {
"fields": [
"genre"
]
},
"type": "json"
}
What is the correct index for cloudant query, which does not index the whole db?
Thanks for your help

You had it almost correct. Your index is right, but you need to throw in a selector to get all IDs https://cloudant.com/blog/mango-json-vs-text-indexes/.
This isn't a great solution performance-wise, as Tony says,
The reason this works is, again, because Mango performs the above $in operation as a filtering mechanism against all the documents. As we saw in the conclusion of the previous section on JSON syntax, the performance tradeoff with the query above is that it, essentially, performs a full index scan and then applies a filter.
{
"selector": {
"_id": {
"$gt": null
},
"genre": {
"$in": ["Western"]
}
},
"fields": [
"_id",
"genre"
],
"sort": [
{
"_id": "asc"
}
]
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string