Mongo DB sort by calculated field - node.js

I have users collection in mongo db.
I want to sort the docs in the collection based on the delta between wins and losts
In the case of the document below: 2-11 = -9
How do I do that? (I am using mongo from node js )
{
"_id": {
"$oid": "615ebae9f5b277c71b886906"
},
"name": "jack",
"wins": {
"$numberInt": "2"
},
"losts": {
"$numberInt": "11"
},
"lastVisit": {
"$date": {
"$numberLong": "1633852348009"
}
}
}

$sort doens't allow us to use an expression and sort by it so the only solution i think is to add one more field, sort by it and then removed it.
Query
on sort you can set it 1/-1 depending on the order you want
aggregate(
[{"$set": {"dif": {"$subtract": ["$wins", "$losts"]}}},
{"$sort": {"dif": 1}},
{"$unset": ["dif"]}])

Related

how can i sort data with a array element in mongodb without using unwind

this is my sample data in this I have a userId and a array "watchHistory", "watchHistory" array contains the list of videos that is watched by the user :
{
"_id": "62821344445c30b35b441f11",
"userId": 579,
"__v": 0,
"watchHistory": [
{
"seenTime": "2022-05-23T08:29:19.781Z",
"videoId": 789456,
"uploadTime": "2022-03-29T12:33:35.312Z",
"description": "Biography of Indira Gandhi",
"speaker": "andrews",
"title": "Indira Gandhi",
"_id": "628b45df775e3973f3a670ec"
},
{
"seenTime": "2022-05-23T08:29:39.867Z",
"videoId": 789455,
"uploadTime": "2022-03-31T07:37:39.712Z",
"description": "What are some healthy food habits to stay healthy",
"speaker": "morris",
"title": "Healthy Food Habits",
"_id": "628b45f3775e3973f3a670"
},
]
}
I need to match the userId and after that i need to sort it with "watchHistory.seenTime", seenTime field indicates when the user saw the video. so i need to sort like the last watched video should come first in the list.
I don't have permission to use unwind so can any one help me from this. Thank you.
If you are using MongoDB version 5.2 and above, you can use $sortArray operator in an aggregation pipeline. Your pipeline should look something like this:
db.collection.aggregate(
[
{"$match":
{ _id: '62821344445c30b35b441f11' }
},
{
"$project": {
_id: 1,
"userId": 1,
"__v": 1,
"watchHistory": {
"$sortArray": { input: "$watchHistory", sortBy: { seenTime: -1 }}
}
}
}
]
);
Please modify the filter for "$match" stage, according to the key and value you need to filter on. Here's the link to the documentation.
Without using unwind, it's not possible to do it via an aggregation pipeline, but you can use update method and $push operator, as a workaround like this:
db.collection.update({
_id: "62821344445c30b35b441f11"
},
{
$push: {
watchHistory: {
"$each": [],
"$sort": {
seenTime: -1
},
}
}
})
Please see the working example here

How to define an index to use in a Mango Query

I am trying to create a CouchDB Mango Query with an index with the hope that the query runs faster. At the moment I have the following Mango Query which returns what I am looking for but it's slow. Therefore, I assume, I need to create an index to make it faster. I need help figuring out how to create that index.
selector: {
categoryIds: {
$in: categoryIds,
},
},
sort: [{ publicationDate: 'desc' }],
You can assume that my documents are let say news articles from different categories. Therefore in each document I have a field that contains one or more categories that the news article belongs to. For that I have an array of categoryIds for each document. My query needs to be optimized for queries like "Give me all news that have categoryId1 in their array of categoryIds sorted by publicationDate". What I don't know how to do is 1. How to define an index 2. What that index should be 3. How to use that index in "use_index" field of the Mango Query. Any help is appreciated.
Update after "Alexis Côté" answer:
If I define the index like this:
{
"_id": "_design/0f11ca4ef1ea06de05b31e6bd8265916c1bbe821",
"_rev": "6-adce50034e870aa02dc7e1e075c78361",
"language": "query",
"views": {
"categoryIds-json-index": {
"map": {
"fields": {
"categoryIds": "asc"
},
"partial_filter_selector": {}
},
"reduce": "_count",
"options": {
"def": {
"fields": [
"categoryIds"
]
}
}
}
}
}
And run the Mango Query like this:
{
"selector": {
"categoryIds": {
"$in": [
"e0bd5f97ac35bdf6893351337d269230"
]
}
},
"use_index": "categoryIds-json-index"
}
It still does return the results but they are not sorted in the order I want by publicationDate. So I am not clear what you are suggesting the solution is.
You can create an index as documented here
In your case, you will need an index on the "categoryIds" field.
You can specify the index using "use_index": "_design/<name>"
Note:The query planner should automatically pick this index if it's compatible.

Mongoose : how to set a field of a model with result from an agregation

Here is my sample :
Two simple Mongoose models:
a Note model, with among other fields an id field that is a ref for the Notebook model.
a Notebook model, with the id I mentioned above.
My goal is to output something like that:
[
{
"notes_count": 7,
"title": "first notebook",
"id": "5585a9ffc9506e64192858c1"
},
{
"notes_count": 3,
"title": "second notebook",
"id": "558ab637cab9a2b01dae9a97"
}
]
Using aggregation and population on the Note model like this :
Note.aggregate(
[{
"$group": {
"_id": "$notebook",
"notes_count": {
"$sum": 1
}
}
}, {
"$project": {
"notebook": "$_id",
"notes_count": "$notes_count",
}
}]
gives me this kind of result :
{
"_id": "5585a9ffc9506e64192858c1",
"notes_count": 7,
"notebook": {
"_id": "5585a9ffc9506e64192858c1",
"title": "un carnet court",
"__v": 0
}
}
Forget about __v and _id fields, would be easy to handle with a modified toJSON function.
But in this function neither doc nor ret params gives me access to the computed notes_count value.
Obviously, I could manage this in the route handler (parse result and recreate the datas that will be returned) but, is there a proper way to do that with mongoose ?
You can't use the aggregate method to update. As you have noted, you'll need to use output from the aggregate constructor to update the relevant documents.
As the Mongoose aggregate method will return a collection of plain objects, you can iterate through this and utilise the _id field (or similar) to update the documents.

Query data where userID in multiples ID

I try to make a query and i don't know the right way to do this.
The mongo collection structure contains multiples user ID (uid) and i want to make a query that get all datas ("Albums") where the User ID match one of the uid.
I do not know if the structure of the collection is good for that and I would like to know if I should do otherwise.
{
"_id": ObjectId("55814a9799677ba44e7826d1"),
"album": "album1",
"pictures": [
"1434536659272.jpg",
"1434552570177.jpg",
"1434552756857.jpg",
"1434552795100.jpg"
],
"uid": [
"12814a8546677ba44e745d85",
"e745d677ba4412814e745d7b",
"28114a85466e745d677d85qs"
],
"__v": 0
}
I just searched on internet and found this documentation http://docs.mongodb.org/manual/reference/operator/query/in/ but I'm not certain that this is the right way.
In short, I need to know: if I use the right method for the stucture of the collection and the operator "$in" is the right solution (knowing that it may have a lot of "User ID": between 2 and 2000 maximum).
You don't need $in unless you are matching for more than one possible value in a field, and that field does not have to be an array. $in is in fact shorthand for $or.
You just need a simple query here:
Model.find({ "uid": "12814a8546677ba44e745d85" },function(err,results) {
})
If you want "multiple" user id's then you can use $in:
Model.find(
{ "uid": { "$in": [
"12814a8546677ba44e745d85",
"e745d677ba4412814e745d7b",
] } },
function(err,results) {
}
)
Which is short for $or in this way:
Model.find(
{
"$or": [
{ "uid": "12814a8546677ba44e745d85" },
{ "uid": "e745d677ba4412814e745d7b" }
]
},
function(err,results) {
}
)
Just to answer your question, you can use the below query to get the desired result.
db.mycollection.find( {uid : {$in : ["28114a85466e745d677d85qs"] } } )
However, you need to revisit your data structure, looks like its a Many-to-Many problem and you might need to think about introducing a mid collection for that.

Query all unique values of a field with Elasticsearch

How do I search for all unique values of a given field with Elasticsearch?
I have such a kind of query like select full_name from authors, so I can display the list to the users on a form.
You could make a terms facet on your 'full_name' field. But in order to do that properly you need to make sure you're not tokenizing it while indexing, otherwise every entry in the facet will be a different term that is part of the field content. You most likely need to configure it as 'not_analyzed' in your mapping. If you are also searching on it and you still want to tokenize it you can just index it in two different ways using multi field.
You also need to take into account that depending on the number of unique terms that are part of the full_name field, this operation can be expensive and require quite some memory.
For Elasticsearch 1.0 and later, you can leverage terms aggregation to do this,
query DSL:
{
"aggs": {
"NAME": {
"terms": {
"field": "",
"size": 10
}
}
}
}
A real example:
{
"aggs": {
"full_name": {
"terms": {
"field": "authors",
"size": 0
}
}
}
}
Then you can get all unique values of authors field.
size=0 means not limit the number of terms(this requires es to be 1.1.0 or later).
Response:
{
...
"aggregations" : {
"full_name" : {
"buckets" : [
{
"key" : "Ken",
"doc_count" : 10
},
{
"key" : "Jim Gray",
"doc_count" : 10
},
]
}
}
}
see Elasticsearch terms aggregations.
Intuition:
In SQL parlance:
Select distinct full_name from authors;
is equivalent to
Select full_name from authors group by full_name;
So, we can use the grouping/aggregate syntax in ElasticSearch to find distinct entries.
Assume the following is the structure stored in elastic search :
[{
"author": "Brian Kernighan"
},
{
"author": "Charles Dickens"
}]
What did not work: Plain aggregation
{
"aggs": {
"full_name": {
"terms": {
"field": "author"
}
}
}
}
I got the following error:
{
"error": {
"root_cause": [
{
"reason": "Fielddata is disabled on text fields by default...",
"type": "illegal_argument_exception"
}
]
}
}
What worked like a charm: Appending .keyword with the field
{
"aggs": {
"full_name": {
"terms": {
"field": "author.keyword"
}
}
}
}
And the sample output could be:
{
"aggregations": {
"full_name": {
"buckets": [
{
"doc_count": 372,
"key": "Charles Dickens"
},
{
"doc_count": 283,
"key": "Brian Kernighan"
}
],
"doc_count": 1000
}
}
}
Bonus tip:
Let us assume the field in question is nested as follows:
[{
"authors": [{
"details": [{
"name": "Brian Kernighan"
}]
}]
},
{
"authors": [{
"details": [{
"name": "Charles Dickens"
}]
}]
}
]
Now the correct query becomes:
{
"aggregations": {
"full_name": {
"aggregations": {
"author_details": {
"terms": {
"field": "authors.details.name"
}
}
},
"nested": {
"path": "authors.details"
}
}
},
"size": 0
}
Working for Elasticsearch 5.2.2
curl -XGET http://localhost:9200/articles/_search?pretty -d '
{
"aggs" : {
"whatever" : {
"terms" : { "field" : "yourfield", "size":10000 }
}
},
"size" : 0
}'
The "size":10000 means get (at most) 10000 unique values. Without this, if you have more than 10 unique values, only 10 values are returned.
The "size":0 means that in result, "hits" will contain no documents. By default, 10 documents are returned, which we don't need.
Reference: bucket terms aggregation
Also note, according to this page, facets have been replaced by aggregations in Elasticsearch 1.0, which are a superset of facets.
The existing answers did not work for me in Elasticsearch 5.X, for the following reasons:
I needed to tokenize my input while indexing.
"size": 0 failed to parse because "[size] must be greater than 0."
"Fielddata is disabled on text fields by default." This means by default you cannot search on the full_name field. However, an unanalyzed keyword field can be used for aggregations.
Solution 1: use the Scroll API. It works by keeping a search context and making multiple requests, each time returning subsequent batches of results. If you are using Python, the elasticsearch module has the scan() helper function to handle scrolling for you and return all results.
Solution 2: use the Search After API. It is similar to Scroll, but provides a live cursor instead of keeping a search context. Thus it is more efficient for real-time requests.

Resources