Get the size of the result of aggregate method MongoDB - python-3.x

I have this aggregate query :
cr = db.last_response.aggregate([
{"$unwind": '$blocks'},
{"$match": {"sender_id": "1234", "page_id": "563921", "blocks.tag": "pay1"}},
{"$project": {"block": "$blocks.block"}}
])
Now i want to get the number of element it returned (is it empty cursor or not).
This is how i did :
I defined an empty array :
x = []
I iterated through the cursor and append the array x:
for i in cr :
x.append(i['block'])
print("length of the result of aggregation cursor :",len(x))
My question is : Is there any faster way to get the number of the result of aggregate query like the count() method of the find() query ?
Thanks

The faster way is that reject operations of transfers all data from mongod to you application. To do this you may add final group stage to count docs
{"$group": {"_id": None, "count": {"$sum": 1}}},
This is mean that mongod do aggregate and get as result count of docs.
Thereis no way to get count of result without execution of aggregation pipeline.

Related

update an array element in mongodb

I am trying to update an array element inside a document without changing whole array.
Array elements look like this
Suppose i have to only update index 1 element's value. For that i have:
_id of the document
index 1's value ("optionalImages-624476a7bd4d2bfe6bf86e9a-1-1650025533684.jpeg")
to be updated value ("optionalImages-624476a7bd4d2bfe6bf86e9a-1-1650025534589.jpeg").
I think it can be updated by mongodb's arrayfilters but i don't get the documentation correctly.
Your help will be highly appreciated.
Query1
arrayFilters using $[m] inside the path to specify the member value that we want to change
m is the member with value 20, and we set it to 100
(instead of 20 and 100, put your "....jpg" strings)
Playmongo
update(
{"_id": {"$eq": 1}},
{"$set": {"ar.$[m]": 100}},
{"arrayFilters": [{"m": {"$eq": 20}}])
Query2
pipeline update >= MongoDB 4.2
uses map on the array to
find the member with value 20, and replaces it with 100
Playmongo
update(
{"_id": {"$eq": 1}},
[{"$set":
{"ar":
{"$map":
{"input": "$ar",
"in": {"$cond": [{"$eq": ["$$this", 20]}, 100, "$$this"]}}}}}])

Count and data in single query in Azure Cosmos DB

I want to return the count and data by writing it in a single Cosmos sql query.
Something like
Select *, count() from c
Or if possible i want get the count in a json document.
[
{
"Count" : 1111
},
{
"Name": "Jon",
"Age" : 30
}
]
You're going to have to issue two separate queries - one to get the total number of documents matching your query, and a second to get a page of documents.

Query to get all Cosmos DB documents referenced by another

Assume I have the following Cosmos DB container with the possible doc type partitions:
{
"id": <string>,
"partitionKey": <string>, // Always "item"
"name": <string>
}
{
"id": <string>,
"partitionKey": <string>, // Always "group"
"items": <array[string]> // Always an array of ids for items in the "item" partition
}
I have the id of a "group" document, but I do not have the document itself. What I would like to do is perform a query which gives me all "item" documents referenced by the "group" document.
I know I can perform two queries: 1) Retrieve the "group" document, 2) Perform a query with IN clause on the "item" partition.
As I don't care about the "group" document other than getting the list of ids, is it possible to construct a single query to get me all the "item" documents I want with just the "group" document id?
You'll need to perform two queries, as there are no joins between separate documents. Even though there is support for subqueries, only correlated subqueries are currently supported (meaning, the inner subquery is referencing values from the outer query). Non-correlated subqueries are what you'd need.
Note that, even though you don't want all of the group document, you don't need to retrieve the entire document. You can project just the items property, which can then be used in your 2nd query, with something like array_contains(). Something like:
SELECT VALUE g.items
FROM g
WHERE g.id="1"
AND g.partitionKey="group"
SELECT VALUE i.name
FROM i
WHERE array_contains(<items-from-prior-query>,i.id)
AND i.partitionKey="item"
This documentation page clarifies the two subquery types and support for only correlated subqueries.

Timeout for db.collection.distinct()?

I have a database with a collection of about 90k documents. Each document is as follows:
{
'my_field_name': "a", # Or "b" or "c" ...
'p1': Array[30],
'p2': Array[10000]
}
There are about 9 unique values for a field name. When there where ~30k documents in the collection:
>>> db.collection.distinct("my_field_name")
["a", "b", "c"]
However, now with 90k documents, db.collection.distinct() returns an empty list.
>>> db.collection.distinct("my_field_name")
[]
Is there a maxTimeMS setting for db.collection.distinct? If so how could I set it to a higher value. If not what else could I investigate?
One thing you can do to immediately speed up your query's execution time is to index the field on which you are running the 'distinct' operation on (if the field is not already indexed).
That being said, if you want to set a maxTimeMS, one work around is to rewrite your query as an aggregation and set the operation timeout on the returned cursor. E.g:
db.collection.aggregate([
{ $group: { _id: '$my_field_name' } },
]).maxTimeMS(10000);
However unlike distinct, a cursor will be returned by the above query.

Increase performance for this MongoDB query

I have a MongoDB document with quite a large embedded array:
name : "my-dataset"
data : [
{country : "A", province: "B", year : 1990, value: 200}
... 150 000 more
]
Let us say I want to return data objects where country == "A".
What is the proper way of doing this, for example via NodeJs?
Given 150 000 entries with 200 matches, how long should the query take approximately?
Would it be better (performance/structure wise) to store data as documents and the name as a property of each document?
Would it be more efficient to use Mysql for this? )
A) Just find them with a query.
B) If the compound index {name:1, data.country:1} is built, the query should be fast. But you store all the data in one array, $unwind op has to be used. As a result, the query could be slow.
C) It will be better. If you store the data like:
{country : "A", province: "B", year : 1990, value: 200, name:"my-dataset"}
{country : "B", province: "B", year : 1990, value: 200, name:"my-dataset"}
...
With compound index {name:1, country:1}, the query time should be < 10ms.
D) MySQL vs MongoDB 1000 reads
1.You can use the MongoDB aggregation :
db.collection.aggregate([
{$match: {name: "my-dataset"}},
{$unwind: "$data"},
{$match: {"data.country": "A"}}
])
Will return a document for each data entry where the country is "A". If you want to regroup the datasets, add a $group stage :
db.collection.aggregate([
{$match: {name: "my-dataset"}},
{$unwind: "$data"},
{$match: {"data.country": "A"}},
{$group: {_id: "$_id", data: {$addToSet: "$data"}}}
])
(Didn't test it on a proper dataset, so it might be bugged)
2.150000 Subdocuments is still not a lot for mongodb, so if you're only querying on one dataset it should be pretty fast (the order of the millisecond).
3.As long as you are sure that your document is going to be smaller than 16MB (kinda hard to say), the maximum BSON document size), it should be fine, but the queries would be simpler if you stored your data as documents with the dataset name as a property, which is generally better for performances.

Resources