CosmosDb Mongo - collection with shardkey, slow query by shardkey? - azure

I have a CosmosDb collection with Mongodb.
This is a customer database, and the ShardKey is actually CustomerId.
My collection has 200000 records, and has an combined index of both e-mail and customerid.
An example of a customer:
{
"CustomerId" : "6a0f4360-d722-4926-9751-9c7fe6a97cb3",
"FirstName" : "This is my company first name",
"LastName" : "This is my company last name",
"Email" : "6a0f4360-d722-4926-9751-9c7fe6a97cb3#somemail.com",
"Addresses" : [
{
"AddressId" : "54e34da9-55fb-4d60-8411-107985c7382e",
"Door" : "11111",
"Floor" : "99",
"Side" : "B",
"ZipCode" : "8888",
}
]
}
What I find strange is if I query by Email it spends 7000RUs (which is too much - at least is what data explorer tells me...) but if I query by CustomerId, it spends more or less the same RUs...
My questions are:
Shoudn't both operations spend less RUs than this, specially by CustomerId?
An example of a query by E-mail:
{ "Email" : { $eq: "3f7da6c3-81bd-4b1d-bfa9-d325388079ab#somemail.com" } }
An example of a query by CustomerId:
{ "CustomerId" : { $eq: "3f7da6c3-81bd-4b1d-bfa9-d325388079ab" } }
Another question, my index contains both Email and CustomerId. Is there any way for me to query by e-mail and return only CustomerId, for example?

Shoudn't both operations spend less RUs than this, specially by CustomerId?
CustomerId is your shard key (aka partition key) which helps in grouping documents with same value of CustomerId to be stored in the same logical partition. This grouping is used during pin-point GET/SET calls to Cosmos but not during querying. So, you would need an index on CustomerId explicitly.
Furthermore, since the index that you have is a composite index on CustomerId and Email, doing a query on only one of these fields at a time will lead to a scan being performed in order to get back the result. Hence the high RU charge and the similar amount of RU charge on each of these queries.
Another question, my index contains both Email and CustomerId. Is there any way for me to query by e-mail and return only CustomerId, for example?
Firstly, in order to query optimally on Email, you would need to create an index on Email separately. Thereafter, you may use the project feature of Mongo to include only certain fields in the response.
Something like this-
find({ "Email" : { $eq: "3f7da6c3-81bd-4b1d-bfa9-d325388079ab#somemail.com" } }, { "CustomerId":1 })

Related

MongoDB aggregation lookup with pagination is working slow in huge amount of data

I've a collection with more than 150 000 documents in MongoDB. I'm using Mongoose ODM v5.4.2 for MongoDB in Node.js. At the time of data retrieving I'm using Aggregation lookup with $skip and $limit for pagination. My code is working fine but after 100k documents It's taking 10-15 seconds to retrieve data. But I'm showing only 100 records at a time with the help of $skip and $limit. I've already created index for foreignField. But still it's getting slow.
campaignTransactionsModel.aggregate([{
$match: {
campaignId: new importModule.objectId(campaignData._id)
}
},
{
$lookup: {
from: userDB,
localField: "userId",
foreignField: "_id",
as: "user"
},
},
{
$lookup: {
from: 'campaignterminalmodels',
localField: "terminalId",
foreignField: "_id",
as: "terminal"
},
},
{
'$facet': {
edges: [{
$sort: {
[sortBy]: order
}
},
{ $skip: skipValue },
{ $limit: viewBy },
]
}
}
]).allowDiskUse(true).exec(function(err, docs) {
console.log(docs);
});
The query is taking longer because the server scans from beginning of input results(before skip stage) to skip the given number of docs and set the new result.
From official MongoDB docs :
The cursor.skip() method requires the server to scan from the
beginning of the input results set before beginning to return results.
As the offset increases, cursor.skip() will become slower.
You can use range queries to simulate similar result as of .skip() or skip stage(aggregation)
Using Range Queries
Range queries can use indexes to avoid scanning unwanted documents,
typically yielding better performance as the offset grows compared to
using cursor.skip() for pagination.
Descending Order
Use this procedure to implement pagination with range queries:
Choose a field such as _id which generally changes in a consistent
direction over time and has a unique index to prevent duplicate
values
Query for documents whose field is less than the start value
using the $lt and cursor.sort() operators, and
Store the last-seen field value for the next query.
Increasing Order
- Query for documents whose field is less than the start value
using the $gt and cursor.sort() operators, and
Lets say the last doc you got has _id : objectid1, then you can query the docs who have _id : {$lt : objectid1} to get the docs in decreasing order. and for incresing order you can query the docs who have _id : {$gt : objectid1}
Read official docs on Range queries for more information.

(AQL) how to INSERT more than one document INTO a collection

according to ArangoDB documentation, INSERT operation is applicable for a single document insertion. I have an array which may have thousands of objects (documents). Is there a way to INSERT this array of document INTO a collection with a single query?
I believe you've misunderstood the documentation, which says:
Only a single INSERT statement per collection is allowed per AQL query
You can have multiple INSERT statements per AQL query (subject to the above limitation, amongst others), and each of them can entail multiple insertions.
Here's an example of 1000 insertions executed successfully as one AQL query:
FOR n in 1..1000
INSERT {_from: "tasks/1", _to: CONCAT("tasks/", TO_STRING(n))} in depends
COLLECT WITH COUNT INTO c
RETURN c
Another approach would be to access the database with arangosh:
You can use the method collection.insert(array) to insert an array consisting of multiple documents into a collection.
Example: Insert two documents into the collection "example"
db.example.insert([{ id : "doc1" }, { id : "doc2" }]);
[
{
"_id" : "example/12156601",
"_key" : "12156601",
"_rev" : "_WEnsap6---"
},
{
"_id" : "example/12156605",
"_key" : "12156605",
"_rev" : "_WEnsap6--_"
}
]
The method is documented at: https://docs.arangodb.com/3.2/Manual/DataModeling/Documents/DocumentMethods.html
BULK Insert in AQL v3.6 is as following:
db.collection('collection_name').save([{ id : "doc1" }, { id : "doc2" }]);
as described in docs and I have tested it in my project using NodeJS.
https://www.arangodb.com/docs/stable/programs-arangosh-details.html#database-wrappers

Conditionally update an array in mongoose [duplicate]

Currently I am working on a mobile app. Basically people can post their photos and the followers can like the photos like Instagram. I use mongodb as the database. Like instagram, there might be a lot of likes for a single photos. So using a document for a single "like" with index seems not reasonable because it will waste a lot of memory. However, I'd like a user add a like quickly. So my question is how to model the "like"? Basically the data model is much similar to instagram but using Mongodb.
No matter how you structure your overall document there are basically two things you need. That is basically a property for a "count" and a "list" of those who have already posted their "like" in order to ensure there are no duplicates submitted. Here's a basic structure:
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3")
"photo": "imagename.png",
"likeCount": 0
"likes": []
}
Whatever the case, there is a unique "_id" for your "photo post" and whatever information you want, but then the other fields as mentioned. The "likes" property here is an array, and that is going to hold the unique "_id" values from the "user" objects in your system. So every "user" has their own unique identifier somewhere, either in local storage or OpenId or something, but a unique identifier. I'll stick with ObjectId for the example.
When someone submits a "like" to a post, you want to issue the following update statement:
db.photos.update(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
"likes": { "$ne": ObjectId("54bb2244a3a0f26f885be2a4") }
},
{
"$inc": { "likeCount": 1 },
"$push": { "likes": ObjectId("54bb2244a3a0f26f885be2a4") }
}
)
Now the $inc operation there will increase the value of "likeCount" by the number specified, so increase by 1. The $push operation adds the unique identifier for the user to the array in the document for future reference.
The main important thing here is to keep a record of those users who voted and what is happening in the "query" part of the statement. Apart from selecting the document to update by it's own unique "_id", the other important thing is to check that "likes" array to make sure the current voting user is not in there already.
The same is true for the reverse case or "removing" the "like":
db.photos.update(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
"likes": ObjectId("54bb2244a3a0f26f885be2a4")
},
{
"$inc": { "likeCount": -1 },
"$pull": { "likes": ObjectId("54bb2244a3a0f26f885be2a4") }
}
)
The main important thing here is the query conditions being used to make sure that no document is touched if all conditions are not met. So the count does not increase if the user had already voted or decrease if their vote was not actually present anymore at the time of the update.
Of course it is not practical to read an array with a couple of hundred entries in a document back in any other part of your application. But MongoDB has a very standard way to handle that as well:
db.photos.find(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
},
{
"photo": 1
"likeCount": 1,
"likes": {
"$elemMatch": { "$eq": ObjectId("54bb2244a3a0f26f885be2a4") }
}
}
)
This usage of $elemMatch in projection will only return the current user if they are present or just a blank array where they are not. This allows the rest of your application logic to be aware if the current user has already placed a vote or not.
That is the basic technique and may work for you as is, but you should be aware that embedded arrays should not be infinitely extended, and there is also a hard 16MB limit on BSON documents. So the concept is sound, but just cannot be used on it's own if you are expecting 1000's of "like votes" on your content. There is a concept known as "bucketing" which is discussed in some detail in this example for Hybrid Schema design that allows one solution to storing a high volume of "likes". You can look at that to use along with the basic concepts here as a way to do this at volume.

Show message owner

Help me with understanding mongodb, please.
have three collections: threads, messages and users.
thread
{ "title" : "1212", "message" : "12121", "user_id" : "50ffdfa42437e00223000001", "date" : ISODate("2013-04-11T19:48:36.878Z"), "_id" : ObjectId("51671394e5b854b042000003") }
message
{ "message" : "text", "image" : null, "thread_id" : "51671394e5b854b042000003", "user_id" : "516d08a7772d141766000001", "date" : ISODate("2013-04-17T15:58:07.021Z"), "_id" : ObjectId("516ec68fb91b762476000001") }
user
{ "user" : "admin", "date" : ISODate("2013-04-16T08:15:35.497Z"), "status" : 1, "_id" : ObjectId("516d08a7772d141766000001") }
How can I display all messages for current thread and get user name (for comment) from users collection?
this code get only messages without user name
exports.getMessages = function(id, skip, callback) {
var skip = parseInt(skip);
messages.find({thread_id: id}).sort({date: 1}).skip(skip).limit(20).toArray(
function(e, res) {
if (e) {
callback(e)}
else callback(null, res)
});
};
Node.js and mongo native
Generally, Mongo uses embedded documents or references to maintain relationships. Here is a link from the mongo docs worth reading.
What you are currently doing is storing a manual reference to the user collection within your message collection. Mongo manual references require additional queries in order to get that referenced data. In this case, using a reference based relationship will work, but it would force the N+1 query problem. Meaning you will have to make an addition query for every message you wish to display plus the original query for messages. References are explained in further detail here. One solution would be to incorporate DBRefs, which would require language specific driver support.
Another alternative would be use embedded documents. In this case you would store the related user object embedded within the messages object. Here is another link to the mongo docs with a great example. In this case, you would make a single query, which will return all of the messages, with each related user object embedded inside. Although embedded documents encourage duplicate data, in many cases they provide performance benefits. All of this information is explained in the mongo docs and can be read in detail to further understand the data modeling of mongo.
Additionally, the mongoose library is pretty awesome and has a populate function which is helpful for references.

Object mapping solution

I have this SQL Database structure. A users table, a objects table and a mapping table users_objects_map to assign objects to an user account.
In SQL it works fine. With this structure it is easy to fetch objects of an user or users assigned to an object. I also can assign an object to multiple users.
users
id
firstname
lastname
...
objects
id
...
users_objects_map
user_id
object_id
What is the best way to build this with MongoDB?
My first idea was to add an array to the users where all IDs of assign objects will stored.
{"firstname":"John", "lastname": "Doe", "object_ids":["id","id2",...,"id-n"]}
But what is if a user is assigned to thousands of objects? I don't think that's a good solution. And how I'm able to fetch all users assigned to an object or all objects assigned to an user?
Is there any clever MongoDB solution for my problem?
Using object IDs within BsonArrays as a reference towards the objects is a great way to go and also consider using BsonDocuments within the "object_ids" of the user itself, then you will be able to scale it easier and using the "_id" (ObjectID) so that MongoDB indexes those IDs, this will gain performance.
Eventually you will be having 2 collections, one is with users and the other is with objects:
user:
{
"_id" : "user_id",
"firstname" : "John",
"lastname" : "Doe",
"object_ids" : [
{ "_id" : "26548" , "futurefield" : "futurevalue" },
{ "_id" : "26564" , "futurefield" : "futurevalue" }
]
}
At this moment I really don't know what kind of objects they are going to be.. but i can give you an example:
workshop object:
{
"_id>" : "user_id",
"name" : "C# for Advanced Users",
"level" : "300",
"location" : "Amsterdam, The Netherlands",
"date" : "2013-05-08T15:00:00"
}
Now comes the fun part and that is querying.
I am developing in C# and using the driver from mongodb.org.
Example:
Give me everyone that has object id == "26564".
var query = from user in userCollection.Find(Query.EQ("objects_ids._id","26564"))
select user;
This query will return the documents, in this case the users that have matched the ID.
If you have a range of values please use: Query.All("name" , "BsonArray Values");
The second query is find and/or match the IDs of the objects ID that BsonDocuments might contain.
var secondQuery =
from workshops in objectsCollection.Find(Query.EQ("_id", "userid"))
select cust["object_ids"].AsBsonArray.ToArray();
I hope I have helped you this way.
Good luck with it!

Resources