MongoDB Subdocument Query Performance on Large Dataset

MongoDB Subdocument Query Performance on Large Dataset - node.js

I have created a schema for Conversations in MongoDB where Messages are stored as an Array of Objects in Conversations Object.
Conversation {
company_id: { type:ObjectId, index: true },
messages: [{
_id: { type: ObjectId, index: true }
}]
}
There is a query I have in place that looks up a Conversation based on the company_id and _id of the first message sent in the array (that is being sent from another part of the application).
Conversation.findOne({ company_id: c_id, messages._id: firstMessage })
Theoretically, if a company has 100 million conversations, and each of those conversations has 1 million messages, how much of a performance issue will I have for the query of the subdocument, rather then me storing the first message id in the main Document and querying just the base object?
Conversation {
company_id: { type:ObjectId, index: true },
firstMessage_id: { type:ObjectId, index: true },
messages: [{
_id: { type: ObjectId, index: true }
}]
}
Conversation.findOne({ company_id: c_id, firstMessage_id: firstMessage })
Thanks in advance for the help.

If your individual messages are 1 million, I would keep them in a separate collection altogether and use the aggregation framework's $lookup in version 3.4 to give you the outcome. Of course I would assume that the proper indices are in place in both collections and proper filters are used to MATCH the company.

Related

Mongoose - query doc if element not in array

I am dealing with an issue while querying a notification schema
receiver: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Profile' }],
readBy: [{
readerId: { type: mongoose.Schema.Types.ObjectId,
ref: 'Profile', default: [] },
readAt: { type: Date }
}]
In order to query latest notifications, this is the query I have written:
GOAL is to check if the "profile.id" DOES NOT exist in the readBy array (which means its unread by that user)
const notifications = await Notification.find({
receiver: profile.id, // this works
readBy: { // but, adding this returns an empty array
$elemMatch: {
readerId: { $ne: profile.id }
}
}
})
Would really appreciate the help, stuck here for days.

I think is easier than we are trying to do.
If you want to know if the element is into the array, then, query looking for it. If the query is empty, it implies that not exists, otherwise yes.
Look this example.
Is a simple query, only check if one document has been received and readed by user profile.id.
db.collection.find([
{
"receiver": profile.id,
"readBy.readerId": profile.id
}
])
Please check if the output is as expected or I misunderstood your petition.
In mongoose you can use this:
var find = await model.find({"receiver":1,"readBy.readerId":1}).countDocuments()
It will return how many documents match the query.
Edit to get documents where readerId is not present in the array:
Here, you only need to add $ne operator
db.collection.find([
{
"receiver": profile.id,
"readBy.readerId": {
"$ne": profile.id
}
}
])
You can check this here

Require a mongodb db model deign. Whether to choose Refs or Embeded doc

I'm designing a backend for a Talent hunt application. Initially I designed the DB using refs. Like I followed the primary key forign key using refs to join my collections. But as the collection requirement increases it's being hard to join all the collections. Then I come across the one to many collections. So I'm thinking to get a suggestion from experts. Let me tell the requirements.
I have the following collections initially.
User.js
{
_id: "5ecfdc903165f709b49a4a14",
name: "Lijo",
email: "lijo#gmail.com"
}
then I have a category table for serving the categories
Category.js
{
_id: 5ecfdc903165f709b49a5a18,
title: "Acting",
code: "ACT"
}
If one user adds his talents to profile. I used one another collection for storing. Rmember the user can save multilple talents. So i created,
UserTalents.js
{
_id: "5ecfdc903165f709b49a6c87",
categoryId: "5ecfdc903165f709b49a5a18",
userId: "5ecfdc903165f709b49a4a14",
level: "beginner"
}
For each catgeory need to upload atleast one media along with description Soagain I created a new collection for that.
Media.js
{
_id: "5ecfdc903165f709b49a8a14",
talentId: "5ecfdc903165f709b49a6c87",
userId: "5ecfdc903165f709b49a4a14"
media: "5ecfdc903165f709b49a4a14_1.jpg"
}
And I need to have these users connected. For that craeted.
Friends.js
{
_id: "5ecfdc903165f709b49a8a18",
sender: "5ecfdc903165f709b49a4a14",
receiver: "5ecfdc903165f709b49a4a15"
status: "accepted"
}
Is this good to continue??? Expecting a huge amount of users. Or can I follow like:
User.js
{
_id: "5ecfdc903165f709b49a4a14",
name: "Lijo",
email: "lijo#gmail.com",
talents: [
{
_id: "5ecfdc903165f709b49a5a18", // _id from Category
title: "Acting",
code: "ACT",
level: "beginner",
media: "5ecfdc903165f709b49a4a14_1.jpg"
}
],
friends: [
{
_id: "5ecfdc903165f709b49a4a15",
name: "Test",
status: "approved"
}
]
}
I I follow this, then how do I update the name fileds in talents and friends array either one of its original name is changed?
Which is better approach?

Query/sort reference of reference in mongoose

Hopefully I can explain this well.
I have 3 Model types in play here: Users, Products, and Stores. What I'm after is a sorted list of Stores, per user, based on how many Products they've added from that Store. So basically "show me this User's top Stores".
pseudo-schemas:
var User = {
name: String
};
var Store = {
name: String
};
var Product = {
title: String,
user: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User',
}
store: {
type: mongoose.Schema.Types.ObjectId,
ref: 'Store'
}
};
So how can I find which Stores the User has added the most Products to? This may be obvious, it's late. :-P
Thanks!

You can try to use Aggregation framework to solve it.
And especially $group pipeline:
// aggregate whole `Product` collection
Product.aggregate([
// count products by `user` and `store` and save result to `products_count`
{$group: {
_id: {user_id:"$user", store_id:"$store"},
products_count: {$sum: 1}
}},
// sort by most products count
{$sort: {products_count: -1}}
])
There are also $limit and $skip pipelines, that help to paginate all this stuff.

How to find documents by some conditions for its linked documents

I am a newby in MongoDB and I have a problem when querying a linked documents of the some documents collection.
Here is my database scheme:
var tagScheme = Schema({
name: { type: String, required: true }
});
tagScheme.index({ name: 1 }, { unique: true });
var linkScheme = Schema({
name: { type: String },
tags: [{ type: Schema.Types.ObjectId, ref: 'Tag' }]
});
linkScheme.index({ name: 1 }, { unique: true });
I need to get a count of the appropriate links for the specified tag. I try to execute the following query:
dbschemes.Link.find({ 'tags.name': specifiedTagName }, function (err, links) {
return res.send(500, err);
alert(links.length);
});
This query works not properly: it always returns an empty links list. Could someone exlain me what the problem is?

As JohnnyHK commented, the type of query you want to do is a relational type query and document database such as mongodb simply do not support them. Fix your schema to put that tag data directly in the link schema (nesting or "denormalizing" from a relational standpoint, which is OK in this case), then you can query it:
var LinkSchema = new Schema({
name: String,
tags: [String]
});
With that Schema, your query will work as you expect.
To address the comments below. This is a document database. It's not relational. There are trade-offs. Your data is de-normalized and it gives you some scalability and performance and you trade off flexibility of queries and data consistency to get them. If you wanted to rename a tag, a relatively rare occurrence, you'd have to do a whopping 2 database commands (a $push of the new name then a $pull of the old name) as opposed to relation where a single update command would do it.

How to sort array of embedded documents via Mongoose query?

I'm building a node.js application with Mongoose and have a problem related to sorting embedded documents. Here's the schema I use:
var locationSchema = new Schema({
lat: { type: String, required: true },
lon: { type: String, required: true },
time: { type: Date, required: true },
acc: { type: String }
})
var locationsSchema = new Schema({
userId: { type: ObjectId },
source: { type: ObjectId, required: true },
locations: [ locationSchema ]
});
I'd like to output the locations embedded in the userLocations documented sorted by their time attribute. I currently do the sorting in JavaScript after I retrieved the data from MongoDb like so:
function locationsDescendingTimeOrder(loc1, loc2) {
return loc2.time.getTime() - loc1.time.getTime()
}
LocationsModel.findOne({ userId: theUserId }, function(err, userLocations) {
userLocations.locations.sort(locationsDescendingTimeOrder).forEach(function(location) {
console.log('location: ' + location.time);
}
});
I did read about the sorting API provided by Mongoose but I couldn't figure out if it can be used for sorting arrays of embedded documents and if yes, if it is a sensible approach and how to apply it to this problem. Can anyone help me out here, please?
Thanks in advance and cheers,
Georg

You're doing it the right way, Georg. Your other options are either to sort locations by time upon embedding in the first place, or going the more traditional non-embedded route (or minimally embedded route so that you may be embedding an array of ids or something but you're actually querying the locations separately).

This also can be done using mongoose sort API as well.
LocationsModel.findOne({ userId: theUserId })
// .sort({ "locations.time": "desc" }) // option 1
.sort("-locations.time") // option 2
.exec((err, result) => {
// compute fetched data
})
Sort by field in nested array with Mongoose.js
More methods are mentioned in this answer as well
Sorting Options in mogoose
Mongoose Sort API

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

MongoDB Subdocument Query Performance on Large Dataset - node.js

Related

Mongoose - query doc if element not in array

Require a mongodb db model deign. Whether to choose Refs or Embeded doc

Query/sort reference of reference in mongoose

How to find documents by some conditions for its linked documents

How to sort array of embedded documents via Mongoose query?

Categories

Resources