Embed document without knowing _id in Mongoose/Mongodb - node.js

I know there are no "joins" in MongoDB. I'm attempting to link a large number of documents to the 40,000+ locations in my locations collection.
My locations collection has custom (read: not under my control) identifiers for locations and their corresponding lat/lng coordinates.
var Locations = new Schema({
location_id: String,
loc: { //lng, lat: as per mongodb documents
type: [Number],
index: '2d'
}
});
There are several collections that have a field referencing this custom identifier to match latitude and longitude.
var MyCollection = new Schema({
location: String,
otherFields: Strings...
});
I'm a little lost on how to best go about this. A lot of posts suggest linking via Schema, but I've only seen that with an Schema.Types.ObjectId. This seems impractical for me because the data I'm importing only have the custom identifier.
Could I perhaps add another field into MyCollection and find the correct _id of the location to link to while I'm uploading data. If so, can someone point me in the right direction for accomplishing this.
Map reduce could be used somehow perhaps? I'm still a bit novice with Mongo.
Tried
I did try loading up the entirety of the location data into a JS object then checking that object against the return object from my other query, injecting the matching location data into my return object. This works but is unbearably slow.

First, just for the record: MongoDb will still generate an _id property for each object you store.
1. "[...], if the mongod receives a document to insert that does not contain an _id field, mongod will add the _id field that holds an ObjectId. [...]"
Source
You wrote that location_id is not under your control. And you want to use location_id because the other collections are using it also? So you don't want to break the standard in your project which is good.
As I see, you already have the location property in MyCollection and can store the location_id there.
As far as I know, you have to write your own linking methods now. You have to store the location_id in MyCollection and load the Location if you want to access if via MyCollection by
Locations.find({location_id: <the_location_id>})
But maybe your main problem is that you cannot find the Locations you are looking for in a reliable time?
I don't know your criteria for finding Locations for MyCollection is. If it is proximity of coordinates then you can reduce the amount of locations to check by filtering out the ones you really don't need to check. Then you don't have to check all the 40.000 Locations, but maybe just 100? In the following I assume it is the proximity.
Do you have lat, lon in both collections (Locations, MyCollection) ?
If so, you can define a query which gets locations around your MyCollection object (square). Then you will receive a smaller amount of Locations from MongoDb. Now you can apply your more complex check which checks if they really belong to your MyCollection-object.
something like this:
Locations.find({lat: {$gt: <x>-a, $lt: <x>+a}, lon: {$gt: <y>-b, $lt: <y>+b}}, function(locations){ ... });
I hope it helps.

Related

Trying to understand mongodb indexes for finding documents with exact and unique value(s)

I am reading through mongo docs fro nodejs driver, particularly this index section https://www.mongodb.com/docs/drivers/node/current/fundamentals/indexes/#geospatial-indexes and it looks like all of the indexes that they mention are for sortable / searchable data. So I wanted to ask if I need indexes for following use case:
I have this user document structure
{
email: string,
version: number,
otherData: ...
}
As far as I understand I can query each user by _id and this already has default unique index applied to it? I alos want to query user by email as well, so I created following unique index
collection.createIndex({ email: 1 }, { unique: true })
Is my understanding correct here that by creating this index I guarantee thaa:
Email is always unique
My queries like collection.findOne({email: 'my#email.com'}) are optimised?
Next, I want to perform update operations on user documents, but only on specific versions, so:
collection.updateOne({email: '...', version: 2}, update)
What index do I need to create in order to optimise this query? Should I be somehow looking into compound indexes for this as I am now using email and version?
Yes, the unique constraint happens at the db layer so by definition this will be unique, It is worth mentioning that this can affect insert/update performance as this check has to be executed on each of these instances - from my experience you only start feeling this overhead in larger scale ( hundreds of millions of documents in a single collection + thousands of inserts a minutes ).
Yes. there is no other way to optimize this further.
What index do I need to create in order to optimise this query? Should I be somehow looking into compound indexes for this as I am now using email and version?
You want to create a compound index, the syntax will looks like this:
collection.createIndex({ email: 1, version: 1 }, { unique: true })
I will just say that by definition the (first) email index ensures uniqueness, so any additional filtering you add to the query and index will not really affect anything as there will always be only 1 of those emails in the DB. Basically why bother adding a "version" field to the query? if you need it for filtering that's fine but then you won't be needing to alter the existing index.

Can mongoose batch update based on an array of objects that matches the collection?

I am working on a project in Express/Node, and I am utilizing a MongoDB database that has a collection of Course documents that represent a course in my school system that changes in real-time. The Course documents in my database each look like this:
Course Document
{
courseID: Number,
restrictions: String,
status: String,
}
My program has to check for changes in the school's course system, and update any changes that it sees and updates my private MongoDB database with the changes. To accomplish this, I currently have a script that looks at all the courses in the school system, and records them in an array of objects, with each object corresponding to a course.
var allCourses =
[
{
courseID: 123456,
restrictions: "A and B",
status: "OPEN"
},
{
courseID: 678990,
restrictions: "A",
status: "FULL",
}
]
The goal now is to be able to go through my database, and skip the documents that are the same as the corresponding javascript object in the array, and update those that are not.
Obviously, I could just iterate through my array with forEach, and update every single course by filtering by 'courseID' and updating both fields one document at a time, but I can foresee that this would take a large amount of time.
I was wondering if there was a batch update function, similar to the insertMany operation, that can take my array of objects and update my database documents that correspond to an object within the array?
These are helpful links
Trying to do a bulk upsert with Mongoose. What's the cleanest way to do this?
https://docs.mongodb.com/manual/reference/method/db.collection.insertMany/

Compare two collections in MongoDb and remove common

I have three collections in MongoDB
achievements
students
student_achievements
achievements is a list of achievements a students can achieve in an academic year while
students collections hold data list of students in the school.
student_achievements holds documents where each documents contains studentId & achievementId.
I have an interface where i use select2 multiselect to allocate one or more achievements from achievements to students from students and save it to their collection student_achievements, right now to do this i populate select2 with available achievements from database. I have also made an arrangement where if a student is being allocated same achievement again the system throws an error.
what i am trying to achieve is if an achievement is allocated to student that shouldn't be available in the list or removed while fetching the list w.r.t student id,
what function in mongodb or its aggregate framework can i use to achieve this i.e to compare to collections and remove out the common.
Perhaps your data-structure could be made different to make the problem easier to solve. MongoDB is a NoSQL schemaless store, don't try to make it be like a relational database.
Perhaps we could do something like this:
var StudentSchmea = new Schema({
name: String,
achievements: [{ type: Schema.Types.ObjectId, ref: 'Achivement' }]
});
Then you can do something like this which will only add the value if it is unique to the achievements array:
db.student.update(
{ _id: 1 },
{ $addToSet: { achievements: <achivement id> } }
)
If you are using something like Mongoose you can also write your own middleware to remove orphaned docs:
AchivementSchema.post('remove', function(next) {
// Remove all references to achievements in Student schema
});
Also, if you need to verify that the achievement exists before adding it to the set, you can do a findOne query before updating/inserting to verify.
Even with the post remove hook in place, there are certain cases where you will end up with orphaned relationships potentially. The best thing to do for those situations is to have a regularly run cron task to to do cleanup when needed. These are some of the tradeoffs you encounter when using a NoSQL store.

query filter for mongodb using node js

I have two collection one is questions which stores _id, title, options, result, feedback and second is a child in the child I have store question_id, score. And I have filter the _id from questions collection. I don't know how I do this, Is it possible can we set the query for this. so that next time when I find the question from questions collection it sends filtered question. Means Return only that question from questions collection which id not same as the second collection child qustion_id.
This is my first collection where I have store questions, _id title option result feedback
_id:{type:String},
title:{type:String, required:true},
options:{type:Array, required:true},
result:{type:Array, required:true},
feedback:{type:String}
This is my Second collection where I have store attempted question_id and score
quiz:[
{
questionId:{
type:mongoose.Schema.Types.ObjectId,
ref: 'Question',
index: true
},
score:{type:Number},
time:{type:String}
}
]
This is not exactly I just create an example
var query = {}
firstcollection.find($and[{_id:},{secondcollection question_id:}]},function(err, data){
so that filter data means filter _id will store in data.
and I send this data to the frontend
res.send(data);
});
The main problem is conceptual, you are trying to work with mongodb, which is document store in RDBMS style. Under the community pressure Mondo added some minimal join functionality in latest version, but it doesn't make it relational DB.
There is no good way to perform such query. The idea behind document store is simple - you do have collection of documents and you query this collection, and only this collection. All link between collections are "virtual" and only provided by code logic, with no support from DB engine.
So all you can do with mongo is: query first collection for ids (with appropriate projection, to fetch ids only), store answer to some array and then perform second query to other collection using this array.

Mongoose find in referenced document properties

I'm going nuts on a query to find a match based on referenced document properties.
I've defined my schema like this:
mongoose.model('Route', new mongoose.Schema({
user: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User'
}
}));
mongoose.model('Match', new mongoose.Schema({
route: {
type: mongoose.Schema.Types.ObjectId,
ref: 'Route'
}
}));
So when I'm searching for a route from a specific user in the Match model, I'd do something like (also tried without the '_id' property):
match.find({'route.user._id': '53a821577a24cbb86cd290d0'}, function(err, docs){});
But unfortunately it doesn't give me any results. I've also tried to populate the model:
match.find({'route.user._id': '53a821577a24cbb86cd290d0'}).populate('route').exec(function(err, docs){});
But this doesn't make a difference. The solutions I'm aware of (but don't think they're the neatest):
Querying all the results and iterate through them, filtering by code
Saving the nested documents as an array (so not a reference) inside the route model
Anyone suggestions? Many thanks in advance!
Related questions (but not a working solution offered):
Mongodb/Mongoose in Node.js. Finding by id of the nested document
Use mongoose to find by a nested documents properties
I'm going nuts on a query to find a match based on nested document properties
You don't have nested documents. You have references, which are just IDs that point to documents that reside in other collections. This is your basic disconnect. If you really had nested documents, your query would match.
You are encountering the "mongodb does not do joins" situation. Each MongoDB query can search the documents within one and only one collection. Your "match" model points to a route, and the route points to a user, but the match does not directly know about the user, so your schema does not support the query you want to do. You can search the "routes" collection first and use the result of that query to find the corresponding "match" document, or you can de-normalize your schema and store both the routeId and userId directly on the match document, in which case you could then use a single query.
Based on your question title, it seems like you want nested documents but you are defining them in mongoose as references instead of real nested schemas. Use full nested schemas and fix your data, then your queries should start matching.

Resources