How to expose MongoDB documents primary keys in a REST API? - node.js

I am building a REST API with MongoDB + nodeJS. All the documents are stored and are using _id as the primary key. I've read here that we should not expose the _id and we should use another ID which is not incremental.
In the DB, a document is represented as:
{
_id: ObjectId("5d2399b83e9148db977859ea")
bookName: "My book"
}
For the following the endpoints, how should the documents be exposed?
GET /books
GET /books/{bookId}
Currently my API returns:
{
_id: "5d2399b83e9148db977859ea"
bookName: "My book"
}
but should it instead return something like:
{
id: "some-unique-id-generated-on-creation"
bookName: "My book"
}
Questions
Should I expose the _id so that one can make queries such as:
GET /books/5d2399b83e9148db977859ea
Should I use a UUID for my ID instead of ObjectId?
Should I keep the internal _id (but never expose it) and create another attribute id which would use UUID or another custom generated ID ?
Is it a good practice to work with _id in my backend or should I only make queries using my own custom ID? Example: find({ id: }) instead of find({ _id: })

To answer your questions.
You can expose _id so that authenticated users can make queries like GET, PUT and PATCH on that _id.
MongoDB has support that allows you to generate your own BSON ID and use it, instead of mongodb created it's own _id during the insert.
There is no need of duplicating logic, the main purpose of _id is to identify each document separately and having two id columns means you are storing redundant data, follow DRY (DO NOT REPEAT YOURSELF) principle wherever possible.
It's not a bad practice to work with _id in your backend.
Hope this helps!

Given you're using Mongoose, you can use 'virtuals', which are essentially fake fields that Mongoose creates. They're not stored in the DB, they just get populated at run time:
// Duplicate the ID field.
Schema.virtual('id').get(function(){
return this._id.toHexString();
});
// Ensure virtual fields are serialised.
Schema.set('toJSON', {
virtuals: true
});
Any time toJSON is called on the Model you create from this Schema, it will include an 'id' field that matches the _id field Mongo generates. Likewise you can set the behaviour for toObject in the same way.
You can refer the following docs:
1) https://mongoosejs.com/docs/api.html
2) toObject method

In my case, whether it's a security risk or not, but my _id is a concatenation of any of the fields in my Document that are semantically considered as keys, i.e. if i have First Name, Last Name, and Email as my identifier, and a fourth field such as Age as attribute, then _id would be concatenation of all these 3 fields. It would not be difficult to get and update such record as long as I have First Name, Last Name and email information available

Related

define primary key myself and prevent creating the _id in mongodb

I am two questions:
1- I want get by Id from database but operation is based on the key that is made automatically by MangoDB e.g _id. I want to search based on a field that I created myself e.g: id. for this porpouse I used following :
app.get('/:id',async(req,res)=>{
try{
const get=await product.findById({id:req.params.id});
res.json(get);
}
catch(err){
res.send(err);
}
});
and get following output:
{"stringValue":"\"{ id: '1' }\"","valueType":"Object","kind":"Number","value":{"id":"1"},"path":"_id","reason":{"generatedMessage":true,"code":"ERR_ASSERTION","actual":false,"expected":true,"operator":"=="},"name":"CastError","message":"Cast to Number failed for value \"{ id: '1' }\" (type Object) at path \"_id\" for model \"product\""}
when creating the database the fied _id is created by mongodb and know as key. What command should I use to prevent this field from being created? how do i define key field myself in database or select my own definition field as the primary key on which to search?
If you want to do that, you should use find method:
const get = await product.find({id:req.params.id});
I'm not sure, but I think that is not possible to avoid the creation of field _id in mongodb, although you can avoid the ObjectId type if you specify the field _id with a custom value (should be unique) when you insert the document.
But if you want to use a custom id value, you will need to use mongoose find method by you custom id as I indicated before. Since your custom id field should be unique you can use findOne in order to improve the performance of the query.

MongoDb integrating with external db

I have a database which contains data from two separate systems/servers. The first is generated locally [I develop and create this data] (users, activity logs, orders, ...). The second comes from a "product provider" [I only have READ access from API] These objects were created by MySQL and sent in JSON. They already have an "id" property.
With NodeJS, I use request to get a product by "id", and then store it with newProduct.save() appends an _id.
In products, "id" is necessary form relationships with the other collections in my database (such as products_price), and access dynamic endpoints, such as "products/:id/promos".
Note that products are constantly being updated externally and I need to be able to update my documents by "id" not by "_id" as the external server has no knowledge about "_id." [id is unique on a collection level, as each collection is a fresh iteration]
For my first question: should I treat "product.id" as a "regular" MongoDB field and use aggregate/lookup to merge documents from my collections? Or should I overwrite ObjectID() with id? (before saving rename "id" to "_id")
At some point, Orders (local) and Products (external) need to form a relationship where Order _id and Product id (or _id) are stored together for easy retrieval.
Which id do I use in this case?
if you are pretty sure that 'id' coming from your product provider API is unique you better use that as _id (overwrite _id), it will save you:
an unneeded index ('_id' is indexed any way)
some CPU cycles that mongoDB would take to produce the ObjectID
some disk and memory space
(*) even if you find yourself dealing with many different product providers, assuming its one is using his own unique product id you could use a combined _id to make it unique as:
_id = {provider: 'foo', id: xxx}
or _id = [provider_name, product_id]
or _id = provider_name + product_id
etc. etc.
in this use case of multiple providers format depends on how you plan to fetch those products later.

"Right" way to keep API db tables in sync

It's my first time creating an application solo (back-end, front-end, design) and I could use some guidance on my back-end.
Let's say there is a table of Users, Jobs, and Applications (Job Applications). Right now I have it structured like so:
UserSchema {
// other attributes
_id: String,
application_ids: [String] // array of String id's
}
JobSchema {
// other attributes
_id: String,
application_ids: [String]
}
ApplicationSchema {
// other attributes
_id: String,
applicant_id: String, // the _id of a User
job_id: String // the _id of a Job
}
My current plan is like when a new Application is created (via POST /api/applications where I send the User's _id and Job's _id) I would then set the Application's applicant_id to the User's _id and then update the User's application_ids array as well as the Job's application_ids array.
Question: Am I going about this in a reasonable manner or am I touching too many tables for one POST request? There are other tables/schemas in my application that will follow a similar relationship structure. Then there's the matter of deleting Applications and then having to update application_ids again and etc, etc but that's another matter.
Note: I am using Express.js and Mongoose.js, if that helps
No, you shouldn't do it this way. By storing the ID of the user and job in the application, you can use a query to get all the applications by user or all applications for a given job. No need to touch both.
If you really want to have the relationship on both sides, at least set it up as an ObjectId and use the "ref" declaration. Check out the populate docs in the mongoose docs.

Sub documents vs Mongoose population

I have the following senario:
A user can login to a website. A user can add/delete the poll(a question with two options). Any user can give there opinion on the poll by selecting anyone of the options.
Considering the above scenario I have three models - Users Polls Options . They are as follows, in order of dependency:
Option Schema
var optionSchema = new Schema({
optionName : {
type : String,
required : true,
},
optionCount : {
type : Number,
default : 0
}
});
Poll Schema
var pollSchema = new Schema({
question : {
type : String,
required : true
},
options : [optionSchema]
});
User Schema: parent schema
var usersSchema = new Schema({
username : {
type : String,
required : true
},
email : {
type : String,
required : true,
unique : true
},
password : String,
polls : [pollSchema]
});
How do I implement the above relation between those documents. What exaclty is mongoose population? How is it different from subdocuments ? Should I go for subdocuments or should I use Mongoose population.
As MongoDb hasn't got joins as relational databases, so population is a something like hidden join. It just means that when you have that User model and you will populate Poll Model, mongoose will do something like this:
fetch User
fetch related Polls, by ObjectIds which are stored in User document
put fetched Polls documents into User document
And when you will set User as document and Polls as subdocument, it will just mean that you will put whole data in single document. At one side it means that to fetch User Polls, mongoose doesn't need to run two queries(it need to fetch only User document, because Polls data is already there).
But what is better to choose? It just depends of the case.
If your Polls document will refer in another documents (you need access to Polls from documents User, A, B, C - it could be better to populate it, but not for sure. The advantage of populating is fact, that when you will need to change some Polls fields, you don't need to change that data in every document which is referring to that Polls document(as it will be a subdocument) - in that case in document User, A, B, C - you will only update Polls document. As you see it's nice. I told that it's not sure if populating will be better in that case, because I don't know how you need to retrieve your Polls data. If you store you data in wrong way, you will get performance issues or have some problems in easy data fetch.
Subdocuments are the basic way of storing data. It's great when Polls will be only referring to User. There is performance advantage - mongoose need to do one query instead of two as in population and there is no previously reminded update disadvantage, because you store Polls data only in single place, so there is no need to update other documents.
Basically MongoDb was created to mostly use Subdocuments. As the matter of fact, it's just non-relational database. So in most cases I prefer to use subdocuments. I can't answer which way will be better in your case, because I'm not sure how your DB looks like(in a full way) and how you want to retrieve your data.
There is some useful info in official documentation:
http://mongoosejs.com/docs/subdocs.html
http://mongoosejs.com/docs/populate.html
Take a look on that.
Edit
As I prefer to fetch data easily, take care about performance and know that data redundancy in MongoDb is something common, I will choose to store this data as subdocuments.

Mongoose.js: is it possible to change name of ObjectId?

Some question about mongo ObjectId in mongoose
1) Can be ObjectId field by named not as _id? And How to do that? When I do in my code:
MySchema = new mongoose.Schema({
id : mongoose.Schema.ObjectId
});
it changes nothing.
2) If I have objectId field called _id is it possible to return from request another name for this field (for example just "id" - to send it on the in web response);
3) And question just for understanding: why is the ObjectId _id field accessible through "id" property not "_id"?
Thanks, Alex
The "_id" element is part of the mongodb architecture which guarantee that every document in a collection can be uniquely identified. This is especially important if you use sharding to allow unique identifier across disparate machine. Therefore this is a design choice so there is no way to get ride of it :)
The default value for _id are generated as follows:
timestamp
hash of the machine hostname
pid of the generating process
increment
but you can use whatever value you want as long is unique.
If it's easier for you think about the _id of something which has to be there, but you really don't care about :) Just leave the system to auto generate it and use your own identifier.
So if you still wanna create your own "id" execute something like that:
db.mySchema.ensureIndex({"id": 1}, {"unique" : true})
but make sure that is really unique and it doesn't conflict with the API you use.
2) Rename it on the application side, just before sending it as the web response.
3) I think this is because of the API you use. Maybe the author found it more logical to return the id instead of _id ? Honestly never tried mongoose :)

Resources