Using default values in mongodb (mongoose) to deal with missing properties - node.js

I am using mongoose Schemas to create models for my mongodb, and I am wondering wether or not to use default values. I understand that using default values are good for createdAt and similar values, but what about in my case, where I have schemas with a bunch of properties, which potentially could lead to a lot of null pointers in the client. Should I solve this using default values on my mongoose schemas, or should I deal with this issue on the client side, or even nodejs?

My answer would be simply to avoid using default values if a schema field is unused. In my production Node/Mongo/Mongoose app, there are a number of points that help make this a good strategy:
Mongoose does not save unused fields i.e. those that don't have a default in the schema and are not set during the create operation. This saves a lot of Mongo disk space since every Mongo document stores the field name and values in a JSON document. I've seen as much as a 60% gain in disk space from using short field names and avoiding un-necessary defaults.
When you are writing code in NodeJS that deals with the database via Mongoose queries, you get Mongoose-decorated objects that can deal with presence / absence of schema properties, even if not set. Note that this is the default behavior if you do a model.find() operation. As far as the back-end code, you don't need to worry about dealing with Javascript undefined exceptions. Mongoose will help set any properties that are declared in the schema and it will also conversely ignore (and not serialize to the DB) any properties you add that are not declared.
Note that the above functionality is expensive for queries. Bottom line: if you are writing code in NodeJS to get/set props while performing create or update methods, using the Mongoose-decorated objects will deal with schema defaults etc. If you are just querying to send data back to the front-end (as is), you should use the .lean() method on your Mongoose queries - they are significantly faster.
In the front-end, fields that have no values but need defaults can be dealt with easy enough - attributes that are 1-level deep:
var someField = myMongoDoc.attrWithoutDefault || 'default value';
Attributes that are nested (e.g. myMongoDoc.attr1.subAttr1) can be tested using a library like lodash (see https://lodash.com/docs#get)

Related

Should Mongo concurrency issues be handled using Mongo functionality or a locking module?

In our mean stack application we have started running into concurrency issues in Mongo DB. Things like operations which affect array indices, as well as concurrent updates to subdocuments. We are trying to decide if it is better to use a locking node module such as rwlock to handle these problems, or to leverage the functionality provided by mongo and mongoose. I should note that the issues so far are not related to performance, but are unwanted behavior in the database.
Example: we have A route that contains an ID and a field name, and the body is a json object which will be the value for the field specified. The objects are stored in an array, and if no object with the ID is present, one will be created.
/:id/:field
In the case where we have two posts to the same object ID when that object does not yet exist, both posts will create the object, and the array will end up containing two objects, each with one field set, where it should contain one object with two fields set.
One of the reasons I am hesitating to use the database functionality (Such as the __v field) is that it requires that we use methods which bypass the mongoose middleware, and middleware is one of the reasons we chose mongoose in the first place.
I'm wondering if there are any hidden/obvious pros and cons to using one of these approaches over the other?

Rarely used values in mongoose schema

I have a users document specified in via mongoose schema, and internally I want to store a few values in the mongodb document such as flags for various things, such as "has_sent_welcome_email".
None of these flags values will ever be seen inside the web-app directly, but will have external reporting which will read them.
An example of the use case is:
User registers, we create and save new document using mongoose user
model.
We attempt to send automated email response, but our email
server errors for some reason so we set flag to indicate welcome was
never sent.
This can refer various other flags we have, but this is the sort of scenario I am referring to.
Should I store these in the mongoose user schema?
It seems a bit of a waste if they are never going to be displayed, however setting the flag seems easier if I do. Can/should I have the flags as a separate model/schema? Are there any best practices around this sort of thing?
I think its fine to store those flags in the user schema. As those flag are related to user ,the cardinality is few (for boolean is only true or false), and the most important is that easier for you to implement
You should concern about separate model/schema when you have a "one-to-many" relationship like 1 city has millions of user. Otherwise, embedded field in a schema is prefered. Because its easier to do, and you can get all data in one query
For you app, if you dont want mongo to return the extra field you can use projection (http://mongoosejs.com/docs/queries.html) to reduce to network transfer between server and your web-app to reduce network transfer between your server and web-app

MongoDB - various document types in one collection

MongoDB is schemaless, which means a collection (table in relational DB) can contain documents (rows) of different structure - having different fields, for instance.
I'm new to Mongo, so I decided to use Mongoose which should make things a bit easier. Reading the guide:
Defining your schema
Everything in Mongoose starts with a Schema. Each schema maps to a
MongoDB collection and defines the shape of the documents within that
collection.
Notice at the last sentence. Doesn't it conflict with the schemaless philosophy of MongoDB? Or maybe it's that in 99% of cases, I want a collection of documents of the same structure, so in the introductory guide only that scenario is discussed? Does Mongoose even allow me to create schemaless collection?
MongoDB does not require a schema, but that confuses a lot of people from a standard SQL background so Mongoose is aimed at trying to bridge the gap between SQL and NoSQL. If you want to maintain a collection with different document types, than by all means do not use Mongoose.
If you're okay with the schemaless nature of MongoDB there is no reason to add additional abstractions and overhead to MongoDB which is what Mongoose surely applies.
The purpose of Mongoose is to use a Schema, there are other database drivers you can use to take advantage of MongoDBs Schemaless nature such as Mongoskin.
If you want to utilize the Mongoose's Schema Design and make an exeception you can use: Mongoose Strict.
According to the docs:
The strict option, (enabled by default), ensures that values passed to our model constructor that were not specified in our schema do not get saved to the db.
NoSQL doesn't mean no schema. It means, the database doesn't control schema. For instance, with MongoDB, you can look hard to find anything that determines a field in a document is a string, or a number or a date. The database doesn't care. You could store a number in a field in one document and in another document in the same collection, and in the same field, you could store a string. But, from a coding perspective, that can become quite hairy and would be bad practice. This is why you still have to define data types. So, you still need a schema of sorts and why Mongoose offers and, in fact, enforces this functionality.
Going a conceptual level higher now, the major concept of NoSQL is to put schema inside your code and not in some file of SQL commands i.e. not telling the DB what to expect in terms of data types and schema to be controlled by the database. So, instead of needing to have migration files/paths and versioning on database schema, you just have your code. ORMs, for example, try to bridge this issue too, where they often have automated migration systems.
ORMs also try to avoid the Object Relational Impedance Mismatch problem, which MongoDB avoids completely. Well, it doesn't have relationships per se, so the problem is avoided out of necessity.
Getting back to schema, with MongoDB and Mongoose, if you or one of your team make a change to the schema in the code, all your other team members need to do to get the database to work with it is pull in that new code. Voila, the schema is up-to-date and will work. No need to also pull in a copy of the newer migration file (to determine the new schema of the DB) to then have to run it on a (copy of the) db to update it too, just to continue programming. There is no need to make changes in schema in multiple places.
So, in the end, if you can imagine your schema is always in your code (only), making changes to an application with a database persisting state like MongoDB is a good bit simpler and even safer. (Safer, because code and schema can't get out of sync, as it's the one and the same.)

Should I use mongoose if the schema is too dynamic?

What I have understood so far is that the mongoose needs us to define a schema. But what if my schema keeps changing on a per user basis. For instance, let's say there are thousands of users of mobile phones. Each user has a different kind of offer subscriptions and what nots. New offers keep coming, and he can even choose combos of offers, creating new offers on the fly. So these offers become keys holding sub documents of various other details regarding that offer. Such a schema can't be predefined. Shall I use mongoose then? Or stick to mongojs type thin-skin wrappers and forget about mongoose's ODM capabilities?
You could create a Mixed type schema where there's no restriction on the type of the data you can store. However, it comes at a trade-off. Take a look at the official documentation for info and implementation details.
From the Mongoose docs:
NOTE: Any key/val set on the instance that does not exist in your schema is always ignored, regardless of schema option.
You can set the value when the model instance is created:
var task = new Task({'otherKey', 'some value'});
You could also put the ad-hoc values under a mixed subdocument type as well.

Using and populating (real) DBRef arrays with Mongoose / mongoose-dbref

Mongoose doesn't appear to support Mongo DBRefs. Apparently they released "DBRef" support but it was actually just plain references (no ability to reference documents from different collections). I've finally managed to craft a schema that allows me to hold an array of ObjectID references and populate them, which is great for certain parts of my schema, but it would be extremely convenient if I could use proper DBRefs to create an array that lets me refer to documents from a number of collections.
Luckily(?) there's a module that can monkey patch DBRef support into mongoose: https://github.com/goulash1971/mongoose-dbref
Unluckily, I can't make any sense of the documents. The best I can tell is that there is no ability to use DBRefs in an array (there is a 'fetch' method to dereference, but it takes a single dbref); 'populate' doesn't seem to be patched to fill in DBRefs, and I can't tell how I'm supposed to assign a DBRef given a source document [collection.items.push(?????)].
From the internet, it appears that I can assign an object of the form { $id: document._id, $ref: 'Collection' } -- when logging the result, it appears to have "taken" as a DBRef data type, but I am unsure if this is correct since I cannot seem to do anything useful with it (turn the ref back into a document).
What I really want is a way to represent an ordered list of items from multiple collections; any solution to this is fine by me, but so far DBRefs are the best I've got. Help?
A DBRef (as explained in detail here) is a tuple containing the ObjectId, collection name, and possibly the database container name of a referenced object in another collection.
Internally in the MongoDB server these serve no purpose and are just data within a document. The point is for use in some drivers and ODM implementations to allow for some sort of automatic expansion by issuing additional queries to the server in order to have the data that is elsewhere appear to be an ordinary sub-document part of the referencing document. This can be automatic or a lazy load depending on the implementation, but is always done over the wire and processed on the client side. The server will do nothing to traverse or join this data.
Additionally, MongoDB collections are schemaless, so there is nothing as in the relational sense that says all documents in a collection have to have the same structure.
In the case of Mongoose, there are built in functions to do this sort of loading for you as a convenience, and while not strictly a DBRef and utilizing documents with a different schema in the same collection is the same means as storing the documents external to the referencing document.
It is important to consider the data access patterns of your application and not to simply opt for the same sort of relational design you are used to. Keeping in mind that you are only ever reading from one collection at a time, it is most desirable to get at the data you need in a single read or write, without multiple operations over the wire, which will slow things down considerably.
In short, you should always consider embedding sub-documents first, and then use external references any your best supported form only when you absolutely have to. Your application users will thank you in the end.

Resources