Rarely used values in mongoose schema - node.js

I have a users document specified in via mongoose schema, and internally I want to store a few values in the mongodb document such as flags for various things, such as "has_sent_welcome_email".
None of these flags values will ever be seen inside the web-app directly, but will have external reporting which will read them.
An example of the use case is:
User registers, we create and save new document using mongoose user
model.
We attempt to send automated email response, but our email
server errors for some reason so we set flag to indicate welcome was
never sent.
This can refer various other flags we have, but this is the sort of scenario I am referring to.
Should I store these in the mongoose user schema?
It seems a bit of a waste if they are never going to be displayed, however setting the flag seems easier if I do. Can/should I have the flags as a separate model/schema? Are there any best practices around this sort of thing?

I think its fine to store those flags in the user schema. As those flag are related to user ,the cardinality is few (for boolean is only true or false), and the most important is that easier for you to implement
You should concern about separate model/schema when you have a "one-to-many" relationship like 1 city has millions of user. Otherwise, embedded field in a schema is prefered. Because its easier to do, and you can get all data in one query
For you app, if you dont want mongo to return the extra field you can use projection (http://mongoosejs.com/docs/queries.html) to reduce to network transfer between server and your web-app to reduce network transfer between your server and web-app

Related

MongoDB, how to manage user related records

I'm currently trying to learn Node.js and Mongoodb by building the server side of a web application which should manage insurance documents for the insurance agent.
So let's say i'm the user, I sign in, then I start to add my customers and their insurances.
So I have 2 collection related, Customers and Insurances.
I have one more collection to store the users login data, let's call it Users.
I don't want the new users to see and modify the customers and the insurances of other users.
How can I "divide" every user related record, so that each user can work only with his data?
I figured out I can actually add to every record, the _id of the one user who created the record.
For example I login as myself, I got my Id "001", I could add one field with this value in every customer and insurance.
In that way I could filter every query with this code.
Would it be a good idea? In my opinion this filtering is a waste of processing power for mongoDB.
If someone has any idea of a solution, or even a link to an article about it, it would be helpful.
Thank you.
This is more a general permissions problem than just a MongoDB question. Also, without knowing more about your schemas it's hard to give specific advice.
However, here are some approaches:
1) Embed sub-documents
Since MongoDB is a document store allowing you to store arbitrary JSON-like objects, you could simply store the customers and licenses wholly inside each user object. That way querying for a user would return their customers and licenses as well.
2) Denormalise
Common practice for NoSQL databases is to denormalise related data (ie. duplicate the data). This might include embedding a sub-document that is a partial representation of your customers/licenses/whatever inside your user document. This has the similar benefit to the above solution in that it eliminates additional queries for sub-documents. It also has the same drawbacks of requiring more care to be taken for preserving data integrity.
3) Reference with foreign key
This is a more traditionally relational approach, and is basically what you're suggesting in your question. Depending on whether you want the reference to be bi-directional (both documents reference each other) or uni-directional (one document references the other) you can either store the user's ID as a foreign user_id field, or store an array of customer_ids and insurance_ids in the user document. In relational parlance this is sometimes described to as "has many" or "belongs to" (the user has many customers, the customer belongs to a user).

Using default values in mongodb (mongoose) to deal with missing properties

I am using mongoose Schemas to create models for my mongodb, and I am wondering wether or not to use default values. I understand that using default values are good for createdAt and similar values, but what about in my case, where I have schemas with a bunch of properties, which potentially could lead to a lot of null pointers in the client. Should I solve this using default values on my mongoose schemas, or should I deal with this issue on the client side, or even nodejs?
My answer would be simply to avoid using default values if a schema field is unused. In my production Node/Mongo/Mongoose app, there are a number of points that help make this a good strategy:
Mongoose does not save unused fields i.e. those that don't have a default in the schema and are not set during the create operation. This saves a lot of Mongo disk space since every Mongo document stores the field name and values in a JSON document. I've seen as much as a 60% gain in disk space from using short field names and avoiding un-necessary defaults.
When you are writing code in NodeJS that deals with the database via Mongoose queries, you get Mongoose-decorated objects that can deal with presence / absence of schema properties, even if not set. Note that this is the default behavior if you do a model.find() operation. As far as the back-end code, you don't need to worry about dealing with Javascript undefined exceptions. Mongoose will help set any properties that are declared in the schema and it will also conversely ignore (and not serialize to the DB) any properties you add that are not declared.
Note that the above functionality is expensive for queries. Bottom line: if you are writing code in NodeJS to get/set props while performing create or update methods, using the Mongoose-decorated objects will deal with schema defaults etc. If you are just querying to send data back to the front-end (as is), you should use the .lean() method on your Mongoose queries - they are significantly faster.
In the front-end, fields that have no values but need defaults can be dealt with easy enough - attributes that are 1-level deep:
var someField = myMongoDoc.attrWithoutDefault || 'default value';
Attributes that are nested (e.g. myMongoDoc.attr1.subAttr1) can be tested using a library like lodash (see https://lodash.com/docs#get)

Structuring Session Data in MongoDB

This might be a bad title, but I was having trouble thinking of a good way to phrase my problem. Basically, I have a NodeJS application that has session management. Each session interacts with a set of data independent from the other sessions. I am having trouble coming up with a way to structure this in MongoDB. Things I have thought of:
Currently I'm storing a list of JSON "pages" that each have an ID corresponding to the session using it. I am almost positive this will not scale well though, because these "pages" will be read and updated frequently, so if I'm connected to Session1000, I'm going to have to search through 1000 items looking for the correct ID every time I update something from that session. If 1000 people are doing that roughly once a second, well...
Ideally I would like to store each session in a different collection, but the sessions need to be created and referenced dynamically, and I can't find a way in MongoDB to access a collection without hard-coding the name.
Hopefully this accurately describes my problem. Does anyone have any ideas to help me structure the db so that accessing/updating will give fast performance/scalability?

CouchDB - human readable id

Im using CouchDB with node.js. Right now there is one node involved and even in remote future its not planned to changed that. While I can remove most of the cases where a short and auto-incremental-like (it can be sparse but not like random) ID is required there remains one place where the users actually needs to enter the ID of a product. I'd like to keep this ID as short as possible and in a more human readable format than something like '4ab234acde242349b' as it sometimes has to be typed by hand and so on.
However in the database it can be stored with whatever ID pleases CouchDB (using the default auto generated UUID) but it should be possible to give it a number that can be used to identify it as well. What I have thought about is creating a document that consists of an array with all the UUIDs from CouchDB. When in node I create a new product I would run an update handler that updates said document with the new unique ID at the end. To obtain the products ID I'd then query the array and client side using indexOf I could get the index as a short ID.
I dont know if this is feasible. From the performance point of view I can say the following: There are more queries that should do numerical ID -> uuid than uuid -> numerical ID. There will be at max 7000 new entries a year in the database. Also there is no use case where a product can be deleted yet I'd like not to rely on that.
Are there any other applicable ways to genereate a shorter and more human readable ID that can be associated with my document?
/EDIT
From a technical point of view: It seems to be working. I can do both conversions number <-> uuid and it seems go well. I dont now if this works well with replication and stuff but as there is said array i guess it should, right?
You have two choices here:
Set your human readable id as _id field. Basically you can just set in create document calls to DB, and it will accept it. This can be a more lightweight solution, but it comes with some limitations:
It has to be unique. You should also be careful about clients trying to create documents, but instead overwrite existing ones.
It can only contain alphanumeric or a few special characters. In my experience it is asking for trouble to have extra character types.
It cannot be longer than a theoretical string length limit(Couchdb doesn't define any, but you should). Long ids will increase your views(indexes) size really bad. And it might make it s lower.
If these things are no problem with you, then you should go with this solution.
As you said yourself, let the _id be a UUID, and set the human readable id to another field. To reach the document by the human readable id, you can just create a view emitting the human readable id as a key, and then either emit the document as value or get the document via include_docs=true option. Whenever the view is reached Couchdb will update the view incrementally and return you the list. This is really same as you creating a document with an array/object of ids inside it. Except with using a couchdb view, you get more performance.
This might be also slightly slower on querying and inserting. If the ids are inserted sequentially, it's fine, if not, CouchDB will slightly take more time to insert it at the right place. These don't work well with huge amounts of insert coming at the DB.
Querying shouldn't be more than 10% of total query time longer than first option. I think 10% is really a big number. It will be most probably less than 5%, I remember in my CouchDB application, I switched from reading by _id to reading from a view by a key and the slow down was very little that from user end point, when making 100 queries at the same time, it wasn't noticeable.
This is how people, query documents by other fields than id, for example querying a user document with email, when the user is logging in.
If you don't know how couchdb views work, you should read the views chapter of couchdb definite guide book.
Also make sure you stay away from documents with huge arrays inside them. I think CouchDB, has a limit of 4GB per document. I remember having many documents and it had really long querying times because the view had to iterate on each array item. In the end for each array item, instead I created one document. It was way faster.

Using and populating (real) DBRef arrays with Mongoose / mongoose-dbref

Mongoose doesn't appear to support Mongo DBRefs. Apparently they released "DBRef" support but it was actually just plain references (no ability to reference documents from different collections). I've finally managed to craft a schema that allows me to hold an array of ObjectID references and populate them, which is great for certain parts of my schema, but it would be extremely convenient if I could use proper DBRefs to create an array that lets me refer to documents from a number of collections.
Luckily(?) there's a module that can monkey patch DBRef support into mongoose: https://github.com/goulash1971/mongoose-dbref
Unluckily, I can't make any sense of the documents. The best I can tell is that there is no ability to use DBRefs in an array (there is a 'fetch' method to dereference, but it takes a single dbref); 'populate' doesn't seem to be patched to fill in DBRefs, and I can't tell how I'm supposed to assign a DBRef given a source document [collection.items.push(?????)].
From the internet, it appears that I can assign an object of the form { $id: document._id, $ref: 'Collection' } -- when logging the result, it appears to have "taken" as a DBRef data type, but I am unsure if this is correct since I cannot seem to do anything useful with it (turn the ref back into a document).
What I really want is a way to represent an ordered list of items from multiple collections; any solution to this is fine by me, but so far DBRefs are the best I've got. Help?
A DBRef (as explained in detail here) is a tuple containing the ObjectId, collection name, and possibly the database container name of a referenced object in another collection.
Internally in the MongoDB server these serve no purpose and are just data within a document. The point is for use in some drivers and ODM implementations to allow for some sort of automatic expansion by issuing additional queries to the server in order to have the data that is elsewhere appear to be an ordinary sub-document part of the referencing document. This can be automatic or a lazy load depending on the implementation, but is always done over the wire and processed on the client side. The server will do nothing to traverse or join this data.
Additionally, MongoDB collections are schemaless, so there is nothing as in the relational sense that says all documents in a collection have to have the same structure.
In the case of Mongoose, there are built in functions to do this sort of loading for you as a convenience, and while not strictly a DBRef and utilizing documents with a different schema in the same collection is the same means as storing the documents external to the referencing document.
It is important to consider the data access patterns of your application and not to simply opt for the same sort of relational design you are used to. Keeping in mind that you are only ever reading from one collection at a time, it is most desirable to get at the data you need in a single read or write, without multiple operations over the wire, which will slow things down considerably.
In short, you should always consider embedding sub-documents first, and then use external references any your best supported form only when you absolutely have to. Your application users will thank you in the end.

Resources