Mongodb Is there a plugin to query multi collection like "join"? - node.js

Mongodb doesn't support multi collection query, like "left join" in SQL. It only support "populate", but it can't populate subdocument and find parent-document at the same time.
One line query code in SQL, while mongodb user has to query every parent document _id himself.
I have run into these question:
How to populate other collection's subdocument in mongoose?
https://stackoverflow.com/questions/24075910/mongoose-cant-update-or-insert-subdocuments-array
I finally query every _id by myself in a forEach loop.
Is there a plugin to query multi collection like "join"? Or is there any better solution to multi collection query?

Is there a plugin to query multi collection like "join"?
Some DB products have plug-ins that provide extra features, like PostGIS for Postgres. MongoDB does not have such a plug-in for JOINs. It is unlikely that such a plug-in will ever exist because MongoDB is designed not to have join support.
mongoose...
So the one spot where there is "join-like" support are the drivers. Some drivers and wrappers (like Morphia), have support for opening a document and its related sub-documents from a different collection. However, in this case, the driver/tool is simply doing the work of performing multiple queries on your behalf.
This can easily generate too many queries.
Or is there any better solution to multi collection query?
The only solution provided wholly within MongoDB is going to be via the Map / Reduce or Aggregation Framework tools. Even with these tools, you are likely going to have to do multiple passes of the data and then write some scripts to stitch together this data. This will be a lot of work, but you're trying to do something that MongoDB doesn't like to do, so that's expected.
Another solution would be to leverage Hadoop. MongoDB has a Hadoop plug-in so you can run Hadoop over the existing data. Add in some Hadoop query tools like Hive and then you can get an SQL-Like query over the top. This will also be a lot of work, but will enable to run all sorts of SQL-like queries.

Related

Using Mongoose Rather than the MongoDB Driver to Find Document

We are in the process of writing tests for our Node/MongoDB backend and I have a question about finding documents.
My understanding is it's preferable to use Mongoose to get your documents as opposed to the MongoDB driver. In other words, doing Customer.findOne().exec() instead of setting up a db connection and then doing db.collection("customers").findOne().
Other than the first option (using Mongoose to find the doc) being slightly less verbose, I'm curious what the other reasons are. Is a straight MongoDB lookup a bigger drag on the database?
One of the great feature of mongoose is the built in validation mechanism. Also the Populate method to get data from multiple collections is an awesome characteristic of Mongoose.
In terms of query performance, here is a good read:
https://medium.com/#bugwheels94/performance-difference-in-mongoose-vs-mongodb-60be831c69ad
Hope this helps :)

MongoDB - various document types in one collection

MongoDB is schemaless, which means a collection (table in relational DB) can contain documents (rows) of different structure - having different fields, for instance.
I'm new to Mongo, so I decided to use Mongoose which should make things a bit easier. Reading the guide:
Defining your schema
Everything in Mongoose starts with a Schema. Each schema maps to a
MongoDB collection and defines the shape of the documents within that
collection.
Notice at the last sentence. Doesn't it conflict with the schemaless philosophy of MongoDB? Or maybe it's that in 99% of cases, I want a collection of documents of the same structure, so in the introductory guide only that scenario is discussed? Does Mongoose even allow me to create schemaless collection?
MongoDB does not require a schema, but that confuses a lot of people from a standard SQL background so Mongoose is aimed at trying to bridge the gap between SQL and NoSQL. If you want to maintain a collection with different document types, than by all means do not use Mongoose.
If you're okay with the schemaless nature of MongoDB there is no reason to add additional abstractions and overhead to MongoDB which is what Mongoose surely applies.
The purpose of Mongoose is to use a Schema, there are other database drivers you can use to take advantage of MongoDBs Schemaless nature such as Mongoskin.
If you want to utilize the Mongoose's Schema Design and make an exeception you can use: Mongoose Strict.
According to the docs:
The strict option, (enabled by default), ensures that values passed to our model constructor that were not specified in our schema do not get saved to the db.
NoSQL doesn't mean no schema. It means, the database doesn't control schema. For instance, with MongoDB, you can look hard to find anything that determines a field in a document is a string, or a number or a date. The database doesn't care. You could store a number in a field in one document and in another document in the same collection, and in the same field, you could store a string. But, from a coding perspective, that can become quite hairy and would be bad practice. This is why you still have to define data types. So, you still need a schema of sorts and why Mongoose offers and, in fact, enforces this functionality.
Going a conceptual level higher now, the major concept of NoSQL is to put schema inside your code and not in some file of SQL commands i.e. not telling the DB what to expect in terms of data types and schema to be controlled by the database. So, instead of needing to have migration files/paths and versioning on database schema, you just have your code. ORMs, for example, try to bridge this issue too, where they often have automated migration systems.
ORMs also try to avoid the Object Relational Impedance Mismatch problem, which MongoDB avoids completely. Well, it doesn't have relationships per se, so the problem is avoided out of necessity.
Getting back to schema, with MongoDB and Mongoose, if you or one of your team make a change to the schema in the code, all your other team members need to do to get the database to work with it is pull in that new code. Voila, the schema is up-to-date and will work. No need to also pull in a copy of the newer migration file (to determine the new schema of the DB) to then have to run it on a (copy of the) db to update it too, just to continue programming. There is no need to make changes in schema in multiple places.
So, in the end, if you can imagine your schema is always in your code (only), making changes to an application with a database persisting state like MongoDB is a good bit simpler and even safer. (Safer, because code and schema can't get out of sync, as it's the one and the same.)

Solr vs Elasticsearch for nested documents

I have been using solr for my project but recently I encountered Elasticsearch which seems to be very promising. My project requires ability to handle nested documents and I would like to know which one does better job. Solr just added child documents recently but is it as good as Elasticsearch's? Could Elasticsearch perform query on both parent and children at once? Thanks
I've been looking into the subject recently and to my understanding ElasticSearch makes the life a lot easier when working with nested documents, although Solr also supports nesting (but is less flexible in querying).
So the features of ElasticSearch are:
"Seamlessly" supports nesting: you don't have to change your
nested documents structure or add specific fields. However, you need
to indicate in the mapping what fields are nested when creating the
index
Supports nested query with "nested" and "path":
Supports aggregation and filtering with nested docs: also via
"nested" and "path".
With Solr you will have to:
Modify your schema.xml by adding the _ root _ field
Modify your dataset so that parent and child documents would have a specific distinguishing field, in particular, childDocuments to indicate children (see more at this question)
Aggregation and filtering on nested documents promises to be very complicated if not impossible.
Also, nested fields are not supported at all.
Recent Solr versions (5.1 and up) can eventually be configured to support nesting (including you'll have to change your input data structure), however, documentation is not very clear and there is not much information on the Internet because these features are recent.
The bottomline is that in the sense of nested documents ElasticSearch can do everything that Solr can and even more with less effort and smoother learning curve. So going with ElasticSearch seems more reasonable in this case.
I am not aware of Elastic Search, so this is always 50% answer.
Solr works best with denormalized data. However, given that you have nested documents, you can use solr in two scenarios:
Query for parent, with a child attribute
Query for all children of a parent.
You can use block join to perform the above queries. Even though, you deal with nested levels, solr internally manages them as denormalized. I mean, when a parent have 2 children, you end up with three high level documents in solr. And solr manages the relation part.

Mongoose populate option and query time

I am working on a platform where I use mongoose .populate number of times in all my queries, I turn on the mongoose debug mode and find that there is hardly difference in query execution time (for 100 document now , there will will be 100000 doc in future) with using populate and without using populate.
I know that basically populate is also doing a finOne query internally , my question is, is using .populate will increase my query time or is it anyways going to effect my performance if number of record reaches millions. Also is there any alternate that I can choose to increase performance
In general, you're correct - you want to avoid using populate() since it will issue another query for each row. Keep in mind that this is a full round-trip to the server. Mongo doesn't have any sort of concept for a join, so when you do populate you're issuing an additional query for each row in your returned set.
The technique to work around this is to denormalize your data - don't design a mongo database like a relational one. The mongo docs have lots of information on how to do this. https://docs.mongodb.org/manual/core/data-model-design/ One important thing to keep in mind with Mongo design is that you never want to have a subdocument with unbounded growth. Due to the way mongo space allocation and paging works, this can cause severe performance problems, so if you're in a situation like this it's best to normalize.
Another very common technique is subdocument caching. This is where you take partial data from the "joined" collection and cache it on the collection you're querying. In this case, you're trading space for performance because you have duplicate data. Also, you'll have to make sure you keep the data updated whenever there's a change. With mongoose, it is easy to do this as a post-save hook on the model of the foreign collection.

mongoDB + mongoose: choosing the right schema for hierarchical data

I'm new to mongoDB and in a node project i'm using mongoose to create a schema for my database. I have come to understand that in my case i should be using embedded data instead of by reference. (http://docs.mongodb.org/manual/core/data-modeling-introduction/)
The structure that i need to store in the database is something like this:
book
| title
| | chapter
| | | content (url to a html file)
Coming from a mySQL world, i'm trying to understand the noSQL database concepts and i was wondering how one would design the mongoose schema.
Thx,
You can nest documents, and document arrays in a MongoDB document:
db.books.findOne();
can return a JSON:
{
isbn:"253GHST78F6",
title:"some book",
author:"Earnest Hemmingway",
chapters:[
{title:"chapter 1",content:"http://api.com/chapters/1.html"},
{title:"chapter 2",content:"http://api.com/chapters/2.html"}
]
}
But, there are a few more things to keep in mind when modeling a collection in MongoDB:
Model data as close as possible to what will be asked for: There are no Joins in MongoDB, so, try to keep all such data together (pre joined), that is supposed be queried together in future.
Nesting is good for queries but nested update and search is bad:
Indexing and searching inside nested documents, especially arrays, will be expensive.
Avoid placing any data in nested arrays, that is very frequently updated.
There are no Foriegn Keys in MongoDB. So, if you have multiple copies of same document nested under different collections, keeping all of them updated is your responsibility.
First take a look at this Martin Fowlers excellent video:
I think there is no better authority that can explain nosql, then Fowler. Especially knowing his sql background.
Second, MongoDB follows json schema. If you know how to work with json, you will know how to work with nosql (mongodb). Basic thing that needs to be understand is that Mongo schema is expressed in a language that is using it.
So, if you are using NodeJS with Mongo, you still have objects and array to work with. In simple words, Mongo is not forcing any particular schema. It is on developer to create his scheme based on his language/mongo driver.
So how would you express you data logic in your language ? If it is in form of JS object, then move that form to db.
I really like MongoDB, becuse it can be combined with some great JS tools like Underscore.js for all kind of data manipulations.

Resources