How to query many collections elements in mongodb - node.js

Good morning colleagues,
I want to make a query regarding how it would be formulated and what would be recommended and the most optimal for making queries with a large number of elements.
I have an api using express that creates a new mongodb model with a unique name and then includes elements inside.
Example:
Collections
*product234 => includes elements => { _:id: ...., name: ... }, { ...}
*product512 => includes elements => { _:id: ...., name: ... }, { ...}
Each collection hosts more than 5000 elements and I want to create a search engine that returns all the results of all the collections that match the "name" that I will send in the request.
How could I perform this query using mongoose? Would it be viable and would it not bring me performance problems by having more than 200 collections and more than 5000 elements in each one?
Answer (Edit):
As i see in the comments the best solution for this for making fast queries is to create a copy of the element with only name or needed values, reference id and reference collection name into a new collection named for example "ForSearchUse" and then make the query to that collection, if complete info is needed then you can query it to the specific collection using the reference id and name of the element

Related

ravendb NodeJS, load related document and create a nested result in a query

I have an index that returns something like this
Company_All {
name : string;
id : string;
agentDocumentId : string
}
is it possible to load the related agent document and then generate a nested result with selectFields and QueryData like this
ICompanyView {
companyName : 'Warner',
user {
documentId : 'A/1'
firstName : 'john',
lastName : 'paul'
}
}
I need something like the below query that obviously doesn't work as I expect:
const queryData = new QueryData(
["name", "agentDocumentId", "agent.firstName", "agent.lastName"],
["companyName", "user.documentId", "user.lastName", "user.firstName"]);
return await session.query<Company_AllResult>({ index: Company_All })
.whereEquals("companyId", request.companyId)
.include(`agents/${agentDocumentId}`) // ????
.selectFields(queryData,ICompanyView)
.single();
Yes, you can do that using:
https://ravendb.net/docs/article-page/5.4/nodejs/indexes/indexing-related-documents
This is called indexing related documents, and is accessible at indexing time, not query time.
Alternatively, you have the filter clause, which has access to the loaded document, but I wouldn't generally recommend doing this.
Generally:
When you query an index, the results of querying the index are the documents from the collection the index was defined on.
Index-fields defined in the index are used to filter the index-query
but the results are still documents from the original collection.
If you define an index that indexes content from a related-document then when making an index-query you can filter the documents by the indexed-fields from the related documents, but the results are still documents from the original collection.
When making an index-query (or any other query) you can project the query results so that Not the full documents of the original collection are returned but some other object.
Now:
To project/get data from the indexed related-document you have 2 options:
Store the index-fields from the related-document in the index.
(Store all -or- specific fields).
This way you have access to that content when making a projection in your query.
See this code sample.
Don't store the index-fields from the related-document,
then you will be able to use the index-fields to filter by in your query,
but to get content you will need to use 'include' feature in your query,
and then use the session.load, which will Not make another trip to the server.
i.e. https://demo.ravendb.net/demos/nodejs/related-documents/query-related-documents

What is MongoDB aggregate pipeline?

Using Mongoose driver
Consider the following code :
Connecting to database
const mongoose = require("mongoose");
require("dotenv").config();
mongoose.connect(process.env.DB);
const userSchema = new mongoose.Schema({ name: String }, {collection: 'test'});
const Model = mongoose.model('test', userSchema);
Creating dummy document
async function createDocs() {
await Model.deleteMany({});
await Model.insertMany([{name: "User1"}, {name: "User2"}, {name: "User3"},{name: "User4"}])
}
createDocs();
Filtering data using Model.find()
async function findDoc () {
let doc = await Model.find({name: 'User1'});
console.log(`Using find method : ${doc}`);
}
findDoc();
Filtering data using Model.aggregate()
async function matchDoc() {
let doc = await Model.aggregate([
{
$match: {name : 'User1'}
}
])
console.log(`Using aggregate pipeline : `, doc);
}
matchDoc();
• Both the processes produce the same output
Q1) What is an aggregate pipeline and why use it?
Q2) Which method of retrieving data is faster?
I will not get too much into this as there is a lot of information online. But essentially the aggregation pipeline gives you access to a lot of strong operators - mainly used for data analysis and object restructuring, for simple get and set operations there is no use for it.
A "real life" example of when you'd want to use the aggregation pipeline is if you want to calculate an avg of a certain value in your data, obviously this is just the tip of the iceberg in terms of power that this feature allows.
find is slightly faster, the aggregation pipeline has some overhead when compared to the query language as each stage has to load the BSON documents into memory, where find doesn't. If your use case is indeed just a simple query then find is the way to go.
You are checking a smaller piece of the aggregation pipeline.
You cannot do pipeline with a single query using find
Let's say you want to find all the orders which have a product that was purchased within a duration. Here, orders and customers are two different collections, You need multiple stages.
Let's say you stored data in a different format, For ex, date as a string, integers as a decimal. If you want to convert on the fly, you can use aggregation.
If you want to use aggregation operators in an update operation from mongo 4.2+, It helps.
You can restructure data in find. Let's say I just need a few fields after aggregation from an array field.
You cannot find a particular array or object element matching a condition using simple find. elemMatch is not that powerful.
You cannot group things with simple find
And many more.
I request you to check aggregate operators and relevant examples from the documentation
Data retrieving depends on not depend on the aggregation or find. It depends on the data, hardware, and index settings.

Maintain a custom order/sort of documents in MongoDB

In my web app XY I'm showing a classic list/table of data (documents) to the user. While all the sort functions provided by MongoDB ( and Mongoose, i'm using Mongoose) are pretty clear to me, I'm not interested in sorting by date or alphabetical order. In my case it would be important to let the user maintain a custom sort as in manually drag/drop items around to set a specific order of the documents (e.g. putting favourites in top of the list ). The UI to do this is a no-brainer but how to actually save the order in the database, brain-freeze.
Problem : How would I go about saving such a custom order of documents ?
What I use : NodeJS / Express / Mongoose (MongoDB)
Ideas
So far I could think of 2 ideas on how to do this.
A : Having an additional key (e.g. orderKey) in the mongoose Schema. Big con : I would need to keep constantly updating all documents orderKeys. Also I would need some sort of auto-increment for new documents.
const mySch = new Schema({
orderKey : { type : Number }
});
B : Creating one Schema/Model only for sorting, with an Array including all documents _ids for example. The order of the elements within the array would be used as reference for the custom order of the documents. Whenever the order changes, this Array would be changed as well.
conts orderSch = new Schema({
orderArray : { type : Array }
});
mongoose.model('Order', orderSch);
/* const customOrder = new Order({
orderArray : [ _id1, _id2, _id3, _id10, _id7, .. ]
}); */
Any more ideas or best practises are highly appreciated !

Query documents in an array in mongodb using node.js

How to query all documents present inside an Array which itself is present in a MongoDB collection under Node.js.
For example: I have a DB with a structure:
{
"name1":[{"height":"5.5"},
{"weight":"57"}],
"name2":[{"height":"6.1"},
{"weight":"74"}]
}
What query should I make to get all the documents( i.e. height, weight) of the array "name1"
Output should be :
{
{ "height":"5.5"}
{"weight":"57"}
}
My suggestion would be to reorganise the collection so each document has a key of name and key of physicalattributes e.g.
{
'name' : 'name1',
'physAttr': ['height': heightvalue,
'weight': weightvalue]
}
then suppose you wanted to find all documents with height 5.5, the query would be trivial
db.collection.find({ 'physAttr.height': 5.5 })
As described in the question each document in the collection has a different schema from the others (different name key for every document) making query operations difficult.

MongoDB Relational Data Structures with array of _id's

We have been using MongoDB for some time now and there is one thing I just cant wrap my head around. Lets say I have a a collection of Users that have a Watch List or Favorite Items List like this:
usersCollection = [
{
_id: 1,
name: "Rob",
itemWatchList:[
"111111",
"222222",
"333333"
]
}
];
and a separate Collection of Items
itemsCollection = [
{
_id:"111111",
name: "Laptop",
price:1000.00
},
{
_id:"222222",
name: "Bike",
price:123.00
},
{
_id:"333333",
name: "House",
price:500000.00
}
];
Obviously we would not want to insert the whole item obj inside the itemWatchList array because the items data could change i.e. price.
Lets say we pull that user to the GUI and want to diplay a grid of the user itemWatchList. We cant because all we have is a list of ID's. Is the only option to do a second collection.find([itemWatchList]) and then in the results callback manipulate the user record to display the current items? The problem with that is what if I return an array of multiple Users each with an array of itemWatchList's, that would be a callback nightmare to try and keep the results straight. I know Map Reduce or Aggregation framework cant traverse multiple collections.
What is the best practice here and is there a better data structure that should be used to avoid this issue all together?
You have 3 different options with how to display relational data. None of them are perfect, but the one you've chosen may not be the best option for your use case.
Option 1 - Reference the IDs
This is the option you've chosen. Keep a list of Ids, generally in an array of the objects you want to reference. Later to display them, you do a second round-trip with an $in query.
Option 2 - Subdocuments
This is probably a bad solution for your situation. It means putting the entire array of documents that are stored in the items collection into your user collection as a sub-document. This is great if only one user can own an item at a time. (For example, different shipping and billing addresses.)
Option 3 - A combination
This may be the best option for you, but it'll mean changing your schema. For example, lets say that your items have 20 properties, but you really only care about the name and price for the majority of your screens. You then have a schema like this:
usersCollection = [
{
_id: 1,
name: "Rob",
itemWatchList:[
{
_id:"111111",
name: "Laptop",
price:1000.00
},
{
_id:"222222",
name: "Bike",
price:123.00
},
{
_id:"333333",
name: "House",
price:500000.00
}
]
}
];
itemsCollection = [
{
_id:"111111",
name: "Laptop",
price:1000.00,
otherAttributes: ...
},
{
_id:"222222",
name: "Bike",
price:123.00
otherAttributes: ...
},
{
_id:"333333",
name: "House",
price:500000.00,
otherAttributes: ...
}
];
The difficulty is that you then have to keep these items in sync with each other. (This is what is meant by eventual consistency.) If you have a low-stakes application (not banking, health care etc) this isn't a big deal. You can have the two update queries happen successively, updating the users that have that item to the new price. You'll notice this sort of latency on some websites if you pay attention. Ebay for example often has different prices on the search results pages than the actual price once you open the actual page, even if you return and refresh the search results.
Good luck!

Resources