Custom search using Mongodb - node.js

I have a MongoDB as my database and the backend is written in node.js. I am trying to implement a search for a table which returns me all results with the string entered AND string matching.
For example searching "foo" will return (In that order)
foo maker moo
doo foo doo //The order of word search does not matter as long as it puts the word search first
foobar
fooboo
Currently I have this but I am convinced there is a better way to do it without searching the db twice:
async function(req, res) {
var customerName = req.params.customerName;
//word match
var customers1 = await Models.DummyContactTable.find({
customerName: {
$regex: "^" + customerName,
$options: 'i'
},
IsActive: true
});
//String match
var customers2 = await Models.DummyContactTable.find({
$and: [
{
customerName: {
$regex: customerName, $options: 'i'
}
},
{
customerName: {
$not: {
$regex: "^" + customerName,
}
},
IsActive: true
}
]
});
//Since sometimes we get duplicates, doing a filter and find to de-dup
var customers = customers1.concat(customers2.filter((customer) => !customers1.find(f => f.uuid === customer.uuid)));

If you were using Atlas Search, you could write a query like this:
{
$search: {
autocomplete: {
path: "customerName",
query: "foo"
}}}
// atlas search index definition
{
"mappings": {
"fields": {
"customerName" : {
"type" : "autocomplete"
}}}
If you needed to control the result scores, you could use compound
{
$search: {
compound: {
should: [
{autocomplete: {path: "customerName", query: "foo" }},
{text: {path: "customerName", query: "foo" , score: { boost: { "value" : 3" }}}}
]}}}
In this case, we're using the text operator to split on word boundaries using the lucene.standard analyzer, and boosting those results above. Results from Atlas Search are automatically sorted by score with top results first. Queries are optimized for performance and this query would be done in one pass.
There are a lot of other knobs in the docs to turn depending on your sorting and querying needs (such as using different analyzers, prefix searches, phrase searches, regex, etc).

If you want those kinds of ordering rules I would load up all of your customer names into an application that does the search and perform search & sort entirely in the application. I don't expect even Atlas search to provide this kind of flexibility.
(I don't think the queries you provided achieve the ordering you want either.)

Related

How to filter with pagination efficiently with millions of records in mongodb?

I know there are a LOT of questions regarding this subject. And while most work, they are really poor in performance when there are millions of records.
I have a collection with 10,000,000 records.
At first I was using mongoose paginator v2 and it took around 8s to get each page, with no filtering and 25s when filtering. Fairly decent compared to the other answers I found googling around. Then I read about aggregate (in some question about the same here) and it was a marvel, 7 ms to get each page without filtering, no matter what page it is:
const pageSize = +req.query.pagesize;
const currentPage = +req.query.currentpage;
let recordCount;
ServiceClass.find().count().then((count) =>{
recordCount = count;
ServiceClass.aggregate().skip(currentPage).limit(pageSize).exec().then((documents) => {
res.status(200).json({
message: msgGettingRecordsSuccess,
serviceClasses: documents,
count: recordCount,
});
})
.catch((error) => {
res.status(500).json({ message: msgGettingRecordsError });
});
}).catch((error) => {
res.status(500).json({ message: "Error getting record count" });
});
What I'm having issues with is when filtering. aggregate doesn't really work like find so my conditions are not working. I read the docs about aggregate and tried with [ {$match: {description: {$regex: regex}}} ] inside aggregate as a start but it did not return anything.
This is my current working function for filtering and pagination (which takes 25s):
const pageSize = +req.query.pagesize;
const currentPage = +req.query.currentpage;
const filter = req.params.filter;
const regex = new RegExp(filter, 'i');
ServiceClass.paginate({
$or:[
{code: { $regex: regex }},
{description: { $regex: regex }},
]
},{limit: pageSize, page: currentPage}).then((documents)=>{
res.status(200).json({
message: msgGettingRecordsSuccess,
serviceClasses: documents
});
}).catch((error) => {
res.status(500).json({ message: "Error getting the records." });
});
code and description are both indexes. code is a unique index and description is just a normal index. I need to search for documents which contains a string either in code or description field.
What is the most efficient way to filter and paginate when you have millions of records?
Below code will get the paginated result from the database along with the count of total documents for that particular query simultaneously.
const pageSize = +req.query.pagesize;
const currentPage = +req.query.currentpage;
const skip = currentPage * pageSize - pageSize;
const query = [
{
$match: { $or: [{ code: { $regex: regex } }, { description: { $regex: regex } }] },
},
{
$facet: {
result: [
{
$skip: skip,
},
{
$limit: pageSize,
},
{
$project: {
createdAt: 0,
updatedAt: 0,
__v: 0,
},
},
],
count: [
{
$count: "count",
},
],
},
},
{
$project: {
result: 1,
count: {
$arrayElemAt: ["$count", 0],
},
},
},
];
const result = await ServiceClass.aggregate(query);
console.log(result)
// result is an object with result and count key.
Hope it helps.
The most efficient way to filter and paginate when you have millions of records is to use the MongoDB's built-in pagination and filtering features, such as the skip(), limit(), and $match operators in the aggregate() pipeline.
You can use the skip() operator to skip a certain number of documents, and the limit() operator to limit the number of documents returned. You can also use the $match operator to filter the documents based on certain conditions.
To filter your documents based on the code or description field, you can use the $match operator with the $or operator, like this:
ServiceClass.aggregate([
{ $match: { $or: [{ code: { $regex: regex } }, { description: { $regex: regex } }] } },
{ $skip: currentPage },
{ $limit: pageSize }
])
You can also use the $text operator instead of $regex which will perform more efficiently when you have text search queries.
It's also important to make sure that the relevant fields (code and description) have indexes, as that will greatly speed up the search process.
You might have to adjust the query according to your specific use case and data.

Perform full search (using `$regex`) on several fields if search term is presented using aggregation

I have a service that displays products. I need to be able to search products by their fields (product name, barcode or sku).
Previously I used this approach
const query: FilterQuery<TProductSchema> = {};
if (search) {
query.$or = [
{
productName: {
$regex: String(search).split(' ').join('|'),
$options: 'i',
},
},
{
barcode: {
$regex: String(search),
$options: 'i',
},
},
{ sku: { $regex: String(search).split(' ').join('|'), $options: 'i' } },
];
}
if (folderId && folderId !== 'all') {
query.folder = { _id: folderId };
}
const products = await ProductModel.find<HydratedDocument<TProductSchema>>(query)
.limit(Number(limit) === -1 ? 0 : Number(limit))
.skip(Number(page) * Number(limit));
and it worked well but now I also need to include all documents count (which changes depending on selected folderId) in the resulting object.
I thought I could do it with the aggregation framework but I can't figure out how to conditionally match documents only if search is presented.
I thought I could do something like that
const products = await ProductModel.aggregate([
{ $match: {/* match folder */ },
{ /* count matched documents */ },
// next search documents IF `search` is present
{
$match: {
$cond: [search, /* here goes `query` object, '']
}
},
]);
but it doesn't work saying unknown top level operator "$cond"
So how can I apply $match conditionally?
You have created query in first code and you need to pass same in $match it should work same.
$match: query

Mongoose full text search not filtering correctly

So basically i have model with a bunch of string fields like so:
const Schema: Schema = new Schema(
{
title: {
type: String,
trim: true
},
description: {
type: String,
trim: true
},
...
}
);
Schema.index({ '$**': 'text' });
export default mongoose.model('Watch', Schema);
where I index all of them.
Now when I search being that this schema is used as a ref for another model I do a search like this where user is an instance of the other model
const { search, limit = 5 } = req.query;
const query = search && { match: { $text: { $search: new RegExp(search, 'i') } } };
const { schemaRes } = await user
.populate({
path: 'schema',
...query,
options: {
limit
}
})
.execPopulate();
and the searching itself seems to work ok, the problem is when search fields starts to be more specific it seems to me the it does not regard it well.
Example
db
{ title: 'Rolex', name: 'Submariner', description: 'Nice' }
{ title: 'Rolex', name: 'Air-King', description: 'Nice' }
When the search param is Rolex I get both items which is ok but when the search param becomes Rolex Air-King i keep on getting both items which to me is not ok because I would rather get only one.
Is there something I could do to achieve this?
Returning both items is correct, since both items match your search params, but with different similarity score.
You can output the similarity score to help sorting the result.
user.aggregate([
{ $match: { $text: { $search: "Rolex Air-King" } } },
{ $set: { score: { $meta: "textScore" } } }
])
// new RegExp("Rolex Air-King", 'i') is not necessary and even invalid,
// as $search accepts string and is already case-insensitive by default
The query will return
[{
"_id": "...",
"title": "Rolex",
"name": "Air-King",
"description": "Nice",
"score": 2.6
},
{
"_id": "....",
"title": "Rolex",
"name": "Submariner",
"description": "Nice",
"score": 1.1
}]
Since the second result item matches your search query (even partially), MongoDB returns it.
You could use the score to help sort the items. But determining the right threshold to filter the result is complex, as the score depends on the word count as well.
On a side note: You can assign different weights to the fields if they are not equally important
https://docs.mongodb.com/manual/tutorial/control-results-of-text-search/

search mongoDB by 2 fields

I have been looking at some other answers on stack overflow and got as far as I could with that. I learnt that I need to create a text index where I define my Schema which I did like this:
productSchema.index({'title': 'text', 'address.city': 'text'});
If I search by just 1 field, ie: title then I get results as I expect.
Product.find( { $text: { $search: searchTerm } } )
But there is something wrong with my query when trying to search by title and city together.
Product.find( { $text: { $search: searchTerm }, $text: { $search: city } } )
I see no error but I get no results even though I know there should be results for my query. I am not sure if it is because address is an object (according to what I see in Compass)
This is how I defined it in my schema using Mongoose
address: {
city: {type: String, required: true }
},
If I do this:
Product.find().and([{ title: searchTerm }, { 'address.city': city }])
it almost works. But I have to type in the exact title of the product. If the product is called "a rubber duck" and I type in "duck" I get no results. If I type in "a rubber duck" and select the city it is listed in I get back a result.
I have just also tried this:
Product.find( { $and: [ {$text: { $search: searchTerm }}, { address: {city : city } } ] } )
Which seems to work but could probably be improved upon!
Have you looked into the link below? https://docs.mongodb.com/manual/reference/operator/query/or/
There you can find how to add multiple expressions in same query.
If you want to search for titles where title might be equal to A or to B use a query like this: db.inventory.find( { $or: [ { title: "A" }, { title: "B"} ] } ).
Edit If you need data from db to match both expressions in query then use something like this:
Product.find( { $and: [{ address:{city: "CityName"} }, { address: {country : "UK" } } ] } )

Doing partial search with mongoose

I'm trying to get Mongoose to return results in a query when I only give a partial query. For example: I have a 'Company' schema that lists a bunch of companies. A document example:
{
"_id" : ObjectId("57aabeb80057405968de1539"),
"companyName" : "Vandelay Industries",
"owner" : "Ary Vandelay",
"inception" : 2012,
"__v" : 1
}
So if I do a search query like this:
Company.findOne(
{ companyName: Vandelay Industries }, function (err, company) {
if (company) {
//do stuff
}
});
This will produce the document. But If I do the following, I won't get a result:
Company.findOne(
{ companyName: Vandelay }, function (err, company) {
if (company) {
//do stuff
}
});
I would like to be able to do these sorts of partial searches and still get a result. Is there any way to do that with Mongoose?
In order to achieve this you can use a regex search to get the required result.
var searchKey = new RegExp('Vandelay', 'i')
Company.findOne({ companyName: searchKey }, function (err, company) {
if (company) {
//do stuff
}
});
Refer this stackoverflow post.
You can use this query to get result on specific value
Company.findOne({"companyName": /Vandelay/},function(err,company){
if(!err){
console.log(company);
}
});
To get result faster you should use indexing ref https://docs.mongodb.com/manual/text-search/.
db.Company.createIndex( { companyName: "text" } )
then you can search
db.Company.find( { $text: { $search: "company name" } } )
But this only support full word search not partial, so adding an extra line to this will help
db.Company.find({ $or: [
{ $text: { $search: query } },
{ companyName: { $regex: '^' + 'copmany name'} }
]}
This will help you search the results faster than normal
Are you doing a fulltext search? If you do that:
TagGroup.find({
$text: {
$search: text
}
}, {
score: {
$meta: "textScore"
}
}).sort({
score: {
$meta: 'textScore'
}
})
Try that code below. Also you need create a index on that schema
TagGroupSchema.index({
"$**": "text"
});
Here is the document
You can use elasticsearch to do that either, when the documents grows, you should consider this way.

Resources