MongoDB search results are slow and inaccurate - node.js

On https://cbbanalytics.com/, after logging in with email: stackacct#gmail.com, password: pass123, a search bar appears in the top-right corner. When text is input, the following route fires off:
router.get('/live-search/text/:text', function (req, res) {
try {
let text = req.params.text;
// Use $regex
let queryFilters = { label: { $regex: `${text}`, $options: 'i' } };
// Use $search (text-index)
// let queryFilters = { $text: { $search: text } };
// Return Top 20
db.gs__ptgc_selects
.find(queryFilters)
.limit(20)
.then(data => res.json(data))
.catch(err => res.status(400).json('Error: ' + err));
} catch (error) {
res.status(500).json({ statusCode: 500, message: error.message });
}
});
gs__ptgc_selects is a mongodb collection with 180K documents, and we are searching on the label field present in each document. label is set as a text index in MongoDB Atlas:
The primary issue with the regex implementation is:
each fetch takes ~150ms which is noticeable in the search performance
regex isn't returning the best search results. searching Zio returns Alanya DeFazio before Zion Young. Optimal order of search return would be (i) all 1st names starting with Zio, sorted alphabetically, (ii) all 2nd words starting with Zio, (iii) other words with Zio nested inside the word.
using regex doesn't leverage the text index at all. as a result, Query Targeting: Scanned Objects / Returned has gone above 1000 warnings are returned when the search is used.
If we uncomment let queryFilters = { $text: { $search: text } }; and use this instead of regex:
only exact matches are returned
fetches are still at ~150ms
Is it possible to improve search within our current stack (Node JS, mongoDB, and mongoose)? Or are these limitations unavoidable?
Edit: We had recently created a search-index for the entire gs__ptgc_selects collection, however this doesn't appear to be improving search.

Related

Firestore Query Nested Map

Looking to construct a query against a firestore collection ('parent') where the documents have a nested map (2 logical levels deep). Specifically when the first map has dynamic keys which are not known at the time of running the query. As an example:
Document 1
{
codes: {
abc: {
id: 'hi'
},
def: {
id: 'there'
}
}
}
Document 2
{
codes: {
ghi: {
id: 'you'
},
zmp: {
id: 'guys'
}
}
}
What I would like to do is have a WHERE clause that takes a wildcard for a key in the document. ie.
firestore.collection('parent').WHERE('codes.*.id', '==', 'there')
// Results in Document 1
or
firestore.collection('parent').WHERE('codes.*.id', '==', 'you')
// Results in Document 2
Is there any way to achieve this behavior without having to resort to generating subcollection documents to be used for indexing, or polluting the document itself with a second map that maps ids to codes.
== Not ideal solution 1 (subcollections) ==
Build out the server so that when these documents are filed, a subcollection ('child') is maintained with documents that contain the related information. As an example filing Document 1 above would require filing two documents in the child subcollection:
{
id: 'hi'
code: 'abc'
}
{
id: 'there'
code: 'def'
}
Now we can query for the id we want, and get the parent reference, and follow that all the way back to the parent...
firestore.collectionGroup('child').where('id', '==', 'there')
.get()
.then(snapshot => {
for(const doc of snapshot.docs) {
return doc.ref.parent.parent
}
return Promise.reject('no parents, how sad.')
})
.then(ref => ref.get())
.then(snapshot => snapshot.data())
.then(parent => {
// Thank goodness, the parent is Document 1!
}
The downside to this is maintenance of the sub collections, as well as a number of extra operations against firestore.
== Not ideal solution 2 (model pollution) ==
Another way to achieve this is to implement another map or an array in the document itself which simply contains the ids which would then let us query on those values. ie
{
codes: {
abc: {
id: 'hi'
},
def: {
id: 'there'
}
},
codeids:['hi','there']
}
Although this is easy to query:
.WHERE('codeids', 'ARRAY CONTAINS', 'hi')
I don't like the idea of adding fields that are not meaningful to the consumer of the document (the purpose of the field only being to facilitate a documents ability to be queried due to system constraints)
Open to suggestions!

followers and following sytem

I am working on a social network web application I have established a system of following followers with firebase and node js , so I created a collection users and in it two following followers array, I managed to add them
Now I want to issue a condition to check if the user has already made a follow up not to add it a second time to the table how can i access to the tables (following, followers)in order to verify if the user is in
exports.onFollow = (req, res) => {
const followDocument = db.doc(`/users/${req.body.email}`);
const followerDocument = db.doc(`/users/${req.user.email}`);
let followData;
let followerData;
followDocument
.get()
.then((doc) => {
if (doc.exists) {
followData = doc.data();
if ('req.user.email', 'in', followData.followers.docs) {
return res.status(200).json({
error: 'user already follow'
});
} else {
followData.followers.push(req.user.email);
return followDocument.update({
followers: followData.followers
});
}
}
})
.catch((err) => {
console.error(err);
res.status(500).json({
error: err.code
});
});
It sounds like you want followers to be an array with unique values, so that each email address can only occurs once. Firestore has special arrayUnion operation for adding values to such a field.
From the documentation on updating elements in an array:
If your document contains an array field, you can use arrayUnion() and arrayRemove() to add and remove elements. arrayUnion() adds elements to an array but only elements not already present. arrayRemove() removes all instances of each given element.
var washingtonRef = db.collection("cities").doc("DC");
// Atomically add a new region to the "regions" array field.
washingtonRef.update({
regions: admin.firestore.FieldValue.arrayUnion("greater_virginia")
});
// Atomically remove a region from the "regions" array field.
washingtonRef.update({
regions: admin.firestore.FieldValue.arrayRemove("east_coast")
});
I'd recommend switching to using arrayUnion() for your use-case, as it prevents having to do the query to detect if the email address is already in the array.

Search in array child firestore

How can I retrieve all the documents who match a child in a data structure like this:
{
[
id: {
name: "name",
products: {
items: [
productName: "this is the product Name"
]
}
}
]
}
The parameter i try to compare is the one inside products.items[0].productName.
this is how i tried but it does not retrieve anything:
try{
var data = [];
const byName = await dbRef.where('producto.items[0].producto', '==', req.params.nombre).get();
console.log(byName);
if (byName.empty) {
console.log('No matching documents.');
res.send('No matching documents.');
return;
}
byName.forEach(doc => {
console.log(doc.id, '=>', doc.data());
data.push(doc.data());
});
res.send(data);
}catch(err){
res.send(err);
}
If you want to search across all items in the items array for one that matches the value you have, you can use the array-contains operator:
dbRef.where('producto.items', 'array-contains', { producto: req.params.nombre})
But note that this only works if the array only contains the producto field in each item. The reason is that array-contains (and other array-level operators) work on complete items only.
So if the items in producto.items have multiple subfields, and you want to match on one/some of them, you can't use array-contains. In that case, you're options are:
Store the items names in a separate/additional array field product-names and then query on that with array-contains.
Store the array items in a subcollection and query that.
Use a map instead of an array to store these values. This will generate many extra indexes though, which both adds to your storage cost, and may get you to the limit on the number of indexes.

Efficient search of substrings in MongoDB

I want to search in a User collection for a document containing a given username while the user is typing in the username. For example:
Database:
happyuser
happyuser2
happy_user
userhappy
User types in "hap", all usernames should be found, since "hap" is included in all of them. When I do the following, only results are found, when the full username is provided and non of the results is found for the query "hap":
User.find(
{
$text: {
$search: "hap",
$diacriticSensitive: true
}
},
{
score: {
$meta: "textScore"
}
}, function(err, results) {
if (err) {
next(err);
return;
}
return res.json(results);
});
I only get results, when I search by providing a regular expression:
User.find({
"username": new RegExp("hap")
}).exec(function(err, results) {
if (err) {
next(err);
return;
}
return res.json(results);
});
But this cannot be efficient, right? I mean, with a regular expression MongoDB basically touches all documents in the user collection or am I wrong? What is the best practice solution for such a search query?
Searching using regular expression is pretty efficient (at least for sql) unless you have wide array of characters included in your search (complex-random search) that out-limits your databases (regex's) capabilities or unless you have too many databases to search from that it takes forever for your regex to find them all in reasonable time. Specifically for mongoDB, I usually go with:
>db.collection.find( { key: { $regex: 'P.*'} } ).pretty();
This command should give me all details about 'key' starting with letter 'P'. Some people also use %"Your-Expression-Here%. However, I would suggest above command to be more efficient. Have a look at documentation: https://docs.mongodb.com/manual/reference/method/db.collection.find/
In context of your example, you won't be able to perform plain text search.
The $text operator matches on the complete stemmed word. So if a
document field contains the word blueberry, a search on the term blue
will not match. However, blueberry or blueberries will match.
On the over hand, if you don't want to mess with regex. Give a try to where:
User.find().$where(function() {
return this.username.indexOf('hap') != -1;
});

Express js Mongoose alternative to MySQL % Wildcard

I've been reading up and tried a few different code snippets that others have had success with, but I can't seem to get it to work.
What I'd like is for users to search using only part of the term i.e pe for 'peter'. I'd like to have a wildcard on the search term.
My code so far, which isn't working:
router.get('/:callsign', function(req,res){
var search = req.params.callsign;
var term = escape(search);
term = term.toUpperCase();
if(search=="*" || search==""){
res.redirect("/");
}
User.find({'callsign' : new RegExp('^'+term+'$', "i") }, function(err, callsign){
if(err)
{
console.log('No user found'+err);
req.flash('message','Sorry, something went wrong. Try again.');
res.render('callSearchResults'),{
message: req.flash('message'),
title: 'Sorry, no results'
}
}
if(callsign){
console.log('Callsign:'+callsign+term);
res.render('callSearchResults',{
call: callsign,
title: 'You searched for '+search,
query: term
});
}else{
console.log('No entries found'+search);
}
});
});
Also, 'callsign' callback is constantly true - even when there are no results!
You are using an RegExp for this search. Literal ^ mean that pattern must be at the beggining of string, and $ that at the end. If you want just to match part you don't need to add them, so example below:
new RegExp(term, "i")
Also there is a good mechanism of full-text search build-in mongo. You can read about them from official docs.
About queries in mongoose, when there is now object and checking in callback. The returned object is Array of documents, and in JS empty array is casted to true statement. Instead check lenght array, like this:
if(callsign.length > 0) {
// Logic
} else {
// Nothing found
}

Resources