Efficient search of substrings in MongoDB - node.js

I want to search in a User collection for a document containing a given username while the user is typing in the username. For example:
Database:
happyuser
happyuser2
happy_user
userhappy
User types in "hap", all usernames should be found, since "hap" is included in all of them. When I do the following, only results are found, when the full username is provided and non of the results is found for the query "hap":
User.find(
{
$text: {
$search: "hap",
$diacriticSensitive: true
}
},
{
score: {
$meta: "textScore"
}
}, function(err, results) {
if (err) {
next(err);
return;
}
return res.json(results);
});
I only get results, when I search by providing a regular expression:
User.find({
"username": new RegExp("hap")
}).exec(function(err, results) {
if (err) {
next(err);
return;
}
return res.json(results);
});
But this cannot be efficient, right? I mean, with a regular expression MongoDB basically touches all documents in the user collection or am I wrong? What is the best practice solution for such a search query?

Searching using regular expression is pretty efficient (at least for sql) unless you have wide array of characters included in your search (complex-random search) that out-limits your databases (regex's) capabilities or unless you have too many databases to search from that it takes forever for your regex to find them all in reasonable time. Specifically for mongoDB, I usually go with:
>db.collection.find( { key: { $regex: 'P.*'} } ).pretty();
This command should give me all details about 'key' starting with letter 'P'. Some people also use %"Your-Expression-Here%. However, I would suggest above command to be more efficient. Have a look at documentation: https://docs.mongodb.com/manual/reference/method/db.collection.find/

In context of your example, you won't be able to perform plain text search.
The $text operator matches on the complete stemmed word. So if a
document field contains the word blueberry, a search on the term blue
will not match. However, blueberry or blueberries will match.
On the over hand, if you don't want to mess with regex. Give a try to where:
User.find().$where(function() {
return this.username.indexOf('hap') != -1;
});

Related

MongoDB search results are slow and inaccurate

On https://cbbanalytics.com/, after logging in with email: stackacct#gmail.com, password: pass123, a search bar appears in the top-right corner. When text is input, the following route fires off:
router.get('/live-search/text/:text', function (req, res) {
try {
let text = req.params.text;
// Use $regex
let queryFilters = { label: { $regex: `${text}`, $options: 'i' } };
// Use $search (text-index)
// let queryFilters = { $text: { $search: text } };
// Return Top 20
db.gs__ptgc_selects
.find(queryFilters)
.limit(20)
.then(data => res.json(data))
.catch(err => res.status(400).json('Error: ' + err));
} catch (error) {
res.status(500).json({ statusCode: 500, message: error.message });
}
});
gs__ptgc_selects is a mongodb collection with 180K documents, and we are searching on the label field present in each document. label is set as a text index in MongoDB Atlas:
The primary issue with the regex implementation is:
each fetch takes ~150ms which is noticeable in the search performance
regex isn't returning the best search results. searching Zio returns Alanya DeFazio before Zion Young. Optimal order of search return would be (i) all 1st names starting with Zio, sorted alphabetically, (ii) all 2nd words starting with Zio, (iii) other words with Zio nested inside the word.
using regex doesn't leverage the text index at all. as a result, Query Targeting: Scanned Objects / Returned has gone above 1000 warnings are returned when the search is used.
If we uncomment let queryFilters = { $text: { $search: text } }; and use this instead of regex:
only exact matches are returned
fetches are still at ~150ms
Is it possible to improve search within our current stack (Node JS, mongoDB, and mongoose)? Or are these limitations unavoidable?
Edit: We had recently created a search-index for the entire gs__ptgc_selects collection, however this doesn't appear to be improving search.

Does EdgeNGram autocomplete_filter make sense with prefix search?

i have Elastic Search Index with around 1 million records.
I want to do multi prefix search against 2 fields in the Elastic Search Index, Name and ID (there are around 10 total).
Does creating EdgeNGram autocomplete filter make sense at all?
Or i am missing the point of the EdgeNGram.
Here is the code i have for creation of the index:
client.indices.create({
index: 'testing',
// type: 'text',
body: {
settings: {
analysis: {
filter: {
autocomplete_filter: {
type: 'edge_ngram',
min_gram: 3,
max_gram: 20
}
},
analyzer: {
autocomplete: {
type: 'custom',
tokenizer: 'standard',
filter: [
'lowercase',
'autocomplete_filter'
]
}
}
}
}
}
},function(err,resp,status) {
if(err) {
console.log(err);
}
else {
console.log("create",resp);
}
});
Code for searching
client.search({
index: 'testing',
type: 'article',
body: {
query: {
multi_match : {
query: "87041",
fields: [ "name", "id" ],
type: "phrase_prefix"
}
}
}
},function (error, response,status) {
if (error){
console.log("search error: "+error)
}
else {
console.log("--- Response ---");
console.log(response);
console.log("--- Hits ---");
response.hits.hits.forEach(function(hit){
console.log(hit);
})
}
});
The search returns the correct results, so my question being does creating the edgengram filter and analyzer make sense in this case?
Or this prefix functionality would be given out of the box?
Thanks a lot for your info
It is depending on your use case. Let me explain.
You can use ngram for this feature. Let's say your data is london bridge, then if your min gram is 1 and max gram is 20, it will be tokenized as l, lo, lon, etc..
Here the advantage is that even if you search for bridge or any tokens which is part of the generated ngrams, it will be matched.
There is one out of box feature completion suggester. It uses FST model to store them. Even the documentation says it is faster to search but costlier to build. But the think is it is prefix suggester. Meaning searching bridge will not bring london bridge by default. But there are ways to make this work. Workaround to achieve is that, to have array of tokens. Here london bridge and bridge are the tokens.
There is one more called context suggester. If you know that you are going to search on name or id, it is best over completion suggester. As completion suggester works over on all the index, context suggester works on a particular index based on the context.
As you say, it is prefix search you can go for completion. And you mentioned that there 10 such fields. And if you know the field to be suggested at fore front, then you can go for context suggester.
one nice answer about edge ngram and completion
completion suggester for middle of the words - I used this solution, it works like charm.
You can refer documentation for other default options available within suggesters.

Checking if a field contains a string, works with string type field but not with number

I am using Loopback node js framework with MongoDB.
Here I am checking if a field contains a given string or not
user.find({
where: {
or: [{
mobile: {
"regexp": '/' + data.search + '/i'
},
contacts:{
"regexp": '/' + data.search + '/i'
}}]
}
}, function(err, mobileResult) {
if (err) {
callback(err, null);
} else {
.......
.......
}
});
this one works with string type field contacts but not with number field mobile.
I tried answers from these post but it didn't worked for me.
I think you can use this solution:
Using aggregate for querying your data:
MongoDB aggregation on Loopback
Using this awesome way to search on a number in MongoDB: (Second Answer)
MongoDB Regex Search on Integer Value
I also agree with #Stennie which commented:
Regular expressions match patterns in strings. If you need to do
pattern matching I'd suggest storing your mobile values as strings
instead of numbers.

Express js Mongoose alternative to MySQL % Wildcard

I've been reading up and tried a few different code snippets that others have had success with, but I can't seem to get it to work.
What I'd like is for users to search using only part of the term i.e pe for 'peter'. I'd like to have a wildcard on the search term.
My code so far, which isn't working:
router.get('/:callsign', function(req,res){
var search = req.params.callsign;
var term = escape(search);
term = term.toUpperCase();
if(search=="*" || search==""){
res.redirect("/");
}
User.find({'callsign' : new RegExp('^'+term+'$', "i") }, function(err, callsign){
if(err)
{
console.log('No user found'+err);
req.flash('message','Sorry, something went wrong. Try again.');
res.render('callSearchResults'),{
message: req.flash('message'),
title: 'Sorry, no results'
}
}
if(callsign){
console.log('Callsign:'+callsign+term);
res.render('callSearchResults',{
call: callsign,
title: 'You searched for '+search,
query: term
});
}else{
console.log('No entries found'+search);
}
});
});
Also, 'callsign' callback is constantly true - even when there are no results!
You are using an RegExp for this search. Literal ^ mean that pattern must be at the beggining of string, and $ that at the end. If you want just to match part you don't need to add them, so example below:
new RegExp(term, "i")
Also there is a good mechanism of full-text search build-in mongo. You can read about them from official docs.
About queries in mongoose, when there is now object and checking in callback. The returned object is Array of documents, and in JS empty array is casted to true statement. Instead check lenght array, like this:
if(callsign.length > 0) {
// Logic
} else {
// Nothing found
}

Mongoose find always returning true when it shouldn't

I’m using Mongoose.js to interface with my Mongo database. This function searches through my Location names and should be logging not found to the console instead of found, as I don't have a Location with the name !#£.
Location.find({name: "!#£"}, function(err, obj) {
console.log(obj);
if (!err) {
console.log("found");
} else {
console.log("not found");
}
});
This is what is logging out to my console:
[]
found
The expected behaviour should be for it to log not found to the console. Here's a dump of the Location model data:
[
{
"_id":"5384c421af3de75252522aa2",
"name":"London, UK",
"lat":51.508515,
"lng":-0.12548719999995228,
"__v":0,
"modified":"2014-05-27T16:58:09.546Z",
"search_count":1
},
{
"_id":"5384c766af3de75252522ab4",
"name":"Paris, France",
"lat":48.856614,
"lng":2.3522219000000177,
"__v":0,
"modified":"2014-05-27T17:12:06.990Z",
"search_count":1
},
{
"_id":"53851a213a33fe392b758046",
"name":"Zagreb, Croatia",
"lat":45.8150108,
"lng":15.981919000000062,
"__v":0,
"modified":"2014-05-27T23:05:05.306Z",
"search_count":1
}
]
The callback interface semantics are not what you think.
err means the query failed entirely due to an error like the DB being unreachable. It has no meaning with regard to whether documents matched or not
obj is an array of results, which you should name "locations" IMHO to keep things clear. If no documents match the query, this array will be empty. If some match, they will be in the array.
So there are 3 total states to consider: error, success with no matches, success with some matches.

Resources