I want to create an autocomplete input box that shows word suggestions as users type.
Basically, my problem is that when I use the $text operator for searching strings in a document, the queries will only match on complete stemmed words. This is for the same reason that if a document field contains the word blueberry, a search on the term blue will not match the document. However, a search on either blueberry or blueberries would match.
find = {$text: { $search: 'blue' } };
^ (doesn't match blueberry or bluebird on a document.)
I want to be able to do this. I want to match 'blueberry' or 'bluebird' with 'blue', and initially I thought this was possible by using a 'starts with' (^) regular expression, but it seems like $text and $search only accepts a string; not a regexp.
I would like to know if there is a way to do this that is not excessively complex to implement/maintain. So far, I've only seen people trying to accomplish this by creating a new collection with the results of running a map/reduce across the collection with the text index.
I do not want to use ElasticSearch or Solr because I think it is overkill for what I am trying to do, and although I sometimes think that eventually I will have no other choice, I still cannot believe that there is not a simpler way to accomplish this.
MongoDB full text search matches whole words only, so it is inherently not suitable for auto complete.
The $text operator can search for words and phrases. The query matches on the complete stemmed words. For example, if a document field contains the word blueberry, a search on the term blue will not match the document. However, a search on either blueberry or blueberries will match.
(Source: http://docs.mongodb.org/manual/core/index-text/)
You can now use Atlas Search natively in MongoDB Atlas to achieve this. You will have to first add the autocomplete field mapping in your index definition before you can use the autocomplete operator to your query. This can be accomplished through the Visual Editor or the JSON editor - there's a tutorial which walks you through how to implement it.
Here's the index definition template from the docs:
{
"mappings": {
"dynamic": true|false,
"fields": {
"<field-name>": [
{
"type": "autocomplete",
"analyzer": "lucene.standard",
"tokenization": "edgeGram|rightEdgeGram|nGram",
"minGrams": <2>,
"maxGrams": <15>,
"foldDiacritics": true|false
}
]
}
}
}
And the query, where you can also specify support for typo-tolerance via the fuzzy parameter:
{
$search: {
"index": "<index name>", // optional, defaults to "default"
"autocomplete": {
"query": "<search-string>",
"path": "<field-to-search>",
"tokenOrder": "any|sequential",
"fuzzy": <options>,
"score": <options>
}
}
}
Related
For the application we are developing we need to allow our searches to support accents, be case insensitive and search for partial words. For example, given the product name "La Niña" in our collection, the following searches should be expected to return the entry:
La Niña
niña
nina
nin
La nin
Currently I have tried two approaches, each with their appear apparent limitations, based on testing and some research:
Regex
supports case insensitive and partial searches
does not support accents such that, niña != nina
Text Search
support case insensitive, accents and partial phrases
does not support partial words
Example regex search, as we have used:
function escapeRegExp(text) {
return text.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}
const escapedStr = this.escapeRegExp(searchTerm);
await Product.find({ name: new RegExp(`${escapedStr}`, 'i') });
Example text search, as we have used:
// On the schema
storeSchema.index({ name: 'text' });
// Searching:
await Product.find($text: { $search: searchTerm })
.collation({locale: 'en', strength: 1});
BTW We have set the schemas in question to use collation strength level 1.
Some approaches I am considering, if MongoDB doesn't provide a solution:
shadow name field (not sure the right term?), with the accents removed
a separate full text search engine
Can anyone help here?
Note, we are leveraging mongoose 5.9.5, with node 12.16.2 and mongodb 4.3.8 running in mongo cloud.
I believe the Text Search is what you need. There are two other features of Text Search that fulfills the requirement of a partial word match you described in the question.
Stop Words: Given a language option, MongoDB Text Search is capable of identifying words that shouldn't influence search results. The frequency of usage of these words is such that they appear in almost every sentence, for example, in English, words like "the", "a", "of", are all stop words. These words are stripped off the search phrase before the actual search takes place.
Word Stemming: Given a language option, MongoDB Text Search is capable of identifying the root version of a word, for example, in English, the stem version of "identifying" would be "identify" so they both would match in a text search".
I was able to figure with Google Translate that the "La Niña" example you gave is in Spanish.
If I insert the following into a sample product collection:
db.products.insertMany([
{ "term" : "La Niña" },
{ "term" : "niña" },
{ "term" : "nina" },
{ "term" : "nin" },
{ "term" : "La nin" },
])
By specifying a language option of "spanish" on my Test Search query:
db.products.find({ $text: { $search: "La Niña", $language: "spanish" } })
MongoDB would effectively match that with all the products that were previously inserted. You can get a list of the supported language options for MongoDB here.
I'm not 100% sure of how the accent matching works though.
I'm trying to find a way to return a document based on wether or not a field is a substring of a given string.
I got a prototype working that basically fetches everything from the collection and then does the needed logic in code. In code I can find what I want by iterating over every document and then returning a document based on search.includes(field). This is obviously not an ideal solution as fetching every document in a collection is an expensive operation that won't scale well.
Next thing I did was looking at text search using MongoDB indexes. This kind of works but it returns documents even if the field isn't a complete substring of the search.
Is there any way I can construct a query that checks if a field on a document is an exact substring of a given string?
As an example, here's three documents similar to those in my collection:
{
"_id": ObjectId("5b893f36e7e6ab1a88f87b39"),
"trigger": "hello",
"response": "World"
}
{
"_id": ObjectId("5b6ca6169cc009573bbc3571"),
"trigger": "stackoverflow",
"response": "Is awesome!"
}
{
"_id": ObjectId("5b6ca6169cc009573bbc3571"),
"trigger": "foo bar",
"response": "barfoo"
}
These are some cases with the output I expect:
The search strings stack or stackexchange should not return any documents as there is no trigger field which is a perfect substring of those.
The string hello stackexchange should get you only the first document as the trigger field is a substring of the search string.
The string hello stackoverflow would get you both documents as they both have a trigger field which is a substring of the search string.
EDIT: The query also has to deal with the fact that the trigger field may contain spaces. So the string foo bar foobar should match the last document but the string foo should not.
Any help is much appreciated!
After a quite a bit of trial and error, I've found a way to achieve what I wanted. By using $indexOfBytes in a $gt, I was able to check if trigger existed as a substring in the search string by seeing if the result of $indexOfBytes was greater than -1. Here is my final Mongoose query:
Collection.find({
$expr: {
$gt: [
{
$indexOfBytes: [
search,
"$trigger"
]
},
-1
]
}
});
I have a collection like this:
{
"_id" : ObjectId("5a7c49b02d2bbb28a4b2e6a2"),
"phone" : "Pinheiro",
"email" : "Pinheiro",
"variableParameters" : {
"loremIpsum" : "Do you see a little Asian child with a blank expression on his face sitting outside on a mechanical helicopter that shakes when you put quarters in it?",
"uf" : "Rio de Janeiro",
"city" : "Rio de Janeiro",
"end" : "RUA JARDIM BOTÂNICO 1060",
"tel" : "5521999999999",
"eml" : "teste#gmail.com",
"nome" : "Usuario de Teste"
}
}
And i want to query the "variableParameters" object, but like the name said, this properties are variable. So in some cases it will have "uf", but in other cases won't.
I'm actually doing a query that only matches the constant field from a mongoose schema:
{ 'phone': { $regex: filter, $options: 'i' } }
Is there any way that I can query "variableParameters" without knowing his child properties?
If you are unsure about the keys(since they are variable), then try using $text search
To use text search we need to index the variableParameters.
Case sensitive text search can also be performed but it comes with the impact on performance.
Please read https://docs.mongodb.com/manual/reference/operator/query/text/ for more information on text search
[SOLVED]
Thanks #Clement Amarnath for the help.
The solution is something like this:
_Events.find({ $text: { $search: 'searchText' } }, (err, events) => {
if (err) return Exceptions.HandleApiException(err, res);
res.send(events);
});
The $text parameter can have this properties:
{
$text: {
$search: <string>,
$language: <string>,
$caseSensitive: <boolean>,
$diacriticSensitive: <boolean>
}
}
$search A string of terms that MongoDB parses and uses to query the text index. MongoDB performs a logical OR search of the terms unless specified as a phrase. See Behavior for more information on the field.
$language Optional. The language that determines the list of stop words for the search and the rules for the stemmer and tokenizer. If not specified, the search uses the default language of the index. For supported languages, see Text Search Languages.
If you specify a language value of "none", then the text search uses simple tokenization with no list of stop words and no stemming.
$caseSensitive Optional. A boolean flag to enable or disable case sensitive search. Defaults to false; i.e. the search defers to the case insensitivity of the text index.
$diacriticSensitive Optional. A boolean flag to enable or disable diacritic sensitive search against version 3 text indexes. Defaults to false; i.e. the search defers to the diacritic insensitivity of the text index.
For more information, see MongoDB Documentation.
You can use $where to build your own matching function.
An example for a full-text match would be:
db.col.find({$where: function(){
return Object.values(this.variableParameters).includes("Rio de Janeiro")
}})
I have documents indexed in elastic cluster with the below mapping. basically i have a field named model which holds car model names like "Silverado 2500HD", "Silverado 1500HD" "LX 350" etc etc.
POST /location-test-no-boost {
"settings":{
"analysis":{
"analyzer":{
"mysynonym":{
"tokenizer":"standard",
"filter":[
"standard","lowercase","stop","mysynonym"
],
"ignore_case":true
}
},
"filter":{
"mysynonym":{
"type":"synonym",
"synonyms": [
"2500 HD=>2500HD",
"chevy silverado=>Silverado"
]
}
}
}
},
"mappings":{
"vehicles":{
"properties":{
"id":{
"type":"long",
"ignore_malformed":true
},
"model":{
"type":"String",
"index_analyzer": "standard",
"search_analyzer":"mysynonym"
}
}
}
}
}
The sample document content is
POST /location-test-no-boost/vehicles/10
{
"model" : "Silverado 2500HD"
}
When i tried to search with Query string "Chevy sivlerado", the synonym matches perfectly to Silverado and gives back the result, on the contrary when i tried to search via query string "2500 HD" it is returning 0 results. I tried different combination on the Synonym involving number and found that elastic search synonym mapper does not support numbers is this correct?
is there any way i can make some mapping when user searches for "2500 HD", i can map the query to "2500HD"
Ok here's your problem:
You try to define a filter that try to merge "2500 HD" into "2500HD" for searching
But the analyzer will work like this:
Perform char_filter first (if any)
Perform tokenizer first, which is standard in your definition, hence "2500 HD" will be split into two terms 2500, HD
Perform filters after that, which will transform terms into 2500, hd. Your filter synonyms will be ignored because none of them matched the passed filter.
So when you query for "2500 HD", you actually search for 2500 and hd. And none of documents matched since the indexed terms is 2500hd.
I prefer you to replace your synonyms with word_delimiter filter, something like this:
"filter":{
"my_delimiter":{
"type":"word_delimiter",
"preserve_original": true
}
}
It will transform your document 2500HD into 2500hd, 2500, hd. And hence it will match the query "2500 HD", which will be transformed into 2500, hd. Please refer the document link to find out more options.
You dont need to define a synonym filter like that. If you actually want to transform like your current definitions, let define another tokenizer instead of using standard tokenizer.
P/S: You can install inquisitor plugin to see how terms will be analyzed: https://github.com/polyfractal/elasticsearch-inquisitor
How do I get elastic search to work to solve a simple autocomplete use case that has multiple words?
Lets say I have a document with the following title - Elastic search is a great search tool built on top of lucene.
So if I use the prefix query and construct it with the form -
{
"prefix" : { "title" : "Elas" }
}
It will return that document in the result set.
However if I do a prefix search for
{
"prefix" : { "title" : "Elastic sea" }
}
I get no results.
What sort of query do I need to construct so as to present to the user that result for a simple autocomplete use case.
A prefix query made on Elastic sea would match a term like Elastic search in the index, but that doesn't appear in your index if you tokenize on whitespaces. What you have is elastic and search as two different tokens. Have a look at the analyze api to find out how you are actually indexing your text.
Using a boolean query like in your answer you wouldn't take into account the position of the terms. You would get as a result the following document for example:
Elastic model is a framework to store your Moose object and search
through them.
For auto-complete purposes you might want to make a phrase query and use the last term as a prefix. That's available out of the box using the match_phrase_prefix type in a match query, which was made available exactly for your usecase:
{
"match" : {
"message" : {
"query" : "elastic sea",
"type" : "phrase_prefix"
}
}
}
With this query your example document would match but mine wouldn't since elastic is not close to search there.
To achieve that result, you will need to use a Boolean query. The partial word needs to be a prefix query and the complete word or phrase needs to be in a match clause. There are other tweaks available to the query like must should etc.. that can be applied as needed.
{
"query": {
"bool": {
"must": [
{
"prefix": {
"name": "sea"
}
},
{
"match": {
"name": "elastic"
}
}
]
}
}
}