I am working on a search feature in which I have to perform search operation in about 300,000 documents.
For this I have created a compound index over four fields and have given weight to them as well. By default if I search for a phrase having multiple words then mongoose searches for all the keywords with OR operation. Eg:- If you search for small cell lung then mongoose will search all document in which either one of these is available.
It is working very fast.
But my requirement is to perform AND operation. To achieve this I split all the words in a phrase and then put them in double quotes.So when user searches for a phrase having multiple words, search operation is performed as AND operation on each word. Eg:- If you search for small cell lung ("small" "cell" "lung") then it should find all those documents in which all are available. It is also working but it is very slow now.
Is there any way to make it faster.
I will share the code if required.
Thanks
Give a shot to this :
db.table.find("text", {search:"\"small\" \"cell\" \"lung\""})
Now above code will do the following :
If the search string includes phrases, the search performs an AND with
any other terms in the search string; e.g. search for "\"kiss me on
\" cheeks lips" searches for "Kiss Me on" and ("cheecks" or
"lips").
You can read it from here:
Docs
When you want to match the complete phrase, enclose the phrase in escaped double quotes:
db.table.find( { $text: { $search: "small cell lung" } } )
Also see the $text documentation on the mongodb site
Related
I currently use Typesense to search in an HTML database. When I search for a term, I would like to retrieve N characters before and N characters after the term found in search.
For example, I search for "query" and this is the sentence that matches:
Let's repeat the query we made earlier with a group_by parameter
I would like to easy retrieve a fixed number of letters (or words) before and after the term to show it in a presumably small area where the search results is retrieved, without breaking any words.
For this particular example, I would be showing:
..repeat the query we made earlier..
Is there a feature like this in Typesense?
I have checked Typesense's documents, without any luck.
The feature you're referring to is called snippets/highlights and it's enabled by default. You can control how many words are returned on either side of the matched text using the highlight_affix_num_tokens search parameter, documented under the table here: https://typesense.org/docs/0.23.1/api/search.html#results-parameters
highlight_affix_num_tokens
The number of tokens that should surround the highlighted text on each side. This controls the length of the snippet.
If user searches for "John Banglore" or "Banglore John" then they should get John from Banglore at first position in result and then other related results.
how to do it ?
UserModel.js{ name:String, city:String }
and I have tried by splitting query string then converting to regex and then search but it is not returning John from Banglore at first position in result.
Backend - Node.js
Database - MongoDB
Module - mongoose
This could be solved with a simple approach for your specific use case
Where you split the name on the space character and do a set of regexes based on that.
Although that solves this case. You will likely run into other cases where the fuzzy matching capabilities are limited. It may be better to look into a Inverted Index such as elastic search for enhanced search capabilities.
const searchedWord = req.body.searchTerm;
console.log(searchedWord);
db.collection('subtitle')
.find({
$text: { $search: searchedWord },
})
Here is my code it takes a search word coming from the user and searches through all documents and returns the results. but the thing is it is case sensitive plus returns all the documents containing the world. if you search for "happen", some other words like "happened" and "happens" also return.
I just want to make it case insensitive and exact word.
I used regex but it does not work when my entry is dynamic like this.
all the MongoDB documentation is about a hardcoded word for search.
MongoDB Text Indexes use language-specific stemming rules.
When using english, suffixes are remove and the stem word is indexed, so "happens", "happened", and "happening" are all stored in the index as "happen".
To disable stemming, explicitly specify the language as "none".
I'm using solr for an enterprise application. So far it works well, as I am using a ngram field to search against. It works correctly for partial queries (match against indexed ngrams). But the problem I have is, how to enforce exact query matches?. For an example the query "Test 1" should match exactly the same text as it is when the user enter it with double quotation marks. Currently Since I have used some tokenizers and filters, the double quotation marks get filtered out, there's no difference in the queries "test 1", "tEst 1" or "TEST 1" (that is because of the analyzer chain I use, but it is needed to work with ngrams and partial search).
Currently I'm searching against a ngram query field. In order to enforce exact query match, what should I do? what is the best practice?. currently what I think is to identify the double quotation marks from client side and change the query field to the original field (with out ngrams). But I feel like there should be a better way of doing this, since the problem I have is generic and solr is a complete enterprise level search engine.
You can have another field for it and add string as the fieldType for the same and index it with same.
When you want to perform the exact match you can query on the above field.
And when you want to perform partial search ..you can query to the earlier field which is indexed by ngram.
OR.. Here is another way you can try.
You have defined the current field type using the ngram. In that while indexing you can define the ngram tokenizer and for the query you mention keywordTokenizer and lowercase filter factory only.
While indexing the text will be tokenized and while performing the query it will not.
I have stemming enabled in my Solr instance, I had assumed that in order to perform an exact word search without disabling stemming, it would be as simple as putting the word into quotes. This however does not appear to be the case?
Is there a simple way to achieve this?
There is a simple way, if what you're referring to is the "slop" (required similarity) as part of a fuzzy search (see the Lucene Query Syntax here).
For example, if I perform this search:
q=field_name:determine
I see results that contain "determine", "determining", "determined", etc.. If I then modify the query like so:
q=field_name:determine~1
I only see results that contain the word "determine". This is because I'm specifying a required similarity of 1, which means "exact match". I can specify this value anywhere from 0 to 1.
Another thing you can do is index the same text without stemming in one field, and with stemming in another. Boost the non-stemmed field & that should prefer exact versions of words to stemmed versions. Of course you could also write your own query parser that directs quoted phrases to the non-stemmed field only.