How i can search by part of word with dismax?
For example when my query is "wor" i want get results with "word" "world" "adwords" etc. fields values.
Is it possible?
Check for EdgeNGramFilterFactory filter
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="25" side="front"/>
EdgeNGramFilterFactory generates edge grams for the token e.g.
word would generate -> wo, wor, word .....
You can use this at index time to generate the tokens.
So when you search for wor, documents with word would match
However, if you want to search in the mid of the words check for NGramFilterFactory
Related
Hello guys
how could I count number of highlighted words in Algolia search including the duplicate word in the same hit
Algolia only tracks the number of records that match, not the number of highlighted words. You'd have to traverse the hits object yourself to calculate that information based on the <em> tags in each _highlightResult.
What are you using it for?
Situation
I need to create a live search with MongoDB. But I don't know, which index is better to use normal or text. Yesterday I found main differences between them. I have a following document:
{
title: 'What vitamins are found in blueberries'
//other fields
}
So, when user enter blue, the system must find this document (... blueberries).
Problem
I found these differences in the article about them:
A text index on the other hard will tokenize and stem the content of the field. So it will break the string into individual words or tokens, and will further reduce them to their stems so that variants of the same word will match ("talk" matching "talks", "talked" and "talking" for example, as "talk" is a stem of all three).
So, Why is a text index, and its subsequent searchs faster than a regex on a non-indexed text field? It's because text indexes work as a dictionary, a clever one that's capable of discarding words on a per-language basis (defaults to english). When you run a text search query, you run it against the dictionary, saving yourself the time that would otherwise be spent iterating over the whole collection.
That's what I need, but:
The $text operator can search for words and phrases. The query matches on the complete stemmed words. For example, if a document field contains the word blueberry, a search on the term blue will not match the document. However, a search on either blueberry or blueberries will match.
Question
I need a fast clever dictionary but I also need searching by substring. How can I join these two methods?
In Azure Search , Is there a way we can get exact match result of multiple words?
If i Search for word "Coca Cola Millenials". Can i get the result from results of azure matching the word "Coca Cola Millenials"
Are you asking if you can search for the phrase "Coca Cola Millenials"? Yes, you can. Surround the phrase with quotes as you did in this question.
From our documentation:
The phrase operator encloses a phrase in quotation marks. For example,
while Roach Motel (without quotes) would search for documents
containing Roach and/or Motel anywhere in any order, "Roach Motel"
(with quotes) will only match documents that contains that whole
phrase together and in that order (text analysis still applies).
Hope that helps
I have 4842 documents with a sample format
{"ID":"12345","NAME":"name_value","KIND":"kind_value",...,"Secondary":{...},"Tertiary":{...}} where “...” are a few more varying number of key value pairs per object
I have indexed KIND as a full text index using - db.collection.ensureFulltextIndex("KIND") before inserting data.Also, KIND is just a one word string. ie. without spaces
Via AQL following queries were executed:
FOR doc IN FULLTEXT(collection, 'KIND', 'DeploymentFile') RETURN doc --> takes 3.54s (avg)
FOR doc IN collection FILTER doc.KIND == 'DeploymentFile' RETURN doc --> takes 1.16s (avg)
2944 Objects returned in both queries
Q1. Assuming that we have used a fulltext index and I haven't hash indexed KIND, shouldn't the query using FULLTEXT function be faster than the normal == operation (since == doesn't utilize the full text index). If so, what am I doing wrong here?
Q2. Utilizing the fulltext index, can i perform a query which does a CONTAINS string or LIKE string?
---UPDATE Q2.The requirement is searching for a substring within a parent string (which is only one word). The substring can lie anywhere within the parent string. (SQL equivalent of LIKE '%text%')
Q1: The fulltext index does allow for more complex query. It splits the text at word breaks and checks if a word occurs within a larger text. All of these features are not needed in your example. Therefore it generates more overhead than it is saving.
In your example it would be better to create a skip-list or hash-index and search for equality.
Q2: In the simplest form, a fulltext query contains just the sought word. If multiple search words are given in a query, they should be separated by commas. All search words will be combined with a logical AND by default, and only such documents will be returned that contain all search words. This default behavior can be changed by providing the extra control characters in the fulltext query, which are:
+: logical AND (intersection)
|: logical OR (union)
-: negation (exclusion)
Examples:
"banana": searches for documents containing "banana"
"banana,apple": searches for documents containing both "banana" AND "apple"
"banana,|orange": searches for documents containing either "banana" OR "orange" OR both
"banana,-apple": searches for documents that contains "banana" but NOT "apple".
Logical operators are evaluated from left to right.
Each search word can optionally be prefixed with complete: or prefix:, with complete: being the default. This allows searching for complete words or for word prefixes. Suffix searches or any other forms are partial-word matching are currently not supported.
Examples:
"complete:banana": searches for documents containing the exact word "banana"
"prefix:head": searches for documents with words that start with prefix "head"
"prefix:head,banana": searches for documents contain words starting with prefix - "head" and that also contain the exact word "banana".
Complete match and prefix search options can be combined with the logical operators.
Can any one suggest me the best way to get Hits( no of occurrences ) of a word per document in Lucene?..
Lucene uses a field-based, rather than document-based, index.
In order to get term counts per document:
Iterate over documents using IndexReader.document() and isDeleted().
In document d, iterate over fields using Document.getFields().
For each field f, get terms using getTermFreqVector().
Go over the term vector and sum frequencies per terms.
The sum of term frequencies per field will give you the document's term frequency vector.
SpanTermQuery.getSpans will give an enumeration of docs and where the terms appears. The docs are sorted, so you can just count the number of times each doc appears, ignoring the position info.