I use Azure Search which in turn uses Lucene. Is there any way to make search not that strict.
What I need is when searching for "term" should match documents with terms that contain "term".
Serching fox term should match "PrefixTerm", "TermSuffix", "PrefixTermSuffix"
Serching fox part2 should match "part1part2", "part2part3", "part1part2part3"
I need to run search query which has several terms like
"term part2"
To match documents like:
{ someField:"... PrefixTermSuffix ... part1part2part3 ..." }
{ someField:"... PrefixTerm ... part2part3 ..." }
etc
You can use regex expression in Lucene query syntax in Azure Search. In your example, you can construct a regex query like /.term./ /.part2./ to find the documents with terms that contains the two search terms as substrings.
https://[service name].search.windows.net/indexes/[search index]/docs?api-version=2016-09-01&queryType=full&search=/.*term.*/ /.*part2.*/
Azure Search supports two query syntaxes, simple and full. The latter enables the Lucene query syntax. Please see our documentation (https://learn.microsoft.com/en-us/rest/api/searchservice/lucene-query-syntax-in-azure-search) to learn more about the capabilities.
Related
I am working on the below query and trying to implement an ArangoDB wildcard search. The criteria is very simple, I'd like to match records similar to the name or a number field and limit the records to 25. The query works but is very slow, taking upwards of 30seconds. The goal is to optimize this query and get it as close to sub second as possible. I'd like the query to function similar to how a MySQL LIKE would work, matching using the % wildcard on both sides.
https://www.arangodb.com/docs/stable/release-notes-new-features37.html#wildcard-search
Note, one thing I noticed is that in the release note examples, rather than using FILTER, they are using SEARCH.
Additional info:
name is alphanumeric
number is going to by an 8 digit number
LET str = CONCAT("%", 'test', '%")
LET search = (
FOR doc IN name_search
FILTER ANALYZER(doc.name LIKE str, "text_en") OR
FILTER ANALYZER(doc.number LIKE str, "text_en")
LIMIT 25
RETURN doc
)
RETURN SEARCH
FILTER doesn't utilize indices. To speedup your wildcard queries you have to create an ArangoSearch view over a collection and use SEARCH keyword.
Feel free to check the following interactive tutorial (see "LIKE Support" section):
https://www.arangodb.com/learn/search/arangosearch-tutorial-3-7/
I want to find all records containing the pattern "170629-2" in Azure Search explorer, did try with
query string : customOfferId eq "170629-2*"
which only give one result back, which is the exactly match of "170629-2", but i do not get the records which have the patterns of "170629-20", "170629-21" or "170629-201".
Two things.
1-You can't use standard analyzer as it will break your "words" in two parts:
e.g. 170629-20 will be breaked as 170629 and another entry as 20.
2-You can use regex and specify the pattern you want:
170629-2+.*
https://learn.microsoft.com/en-us/azure/search/query-lucene-syntax#bkmk_regex
PS: use &queryType=full to allow regex
I've been trying to create a filter matching the end of the whole field text.
For example, taking a text field with the text: the brown fox jumped over the lazy dog
I would like it to match with a query that searches for fields with values ending with g. Something like:
{
"search":"*",
"queryType":"full",
"searchMode": "any",
...
"filter":"search.ismatchscoring('/g$/','MyField')"
}
The result is only records where MyField contains values with words composed by a the single g character anywhere on the string.
Using the filter directly also produces no results:
{
"search":"*",
"queryType":"full",
"searchMode": "any",
...
"filter":"MyField eq '*g'"
}
As far as I can see, the tokenization will always be the base for the search and filter, which means that on the above query, $ is completely ignored and matches will be by word, not by field.
Probably I could use the keyword_v2 analyzer on this field but then I would lose the tokenizarion that I use when searching normally.
One possible solution could be defining a second field in your index, with the same value as ‘MyField’, but with a different analyzer (e.g. keyword_v2). That way you may still search over the original field while filtering over the other.
Regardless, you might have simplified the filter for the sake of the example, but otherwise it seems redundant to use search.ismatchscoring() when not combining with another filter clause via ‘or’ – one can use the search parameter directly.
Moreover, regex might not be working because the default queryType for search.ismatchscoring() is simple, not full - please see docs here
I have a fields in index with [Analyzer(<name>)] applied. This analyzer is of type CustomAnalyzer with tokenizer = Keyword. I assume it treats both field value and search text as one term each. E.g.
ClientName = My Test Client (in index, is broken into 1 term). Search term = My Test Client (broken in 1 term). Result = match.
But surprisingly that's not the case until I apply phrasal search (enclose term in double quotes). Does anyone know why? And how to solve it? I'd rather treat search term as the whole, then do enclosing
Regards,
Sergei.
This is expected behavior. Query text is processed first by the query parser and only individual query terms go through lexical analysis. When you issue a phrase query, the whole expression between quotes is treated as a phrase term and as one goes through lexical analysis. You can find a complete explanation of this process here: How full text search works in Azure Search.
I am using Azure Search and trying to perform a search against documents:
It seems as though doing this: /indexes/blah/docs?api-version=2015-02-28&search=abc\-1003
returns the same results as this: /indexes/blah/docs?api-version=2015-02-28&search=abc-1003
Shouldn't the first one return different results than the second due to the escaping backwards slash? From what I understand the backwards slash should allow for an exact search on the whole string of "abc-1003" instead of doing a "not" operator.
(more info here: https://msdn.microsoft.com/en-us/library/azure/dn798920.aspx)
The only way I can get it to work is by doing this (note the double quotes): /indexes/blah/docs?api-version=2015-02-28&search="abc-1003"
I would rather not do that because that would mean making the user enter in the quotes, which they will not know how to do.
Am I expecting something I shouldn't or is it possibly a bug with Azure Search?
First, a dash not prefaced by a whitespace acts like a dash, not a negation operator.
As per the MSDN docs for simple query syntax
- Only needs to be escaped if it's the first character after whitespace, not if it's in the middle of a term. For example, "wi-fi" is a single term
Second, unless you are using a custom analyzer for your index, the dash will be treated by the analyzer almost like white-space and will break abc-1003 into two tokens, abc and 1003.
Then when you put it in quotes"abc-1003" it will be treated as a search for the phrase abc 1003, thus returning what you expect.
If you want to exact match on abc-1003 consider using a filter instead. It is faster and can matching things like guids or text with dashes
The documentation says that a hyphen "-" is treated as a special character that must be escaped.
In reality a hyphen is treated as a split of the token and words on both sides are searched, as Sean Saleh pointed out.
After a small investigation, I found that you do not need a custom analyzer, built-in whitespace would do.
Here is how you can use it:
{
"name": "example-index-name",
"fields": [
{
"name": "name",
"type": "Edm.String",
"analyzer": "whitespace",
...
},
],
...
}
You use this endpoint to update your index:
https://{service-name}.search.windows.net/indexes/{index-name}?api-version=2017-11-11&allowIndexDowntime=true
Do not forget to include api-key to the request header.
You can also test this and other analyzers through the analyzer test endpoint:
{
"text": "Text to analyze",
"analyzer": "whitespace"
}
Adding to Sean's answer, a custom analysis configuration with keyword tokenizer and a lowercase tokenfilter will address the issue. It appears that you are using the default standard analyzer which breaks words with special characters during lexical analysis at indexing. At query time, this lexical analysis applies to regular queries, not wildcard search queries. As a result, with your example, you have and <1003> in the search index and the wildcard search query that wasn't tokenized the same way and looks for terms that start with abc-1003 doesn't find it because neither terms in the index starts with abc-1003. Hope this makes sense. Please let me know if you have any additional questions.
Nate