Azure search, search by partial terms - azure

Here are two examples for search in the portal, where I would expect to get some results in the second search, even with one letter missing.
The search is in Hebrew language
The full term return some results,
The same term with one letter missing return no results,

There are a few ways you can search for partial terms in Azure Search. You'll need to decide which of the following methods will work best in your scenario. Based on the example it seems either fuzzy search or prefix search will do the job. You can learn about the differences between the these methods in the documentation.
Fuzzy search: blog, documentation
Wildcard search, specifically prefix search: documentation
Regular expression search: documentation
Index partial terms by defining a custom analyzer: blog, documentation
Let me know if you have any questions about any of the above

Check this answer I solve this using a regex and change the GET by a POST request.

Related

Azure Search Spell mistake handling

My application is in MVC and I have a search text box based on azure index, its working fine however I have a scenario where in if I enter "chres Harris" instead of "chris Harris" it should return result "chris Harris" but it returns different result like "bob Harris" and then "chris harris", I want the results to be nearabout the same even if there is a spell mistake, please help will any index scoring profile, parameter boost or something useful ?
As of now there are two ways you can handle spelling mistakes in Azure Search.
Use fuzzy queries with Lucene query language. You can boost relevance of exact matches over fuzzy matches for example, search=term^2 OR term~2.
If you deal with names of things, like in your example, configure your index to support phonetic search. Different boosting options to influence relevance are described in the article.
Let me know if none of them don't works for you.

Fuzzy Search in the Search API

The Azure search api offers a fuzzy paramter for suggestions. like this:
https://blssuggestions.search.windows.net/indexes/cities/docs/suggest?api-version=2015-02-28&suggesterName=default&fuzzy=true&search=berlen
Would return "Berlin" as a result of berlen.
I can't find a documentation about this how to activate it in a normal search
setting there fuzzy = true seems to not change anything
https://blssuggestions.search.windows.net/indexes/cities/docs?api-version=2015-02-28&search=berlen&fuzzy=true
[Update]: Please see the other responsed about using querytype=full as this response is no longer correct.
This is correct. Fuzzy search is only available currently in the suggestions api.
You need to call:
https://blssuggestions.search.windows.net/indexes/cities/docs/suggest?api-version=2015-02-28&suggesterName=default&queryType=full&search=berlen~
You were missing querytype=full and the tilde after the character that you want to execute fuzzy searches on.
This is now in the preview version of the api:
https://{yourSite}.search.windows.net/indexes/{yourIndex}/docs?search={fieldToSearch}:{lookupValue}~&queryType=Full&api-version=2015-02-28-preview
Note the ~ and queryType=Full, both of which are required to force fuzzy matching.
Documentation is here:
https://msdn.microsoft.com/library/azure/mt589323.aspx
CAVEAT: The fuzzy search is very fuzzy! i.e. dog will match any 3 letter word with only a single matched letter - dim, now, bag
I am trying to figure out how to tune and tweak but as it is still in preview the documentation is sparse.
UPDATE: I just re-read the documentation and it has since been updated with details of an optional distance parameter. I will investigate.

Search option to only return articles about people on Wikipedia

I am trying to search for only people from Wikipedia and return them in some format (ideally using regex, but a simpler search is okay).
The following query is close, but doesn't allow me to include a specific search query and it appears to only included dead people (well I believe historic figures).
http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=wikipedia&srprop=timestamp&eititle=Template:Persondata
The following query works although I can't seem to limit the results to people only.
http://en.wikipedia.org/w/api.php?action=query&list=embeddedin&eititle=Template:Persondata&eilimit=100&format=xml&redirects
API sandbox |
You want to use Wikidata APIs for semantic searches. Example search for P31 → 5 ("is a human"), using the Wikidata Query Service: http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A5%5D

Exact match in google search

I am trying to make an application which find all the copied code in a project.
But basically my question is purely related to google search.
I made a search for the keyword "public void bubbleSort(int[] arr){"
and this was the result.
In the first page of search results, only the last url makes a perfect match with my keyword.
Can i tell google with some search keywords so that it will give more importance to pages with an exact match of my search keyword?
although the plus sign, +, is no longer an available Google search filter, you can use quotes, or after running the query selecting Search Tools and then verbatim under the All Results drop down.
You can also search the Google code archives, https://code.google.com/ or try some of the other code search engines around the Internet.
+"public void bubbleSort(int[] arr){"
the plus sign means to include this term no matter what. the quotes turn the loosely coupled words into a single term.
for a full list of Google syntax operators:
[web]: https://support.google.com/websearch/answer/136861?hl=en

How to find a match within a single term using Lucene

I am using the Lucene search engine but it only seems to find matches that occur at the beginning of terms.
For example:
Searching for "one" would match "onematch" or "one day a time" but not "loneranger".
The Lucene doc says it doesnt support wildcards at the front of a search string so I am not sure whether Lucene even searches inter-term matches or only can match documents that start with the search term.
Is this a problem with how I have created my index, how I am building my search query or just a limitation of Lucene?
Found some info in another post here on Stack Overflow [LUCENE.NET] Leading wildcard throws an error"
You can set the SetAllowLeadingWildcardCharacters property on your Query Parser to allow leading wildcards during your search. This will of course have the obvious large performance impact but will allow user to find matches within a search term.
Lucene will find a document if the search term appears anywhere within it, but it doesn't allow you to do wildcard queries where the wildcard is on the front of the search term, because it performs horribly. If that is functionality you care about, you will either have to do some low-level Lucene hacking change a config flag (thanks for the interesting link), find a third-party library that has already done that hacking, or find a different search implementation (for small enough datasets, the built in search from a lot of RDBMS engines is sufficient).
Your query should be
"Query query = new WildcardQuery(new Term("contents", "*one *"));"
where contents is the field name in which you are searching.
"one" should be enclosed with asterisk mark. I have given space in the query after *one but there should not be any space. without space the * is not displaying that is why I added star.

Resources