Azure search hit-highlighting and match delimiter - azure

I am using hit-highlighting in azure search. It works fine but I want to fine tune it a bit.
Say, a field has the following value:
"It uses period as the delimiter. If not, please clarify"
If I search for "please" I will get a highlight hit on that field, e.g.:
"If not, <em>please</em> clarify"
If I search for "period" I will get a highlight hit on that field, e.g.:
"It uses <em>period</em> as the delimiter."
After trying it with several examples it seems that it uses period (".") as a delimiter so that it doesn't return the whole field.
From another SO question (Hit Highlighting in Azure Search Service) it seems that I cannot configure azure search to return the whole field with all terms highlighted.
I want to ask:
if this is really the case or more complex rules apply
do I have any control of how the field is split for hit highlighting, e.g. change the delimiter to say "," or "\n"
Thanks in advance

Unfortunately there is no way to customize how documents are split for hit highlighting. Feel free to use Azure Search User Voice website to post improvements ideas giving other users opportunity to vote for them and helping us prioritize: http://feedback.azure.com/forums/263029-azure-search
The hit highlighter splits documents into sentences. In general it's fair to assume it breaks on dots but it also handles abbreviations etc.

Related

Delimiting quick-open path with fullstops in Sublime Text 3?

I'm making the move to ST3, and I'm having some trouble. I'd like to be able to delimit the quick-open filepath (⌘ + T) with periods instead of slashes or spaces. However, I can't find the setting to do that.
For example:
component.biz_site_promotions.presentation
should be able to open the file that
component biz_site_promotions presentation
would.
Any help would be greatly appreciated!
There is no setting in Sublime that changes the way this works; the search term is always used to directly match the text in the list items (except for space characters).
Note however that the Goto Anything panel uses fuzzy matching on the text that you're entering, so in many cases trying to enter an entire file name is more time consuming anyway.
As an example, to find the file you're mentioning, you could try entering the text cbspp, which in this case is the first letters of all of the parts of the file name in question.
As you add to the search term, the file list immediately filters down to text that matches what you entered; first only filenames that contain a C, then only filenames that contain a C that is followed somewhere after by a B, and so on.
Depending on the complexity and number of files that you have in your project, you may need to add in a few extra characters to dial in better (e.g. comb_s_pp). Usually this search method will either end you up at the exact file you want, or filter the list so much that the file that you want will be easier to find and select.
Additionally, when you select an item and there was more than one possible match, Sublime remembers which item you selected for that particular search term and brings it to the top of the search results next time you do it, under the assumption that you want the same thing again.
As you use Sublime more (and with different projects) you will quickly get a handle on what partial search terms work the best for you.
In addition to finding files, you can do other things with that panel as well, such as jumping to a specific line and/or column or searching inside the file for a search term and jumping directly to it. This applies not only to the current file but also the one that you're about to open.
For more complete details, there is a page in the Unofficial Documentation that covers File Navigation with Goto Anything
As an extra aside, starting with Sublime Text build 3154, the fuzzy searching algorithm handles spaces differently than previous builds.
Historically, spaces in the search term are essentially ignored and the entire input is treated as one search term to be matched character by character.
Starting in build 3154, spaces are handled by splitting up a single search term into multiple search terms, which are applied one after the other.
This allows multiple search terms to hit out of order. For example, index doc in build 3154 will find doc/index.html, but it won't find it in previous versions because the terms aren't in the right order.
As such, assuming you're not currently using such a build (as of right now it's a development build, so only licensed users have access to it), moving forward if you continue to search the way you're searching in your question, you might start getting more results than you expected.

Azure Search result highlight snippets

I am using the Hit Highlighting feature in Azure Search and noticed a discrepancy in the way it behaves from the documentation. In the documentation it says that when you use hit highlighting it will return a snippet of the field with the highlight, but it always returns the entire field (with proper highlighting).
Is there a way to have Azure Search instead return just a snippet (say of about 200 characters) that includes the highlight?
Currently, the answer is no, you cannot. The field breaks according to (English) sentence rules, ie. it breaks on ".", "!", "?".
Also see this question for an example on breaking and some more info relating to the delimiters.
Depending on the nature of the field you might be able to add one of the above delimiters to 'emulate' what you want to accomplish (as suggested by Nate Ko).
I want to suggest something else on top of what Nate spoke to. When you look at the document response, also take a look at the Highlights part of the results (as opposed to the Document). For example, you might be currently getting the field results by retrieving something like this:
Results[i].Document.DESCRIPTION
If there is a highlight found for that field, the snipped will be found here:
Results[i].Highlights.DESCRIPTION
What I like to do is to first check if there is a valid Highlight and if so display it. If not, I show the actual field content.
Liam
We recently introduced a change that improves the highlighter performance on large fields and NLP experience. One side effect of the change was that the new highlighter generates snippets based on sentences, breaking the text field on '.' (period).
One way to workaround the issue is to put '.'s in the field. We are working to enforce the snippet size and let you know when it is available.

Lucene, Change Search on one file

Question about Lucene,
I have a file that I would like to index and search by different analyzers. My goal is to be able to change how I search.
In one case I would like to search exact phrase with punctuation IE. for "one,two" and only return exact matchings w/ punctuation.
I would also like to be able to search the exact phrase without punctuation. IE. for "one two." As in the StandardAnalyzer
Essentially I need to change the search functionality on one field.
How can I change the search on the same file. Ive tried using two analyzers (standard and whitespace) however this makes the indexing time very long.
My second thought is to use just a WhitespaceAnalyzer and when searching pass a query that further tokenizes each string if needed? However I am not sure which API has this if any do.
Also is there a good reading on how analyzers and tokens work and are implemented.
Thanks
What do you mean you tried two analyzers? Duplicate the content to 2 seperate fields with different analyzers? That would be my suggestion.

Does Google offer the ability to ban results systematically from certain sources without the -site string?

I know the topic of removing www.experts-echange.com has been beaten to death but having to type -site:www.experts-exchange.com is tedious. Even the ability to auto add strings to a query would solve this problem. I can probably wrap this into some wget mess but this seems like basic functionality many users would base their search engine of choice on. If you have discovered some easy method to do this for yourself please let me know.
I imagine there is a really slick toolbar that feeds google your text plus the additional strings you choose. There is some internal limit to the number of words and or operators Google searches process (with good reason I suppose).
The Google Custom Search API allows you to include or exclude sites from your search. You can add a custom search engine to your iGoogle home page.
Google custom search: http://www.google.com/coop/cse/
2 easy ways in Firefox:
Write a Grease Monkey script.
Use a search keyword. You type the keyword plus a string in the address bar to trigger a search. In this case the URL is http://www.google.com/search?q=-site:expertsexchange.com%20%s. To search on the site your tab is currently in, use javascript:location='http://www.google.com/search?num=100&q=site:'%20+%20escape(location.hostname)%20+%20'%20%S'%20;%20void%200
As an alternative to excluding results, I have a greasemonkey script that highlights google search results by domain. I configure subtle colors for a few sites of interest to me, like wikipedia & stackoverflow. But I use red for expertsexchange, which allows me to visually skip right over it.
I can publish my script if there is interest...
If you want to whip up your own script, you need to operate on two kinds of elements. Here are the two XPath expressions that I use:
//cite[contains(., '" + domain + "')]/ancestor::li[1]
//span[#class='a'][contains(., '" + domain + "')]/ancestor::div[#class='g']
Then I just apply background-color styles to matching elements. Pretty straight forward.

Semantic difference between "Find" and "Search"?

When building an application, is there any meaningful difference between the idea of "Find" vs "Search" ? Do you think of them more or less as synonymous?
I'm asking in terms of labeling for application UI as well as API design.
Finding is the completion of searching.
If you might not succeed in finding something, call the feature "Search". For example text search in an editor can fail due to no matches - then calling it "Find" would be lying.
On the other hand: in an established job searching site, you can say "Find a PHP job" because you know that for (almost) anything your users want, there will be offerings. This also makes it sound confident, positive and energetic.
According to Steve Krug in Don't Make Me Think, when talking about usability for a publicly-facing web site, use the word Search for a search box and nothing else. (He specifically prohibits "Find", "Quick Find", "Quick Search", and all variations.)
The rationale is that "Search" is the most commonly understood term, so it's what people will look for when they aren't thinking, and you don't want your users to have to think (at all).
I would say that "find" is focused on getting a single, exact match. As in the example above, you "find" the perfect PHP job.
OTOH, you "search" for jobs that meet your criteria. Searching is what you do when you want to graze through several results. "Search" returns pages of results. "Find" is closer to "I'm feeling lucky."
Of course, the terms get used interchangeably sometimes. But, I think that's the essence of the difference.
In many applications, find means "find on the current page/screen", while search means "search the entire database/Internet." Web browsers, online help, and other applications seem to make this distinction.
Within most applications...
Find typically refers to locating text within the document at hand and jumps to the next occurrence.
Search typically refers to locating multiple documents (or other objects) and returns a list.
I wrote the built-in Find command in Acrobat 1.0 and worked on the full text Search engine for Acrobat 2.0 and 3.0.
Most software at that point that handled large amounts of text had a way to locate an exact match to a single word or phrase and called it Find/Find Next. This is what we called it in Acrobat 1.0. We knew from the start that this wasn't enough to handle entire repositories of documents, so we needed a way to scan across a whole set. We couldn't use Find since that was already in the UI and had established behavior, so we settled on Search. The decision was based on little more than the relatively small set of common words that convey the action.
Even harder is to come up with a reasonable icon for it. Our initial take was to use something similar to the old Yellow Pages logo:
(source: yellowpagecity.com)
but the lawyers shot that down - it was too close. We couldn't use a magnifying glass as we had zoom functions tied to that. We went with binoculars.
I don't think that there is any difference.
But then again, I'm Portuguese. :P
Find = Discover exact
Example: We write "Please find attached" in an email. We don't write "Please search attached".
Search = Discover exact + Related match
Example: Google Search
"Seek and ye shall find"
"Search and you will find"
One angle that (surprisingly) no one has mentioned, is that in English when you say you search something, that something is the thing you're searching within, not the thing you're trying to find. So unless you add the word 'for' (as in, to search for something), the two words are fundamentally different.
It becomes obvious with an example:
Find the room.
Search the room.
Two very different tasks! The first defines the object of your search. The second defines the scope of your search.
That's not completely irrelevant when talking about UIs. If your app has a search feature where the user can specify both the source and the object of their search, you might choose to use the words this way. For example:
Search: Current document
Find: "positive and energetic"
Yes, as some others have pointed out, the word 'Find' does imply a successful search, but let's not start calling app designers liars for using it when success isn't guaranteed. It's become a pretty standard term for searching a document for a particular string.
I think search is more generic and more suitable for text search. Find sounds more like 'find a specific record or a group of records'
After searching You find something.
Search for an answer on stackoverflow that you may find it.
For me Find is the success of a Search, that is to Find is to identify the location of something that's known to exist.
Search should always be used when you have no control on what the user is looking for.
Find talks about a specific one.
Search does not talk about a specific one.
Did you find the picture I requested yet?
No? Please search on internet. I need to present it in an hour.
Another one is below
Please find the attachment in this email.
(or)
You'll find the attachment below.
(or)
Please find attached.
here, we use find because it is a specific document which is attached to email.
we don't use the search here, as there is nothing to search in a larger domain.
Search is the primary interface to the Web for many users. Search should be global (not scoped to a subsite) and available from every page; booleans should be made intimidating since users usually use them wrong
Read this: https://www.nngroup.com/articles/search-and-you-may-find/

Resources