Solr/Lucene behaves weird with some word searches - search

I have Solr installed with default configuration (out of box). I have a word "alternatives" in the index. Search for any of the following gives empty results:
1. name:alterna
2. name:alterna
3. name:alterna*
4. name:*altern
Obviously, I am expecting to find that entry given any part of the word "alternatives"
Anybody with such an experience???

Lucene's (and Solr's) default query syntax searches for full terms. This rules out your searches number 1,2, and 4. Number 3 should have worked.
You can debug all of the cases using Solr's analysis admin screen. See also Debugging Search Application Relevance Issues.

I did use Admin screen to debug. #3 should have worked and it does not. Is this related to stemming feature? #3 also gives zero matches. It works for
alt*, alte*, alter*, altern* does not work for alterna*, alternat*, alternati*, alternativ* and then works for alternative* and alternatives*
Thanks

Related

SharePoint: Search box not using search settings

I'm having a problem that just randomly occured, and I have no idea what caused this and how to fix it.
So I have a searchbox that is supposed to use my search settings. Here are my current search settings:
Settings
Settings 2
My problem is that even though I configure my searchbox to use this sites' search settings, it ignores it. Instead of using /results.aspx?u={searchboxQuery} as specified in the settings, it uses /results.aspx?k={searchboxQuery}
I'm also using a custom Result Source:
{searchboxQuery} Path:https://mypage/Pages
The searchbox works if you provide exact matches, but if you try to shorten a word, no result is displayed. The result query itself works correctly, displaying all the correct pages, but not with search queries.
Any help would be highly appreciated, it's quite an annoying problem
I remember running into this a while back. I ended up using a Search Results template from the Pages library (Publishing needs to be turned on. Trying to create the connections myself was too much of a headache.

Verbatim search in azure/cognitive/bing web search (API, not website)

I cannot find any option to achieve a verbatim azure/cognitive/bing Web search.
In my case the difference is trying to sift through tens of millions of irrelevant search results to find the 10 results that actually match my query literally.
Even though I am a paying customer, there is no support available. And the API documentation did not help either.
I would think it should be super easy to provide a verbatim search option. Is there one that I did not see?
I checked further and it seems for the Bing Search APIs - +"phrase" works and returns documents containing this phrase at the top. Just add + in front of what you have been trying. Support link is here: https://azure.microsoft.com/en-us/support/plans/.

Wikipedia wildcard search not working?

I'm trying to do a wildcard search on Wikipedia but the search is not behaving the way the instructions say it should. Here's the advanced search help page:
https://en.wikipedia.org/wiki/Help:Advanced_search
As an example, it says this regarding a Wildcard search:
the query *stan will match Kazakhstan or Afghanistan or Stan Kenton.
However, when I attempt to do that search (or even click on the embedded link to that search), I only get
the page *stan does not exist
and it just lists a bunch of "Stan" entries starting with "Stan Laurel filmography."
Why would this feature not work? Am I missing something?
It does work, however because direct matches for "stan" are scored higher than words with it, Kazakhstan is waaaay down in results. You can try slightly narrowing the results with intitle:*stan however this is still bad. However, a quick check with k*stan shows that it works.
Conclusion: user-written help page has a bad example.

Solr behind Drupal returns too many results for specific query

We've got Solr sat behind one of our client's Drupal 7 websites, and while it's working well, it returns too many results for what should be quite specific queries. (It also has relevance/weighting problems; but I'm hoping that solving this problem will remove the - literally - irrelevant results.)
For example, searching for the phrase 'particular phrase in london' should return the node with that as its title, quite high up; I don't even think that any other content should be returned. But I find that it's returning lots of content, purely on the fact that it mentions "London"!
Frivolously, searching for the ridiculous phrase 'piecrusts in london' returns a lot of results too, apparently just because they mention London. No content on the site mentions actual piecrusts.
When I search for 'particular phrase in london', here are the parameters that end up in the catalina.out log on the server (whitespace added for clarity):
{spellcheck=false&facet=true&f.im_field_health_topic.facet.mincount=1
&facet.mincount=1&f.ds_created.facet.date.gap=%2B1YEAR
&spellcheck.q=particular+phrase+in+london
&qf=taxonomy_names^2.0&qf=path_alias^5.0&qf=content^40&qf=label^21.0
&qf=tos_content_extra^1.0&qf=ts_comments^20&qf=tm_vid_3_names^200
&facet.date=ds_created
&f.ds_created.facet.date.start=1970-01-01T00:00:00Z/YEAR
&f.bundle.facet.mincount=1&hl.fl=content,ts_comments
&json.nl=map&wt=json&rows=10&fl=id,entity_id,entity_type,bundle,bundle_name,
label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,
tos_name,tm_node,zs_entity
&start=0&facet.sort=count&f.bundle.facet.limit=50&q=special+phrase+in+london
&f.ds_created.facet.date.end=2012-01-01T00:00:00Z%2B1YEAR/YEAR
&bf=recip(ms(NOW,ds_created),3.16e-11,1,1)^150.0
&facet.field=im_field_health_topic&facet.field=bundle
&f.im_field_health_topic.facet.limit=50&f.ds_created.facet.limit=50}
hits=1998 status=0 QTime=14
Note that these parameters have been built by Drupal's Apache Solr module; I don't believe we've got any particular custom code of our own that's doing anything to it.
This corresponds to the following URL, if entered directly in the browser:
http://example.com:8081/solr/CLIENT/select?spellcheck=false&facet=true&f.im_field_health_topic.facet.mincount=1&facet.mincount=1&f.ds_created.facet.date.gap=%2B1YEAR&spellcheck.q=particular+phrase+in+London&qf=taxonomy_names^2.0&qf=path_alias^5.0&qf=content^40&qf=label^21.0&qf=tos_content_extra^1.0&qf=ts_comments^20&qf=tm_vid_3_names^200&facet.date=ds_created&f.ds_created.facet.date.start=1970-01-01T00:00:00Z/YEAR&f.bundle.facet.mincount=1&hl.fl=content,ts_comments&json.nl=map&wt=json&rows=10&fl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,tm_node,zs_entity&start=0&facet.sort=count&f.bundle.facet.limit=50&q=particular+phrase+in+London&f.ds_created.facet.date.end=2012-01-01T00:00:00Z%2B1YEAR/YEAR&bf=recip(ms(NOW,ds_created),3.16e-11,1,1)^150.0&facet.field=im_field_health_topic&facet.field=bundle&f.im_field_health_topic.facet.limit=50&f.ds_created.facet.limit=50
This URL returns nearly 2000 results - that's most of the content on the site! I've experimented with removing each query parameter at a time, and the only one to make any difference seems to be qf and q: if I remove qf, zero results; if I remove q, I get more results back!
I guess there are two questions here:
Is there anything in these parameters that tell Solr "don't worry if 'particular phrase', or 'piecrusts' appears: just collate the results for 'london'" and then order by relevancy? I would add that I think 'in' is mentioned in the stopwords file, so we can probably ignore the effect of that (?)
Or is this something in the (standard Drupal) schema that I need to change?
I appreciate that sometimes search is better for the visitor if it's inclusive; Google does return results even if it doesn't find perfect matches. But, stopwords and stemming aside, the client does require that searches return only results where all words appear in the content.
As mentioned in the post at http://drupal.org/node/1783454, the Apache Solr Search Integration module makes use of the mm param, which is more or less configured to effect rankings by how closely the keywords are in the dataset. Looking through the docs there are other ways you can use the parameter to effect rankings as well. Therefore the results produced by Apache Solr Search Integration are weighted more closely to the AND operator even though it will return more results as you add more keywords. The benefit of this param is that in cases where the user enters keywords that are too restrictive, results will still be returned. Displaying no results is a really quick way to guide people away from your site.
How are you displaying the search ?
Maybe you could solr views to limit the search range ?
http://drupal.org/project/apachesolr_views
thanks
Nick

API returning crazy results for 2 word searches

Starting fairly recently, the API has started returning crazy results for 2 word searches. For example:
https://api.foursquare.com/v2/venues/search?oauth_token=XXX&query=local_edition&radius=35000&ll=37.8%2C-122.4&limit=20&intent=browse
Will only return things matching 'ion' it seems. If I search for either 'local' or 'edition', the intended location is one of the first few results.
Is it time to stop replacing spaces with underscores? For a while, that was the only way to get reasonable results when searching for multiple words. (see this thread for more information: What's the best way to tune my Foursquare API search queries?)
I'm not sure why you're getting results for "ion", but if you replace the underscore with a plus sign or %20, it seems to work fine for me:
https://developer.foursquare.com/docs/explore#req=venues/search%3Fquery%3Dlocal+edition%26radius%3D35000%26ll%3D37.8%252C-122.4%26limit%3D20%26intent%3Dbrowse
https://developer.foursquare.com/docs/explore#req=venues/search%3Fquery%3Dlocal%2520edition%26radius%3D35000%26ll%3D37.8%252C-122.4%26limit%3D20%26intent%3Dbrowse

Resources