Omit certain fields from being highlighted in Solr - search

I have a Solr engine deployed with a Standard Request Handler
<requestHandler name="standard" class="solr.SearchHandler" default="true">
<!-- default values for query parameters -->
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="facet">true</str>
<str name="facet.field">path</str>
<str name="facet.sort">lex</str>
<str name="facet.limit">10</str>
<str name="facet.offset">0</str>
<str name="facet.method">fc</str>
<str name="hl">true</str>
<str name="hl.fl">body</str>
<str name="hl.fragsize">888</str>
<str name="hl.usePhraseHighLighter">true</str>
<str name="hl.highlightMultiTerm">true</str>
<str name="hl.mergeContiguous">true</str>
</lst>
Is there a way to omit certain fields from being highlighted, eg. say my query is: q=Ruth+AND+age:23.
I only want the search term "Ruth" highlighted, but not the number 23.

You could try expressing the query as: q=Ruth&fq=age:23 since filter queries do not affect highlighting.

Related

How to configure IndexBasedSpellChecker in solr?

I'm using solr 5.2 and I want to use IndexBasedSpellChecker inside my searchHandler,and this is my searchcomponent for IndexBasedSpellChecker:
<searchComponent class="solr.SpellCheckComponent" name="spellcheck">
<str name="queryAnalyzerFieldType">text_en_general</str>
<lst name="spellchecker">
<str name="name">default</str>
<!--specify a field to use for the suggestions-->
<str name="field">body-en</str>
<str name="classname">solr.IndexBasedSpellChecker</str>
<!-- <str name="distanceMeasure">internal</str> -->
<!--The accuracy setting defines the threshold for a valid suggestion-->
<!-- <float name="accuracy">0.05</float> -->
<!-- maxEdits defines the number of changes to the term to allow-->
<int name="maxEdits">2</int>
<!--defines the minimum number of characters the terms should share-->
<int name="minPrefix">1</int>
<!--defines the maximum number of possible matches to review before returning results-->
<int name="maxInspections">5</int>
<!--defines how many characters must be in the query before suggestions are provided-->
<int name="minQueryLength">4</int>
<!-- sets the maximum threshold for the number of documents a term must appear in before being considered as a suggestion-->
<float name="maxQueryFrequency">0.01</float>
<!--sets the minimum number of documents a term must appear in-->
<float name="thresholdTokenFrequency">.01</float>
my problem here is that when I want to use accuracy it gives me this error
Caused by: org.apache.solr.common.SolrException: java.lang.Float cannot be cast to java.lang.String
and when I comment this setting, it will give me another error for using distanceMeasure :
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error loading class 'internal'
and when I coment both of them ,I can't get the result from my spellchecker,and when I query a phrase it just spellcheck the the first word of the phrase,what I should do?
I can't see the full component description so I can't say for sure what's going on. If you have more than one spellchecker inside that component make sure it has the same field name.
<str name="field">body-en</str>
The following code works for me:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">variations</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="distanceMeasure">internal</str>
<float name="accuracy">0.5</float>
<int name="maxEdits">2</int>
<int name="minPrefix">1</int>
<int name="maxInspections">5</int>
<int name="minQueryLength">4</int>
<float name="maxQueryFrequency">0.01</float>
<float name="thresholdTokenFrequency">.01</float>
</lst>
</searchComponent>
with the following request handler snippet:
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck">true</str>
<str name="spellcheck.count">3</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.collate">true</str>
Hope it helps!

Solr 4 new style solr.xml - how to configure access to core admin pages?

In solr wiki this phrase can be found:
To enable dynamic core configuration, make sure the adminPath attribute is set in solr.xml. If this attribute is absent, the CoreAdminHandler will not be available.
In old style solr.xml this attribute sets in cores element:
cores adminPath="/admin/cores"
In new (discovery) style solr.xml (available since solr 4.4 and mandatory since coming 5th) there is no cores element to set and no any notion about adminPath attribute around. As a result, if to check localhost:8983/solr, error occurs:
NetworkError: 404 Not Found - http://localhost:8983/solr/admin/cores?wt=json&indexInfo=false
Does all this mean dynamic core handling via HTTP is unavailable in 4.4+ solr or I missed to set something in configs?
Thanks in advance.
Edit solr.xml
<solr>
<str name="adminHandler">${adminHandler:org.apache.solr.handler.admin.CoreAdminHandler}</str>
<int name="coreLoadThreads">${coreLoadThreads:3}</int>
<str name="coreRootDirectory">${coreRootDirectory:#SOLR.CORES.DIRECTORY#}</str>
<str name="managementPath">${managementPath:}</str>
<str name="sharedLib">${sharedLib:}</str>
<str name="shareSchema">${shareSchema:false}</str>
<solrcloud>
<int name="distribUpdateConnTimeout">${distribUpdTimeout:1000000}</int>
<int name="distribUpdateSoTimeout">${distribUpdateTimeout:1000000}</int>
<int name="leaderVoteWait">${leaderVoteWait:1000000}</int>
<str name="host">${host:}</str>
<str name="hostContext">${hostContext:solr}</str>
<int name="hostPort">${jetty.port:8983}</int>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
</solrcloud>
<logging>
<str name="class">${loggingClass:}</str>
<str name="enabled">${loggingEnabled:}</str>
<watcher>
<int name="size">${loggingSize:1000000}</int>
<int name="threshold">${loggingThreshold:100000}</int>
</watcher>
</logging>
</solr>

Limit the rows to 20 in solr query but allowing cluster to search between 100 rows ?

I want to display only 20 rows in the solr query. However, I wants the carrot2 cluster to create labels/perform clustering for 100 rows ?. Both the thing should happen in the same query. Is it possible ??
Nope. Carrot clustering happens dynamically on the number of results fetched by Solr which is controlled by the rows parameter.
So the control of Solr results has to happen on the client side.
You can add clustering as last-components with request handler so that the search and clustering can be performed with a single call
e.g config :-
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">text</str>
<bool name="clustering">true</bool>
<str name="clustering.engine">default</str>
<bool name="clustering.results">true</bool>
<!-- Fields to cluster on -->
<str name="carrot.title">name</str>
<str name="carrot.snippet">features</str>
</lst>
<arr name="last-components">
<str>clustering</str>
</arr>
</requestHandler>
Or with url clustering=true&clustering.engine=default&clustering.results=true&carrot.title=name&carrot.snippet=features
Change the title and snippet parameters as per your field mappings

Solr - identical search result scores for multiple search terms?

I would like to know how it is possible to get different scores for a multiple terms search result?
Certain results in solr have the same score even when there are multiple terms in the query as you will see in the example below.
I have two indexes in Solr, each containing: id, first_name, last_name
Each index would look like the following:
<doc>
<str name="id">1</str>
<str name="last_name">fisher</str>
<str name="name">john</str>
</doc>
<doc>
<str name="id">2</str>
<str name="last_name">darby</str>
<str name="name">john</str>
</doc>
When I query just "john" both results come up. That is perfect.
However, when I query "john fisher" both results come up but with the same score.
What I want is different scores based on the relevancy of the search terms.
Here is the result for the following query
http://localhost:8983/solr/select?q=john+fisher%0D%0A&rows=10&fl=*%2Cscore
<response>
...
<result name="response" numFound="2" start="0" maxScore="0.85029894">
<doc>
<float name="score">0.85029894</float>
<str name="id">1</str>
<str name="last_name">fisher</str>
<str name="name">john</str>
</doc>
<doc>
<float name="score">0.85029894</float>
<str name="id">2</str>
<str name="last_name">darby</str>
<str name="name">john</str>
</doc>
</result>
</response>
Any help would be greatly appreciated
Your best bet is to understand & analyse how different factors affect your document score, Lucene has helpful feature Explanation, Solr leverage this to provide how scoring is calculated you can use 'debugQuery' in Solr to see how it is derived,
?q=john&fl=score,*&rows=2&debugQuery=on
Ex Response:
<lst name="debug">
<str name="rawquerystring">john</str>
<str name="querystring">john</str>
<str name="parsedquery">+DisjunctionMaxQuery((text:john))</str>
<str name="parsedquery_toString">+(text:john)</str>
<lst name="explain">
<!-- Score calulation for Result#1 -->
<str>
2.1536596 = (MATCH) fieldWeight(text:john in 36722), product of:
1.0 = tf(termFreq(text:john)=1)
8.614638 = idf(docFreq=7591, maxDocs=15393998)
0.25 = fieldNorm(field=text, doc=36722)
</str>
<!-- Score calulation for Result#2 -->
<str>
2.1536596 = (MATCH) fieldWeight(text:john in 36724), product of:
1.0 = tf(termFreq(text:john)=1)
8.614638 = idf(docFreq=7591, maxDocs=15393998)
0.25 = fieldNorm(field=text, doc=36724)
</str>
</lst>
besides this, you can use explainOther to find out how a certain document did not match the query.
?q=john&fl=score,*&rows=2&debugQuery=on&explainOther=on
Do Read:
Solr Relevancy
Lucene Scoring
It looks to me that you are only searching on the "name" field. Thats why the scores are the same. If you use DisMax you can easily search on both fields and the most relevant will have a higher score.
e.g.
<str name="defType">edismax</str>
<str name="qf">name last_name</str>
Another way is to combine the 2 fields into 1 field with copyField and only search in the newly created field.
Thanks for the quick reply guys, I appreciate that.
From the explain query I was able to identify that indeed the search was only been performed on one field alone.
I saw that it is possible to add multiple fields to the same field for searching.
In the schema.xml I added the following:
<copyField source="last_name" dest="text"/>
The results now come up as expected when using more than one search term.

Solr hides some facet.fields when doing a distributed search

I am searching over 6 Solr shards (Solr version 3.5). What I recognized is that when I am doing the search in my normal standalone instance, which contains the same data I get 2 facet_fields in the facet_counts section. This is was I except:
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="url">...</lst>
<lst name="url">...</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
As you can see there are 2 facet_fields. When I am doing the same query using multiple shards (same data), I am getting always just one facet_field:
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="url">...</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
I am also using tagging and excluding filters in my Query. Could this be the problem?
Thanks to Yonik Seeley from the solr-user mailing list the solution was to add some output keys to the the facets.
See also http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters

Resources