Solr conditional query fields (qf) - search

Is it possible to define query fields in Solr based on certain conditions? For e.g. I've three fields text, title and product.The solr config definition:
<str name="qf">text^0.5 title^10.0 Product</str>
What I'm looking here is to include "product" as a searchable field only when certain condition is met, for e.g. if author:"Tom", then search in Product as well.
Is there a way to do that during query time using edismax ?
The alternate I've is to add the product information to either text or title of the document (where author=Tom) during index time so it'll be searchable. But, I'm trying to avoid this if possible.
Any pointers will be appreciated.
-Thanks

In order to search in different fields based on different conditions, there is a need to first search for that specific conditions, thus it is more or less the same as issuing multiple queries.
That said, in case there is a need to do it as a one-time query (e.g. for out-of-the-box sorting/grouping/other solr features), the nested queries can be used.
For defining two different conditions (as in the original question, but it can easily be extended with more OR clauses), the q parameter can receive following value:
_query_:"{!edismax fq=$fq1 qf=$qf1 v=$condQuery}"
OR
_query_:"{!edismax fq=$fq2 qf=$qf2 v=$condQuery}"
The query uses Parameter Dereferencing, so there is no need to manually escape any special characters before passing the parameters to solr.
fq1 - first special condition
qf1 - list of fields to search in for first special condition (fq1)
fq2 - second special condition
qf2 - list of fields to search in for first special condition (fq2)
condQuery - the actual search term/query
The fq1 may be empty in order to define a baseline (in this particular case - search in text and title, but not in product).
The raw parameters themselves will look the following way:
fq1=&qf1=text^0.5 title^10.0&fq2=author:"Tom"&qf2=text^0.5 title^10.0 Product&condQuery=5
And the Final query will be something like this:
http://localhost:8983/solr/collection1/select?q=_query_%3A%22%7B!edismax+fq%3D%24fq1+qf%3D%24qf1+v%3D%24condQuery%7D%22+OR+_query_%3A%22%7B!edismax+fq%3D%24fq2+qf%3D%24qf2+v%3D%24condQuery%7D%22&fl=*%2Cscore&wt=xml&indent=true&fq1=&qf1=text^0.5%20title^10.0&fq2=author:%22Tom%22&qf2=text^0.5%20title^10.0%20Product&condQuery=5
.. or the same query returned by solr in solr response (provided only for showing it in a structured way):
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="q">_query_:"{!edismax fq=$fq1 qf=$qf1 v=$condQuery}" OR _query_:"{!edismax fq=$fq2 qf=$qf2 v=$condQuery}"</str>
<str name="condQuery">5</str>
<str name="indent">true</str>
<str name="fl">*,score</str>
<str name="fq1"/>
<str name="qf1">text^0.5 title^10.0</str>
<str name="fq2">author:"Tom"</str>
<str name="qf2">text^0.5 title^10.0 Product</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="..." start="..." maxScore="...">
...
</result>
</response>
Even though it works, I suggest to consider the effect it would have on query time (as each condition will have a separate internal search query) and measure how it affects your specific case.

I didn't try it by myself, but it looks like this could be achievable by using http://wiki.apache.org/solr/FunctionQuery#Boolean_Functions

Shamik,
I don't think there is a way to do this in Solr that is easy. One thing to consider is managing of these rules overtime too, it would be a nightmare for a large system.
If you really wanted to do something like this, maybe you can issue two calls to Solr to get the result set.

Related

SOLR: how to exclude fields globally?

Im using apache Solr for my site search. and the website has large number of pages and each page has a field called 'searchEnabled'. This is a boolean field contains values true or false. I want to exclude the disabled pages from all the search results (The site has number of different searches) if the searchEnabled field is set to false.
I can use a filter query(fq) to exclude this field. But my site is using number of different searches with different queries. I do not want to add the filter query in all the search queries across the website. Is there any easy way to disable the indexes with field 'searchEnabled' set to false?
So that no any solr search will return the document/pages where the field value is set to false.
You can add a parameter that will always be present to solrconfig.xml for the request handler you're making your request against.
By using the name appends for your parameter list, the parameter will always be appended to the other given parameters.
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="appends">
<str name="fq">searchEnabled:true</str>
</lst>
</requestHandler>
This will always add a filter query to your requests behind the scenes that limit the result set to those documents that have searchEnabled set to true.

Solr: How to specify field relevancy/weight

I'm currently indexing data using Solr that consists of about 10 fields. When I perform a search I would like certain fields to be weighted higher. Could anyone help point me in the right direction?
For example, searching across all fields for a term such as "superman" should return hits in the "Title" field before the "Description" field.
I've found documentation on how to make one field score higher from the query, but I would prefer to set this in a configuration file or similar. The following would require all searches to specify the weight. Is it possible to specify this in the solr config file?
q=title:superman^2 description:superman
Try using qf with ExtendedDisMax your query then would look like that:
q=superman
While your config will look like:
<str name="qf">title^2 description</str>
You can get some working examples here
The qf parameter introduces a list of fields, each of which is assigned a boost factor to increase or decrease that particular field's importance in the query. For example, the query below:
qf="fieldOne^2.3 fieldTwo fieldThree^0.4"
Assigns fieldOne a boost of 2.3, leaves fieldTwo with the default boost (because no boost factor is specified), and fieldThree a boost of 0.4. These boost factors make matches in fieldOne much more significant than matches in fieldTwo, which in turn are much more significant than matches in fieldThree."
Source: Apache Lucene
In your case: qf="title^100 description" may do the trick - if you're using Solr in a library I'd love to chat.
By using edismax we can achieve what you looking for.
Try adding these two fields in your request handler by changing the fields.
You can remove a particular field completely, if you don't want it.
<str name="defType"> edismax </str>
<str name="qf"> YourField^50 YourAnotherField^30 YetAnotherField</str>
The more the power(^) increases, the more priority that field gets.

solr - modeling multiple values on 1:n connection

I try to model my db using this example from solr wiki.
I have a table called item and a table called features with id,featureName,description
here is the updated xml (added featureName)
<dataConfig>
<dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:/temp/example/ex" user="sa" />
<document>
<entity name="item" query="select * from item">
<entity name="feature" query="select description, featureName as features from feature where item_id='${item.ID}'"/>
</entity>
</document>
Now I get two lists in the xml element
<doc>
<arr name="featureName">
<str>number of miles in every direction the universal cataclysm was gathering</str>
<str>All around the Restaurant people and things relaxed and chatted. The</str>
<str>- Do we have... - he put up a hand to hold back the cheers, - Do we</str>
</arr>
<arr name="description">
<str>to a stupefying climax. Glancing at his watch, Max returned to the stage</str>
<str>air was filled with talk of this and that, and with the mingled scents of</str>
<str>have a party here from the Zansellquasure Flamarion Bridge Club from</str>
</arr>
</doc>
But I would like to see the list together (using xml attributes) so that I dont have to join the values.
Is it possible?
I wanted to suggest the ScriptTransformer, it gives you the flexibility to alter the data as needed, but it will not work in your case since it's working at the row level.
You can always define an aggregation function for string concatenation in SQL(example), but you will potentially have performance issues.
If you would use a http/xml data source the solution would have been to use the flatten atribute.
Nevertheless the search functionality will work as expected even if you ended up with multi-valued fields. The down side would be on the client where you will concatenate them before the presentation layer, which is not really a problem if you use some sort of pagination.

Remove duplicates without considering position

Is there any Filter Factory that can be used to remove duplicates without considering positions?
I cannot use the RemoveDuplicatesTokenFilterFactory because it considers positions [stack].
I had a similar issue with lots of duplicate values within fields where I wanted them to be unique. The solution was to add a processor to the solrconfig.xml file. Below is the example. Every value for the fields listed will be unique. My field names are ingredient_substance, active_moiety ...
<updateRequestProcessorChain>
<processor class="org.apache.solr.update.processor.UniqFieldsUpdateProcessorFactory">
<lst name="fields">
<str>ingredient_substance</str>
<str>active_moiety</str>
<str>generic_medicine</str>
<str>inactive_ingredient_substance</str>
</lst>
</processor>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>

Solr - index JSON query string from database?

I would like to know if it is possible to index data that contains a JSON string that can be decoded and each JSON value to be indexed with the separate values.
I am using the DIH to connect to a MySQL database and able to index the individual columns.
The result would look like the following:
<response name="response" numFound="1" start="0" maxScore="2.7143538">
...
<result name="response" numFound="1" start="0" maxScore="2.7143538">
<doc>
<float name="score">2.7143538</float>
<str name="id">82</str>
<str name="name">jorge</str>
<str name="otherinfo">{"day":15,"year":1989,"month":"January"}</str>
</doc>
</result>
</response>
The problem is that "otherinfo" is a JSON string that I would like to decode and have something like the following in my index:
<response name="response" numFound="1" start="0" maxScore="2.7143538">
...
<result name="response" numFound="1" start="0" maxScore="2.7143538">
<doc>
<float name="score">2.7143538</float>
<str name="id">82</str>
<str name="name">jorge</str>
<str name="day">15</str>
<str name="year">1989</str>
<str name="month">January</str>
</doc>
</result>
</response>
Would this be possible to do at all with Solr?
Thanks in advance
I commented on this. I decided that I should answer instead.
The fix for your issue isn't at the Solr level. You shouldn't be storing your data this way in the DB to begin with. In the long run, it would be better to fix this problem there, as opposed to trying to hack this at the Solr indexing level.
Your question proves that someone, probably an end user, is interested in searching by this data. This implies that it should probably be stored in the database as an actual Date or Timestamp field so that it can be properly selected or sorted on.
I'm sure people won't like that this doesn't exactly answer your question, but someone needs to tell you this.
If you know your way around Java you could write your own, custom transformer that would handle your specific case.
Have you tried using DIH RegexTransformer to parse JSON?
I think that should be doable, especially if you have fixed json format (doesn't contain document in document in document in ...).
I've just noticed ScriptTransformer, which allows you to write your own parser. I think this is the way to go...
Is the otherinfo field in the DB a JSON string to start with?
You would need dynamic fields (docs, explanation) and client-side code to let Solr store data with arbitary schema.
You would need to define dynamic fields in your schema like:
dyn_string_*: store text as it is
dyn_text__*: store text and index it for search
etc
Then you will need to tell DIH to map DB fields to solr dynamic fields (pseudocode warning; sorry, but I am not familiar with DIH):
Select
day as dyn_number_day,
name as dyn_text_name
from
tablename
Edit
You do have the requirement to query into the data structure. This needs a schema-less datastore.
Document DBs like MongoDB offer exactly the functionality: store data on arbitary fields you determine at insert-time. And it can run any kind of ad-hoc query on your data.
I am not aware of a request handler that can index your data for that. You can write code that fetches updated (or added or removed) rows periodically, decodes the JSON field and index it to Solr.
I reccomend skinny data model to store attributes to properties independent of current DB schema. I asked a question ' Set intersection in MySQL: a clean way ' a while back.
Recap: MongoDB and friends contain exactly the functionality you need. If you want relations and referential integrity, you can keep using RDBMS. If you still want that JSON thing, develop an active system that will parse it and index it to solr. But I recommend moving to a skinny data model, since you can get the same (conditions apply!) query capabilities that Solr gives you by SQL.
Exotic technology: Graph databases like Neo4j contain document database functionality (ad-hoc queries) and relations: a relation directly links one node to another, no joins involved. So it's just one step short of referential integrity.

Resources