SOLR: how to exclude fields globally?

SOLR: how to exclude fields globally? - search

Im using apache Solr for my site search. and the website has large number of pages and each page has a field called 'searchEnabled'. This is a boolean field contains values true or false. I want to exclude the disabled pages from all the search results (The site has number of different searches) if the searchEnabled field is set to false.
I can use a filter query(fq) to exclude this field. But my site is using number of different searches with different queries. I do not want to add the filter query in all the search queries across the website. Is there any easy way to disable the indexes with field 'searchEnabled' set to false?
So that no any solr search will return the document/pages where the field value is set to false.

You can add a parameter that will always be present to solrconfig.xml for the request handler you're making your request against.
By using the name appends for your parameter list, the parameter will always be appended to the other given parameters.
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="appends">
<str name="fq">searchEnabled:true</str>
</lst>
</requestHandler>
This will always add a filter query to your requests behind the scenes that limit the result set to those documents that have searchEnabled set to true.

Related

Solr not allow search free text (semantic search)

I indexed the data in solr from database and want to do a free text search in all the columns indexed.
Do not want to provide column names.

Add a catch-all copyField instruction (for example source="*" dest="_text_"). This will make sure that all content is copied into the _text_ field. Make your queries search against this field.
q=foo bar&qf=_text_
The _text_ field is usually already defined, but otherwise configure it as a text field.

If you are using eDisMax or DisMax parser you can use the qf parameter to indicate what fields will be searched.
The general syntax is (via query string parameters) is:
q="hello+world"&qf=field1+field2+field3&defType=edismax
You can set this value directly on your solr_config.xml so that you don't have to pass it on every request. If you do, then your query will just be:
q="hello+world"&defType=edismax

Solr conditional query fields (qf)

Is it possible to define query fields in Solr based on certain conditions? For e.g. I've three fields text, title and product.The solr config definition:
<str name="qf">text^0.5 title^10.0 Product</str>
What I'm looking here is to include "product" as a searchable field only when certain condition is met, for e.g. if author:"Tom", then search in Product as well.
Is there a way to do that during query time using edismax ?
The alternate I've is to add the product information to either text or title of the document (where author=Tom) during index time so it'll be searchable. But, I'm trying to avoid this if possible.
Any pointers will be appreciated.
-Thanks

In order to search in different fields based on different conditions, there is a need to first search for that specific conditions, thus it is more or less the same as issuing multiple queries.
That said, in case there is a need to do it as a one-time query (e.g. for out-of-the-box sorting/grouping/other solr features), the nested queries can be used.
For defining two different conditions (as in the original question, but it can easily be extended with more OR clauses), the q parameter can receive following value:
_query_:"{!edismax fq=$fq1 qf=$qf1 v=$condQuery}"
OR
_query_:"{!edismax fq=$fq2 qf=$qf2 v=$condQuery}"
The query uses Parameter Dereferencing, so there is no need to manually escape any special characters before passing the parameters to solr.
fq1 - first special condition
qf1 - list of fields to search in for first special condition (fq1)
fq2 - second special condition
qf2 - list of fields to search in for first special condition (fq2)
condQuery - the actual search term/query
The fq1 may be empty in order to define a baseline (in this particular case - search in text and title, but not in product).
The raw parameters themselves will look the following way:
fq1=&qf1=text^0.5 title^10.0&fq2=author:"Tom"&qf2=text^0.5 title^10.0 Product&condQuery=5
And the Final query will be something like this:
http://localhost:8983/solr/collection1/select?q=_query_%3A%22%7B!edismax+fq%3D%24fq1+qf%3D%24qf1+v%3D%24condQuery%7D%22+OR+_query_%3A%22%7B!edismax+fq%3D%24fq2+qf%3D%24qf2+v%3D%24condQuery%7D%22&fl=*%2Cscore&wt=xml&indent=true&fq1=&qf1=text^0.5%20title^10.0&fq2=author:%22Tom%22&qf2=text^0.5%20title^10.0%20Product&condQuery=5
.. or the same query returned by solr in solr response (provided only for showing it in a structured way):
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="q">_query_:"{!edismax fq=$fq1 qf=$qf1 v=$condQuery}" OR _query_:"{!edismax fq=$fq2 qf=$qf2 v=$condQuery}"</str>
<str name="condQuery">5</str>
<str name="indent">true</str>
<str name="fl">*,score</str>
<str name="fq1"/>
<str name="qf1">text^0.5 title^10.0</str>
<str name="fq2">author:"Tom"</str>
<str name="qf2">text^0.5 title^10.0 Product</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="..." start="..." maxScore="...">
...
</result>
</response>
Even though it works, I suggest to consider the effect it would have on query time (as each condition will have a separate internal search query) and measure how it affects your specific case.

I didn't try it by myself, but it looks like this could be achievable by using http://wiki.apache.org/solr/FunctionQuery#Boolean_Functions

Shamik,
I don't think there is a way to do this in Solr that is easy. One thing to consider is managing of these rules overtime too, it would be a nightmare for a large system.
If you really wanted to do something like this, maybe you can issue two calls to Solr to get the result set.

Lucene wildcard applied to indexed field

I have a set of indexed fields such as these:
submitted_form_2200FA17-AF7A-4E44-9749-79D3A391A1AF:true
submitted_form_2398389-2-32-43242423:true
submitted_form_54543-32SDf-3242340-32422:true
And I get that it's possible to wildcard queries such as
submitted_form_2398389-2-32-43242423:t*e
What I'm trying to do is get "any" submitted form via something like:
submitted_form_*:true
Is this possible? Or will I have to do a stream of "OR"s on the known forms (which seems quite heavy)

That's not the intended use of fields, I think. Field names aren't supposed to be the searchable values, field values are. Field names are supposed to be known a priori.
My suggestion is (if possible) to store the second part of the name as the field value, for instance: submitted_form:2398389-2-32-43242423. submitted_from would be the field known a priori, and the value could eventually be searched with a PrefixQuery.
Anyway, you could access the collection of fields' names using IndexReader.getFieldNames() in Lucene 3.x and this in Lucene 4.x. I wouldn't expect search performance there.

Solr: How to specify field relevancy/weight

I'm currently indexing data using Solr that consists of about 10 fields. When I perform a search I would like certain fields to be weighted higher. Could anyone help point me in the right direction?
For example, searching across all fields for a term such as "superman" should return hits in the "Title" field before the "Description" field.
I've found documentation on how to make one field score higher from the query, but I would prefer to set this in a configuration file or similar. The following would require all searches to specify the weight. Is it possible to specify this in the solr config file?
q=title:superman^2 description:superman

Try using qf with ExtendedDisMax your query then would look like that:
q=superman
While your config will look like:
<str name="qf">title^2 description</str>
You can get some working examples here

The qf parameter introduces a list of fields, each of which is assigned a boost factor to increase or decrease that particular field's importance in the query. For example, the query below:
qf="fieldOne^2.3 fieldTwo fieldThree^0.4"
Assigns fieldOne a boost of 2.3, leaves fieldTwo with the default boost (because no boost factor is specified), and fieldThree a boost of 0.4. These boost factors make matches in fieldOne much more significant than matches in fieldTwo, which in turn are much more significant than matches in fieldThree."
Source: Apache Lucene
In your case: qf="title^100 description" may do the trick - if you're using Solr in a library I'd love to chat.

By using edismax we can achieve what you looking for.
Try adding these two fields in your request handler by changing the fields.
You can remove a particular field completely, if you don't want it.
<str name="defType"> edismax </str>
<str name="qf"> YourField^50 YourAnotherField^30 YetAnotherField</str>
The more the power(^) increases, the more priority that field gets.

Solr - index JSON query string from database?

I would like to know if it is possible to index data that contains a JSON string that can be decoded and each JSON value to be indexed with the separate values.
I am using the DIH to connect to a MySQL database and able to index the individual columns.
The result would look like the following:
<response name="response" numFound="1" start="0" maxScore="2.7143538">
...
<result name="response" numFound="1" start="0" maxScore="2.7143538">
<doc>
<float name="score">2.7143538</float>
<str name="id">82</str>
<str name="name">jorge</str>
<str name="otherinfo">{"day":15,"year":1989,"month":"January"}</str>
</doc>
</result>
</response>
The problem is that "otherinfo" is a JSON string that I would like to decode and have something like the following in my index:
<response name="response" numFound="1" start="0" maxScore="2.7143538">
...
<result name="response" numFound="1" start="0" maxScore="2.7143538">
<doc>
<float name="score">2.7143538</float>
<str name="id">82</str>
<str name="name">jorge</str>
<str name="day">15</str>
<str name="year">1989</str>
<str name="month">January</str>
</doc>
</result>
</response>
Would this be possible to do at all with Solr?
Thanks in advance

I commented on this. I decided that I should answer instead.
The fix for your issue isn't at the Solr level. You shouldn't be storing your data this way in the DB to begin with. In the long run, it would be better to fix this problem there, as opposed to trying to hack this at the Solr indexing level.
Your question proves that someone, probably an end user, is interested in searching by this data. This implies that it should probably be stored in the database as an actual Date or Timestamp field so that it can be properly selected or sorted on.
I'm sure people won't like that this doesn't exactly answer your question, but someone needs to tell you this.

If you know your way around Java you could write your own, custom transformer that would handle your specific case.
Have you tried using DIH RegexTransformer to parse JSON?
I think that should be doable, especially if you have fixed json format (doesn't contain document in document in document in ...).
I've just noticed ScriptTransformer, which allows you to write your own parser. I think this is the way to go...

Is the otherinfo field in the DB a JSON string to start with?
You would need dynamic fields (docs, explanation) and client-side code to let Solr store data with arbitary schema.
You would need to define dynamic fields in your schema like:
dyn_string_*: store text as it is
dyn_text__*: store text and index it for search
etc
Then you will need to tell DIH to map DB fields to solr dynamic fields (pseudocode warning; sorry, but I am not familiar with DIH):
Select
day as dyn_number_day,
name as dyn_text_name
from
tablename
Edit
You do have the requirement to query into the data structure. This needs a schema-less datastore.
Document DBs like MongoDB offer exactly the functionality: store data on arbitary fields you determine at insert-time. And it can run any kind of ad-hoc query on your data.
I am not aware of a request handler that can index your data for that. You can write code that fetches updated (or added or removed) rows periodically, decodes the JSON field and index it to Solr.
I reccomend skinny data model to store attributes to properties independent of current DB schema. I asked a question ' Set intersection in MySQL: a clean way ' a while back.
Recap: MongoDB and friends contain exactly the functionality you need. If you want relations and referential integrity, you can keep using RDBMS. If you still want that JSON thing, develop an active system that will parse it and index it to solr. But I recommend moving to a skinny data model, since you can get the same (conditions apply!) query capabilities that Solr gives you by SQL.
Exotic technology: Graph databases like Neo4j contain document database functionality (ad-hoc queries) and relations: a relation directly links one node to another, no joins involved. So it's just one step short of referential integrity.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string