How to index a date without day & time in Apache Solr - search

In my database the dates are like 1973-01. They are stored as string value. If I have to index this using Apache Solr then how would I do it.
I have written the below in my schema.xml:
<field name="pubdate" type="tdate" indexed="true" stored="true" multiValued="false" />
I have also changed all the dates like 1973-01Z. But I am still getting an error:
org.apache.solr.common.SolrException: Invalid Date in Date Math String:'1973-01Z'
I believe Solr only accepts date like 1995-12-31T23:59:59Z
Can anyone help?

In solrconfig.xml you can define the date formats your update request handler can process inside an updateRequestProcessorChain with the help of a ParseDateFieldUpdateProcessorFactory:
<updateRequestProcessorChain name="parse-field-types">
<processor class="solr.RemoveBlankFieldUpdateProcessorFactory"/>
<processor class="solr.ParseBooleanFieldUpdateProcessorFactory"/>
<processor class="solr.ParseLongFieldUpdateProcessorFactory"/>
<processor class="solr.ParseDoubleFieldUpdateProcessorFactory"/>
<processor class="solr.ParseDateFieldUpdateProcessorFactory">
<!-- A default time zone name or offset may optionally be specified for those
dates that don't include an explicit zone/offset.
-->
<str name="defaultTimeZone">Europe/Berlin</str>
<arr name="format">
<str>yyyy-MM-dd'T'HH:mm:ss.SSSZ</str>
<str>yyyy-MM-dd'T'HH:mm:ssZ</str>
<str>yyyy-MM-dd HH:mm:ss Z</str>
<str>yyyy-MM-dd HH:mm:ss</str>
<str>yyyy-MM-dd HH:mm:ss 'UTC</str>
</arr>
</processor>
<processor class="solr.LogUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
Then you have to connect the updateRequestProcessorChain with the update request handler
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="update.chain">parse-field-types</str>
</lst>
</requestHandler>
Maybe you can define a format here that is working for you.

Related

How to get solr result's doc fields in str rather than arr?

I have made an Index, secondCore {id, resid, title, name, cat, role, exp}. When I execute query, then result fields in doc is returned as array (<arr name="fid"><long>6767</long></arr>), but I want it to be string, as it returned in ID(<str name="id">1</str>).
Where can I do the changes? I have multiple cores, and each core have seperate schema.xml, (say server/solr/firstCore/conf/fcschema.xml and server/solr/secondCore/conf/scschema.xml). In core.properties of each core, I have written schema file name as schema=fcschema.xml
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="indent">true</str>
<str name="q">status:inbox</str>
<str name="_">1444301939167</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="3" start="0">
<doc>
<str name="id">1</str>
<arr name="fid">
<long>6767</long>
</arr>
<arr name="resid">
<long>384</long>
</arr>
<arr name="status">
<str>inbox</str>
</arr>
<long name="_version_">1514456876026167296</long></doc>
...
</result>
</response>
Entries in schema file:
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="resid" type="int" indexed="true" stored="true" multiValued="false" />
<field name="title" type="string" indexed="true" stored="true" multiValued="false" />
<field name="name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="cat" type="string" indexed="true" stored="true" multiValued="true" />
<field name="role" type="string" indexed="true" stored="true" multiValued="true" />
<field name="exp" type="float" indexed="true" stored="true" multiValued="false" />
So I wanted to ask:
Where can I do the changes to get result in string rather than array?
How can I verify that, my core is using specified schema file?
To search for the docs having status as inbox filter, I have to perform status:"inbox search" exactly, but I want this doc when I search for status:inbox or status:filter. How to do? I think this problem will get solved after solving first one.
Although this question is not relevant to this topic, but where can I set default output format to xml, rather than json? I tried in solrconfig.xml, but couldn't get it.
PS: I restarted solr after doing anything in any of the xml file, and I'm using solr-5.3
Please feel free to ask for clarification in case the question is unclear. Thanks in advance. :)
Although I have done changes in schema.xml, but I noticed that It was not getting reflected, and later on I came to know that, solr 5.3.x implicitly makes managed-schema.xml, editing which solved all my queries. Check here:
Why is solr returning result with only exact search?
But the problem #4 is still pending. I have tried <str name="wt">xml</str> and wrote response writer also <queryResponseWriter name="xml" class="solr.XMLResponseWriter" />, but couldn't resolve it. Neither adding default="true" did! Can anyone provide me any suggestion?
I had the same issue today: I was migrating from SOLR 4.x to 5.x and suddenly saw after dumping the data in all of the objects had their values nested inside arrays. Not being sure whether the issue was with Haystack or the load script, I tried inserting a few new records via the SOLR dashboard. Same thing, but I noticed a few SOLR specific fields were loading fine.
This bug seems to be related to the field type that you specify. "tstrings" (i believe this is the default via haystack) will make the data stored nested inside arrays, but the "strings" type works just fine. Below is an example of a field specification that allowed me to go from values that were arrays to string values.
<field name="external_id" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
It seems the Haystack schema.xml generator needs some work to adapt to new conventions with Solr 5.x.
It took some time, but the best way I found to fix all of my fields was insert a JSON record and check whether each field came in with the correct format. Go one by one until they're all working properly.
If I find some time I'll look at Haystack's SOLR schema generator and see what might have changed.
Hope this helps someone!
I had the same problem , while migrating from 4.9 to 6.x. I noticed fields defined as text_general returned data as an Array. The same field returned a string type in 4.9 version of solr. Interestingly some fields were not converted to array in solr 6.x. I did not use the "managed-schema", I was using the Classic schema.xml.
To solve the problem I took the schema.xml from solr 4.9 and moved to the conf/ directory of my new solr core. So all the fields definitions were from solr 4.9, I used the solrconfig.xml from solr 6.x but I disabled the updateRequestProcessorChain, as I am not going to use "field guessing"...etc. Once I restarted solr and reindexed content, that solved the problem, I did not see any data element being returned as array, unless its a multi-valued field.

Solr - Searching Multiple Fields

I'm trying to allow a global search across all fields defined in my solr schema.xml. I have the following field:
<field name="catchall"
type="text_en_splitting"
stored="true"
indexed="true"
multiValued="true" />
Then, I have:
<copyField source="*" dest="catchall"/>
<defaultSearchField>catchall</defaultSearchField>
However, when I search without specifying a field, it only searches this field:
<field name="text" type="text_en_splitting" multiValued="false"/>
Is my configuration missing something to search across all fields? Here's an example of the field that is not being included in the default search:
<field name="summary" type="text_en_splitting" indexed="true" stored="true" multiValued="true"/>
I think that I figured out the issue. Apparently with Solr 3.6.1, the default search field is specified in solrconfig.xml rather than in the schema.xml. In solrconfig.xml, I changed the element value from text to catchall.
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">catchall</str>
</lst>
</requestHandler>

Solr Ping query caused exception: undefined field text

Im trying to do some work on my server but running into problems. When I try to ping the server through the admin panel I get this error, which I believe might be causing the problem:
The server encountered an internal error (Ping query caused exception:
undefined field text org.apache.solr.common.SolrException: Ping query
caused exception: undefined field text at
org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:76)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
Can anyone give me a bit of guideance as to what might be going wrong? I'm using Solr 3.6. I think it may be to do with the defined "text" in the schema.xml??
This is my schema currently: https://gist.github.com/3689621
Any help would be much appreciated.
James
Based on the error, I am guessing that the query that is defined in the /admin/ping requestHandler is searching against a field named text, which you do not have defined in your schema.
Here is a typical ping requestHandler section
<requestHandler name="/admin/ping" class="solr.PingRequestHandler">
<lst name="invariants">
<str name="q">solrpingquery</str>
</lst>
<lst name="defaults">
<str name="qt">standard</str>
<str name="echoParams">all</str>
<str name="df">text</str>
</lst>
</requestHandler>
Note how the <str name="df">text<str> setting. This is the default field that the ping will execute the search against. You should change this to a field that is defined in your schema, perhaps, title or description based on your schema.
Add this line in your schema.xml
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>

Sorting by custom score in Solr does not sort consistently

I assign a custom "popularity" score for each document in my Solr database. I want search results to be ordered by this custom "score" field rather than the built-in relevancy score that is the default.
First I define my score field:
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
<field name="score" type="sint" stored="true" multiValued="false" />
Then I rebuild the index, inserting a score for each document.
To run a query, I use something like this:
(text:hello)+_val_:"score"
Now I would expect the documents to come back sorted by the "score" field, but what I get instead is:
<doc>
<int name="score">566</int>
<str name="text">SF - You lost me at hello...</str>
</doc>
<doc>
<int name="score">41</int>
<str name="text">hello</str>
</doc>
<doc>
<int name="score">77</int>
<str name="text">
CAGE PAGE-SAY HELLO (MIKE GOLDEN's Life Is Bass Remix)-VIM
</str>
</doc>
<doc>
<int name="score">0</int>
<str name="text">Hello Hello Hello</str>
</doc>
Notice that the scores come back out of order: 566, 41, 77, 0. The weird thing is that it only sorts this way with certain queries. I'm not sure what the pattern is, but so far I've only see the bad sorting when scores of "0" come back in the search results.
I've tried IntField instead of SortableIntField, and I've tried putting "sort=score desc" as a query parameter, with no change in behavior.
Am I doing something wrong, or just misunderstanding the meaning of using val:"score" in my query?
EDIT: I tried renaming the "score" field to "popularity" and got the same result.
score field is used by Solr internally, so may be its not a good practice to define a field with the same field name.
you can try defining a field with different field name and both the options you mentioned should work fine.
Edit - This is what i have and works fine (Solr 3.3)
Schema -
Field Type -
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
Field -
<field name="popularity" type="int" indexed="true" stored="true" />
Data -
<add>
<doc>
<field name="id">1007WFP</field>
<field name="popularity">566</field>
<field name="text">SF - You lost me at hello...</field>
</doc>
<doc>
<field name="id">2007WFP</field>
<field name="popularity">41</field>
<field name="text">hello</field>
</doc>
<doc>
<field name="id">3007WFP</field>
<field name="popularity">77</field>
<field name="text">
CAGE PAGE-SAY HELLO (MIKE GOLDEN's Life Is Bass Remix)-VIM
</field>
</doc>
<doc>
<field name="id">4007WFP</field>
<field name="popularity">0</field>
<field name="text">Hello Hello Hello</field>
</doc>
</add>
Query -
http://localhost:8983/solr/select?q=*:*&sort=popularity%20desc
Results :-
<result name="response" numFound="4" start="0">
<doc>
<str name="id">1007WFP</str>
<int name="popularity">566</int>
</doc>
<doc>
<str name="id">3007WFP</str>
<int name="popularity">77</int>
</doc>
<doc>
<str name="id">2007WFP</str>
<int name="popularity">41</int>
</doc>
<doc>
<str name="id">4007WFP</str>
<int name="popularity">0</int>
</doc>
</result>
The _val_ hack actually ADDS the "popularity" field to the normally computed score of solr.
So, if you have popularity=41 on document A and popularity=77 on document B, but document A scores more than 36 points better than B for the keyword "hello", then they'll get sorted with A before B.
Use the "sort" field (as you did) that completely overrides normal sorting by score.
An alternative way could be to use a filter query (parameter fq instead of q), that filters matching document without computing any score, and then use _val_ to define your scoring formula. Since with filter queries all retrieved documents will have a score of zero, _val_ would be unaffected and behave as you originally expected.

facet dynamic fields with apache solr

I have defined dynamic field in ApacheSolr:
I use it to store products features like: color_feature, diameter_feature, material_feature and so on. Number of those fields are not constant becouse products are changing.
Is it possible to get facet result for all those dynamic fields with the same query or do I need to write always all fields in a query like ... facet.field=color_feature&facet.field=diameter_feature&facet.field=material_feature&facet.field=...
Solr currently does not support wildcards in the facet.field parameter.
So *_feature won't work for you.
May want to check on this - https://issues.apache.org/jira/browse/SOLR-247
If you don't want to pass parameters, you can easily add these to your request handler defaults.
The qt=requesthandler in request would always include these facets.
I was in a similar situation when working on an e-commerce platform. Each item had static fields (Price, Name, Category) that easily mapped to SOLR's schema.xml, but each item could also have a dynamic amount of variations.
For example, a t-shirt in the store could have Color (Black, White, Red, etc.) and Size (Small, Medium, etc.) attributes, whereas a candle in the same store could have a Scent (Pumpkin, Vanilla, etc.) variation. Essentially, this is an entity-attribute-value (EAV) relational database design used to describe some features of the product.
Since the schema.xml file in SOLR is flat from the perspective of faceting, I worked around it by munging the variations into a single multi-valued field ...
<field
name="variation"
type="string"
indexed="true"
stored="true"
required="false"
multiValued="true" />
... shoving data from the database into these fields as Color|Black, Size|Small, and Scent|Pumpkin ...
<doc>
<field name="id">ITEM-J-WHITE-M</field>
<field name="itemgroup.identity">2</field>
<field name="name">Original Jock</field>
<field name="type">ITEM</field>
<field name="variation">Color|White</field>
<field name="variation">Size|Medium</field>
</doc>
<doc>
<field name="id">ITEM-J-WHITE-L</field>
<field name="itemgroup.identity">2</field>
<field name="name">Original Jock</field>
<field name="type">ITEM</field>
<field name="variation">Color|White</field>
<field name="variation">Size|Large</field>
</doc>
<doc>
<field name="id">ITEM-J-WHITE-XL</field>
<field name="itemgroup.identity">2</field>
<field name="name">Original Jock</field>
<field name="type">ITEM</field>
<field name="variation">Color|White</field>
<field name="variation">Size|Extra Large</field>
</doc>
... so that when I tell SOLR to facet, then I get results that look like ...
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="variation">
<int name="Color|White">2</int>
<int name="Size|Extra Large">2</int>
<int name="Size|Large">2</int>
<int name="Size|Medium">2</int>
<int name="Size|Small">2</int>
<int name="Color|Black">1</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
... so that my code that parses these results to display to the user can just split on my | delimiter (assuming that neither my keys nor values will have a | in them) and then group by the keys ...
Color
White (2)
Black (1)
Size
Extra Large (2)
Large (2)
Medium (2)
Small (2)
... which is good enough for government work.
One disadvantage of doing it this way is that you'll lose the ability to do range facets on this EAV data, but in my case, that didn't apply (the Price field applying to all items and thus being defined in schema.xml so that it can be faceted in the usual way).
Hope this helps someone!

Resources