Solr Ping query caused exception: undefined field text - search

Im trying to do some work on my server but running into problems. When I try to ping the server through the admin panel I get this error, which I believe might be causing the problem:
The server encountered an internal error (Ping query caused exception:
undefined field text org.apache.solr.common.SolrException: Ping query
caused exception: undefined field text at
org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:76)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
Can anyone give me a bit of guideance as to what might be going wrong? I'm using Solr 3.6. I think it may be to do with the defined "text" in the schema.xml??
This is my schema currently: https://gist.github.com/3689621
Any help would be much appreciated.
James

Based on the error, I am guessing that the query that is defined in the /admin/ping requestHandler is searching against a field named text, which you do not have defined in your schema.
Here is a typical ping requestHandler section
<requestHandler name="/admin/ping" class="solr.PingRequestHandler">
<lst name="invariants">
<str name="q">solrpingquery</str>
</lst>
<lst name="defaults">
<str name="qt">standard</str>
<str name="echoParams">all</str>
<str name="df">text</str>
</lst>
</requestHandler>
Note how the <str name="df">text<str> setting. This is the default field that the ping will execute the search against. You should change this to a field that is defined in your schema, perhaps, title or description based on your schema.

Add this line in your schema.xml
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>

Related

How to index a date without day & time in Apache Solr

In my database the dates are like 1973-01. They are stored as string value. If I have to index this using Apache Solr then how would I do it.
I have written the below in my schema.xml:
<field name="pubdate" type="tdate" indexed="true" stored="true" multiValued="false" />
I have also changed all the dates like 1973-01Z. But I am still getting an error:
org.apache.solr.common.SolrException: Invalid Date in Date Math String:'1973-01Z'
I believe Solr only accepts date like 1995-12-31T23:59:59Z
Can anyone help?
In solrconfig.xml you can define the date formats your update request handler can process inside an updateRequestProcessorChain with the help of a ParseDateFieldUpdateProcessorFactory:
<updateRequestProcessorChain name="parse-field-types">
<processor class="solr.RemoveBlankFieldUpdateProcessorFactory"/>
<processor class="solr.ParseBooleanFieldUpdateProcessorFactory"/>
<processor class="solr.ParseLongFieldUpdateProcessorFactory"/>
<processor class="solr.ParseDoubleFieldUpdateProcessorFactory"/>
<processor class="solr.ParseDateFieldUpdateProcessorFactory">
<!-- A default time zone name or offset may optionally be specified for those
dates that don't include an explicit zone/offset.
-->
<str name="defaultTimeZone">Europe/Berlin</str>
<arr name="format">
<str>yyyy-MM-dd'T'HH:mm:ss.SSSZ</str>
<str>yyyy-MM-dd'T'HH:mm:ssZ</str>
<str>yyyy-MM-dd HH:mm:ss Z</str>
<str>yyyy-MM-dd HH:mm:ss</str>
<str>yyyy-MM-dd HH:mm:ss 'UTC</str>
</arr>
</processor>
<processor class="solr.LogUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
Then you have to connect the updateRequestProcessorChain with the update request handler
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="update.chain">parse-field-types</str>
</lst>
</requestHandler>
Maybe you can define a format here that is working for you.

Solr can not process rich documents but already fetched

Recently, I am trying to adopt Solr to search rich document files(e.g. .pdf, .doc, xls ...etc)
When I try to import all the files from the disk using Solr admin UI (localhost:18983/solr/#/local.info/dataimport//dataimport), the message always shows "Index Completed" but no document added/updated.
Data Import Messages Screenshot
I have also checked the official online manual to index a directory of rich files(lucene.apache.org/solr/quickstart.html#indexing-a-directory-of-rich-files).
The error messages showed
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: localhost:8983/solr/local.info/update/extract?resource.name=%2Fvar%2Fsolr%2Fdata%2Flocal.info%2Frich_documents%2FNEWS.PDF&literal.id=%2Fvar%2Fsolr%2Fdata%2Flocal.info%2Frich_documents%2FNEWS.PDF
SimplePostTool: WARNING: Response:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">71</int>
</lst>
<lst name="error">
<str name="msg">
Invalid UUID String: '/var/solr/data/local.info/rich_documents/NEWS.PDF'</str>
<int name="code">400</int></lst>
</response>
Here are my configs
data-config.xml, solrconfig.xml, schema.xml
Configs Link
Anyone has idea to fix this problem?
Thanks

How to get solr result's doc fields in str rather than arr?

I have made an Index, secondCore {id, resid, title, name, cat, role, exp}. When I execute query, then result fields in doc is returned as array (<arr name="fid"><long>6767</long></arr>), but I want it to be string, as it returned in ID(<str name="id">1</str>).
Where can I do the changes? I have multiple cores, and each core have seperate schema.xml, (say server/solr/firstCore/conf/fcschema.xml and server/solr/secondCore/conf/scschema.xml). In core.properties of each core, I have written schema file name as schema=fcschema.xml
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="indent">true</str>
<str name="q">status:inbox</str>
<str name="_">1444301939167</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="3" start="0">
<doc>
<str name="id">1</str>
<arr name="fid">
<long>6767</long>
</arr>
<arr name="resid">
<long>384</long>
</arr>
<arr name="status">
<str>inbox</str>
</arr>
<long name="_version_">1514456876026167296</long></doc>
...
</result>
</response>
Entries in schema file:
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="resid" type="int" indexed="true" stored="true" multiValued="false" />
<field name="title" type="string" indexed="true" stored="true" multiValued="false" />
<field name="name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="cat" type="string" indexed="true" stored="true" multiValued="true" />
<field name="role" type="string" indexed="true" stored="true" multiValued="true" />
<field name="exp" type="float" indexed="true" stored="true" multiValued="false" />
So I wanted to ask:
Where can I do the changes to get result in string rather than array?
How can I verify that, my core is using specified schema file?
To search for the docs having status as inbox filter, I have to perform status:"inbox search" exactly, but I want this doc when I search for status:inbox or status:filter. How to do? I think this problem will get solved after solving first one.
Although this question is not relevant to this topic, but where can I set default output format to xml, rather than json? I tried in solrconfig.xml, but couldn't get it.
PS: I restarted solr after doing anything in any of the xml file, and I'm using solr-5.3
Please feel free to ask for clarification in case the question is unclear. Thanks in advance. :)
Although I have done changes in schema.xml, but I noticed that It was not getting reflected, and later on I came to know that, solr 5.3.x implicitly makes managed-schema.xml, editing which solved all my queries. Check here:
Why is solr returning result with only exact search?
But the problem #4 is still pending. I have tried <str name="wt">xml</str> and wrote response writer also <queryResponseWriter name="xml" class="solr.XMLResponseWriter" />, but couldn't resolve it. Neither adding default="true" did! Can anyone provide me any suggestion?
I had the same issue today: I was migrating from SOLR 4.x to 5.x and suddenly saw after dumping the data in all of the objects had their values nested inside arrays. Not being sure whether the issue was with Haystack or the load script, I tried inserting a few new records via the SOLR dashboard. Same thing, but I noticed a few SOLR specific fields were loading fine.
This bug seems to be related to the field type that you specify. "tstrings" (i believe this is the default via haystack) will make the data stored nested inside arrays, but the "strings" type works just fine. Below is an example of a field specification that allowed me to go from values that were arrays to string values.
<field name="external_id" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
It seems the Haystack schema.xml generator needs some work to adapt to new conventions with Solr 5.x.
It took some time, but the best way I found to fix all of my fields was insert a JSON record and check whether each field came in with the correct format. Go one by one until they're all working properly.
If I find some time I'll look at Haystack's SOLR schema generator and see what might have changed.
Hope this helps someone!
I had the same problem , while migrating from 4.9 to 6.x. I noticed fields defined as text_general returned data as an Array. The same field returned a string type in 4.9 version of solr. Interestingly some fields were not converted to array in solr 6.x. I did not use the "managed-schema", I was using the Classic schema.xml.
To solve the problem I took the schema.xml from solr 4.9 and moved to the conf/ directory of my new solr core. So all the fields definitions were from solr 4.9, I used the solrconfig.xml from solr 6.x but I disabled the updateRequestProcessorChain, as I am not going to use "field guessing"...etc. Once I restarted solr and reindexed content, that solved the problem, I did not see any data element being returned as array, unless its a multi-valued field.

Trying to implement scoped autosuggestions with solr

I am trying to implement scoped autosuggestions like in ecommerce websites like amazon etc.
eg.
if i type Lego , the suggestions should come like
Legolas in Names
Lego in Toys
where Names and Toys are solr field names.
closest aid i got is from this discussion:
solr autocomplete with scope is it possible?
Which informed me that it isn't possible with the suggester which I am currently using.
Until now, using the suggester I am able to achieve autosuggestions from a single solr field. [the autosuggest field , following guidelines in the suggester documentation]
Any ideas/links to help me with ?
Update
I tried to achieve autosuggestions using facets. My query looks something like:
http://localhost:8983/solr/core1/select?q=*%3A*&rows=0&wt=json&indent=true&facet=true&facet.field=field1&facet.field=field2&facet.prefix=i
This gives me all the facet results starting with letter 'i' and term faceted to field1 and field2.
This gave me the idea.
Any comments?
I am assuming you are storing the Names or Toys data as in a field, let call it category.
You can configure the payloadField parameter in the searchComponent definition and pass the category data into it. Later in the application when you receive the suggestion results from solr, show first suggestion from each category or which ever strategy suits better for your use case.
You can find the more information in Solr Suggester.
Suggester component seems useful but in payload field, one can only return a single field which may not satisfy many of the use cases.
By Facet prefixing, you cannot get suggestions from a word in the middle. So "Lego" will give suggestion of a product whose value in name field is "Legolas Sample" but not from "Sample Legolas".
The third way is to implement autosuggest is by using a index analyzer that has a layer of EdgeNGramFilterFactory and then searching on the required prefix.
So, the solr schema will look like
<field name="names" type="string" multiValued="false" indexed="true" stored="true"/>
<field name="toys" type="string" multiValued="false" indexed="true" stored="true"/>
<field name="names_ngram" type="text_suggest_ngram" multiValued="false" indexed="true" stored="false"/>
<field name="toys_ngram" type="text_suggest_ngram" multiValued="false" indexed="true" stored="false"/>
and the field type would have a definition of
<fieldType name="text_suggest_ngram" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="10" minGramSize="2"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
and these _ngram fields would be a copyfield:
<copyField source="names" dest="names_ngram"/>
<copyField source="toys" dest="toys_ngram"/>
So , once you have reindexed your data, if you query for "Lego" it will give results from both "Sample Legolas" and "Legolas Sample". However, if you have to categorize these results according to n fields they matched, that would be n different queries which is usually not a problem.
You can add multiple suggester components.
Add one for each field.
E.g. :
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">namesSuggester</str>
<str name="lookupImpl">BlendedInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">Names</str>
<str name="weightField">Popularity</str>
<str name="indexPath">namesSuggesterIndexDir</str>
<str name="suggestAnalyzerFieldType">suggester</str>
</lst>
<lst name="suggester">
<str name="name">toysSuggester</str>
<str name="lookupImpl">BlendedInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">Toys</str>
<str name="weightField">Popularity</str>
<str name="indexPath">toysSuggesterIndexDir</str>
<str name="suggestAnalyzerFieldType">suggester</str>
</lst>
</searchComponent>

Solr - identical search result scores for multiple search terms?

I would like to know how it is possible to get different scores for a multiple terms search result?
Certain results in solr have the same score even when there are multiple terms in the query as you will see in the example below.
I have two indexes in Solr, each containing: id, first_name, last_name
Each index would look like the following:
<doc>
<str name="id">1</str>
<str name="last_name">fisher</str>
<str name="name">john</str>
</doc>
<doc>
<str name="id">2</str>
<str name="last_name">darby</str>
<str name="name">john</str>
</doc>
When I query just "john" both results come up. That is perfect.
However, when I query "john fisher" both results come up but with the same score.
What I want is different scores based on the relevancy of the search terms.
Here is the result for the following query
http://localhost:8983/solr/select?q=john+fisher%0D%0A&rows=10&fl=*%2Cscore
<response>
...
<result name="response" numFound="2" start="0" maxScore="0.85029894">
<doc>
<float name="score">0.85029894</float>
<str name="id">1</str>
<str name="last_name">fisher</str>
<str name="name">john</str>
</doc>
<doc>
<float name="score">0.85029894</float>
<str name="id">2</str>
<str name="last_name">darby</str>
<str name="name">john</str>
</doc>
</result>
</response>
Any help would be greatly appreciated
Your best bet is to understand & analyse how different factors affect your document score, Lucene has helpful feature Explanation, Solr leverage this to provide how scoring is calculated you can use 'debugQuery' in Solr to see how it is derived,
?q=john&fl=score,*&rows=2&debugQuery=on
Ex Response:
<lst name="debug">
<str name="rawquerystring">john</str>
<str name="querystring">john</str>
<str name="parsedquery">+DisjunctionMaxQuery((text:john))</str>
<str name="parsedquery_toString">+(text:john)</str>
<lst name="explain">
<!-- Score calulation for Result#1 -->
<str>
2.1536596 = (MATCH) fieldWeight(text:john in 36722), product of:
1.0 = tf(termFreq(text:john)=1)
8.614638 = idf(docFreq=7591, maxDocs=15393998)
0.25 = fieldNorm(field=text, doc=36722)
</str>
<!-- Score calulation for Result#2 -->
<str>
2.1536596 = (MATCH) fieldWeight(text:john in 36724), product of:
1.0 = tf(termFreq(text:john)=1)
8.614638 = idf(docFreq=7591, maxDocs=15393998)
0.25 = fieldNorm(field=text, doc=36724)
</str>
</lst>
besides this, you can use explainOther to find out how a certain document did not match the query.
?q=john&fl=score,*&rows=2&debugQuery=on&explainOther=on
Do Read:
Solr Relevancy
Lucene Scoring
It looks to me that you are only searching on the "name" field. Thats why the scores are the same. If you use DisMax you can easily search on both fields and the most relevant will have a higher score.
e.g.
<str name="defType">edismax</str>
<str name="qf">name last_name</str>
Another way is to combine the 2 fields into 1 field with copyField and only search in the newly created field.
Thanks for the quick reply guys, I appreciate that.
From the explain query I was able to identify that indeed the search was only been performed on one field alone.
I saw that it is possible to add multiple fields to the same field for searching.
In the schema.xml I added the following:
<copyField source="last_name" dest="text"/>
The results now come up as expected when using more than one search term.

Resources