Solr, exact word search for alphanumeric words - search

I want to configure my solr search engine so that I can get exact word searches, where words can have alphanumeric characters. I first tried with the following code
<field name="id" type="TextField" indexed="true" stored="true" required="true" multiValued="false" />
<field name="freq" type="long" indexed="true" stored="true" />
When I try to post values using java -jar post.jar vocab500.xml
I get the error:
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update using content-type application/xml..
POSTing file vocab500.xml
SimplePostTool: WARNING: Solr returned an error #400 Bad Request
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/update
I found some answers on Stackoverflow and changed the fieldname to textgen
<field name="id" type="textgen" indexed="true" stored="true" required="true" multiValued="false" />
<field name="freq" type="long" indexed="true" stored="true" />
but I still got the same error.

Related

Solr: top results when query (word/words) matches title, following when search matches descirption

I have a Java application that uses Solr and SolrJ API for search.
I have an object with title and description and some other fields that have been imported into solr.
I'm trying to make the top responses be when search words in the query match the title, when there are no more matches in the title, the following results should be those that have matches in their description.
I have title and description fields set in schema.xml
<field name="title" type="text_general" indexed="true" stored="false" multiValued="true"/>
<field name="description" type="text_general" indexed="true" stored="true" multiValued="true"/>
I have tried boosting them in solrconfig.xml but I'm not sure it will give me the desired results and have not given so far.
Any help would be much appreciated!

Solr not returning results

I am very new to Apache Solr and currently trying to understand the concepts. I am using version 6.3. I have created a schema and uploaded a file with a bunch of documents. I do see that 1388 documents are available.
When I put in the q field in the Admin UI "coursetitle:biztalk", I do get the relevant results back but not when I put "biztalk". I thought that I do not need to provide the field name?
Here is the schema:
<field name="courseid" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="coursetitle" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="coursetitlesearch" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="durationinseconds" type="int" indexed="true" stored="true" />
<field name="releasedate" type="date" indexed="true" stored="true"/>
<field name="description" type="text_general" indexed="true" stored="true"/>
<field name="assessmentstatus" type="text_general" indexed="true" stored="true"/>
<field name="iscourseretired" type="text_general" indexed="true" stored="true"/>
<field name="tag" type="string" multiValued="true" indexed="true" stored="true"/>
<field name="course-author" type="string" multiValued="true" indexed="true" stored="true"/>
You need to specify the field unless that you want to search it the default field.
when you do not specify any field solr search in the default Field which you can configure using the following in schema.
<defaultSearchField> coursetitle </defaultSearchField>
So if you put the above in schema.xml and then search for something like
biztalk in the query param, solr will search it as coursetitle:biztalk
if you want all your fields to be searched without having to specify a field name , look through Copy Fields
I recommend you to go through this https://wiki.apache.org/solr/SchemaXml to see various fields.
Usually some important fields are copied to field which is used to search default by Solr. So I suggest you use same copyfield
Example:
<defaultSearchField>SEARCHINDEX</defaultSearchField>
<copyField source="AUTHOR" dest="SEARCHINDEX"/>
<copyField source="coursetitle" dest="SEARCHINDEX"/>
<copyField source="coursetitlesearch" dest="SEARCHINDEX"/>
<copyField source="SUBTITLE" dest="SEARCHINDEX"/>
Now You cane use SEARCHINDEX field to search all other fields content.
Since using defaultSearchField is depreciated, your request handler in solrconfig.xml defines "df", which takes precedence.
<initParams path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
<lst name="defaults">
<str name="df">text</str>
</lst>
</initParams>
After doing a little research, it looks like that by using edismax, we can indeed pass a list (space separated) of default fields in df e.g.:
df=courseid coursetitle course-author
This way, we do not need to use the copyField!

How to get solr result's doc fields in str rather than arr?

I have made an Index, secondCore {id, resid, title, name, cat, role, exp}. When I execute query, then result fields in doc is returned as array (<arr name="fid"><long>6767</long></arr>), but I want it to be string, as it returned in ID(<str name="id">1</str>).
Where can I do the changes? I have multiple cores, and each core have seperate schema.xml, (say server/solr/firstCore/conf/fcschema.xml and server/solr/secondCore/conf/scschema.xml). In core.properties of each core, I have written schema file name as schema=fcschema.xml
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="indent">true</str>
<str name="q">status:inbox</str>
<str name="_">1444301939167</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="3" start="0">
<doc>
<str name="id">1</str>
<arr name="fid">
<long>6767</long>
</arr>
<arr name="resid">
<long>384</long>
</arr>
<arr name="status">
<str>inbox</str>
</arr>
<long name="_version_">1514456876026167296</long></doc>
...
</result>
</response>
Entries in schema file:
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="resid" type="int" indexed="true" stored="true" multiValued="false" />
<field name="title" type="string" indexed="true" stored="true" multiValued="false" />
<field name="name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="cat" type="string" indexed="true" stored="true" multiValued="true" />
<field name="role" type="string" indexed="true" stored="true" multiValued="true" />
<field name="exp" type="float" indexed="true" stored="true" multiValued="false" />
So I wanted to ask:
Where can I do the changes to get result in string rather than array?
How can I verify that, my core is using specified schema file?
To search for the docs having status as inbox filter, I have to perform status:"inbox search" exactly, but I want this doc when I search for status:inbox or status:filter. How to do? I think this problem will get solved after solving first one.
Although this question is not relevant to this topic, but where can I set default output format to xml, rather than json? I tried in solrconfig.xml, but couldn't get it.
PS: I restarted solr after doing anything in any of the xml file, and I'm using solr-5.3
Please feel free to ask for clarification in case the question is unclear. Thanks in advance. :)
Although I have done changes in schema.xml, but I noticed that It was not getting reflected, and later on I came to know that, solr 5.3.x implicitly makes managed-schema.xml, editing which solved all my queries. Check here:
Why is solr returning result with only exact search?
But the problem #4 is still pending. I have tried <str name="wt">xml</str> and wrote response writer also <queryResponseWriter name="xml" class="solr.XMLResponseWriter" />, but couldn't resolve it. Neither adding default="true" did! Can anyone provide me any suggestion?
I had the same issue today: I was migrating from SOLR 4.x to 5.x and suddenly saw after dumping the data in all of the objects had their values nested inside arrays. Not being sure whether the issue was with Haystack or the load script, I tried inserting a few new records via the SOLR dashboard. Same thing, but I noticed a few SOLR specific fields were loading fine.
This bug seems to be related to the field type that you specify. "tstrings" (i believe this is the default via haystack) will make the data stored nested inside arrays, but the "strings" type works just fine. Below is an example of a field specification that allowed me to go from values that were arrays to string values.
<field name="external_id" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
It seems the Haystack schema.xml generator needs some work to adapt to new conventions with Solr 5.x.
It took some time, but the best way I found to fix all of my fields was insert a JSON record and check whether each field came in with the correct format. Go one by one until they're all working properly.
If I find some time I'll look at Haystack's SOLR schema generator and see what might have changed.
Hope this helps someone!
I had the same problem , while migrating from 4.9 to 6.x. I noticed fields defined as text_general returned data as an Array. The same field returned a string type in 4.9 version of solr. Interestingly some fields were not converted to array in solr 6.x. I did not use the "managed-schema", I was using the Classic schema.xml.
To solve the problem I took the schema.xml from solr 4.9 and moved to the conf/ directory of my new solr core. So all the fields definitions were from solr 4.9, I used the solrconfig.xml from solr 6.x but I disabled the updateRequestProcessorChain, as I am not going to use "field guessing"...etc. Once I restarted solr and reindexed content, that solved the problem, I did not see any data element being returned as array, unless its a multi-valued field.

CopyField in Solr Doesnt Seem to Work

I am trying to use the copyField directive in Solr to copy some fields into a catch-all field for searching. Unfortunately the field does not seem to be populated via the copyField directives at all.
Here are my source fields:
<field name="firstName" type="text_general" indexed="true" stored="true" required="false" />
<field name="lastName" type="text_general" indexed="true" stored="true" required="false" />
<field name="postCode" type="text_general" indexed="true" stored="true" required="false" />
<field name="emailAddress" type="text_general" indexed="true" stored="true" required="false" />
<!-- suggest field -->
<field name="name_Search" type="textSuggest" indexed="true" stored="true" multiValued="true" />
And here are my copyField directives:
<!-- copy fields -->
<copyfield source="firstName" dest="name_Search" />
<copyfield source="lastName" dest="name_Search" />
<copyfield source="emailAddress" dest="name_Search" />
<copyfield source="postCode" dest="name_Search" />
Now running a query on the "name_Search" field does not yield any results, and the field does not appear in the schema browser.
Do I need to do anything else to get copyField working? I am running Solr v5.2.1.
EDIT
Here is the textSuggest field type used for the catch-all field:
<fieldType class="solr.TextField" name="textSuggest" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
In the SolrConfig.xml, have configured the suggest handler as follows:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">default</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">name_Search</str>
<str name="suggestAnalyzerFieldType">textSuggest</str>
<str name="buildOnStartup">true</str>
<str name="buildOnCommit">true</str>
</lst>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
I know the suggest handler works, as if I explicitly fill the 'name_Search' field, then I can get results as expected.
In your filters, use copyField instead of copyfield (with capital F).
Source : Documentation of Solr

Search a multi-valued field with a self join

I have the following (simplified) solr schema:
<schema name="documents" version="1.1">
<uniqueKey>id</uniqueKey>
...
<fields>
<field
name="id"
type="string"
indexed="true"
stored="true"
required="true"/>
<field
name="documentReferences"
type="string"
indexed="true"
stored="false"
multiValued="true"
required="false"/>
</fields>
</schema>
The values which will be in this documentReferences field are all ids of other documents which are indexed in this solr core.
The search I want to accomplish (in english):
Documents who's id is not in any other document's documentReferences field
Is this possible? I don't have a problem indexing another field if it would help answer this question.
One of the Solution, I was thinking of was
Index the Id with the Document references itself, that would make sure if the document is not referenced by any other document the count would be one for sure
Search for All Documents Facet on the Document references and then filter the facets with count 1, which would be the list of the ids not refered by the other ids
Would have loved to use the facet maxcount param which would have limited the results for the search out of the box.

Resources