Solr copyField order - search

In Solr does the order of the copyField mentioned in schema.xml makes any difference to search results?
Ex:
<copyField source="name" dest="text" />
<copyField source="title" dest="text" />
<copyField source="description" dest="text" />
What if I have
<copyField source="description" dest="text" />
<copyField source="title" dest="text" />
<copyField source="name" dest="text" />
Is order of the text stored in 'text' will it have any effect if we perform fulltext search on 'text' field?
Thanks in advance

It does not depend on the order to the field definition, unless you have a maximum field length defined in which case the terms would be discarded.
However, the discard terms would also not be controlled by the order defined but rather the order in which the data is fed to Solr or ordered by it internally.
Else, the search should behave in the same manner.

Related

How to get solr result's doc fields in str rather than arr?

I have made an Index, secondCore {id, resid, title, name, cat, role, exp}. When I execute query, then result fields in doc is returned as array (<arr name="fid"><long>6767</long></arr>), but I want it to be string, as it returned in ID(<str name="id">1</str>).
Where can I do the changes? I have multiple cores, and each core have seperate schema.xml, (say server/solr/firstCore/conf/fcschema.xml and server/solr/secondCore/conf/scschema.xml). In core.properties of each core, I have written schema file name as schema=fcschema.xml
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="indent">true</str>
<str name="q">status:inbox</str>
<str name="_">1444301939167</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="3" start="0">
<doc>
<str name="id">1</str>
<arr name="fid">
<long>6767</long>
</arr>
<arr name="resid">
<long>384</long>
</arr>
<arr name="status">
<str>inbox</str>
</arr>
<long name="_version_">1514456876026167296</long></doc>
...
</result>
</response>
Entries in schema file:
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="resid" type="int" indexed="true" stored="true" multiValued="false" />
<field name="title" type="string" indexed="true" stored="true" multiValued="false" />
<field name="name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="cat" type="string" indexed="true" stored="true" multiValued="true" />
<field name="role" type="string" indexed="true" stored="true" multiValued="true" />
<field name="exp" type="float" indexed="true" stored="true" multiValued="false" />
So I wanted to ask:
Where can I do the changes to get result in string rather than array?
How can I verify that, my core is using specified schema file?
To search for the docs having status as inbox filter, I have to perform status:"inbox search" exactly, but I want this doc when I search for status:inbox or status:filter. How to do? I think this problem will get solved after solving first one.
Although this question is not relevant to this topic, but where can I set default output format to xml, rather than json? I tried in solrconfig.xml, but couldn't get it.
PS: I restarted solr after doing anything in any of the xml file, and I'm using solr-5.3
Please feel free to ask for clarification in case the question is unclear. Thanks in advance. :)
Although I have done changes in schema.xml, but I noticed that It was not getting reflected, and later on I came to know that, solr 5.3.x implicitly makes managed-schema.xml, editing which solved all my queries. Check here:
Why is solr returning result with only exact search?
But the problem #4 is still pending. I have tried <str name="wt">xml</str> and wrote response writer also <queryResponseWriter name="xml" class="solr.XMLResponseWriter" />, but couldn't resolve it. Neither adding default="true" did! Can anyone provide me any suggestion?
I had the same issue today: I was migrating from SOLR 4.x to 5.x and suddenly saw after dumping the data in all of the objects had their values nested inside arrays. Not being sure whether the issue was with Haystack or the load script, I tried inserting a few new records via the SOLR dashboard. Same thing, but I noticed a few SOLR specific fields were loading fine.
This bug seems to be related to the field type that you specify. "tstrings" (i believe this is the default via haystack) will make the data stored nested inside arrays, but the "strings" type works just fine. Below is an example of a field specification that allowed me to go from values that were arrays to string values.
<field name="external_id" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
It seems the Haystack schema.xml generator needs some work to adapt to new conventions with Solr 5.x.
It took some time, but the best way I found to fix all of my fields was insert a JSON record and check whether each field came in with the correct format. Go one by one until they're all working properly.
If I find some time I'll look at Haystack's SOLR schema generator and see what might have changed.
Hope this helps someone!
I had the same problem , while migrating from 4.9 to 6.x. I noticed fields defined as text_general returned data as an Array. The same field returned a string type in 4.9 version of solr. Interestingly some fields were not converted to array in solr 6.x. I did not use the "managed-schema", I was using the Classic schema.xml.
To solve the problem I took the schema.xml from solr 4.9 and moved to the conf/ directory of my new solr core. So all the fields definitions were from solr 4.9, I used the solrconfig.xml from solr 6.x but I disabled the updateRequestProcessorChain, as I am not going to use "field guessing"...etc. Once I restarted solr and reindexed content, that solved the problem, I did not see any data element being returned as array, unless its a multi-valued field.

Search a multi-valued field with a self join

I have the following (simplified) solr schema:
<schema name="documents" version="1.1">
<uniqueKey>id</uniqueKey>
...
<fields>
<field
name="id"
type="string"
indexed="true"
stored="true"
required="true"/>
<field
name="documentReferences"
type="string"
indexed="true"
stored="false"
multiValued="true"
required="false"/>
</fields>
</schema>
The values which will be in this documentReferences field are all ids of other documents which are indexed in this solr core.
The search I want to accomplish (in english):
Documents who's id is not in any other document's documentReferences field
Is this possible? I don't have a problem indexing another field if it would help answer this question.
One of the Solution, I was thinking of was
Index the Id with the Document references itself, that would make sure if the document is not referenced by any other document the count would be one for sure
Search for All Documents Facet on the Document references and then filter the facets with count 1, which would be the list of the ids not refered by the other ids
Would have loved to use the facet maxcount param which would have limited the results for the search out of the box.

XSD for XML with all unique node names

My XML file looks something like this:
<Fields>
<Humanities>
<Performing_Arts>
<Dance />
<Music />
</Performing_Arts>
<Visual_Arts>
<Painting />
<Sculptue />
</Visual_Arts>
</Humanities>
<Social_Sciences>
<Psychology>
<Cultural_Psychology />
<Social_Psychology />
</Psychology>
</Social_Sciences>
</Fields>
I want to write an XML Schema, for this file, so that no two nodes, irrespective of location in the file can have duplicate names.
Any node in this file should be allowed to have unlimited child nodes, to any sub-level.
How might I achieve this goal?
skaffman is quite right, you needto enclose your values as either attributes or elements, if you are unsure, w3 schools hasa great tutorial on this;
http://www.w3schools.com/xml/xml_elements.asp
http://www.w3schools.com/xml/xml_attributes.asp
An example of a possible xml representation of your data might be:
<fields>
<department name="Humanities">
<subject name="Peforming Arts">
<topic name="Dance"/>
<topic name="Music"/>
</subject>
<subject name="Visual Arts">
<topic name="Painting"/>
<topic name="Sculpture"/>
</subject>
</department>
<department name="Social Sciences">
<subject name="Psychology">
<topic name="Cultural Psychology"/>
<topic name="Social Psychology"/>
</subject>
</department>
</fields>
Notes:
You can see that this is roughly equivalent to a database with three tables: department, subject and topic, with FK relationships between the parent and children. This is really what XML encapsulates, but in text form, and is the sort of thing to bear in mind while you design your layout.
I've used all lower-case names for elements and attributes. This is a personal thing as xsl/xpath as case sensitive, so making everything lowercase avoids the opporyunity for horrid bugs later

Complex query in Solr, is it possible?

Hey guys, I am new to Solr, and want to accomplish the following scenario (below), but not sure if Solr is capable of handling cases like that:
The problem very straight forward, I want to build a price comparison search. There are my rational DB tables:
t_company:
company_id
company_name
t_product:
product_id
product_price
t_company_product:
company_product_id
company_id
product_id
In Solr, I want to perform the following search - Get all companies that offer 1 or many of specific products for the lowest TOTAL price (so if you select screws, nails, and sheet rock, I want to give a total purchase lowest price).
When I set up my schema, I set the business as the main entity and product_ids and product_prices as two multivalued fields.
Can I query like that? How would I do sum?
Here is all my XML schema.xml and data-config.xml
<document name="companies">
<entity name="company" dataSource="dsCompany"
query="select
newid() as row_id,
company_id,
company_name
from
t_company WITH (NOLOCK)">
<field column="row_id" name="row_id" />
<field column="company_id" name="company_id" />
<field column="company_name" name="company_name" />
<entity name="products" query="select
company_product_id,
product_id,
price
from
t_company_product WITH (NOLOCK)
where
company_id='${company.company_id}'"
dataSource="dsCompany">
<field name="company_product_id" column="company_product_id" />
<field name="product_id" column="product_id" />
<field name="price" column="price" />
</entity>
</entity>
<fields>
<field name="row_id" type="string" indexed="true" stored="true" required="true"/>
<field name="company_id" type="integer" indexed="true" stored="true" required="true" />
<field name="company_name" type="text" indexed="true" stored="true"/>
<field name="service_id" type="integer" indexed="true" stored="true" required="true" />
<field name="price" type="tfloat" indexed="true" stored="true" required="true" />
</fields>
Any feedback will be greatly appreciated!!!
You can use a function query to sort the results by a sum, see here.
In my last project we used a nightly build of 4.0 and it is working fine. It contains so much more functionality than 1.4 that is worth the small risk you may take by using a non released version.
Update:
To use the sum you could try to do add a dynamic field per each product price (I don't know how to use the sum with multivalued fields or if it is possible).
Add to data-config
<field name="price_${products.product_id}" column="price" />
Add to schema.xml
<dynamicField name="price_*" type="decimal" indexed="false" stored="true" />
and if I understand it correctly you should be able to use a query like:
q=:&sort=sum(price_"id for nails",price_"id for screws",price_"id for ...") asc
In 1.4.1 probably, in current trunk (4.0) no or at least not easily.
In solr 1.4 there is field collapsing that can perform aggregates over the records returned. In trunk solr 4.0 this has turned into a grouping option that can perform only min / max type queries (as far as I'm aware).
The documentation can be found here:
http://wiki.apache.org/solr/FieldCollapsing
Remember you'll have to expand out the relationships ( consider it as 1 big denormalised view over the tables involved ).
Solr is not intended to replace a relational database. If you would still like to index relational content then they need to be denormalized hence would contain redundant data. So the count of # of results will be off for most of the queries, for example a search for just the company name will yield a higher total number of results than expected. However with field collapsing you can get away from it. However if you use faceting then eliminating duplicates from there is not possible afaik.
If you form a single schema with all the data that you had mentioned then you could perform the relational queries to a certain extent. Google "solr issue 2272" to get the details. It is currently possible only within a single schema.
Performing a summation operation within a search engine is not possible at this time i believe. i might be wrong and if someone knows a way to do it, i will be very interested also.
I think you might be asking about how to customize scoring. Here's an example in lucene.
http://sujitpal.blogspot.com/2010/10/custom-scoring-with-lucene-payloads.html
From LucidImagination
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/

Solr exact word search

I want to configure my Solr search engine so I get an exact match for the search term I enter.
eg. 'taxes' should return documents with 'taxes' and not 'tax', 'taxation' etc.
Any help or tips would be appreciated.
I presume your field is a TextField, by default solr does a fuzzy search on this field. What you want is to set up your field as a string field and add no tokenizer then you'll get an exact match.
You can even combine the exact search with a fuzzy search and use DisMax to boost the relative weights.
Example (schema.xml) :
<field name="name" type="string" indexed="true" stored="false" required="true" />
<field name="nameString" type="string" indexed="true" stored="false" required="true" />
<copyField source="name" dest="nameString"/>
Example (solrconfig.xml) :
<requestHandler name="accounts" class="solr.SearchHandler">
<lst name="defaults">
<str name="defType">dismax</str>
<str name="qf">
nameString^10.0 name^5.0 description^1.0
</str>
<str name="tie">0.1</str>
</lst>
</requestHandler>
To turn off stemming in your schema.xml, you can define text field like this:
<types>
<!-- other fields definition -->
<fieldType name="text_no_stem" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<!-- other fields definition -->
</types>
<fields>
<!-- other fields definition -->
<dynamicField name="*_nostem" type="text_no_stem" indexed="true" stored="true"/>
<!-- other fields definition -->
</fields>
I'm using sunspot to integrate solr with Ruby on Rails. With this in the schema.xml I define my searchable block like this:
searchable do
text(:wants, as: :wants_nostem)
end
Turn off stemming.
Use the quotes for exact match result :
Example :
core Name : core1
Key : namestring
http://localhost:8983/solr/core1/select?q=namestring:"taxes"&wt=json&indent=true
Use solr string field whcih will do an exact value search e.g
<fieldType class="solr.StrField" name="string" omitNorms="true" sortMissingLast="true" />

Resources