We are trying to use Solr to search our document contents, however I want to be able to search for fields that match internally. I have looked but cannot find anything on self-referential or inner joins.
So for example:
<doc>
<field name="id">12345</field>
<field name="author">Smith</field>
<field name="last_edit">Smith</field>
...
</doc>
Obviously a (author:Smith AND last_edit:Smith) would work, but I would like to be able to search for all documents where author and last_edit are the same, not necessarily a fixed value. Defining a new field is fine.
Related
I am trying to find the document with the latest date in Solr. Is there an efficient way of doing this with the Solr query syntax ?
For now, I have been reading the documentation, but I am only getting "numFound": 0 when I try to query any date in the fq parameter, even though that date does exist in the document.
Example query:
q box:
*:*
fq box:
date_published:"2019-02-28T11:57:29.926Z"
I defined in the schema this field like so:
<field name="date_published" type="pdate" indexed="true" stored="true" required="true"/>
<fieldType name="pdate" class="solr.DatePointField" docValues="true"/>
The above date does exist with the document, but it shows "numFound":0. Although this issue is just a first step, I would actually like to find the latest document with the latest date.
I have a email id field in my table on which solr search is enabled with wildcard
For a email abc.xyz#pqr.com
Whenever I search abc.xyz* I am able to search, if I search pqr.com* I am able to search but whenever I search abc.xyz#pqr.com* I dont get any results.
Below is the xml configuration of the field
<field indexed="true" multiValued="false"
name="user_email_id" stored="true" type="TextField"/>
below is the generated query
SELECT * FROM example WHERE
solr_query='{"q":"user_email_id:Shubha.Sao#techdata.com*","start":0}' LIMIT 50;
The problem is that your email is split into tokens, and instead of full email you most probably get 2 tokens: Shubha.Sao & techdata.com. You can check how the text is split by your current tokenizer in the Solr UI.
Instead of the TextField with its default StandardAnalyzer you need to use either StrField, or customize analyzer to avoid tokenization of the email - for example, you can use KeywordTokenizer that will leave email intact, but you'll able to apply additional filters, like, LowerCaseFilter. Or you can use UAX29URLEmailTokenizer.
I'm developing a search application using Solr that is required to search 'books' that are split into chapters. A book might look like this:
title: "book title"
author: "mr whoever"
chapters: [
{
title: "some chapter title"
text: "blah blah blah"
},
{
title: "some other title"
text: "blah blah blah"
},
... etc.
]
Requirements for the search:
The user is searching for books not chapters, so the top results must be the most relevant books overall, given all the chapter text inside.
The user needs to see which chapters from a book have matched, information about those chapters and how many matches there were per chapter.
Progress:
Multivalued fields
Solr supports multi valued fields (i.e. multiple chapters per book) but it isn't possible to have two fields (title and text) per field on the book document.
Solr "Join"
I don't know if this is necessary. Each chapter will only be owned by one book so it seems like we could just put them all in one document without too much repetition.
Dynamic fields
Have fields like "chapter1text_txt", "chapter1title_txt" and "chapter2text_txt" for example and only join up the per chapter information independent of solr, so solr doesn't know that "chapter1text_txt", "chapter1title_txt" are part of the same thing.
What is the proper way of configuring schema.xml to support and search this type of document?
Document structure
So far the best solution has been using multivalued fields for both chapter_title and chapter_text, and enforcing a consistent ordering of these values in the upload documents, so the first chapter_title always corresponds to the first chapter_text and so on.
Here's the section of schema.xml:
<field name="report_title"
type="text_en" indexed="true" stored="true"/>
<field name="chapter_title"
type="text_en" indexed="true" stored="true" multiValued="true"/>
<field name="chapter_text"
type="text_en" indexed="true" stored="true" multiValued="true"/>
This is a compromise because the index cannot know about this relationship between chapter_title and chapter_text, so it is impossible to ask for "chapters with X in the title and Y in the text".
Match Counts
I still haven't found a way of doing this, but I'm considering using highlighting and counting the number of highlighted terms after asking for one large snippet covering the whole document.
I am indexing a collection of xml document with the next structure:
<mydoc>
<id>1234</id>
<name>Some Name</name>
<experiences>
<experience years="10" type="Java"/>
<experience years="4" type="Hadoop"/>
<experience years="1" type="Hbase"/>
</experiences>
</mydoc>
Is there any way to create solr index so that it would support the next query:
find all docs with experience type "Hadoop" and years>=3
So far my best idea is to put delimited years||type into multiValued string field, search for all docs with type "Hadoop" and after that iterate through the results to select years>=3. Obviously this is very inefficient for a large set of docs.
I think there is no obvious solution for indexing data coming from the many-to-many relationship. In this case I would go with dynamic fields: http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
Field definition in schema.xml:
<dynamicField name="experience_*" type="integer" indexed="true" stored="true"/>
So, using your example you would end up with something like this:
<mydoc>
<id>1234</id>
<name>Some Name</name>
<experience_Java>10</experience_Java>
<experience_Hadoop>4</experience_Hadoop>
<experience_Hbase>1</experience_Hbase>
</mydoc>
Then you can use the following query: fq=experience_Java:[3 to *]
i have two fields:
title
body
and i want to search for two words
dog
OR
cat
in each of them.
i have tried q=*:dog OR cat
but it doesnt work.
how should i type it?
PS. could i enter default search field = ALL fields in schema.xml in someway?
As Mauricio noted, using a copyField (see http://wiki.apache.org/solr/SchemaXml#Copy_Fields) is one way to allow searching across multiple fields without specifying them in the query string. In that scenario, you define the copyField, and then set the fields that get copied to it.
<field name="mysearchfield" type="string" indexed="true" stored="false"/>
...
<copyField source="title" dest="mysearchfield"/>
<copyField source="body" dest="mysearchfield"/>
Once you've done that, you could do your search like:
q=mysearchfield:dog OR mysearchfield:cat
If your query analyzer is setup to split on spaces (typical), that could be simplified to:
q=mysearchfield:dog cat
If "mysearchfield" is going to be your standard search, you can simplify things even further by defining that copyField as the defaultSearchField in the schema:
<defaultSearchField>mysearchfield</defaultSearchField>
After that, the query would just become:
q=dog cat