Searching and match count for phrase with Solr - search

I am using Solr to index documents and now I need to search those documents for an exact phrase and sort the results by the number of times this phrase appears on the document. I also have to present the number of times the phrase is matched back to the user.
I was using the following query (here I am searching by the word SAP):
{
:params => {
:wt => "json",
:indent => "on",
:rows => 100,
:start => 0,
:q => "((content:SAP) AND (doc_type:ClientContact) AND (environment:production))",
:sort => "termfreq(content,SAP) desc",
:fl => "id,termfreq(content,SAP)"
}
}
Of course this is a representation of the actual query, that is done by transforming this hash into a query string at runtime.
I managed to get the search working by using content:"the query here" instead of content:the query here, but the hard part is returning and sorting by the termfreq.
Any ideas on how I could make this work?
Obs: I am using Ruby but this is a legacy application and I can't use any RubyGems, I am using the HTTP interface to Solr here.

I was able to make it work adding a ShingleFilter to my schema.xml:
In my case I started using SunSpot, so I just had to make the following change:
<!-- *** This fieldType is used by Sunspot! *** -->
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!-- This is the line I added -->
<filter class="solr.ShingleFilterFactory" maxShingleSize="4" outputUnigrams="true"/>
</analyzer>
</fieldType>
After doing that change, restarting Solr and reindexing, I was able to use termfreq(content, "the query here") both on my query (q=), on the returning fields (fl=) and even on sorting (sort=).

put debug=results at the end of solr url
it will give you the phrase freq also.

Related

How to use more than one multivalued field in solr search

I have documents that has multivalue fields in my solr. I want to make search according to these multivalue fields.
When I want to query with;
http://localhost:8983/solr/demo/select?q=*:*&fq=id:FEAE38C2-ABFF-4F0C-8AFD-9B8F51036D8A
it gives me the following query result.
response": {
"numFound": 1,
"start": 0,
"docs": [
{
"created_date": "2016-03-23T13:47:46.55Z",
"solr_index_date": "2016-04-01T08:21:59.78Z",
"TitleForUrl": "it-s-a-wonderful-life",
"modified_date": "2016-03-30T08:45:44.507Z",
"id": "FEAE38C2-ABFF-4F0C-8AFD-9B8F51036D8A",
"title": "It's a wonderful life",
"article": "An angel helps a compassionate but despairingly frustrated businessman by showing what life would have been like if he never exis",
"Cast": [
"James Stewart",
"Donna Reed",
"Lionel Barrymore"
],
"IsCastActive": [
"false",
"true",
"true"
]
}
]
}
As you see I have 2 maltivalue fields that are named "Cast" and "IsCastActive".
My problem is When I add filters like Cast:"James Stewart" AND IsCastActive = "true" like the following:
http://localhost:8983/solr/demo/select?q=*:*&fq=id:FEAE38C2-ABFF-4F0C-8AFD-9B8F51036D8A&fq=Cast:"James Stewart"&fq=IsCastActive:"true"
Solr still gives the same result but "James Stewart" is not active in the document. So, I don't want Solr to response any document acconding to my query.
I think I'm doing something wrong. What's the correctly way to do it?
This does not look much possible in a straight forward manner here in Solr . But i think more effective way would be that you keep your Cast member's name as key , and then associate it with the value as true , or false and then filter on your username as key . Something like this : James Stewart :["true"] . Or may be you can use a single field that store cast name and his/her activity status delimited by a colon . . Something like this castInfo:["James Stewart:false","John Sanders:true"] . You can filter on it then by something like this fq=castInfo:"James Stewart:false" .
I want to propose an alternative solution to your problem. Such solution stores true/false as payloads integers. So the idea is to have a field called cast having a definition in the schema like:
<field name="cast" type="payloads" indexed="true" stored="true"/>
<fieldtype name="payloads" stored="false" indexed="true" class="solr.TextField" >
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="integer"/>
</analyzer>
<similarity class="payloadexample.PayloadSimilarityFactory" />
</fieldtype>
The content can be indexed for instance as:
James Stewart|0
Donna Reed|1
where 0/1 is true/false.
Using payloads would also allow you to read directly from the posting list improving your performance on relevant queries.
Here you can find an example explaining how to achieve what I explained above.

CAML person or Group field with Multiple values

I have a Field with Name TargetedPeople in a Sharepoint list. This is a Person or User group which can have multiple values.
The CAML I used for Querying is
siteDataQuery.Query = #"<Where><Includes><FieldRef Name='TargetedPeople' LookupId='TRUE'/><Value Type='Integer'>" + webInContext.CurrentUser.ID + "</Value></Includes></Where>";
This works fine if we have set the Allow Multiple Selections for "No" in the field. But this seems like not working for if it set to "Yes".
Please share me how to query a multiple field.
After some changes I was able to figure out this. If the field has multiple values we should define the value Type as LookupMulti. Following is the working code sample
siteDataQuery.Query = #"<Where><Contains><FieldRef Name='TargetedPeople' LookupId='TRUE'/><Value Type='LookupMulti'>" + webInContext.CurrentUser.ID + "</Value></Contains></Where>";
Also works and looks like simpler (tested on SP Online):
<Contains>
<FieldRef Name='TargetedPeople' />
<Value Type="Integer">
<UserID Type="Integer" />
</Value>
</Contains>

Sitecore Lucene search indexing and subfolders

How can I make Lucene include results, indexed outside the siteroot eg. stuff based with a root of fx. "/sitecore/content/stuff", but not placed in "/sitecore/content/Home".
Taking a look at SearchManager.cs in "/sitecore modules/LuceneSearch/, the SiteRoot is defined as "SiteCore.Content.Site.Startpath", but making any changes to this file dosent seem to have any affect.
Note:
I am only using the "LuceneResults".ascx & .cs.
----- Question updated, as I narrowed in what the problem might be -----
Im trying to create an index of a specific set of items, for use in a Lucene search.
In web.config, I have specified an index containing:
...
<root>/sitecore/content/Home/Subfolder</root>
...
and that works flawlessly, getting all the subitems when doen a search.
I have then copied exactly the same items to a new location, and updated my web.config as following:
...
<root>/sitecore/content/newSubfolder/Subfolder/Subfolder</root>
...
Now my searches never finds anything!
Does anyone have an idea what could be the problem here.
Note:
- I have rebuild the Search Index db, at every change.
- In "Luke" the index seems fine, and the the search here yields the proper results.
Complete Index:
<index id="faqindex" type="Sitecore.Search.Index, Sitecore.Kernel">
<param desc="name">$(id)</param>
<param desc="folder">__faq</param>
<Analyzer ref="search/analyzer"/>
<locations hint="list:AddCrawler">
<resources type="Sitecore.Search.Crawlers.DatabaseCrawler, Sitecore.Kernel">
<database>master</database>
<root>/sitecore/content/MyContent/Snippets/FAQ</root>
<include hint="list:IncludeTemplate">
<faqblock>{3340AAAE-B2F8-4E22-8B7B-F3EDDB48587E}</faqblock>
</include>
<tags>faqblock</tags>
<boost>1.0</boost>
</resources>
</locations>
</index>
It sounds like you are using the Lucene Search module from Sitecore Marketplace. The code for this module limits the search results to the site root and its children:
public SearchManager(string indexName)
{
SearchIndexName = indexName;
Database database = Factory.GetDatabase("master");
var item = Sitecore.Context.Site.StartPath;
SiteRoot = database.GetItem(item);
}
[...]
public SearchResultCollection Search(string searchString)
{
//Getting index from the web.config
var searchIndex = Sitecore.Search.SearchManager.GetIndex(SearchIndexName);
using(IndexSearchContext context = searchIndex.CreateSearchContext())
{
SearchHits hits = context.Search(searchString, new SearchContext(SiteRoot));
sitecore modules\Lucene Search\SearchManager.cs
Assuming that the "website" node in the sites section of Web.config has startItem="/home", results outside of the "home" hierarchy will not be returned.
If you download the source code for this project, and edit the line that populates SiteRoot to the following, the new items will be returned:
SiteRoote = database.GetItem("/sitecore/content");
Remember to copy the new LuceneSearch.dll to the bin directory of the website project.

SOLR sorting based on date : storing date as toISOString(), whereas toUTCString fails

During storing date field into SOLR, I convert Date() to toISOString() and it accepts. I tried storing using toUTCString, but it fails.
Now while searching, I am sorting based on date, I do get result, but these are not sorted in an descending order, rather I get it in mixed order.
I tried specifying a range, using [NOW-1YEAR/DAY TO NOW/DAY+1DAY], but the result is still the same. First I get 6 days old document, then 30min old doc and then 2 months old doc.
what should be the right approach ?
EDIT:
Here is the date field that i added in schema.xml
<field name="message_date" type="date" indexed="true" stored="false" />
and here are the parameters, I am sending during each search,
query = "*:*";
var options = {
fq: '{!geofilt}',
sfield: 'location',
pt: latitude+','+longitude,
d: 10,
sort: ["message_date desc", "geodist() asc"],
start: 0,
rows: 10
}
solrclient.query(query, options, function(err, solrRes){
....
});
This is javascript in the server side, node.js code.
The above code is fine and it is working. Problem was, after retrieving the result from SOLR, I do a finer search in my database to get more details and that was not sorted.
So sorted the result from Mongodb after retrieving the result from SOLR and that worked.
Using nodejs and the solr module is, https://github.com/gsf/node-solr

How can I write a SPQuery to filter items based on a LinkFieldValue?

I need to select a single value from a SharePoint list based on a field value. The type of the field is LinkFieldValue. How should I write the CAML query?
When I select the items with an empty query, I receive all the items in the list as expected.
When I add constraints to the query, it returns an empty result. I have tried constructing the query as follows:
string.Format("<Where><Eq><FieldRef Name=\"PollInstancePoll\" /><Value "
+"Type=\"Text\">{0}</Value></Eq></Where>",
new LinkFieldValue { NavigateUrl = "/az/Lists/Polls/DispForm.aspx?ID=1",
Text = "example poll" });
which results in the following query text:
<Where><Eq><FieldRef Name="PollInstancePoll" />
<Value Type="Text">example poll</Value>
</Eq></Where>
I have solved my problem with the following query:
new SPQuery
{
Query =
CAML.Where(
CAML.And(
CAML.Contains(
CAML.FieldRef("PollInstancePoll"),
CAML.Value(pollPath)),
CAML.Contains(
CAML.FieldRef("PollInstancePage"),
CAML.Value(pagePath))))
};
Essentially I am checking only the URL part of the Link field, and providing the value for comparison as Type="Text". It is important to remember that SharePoint stores the values in database always as server-relative URLs.

Resources