RavenDB full-text search - search

Can you please tell how to perform simple full-text search in RavenDb. The database is stored document: Movie {Name = "Pirates of the Carribean"}. I wish that this document was found on the search phrase "Pirates Carribean" or any other combination of words.

Boris,
Rob's answer has the right index, but it is a bit awkward for querying. You can do that using:
session.Query<Movie, Movie_ByName>()
.Search(x=>x.Name, searchTerms)
.ToList()
That will

What you are worrying about isn't anything to do with full text - by default Lucene works on an OR basis and what you want is an AND
If I were you I'd do
String[] terms = searchTerm.Split(" "); // Or whatever the string.split method is
and
.Where("Name:(" + String.Join(" AND ", terms) + ")");
Your index should look something like
public class Movie_ByName : AbstractIndexCreationTask
{
public override IndexDefinition CreateIndexDefinition()
{
return new IndexDefinitionBuilder<Movie>
{
Map = movies => from movie in movies
select new { movie.Name, market.Id },
Indexes =
{
{x => x.Name, FieldIndexing.Analyzed}
}
}
.ToIndexDefinition(DocumentStore.Conventions);
}
You don't need storage, you're not requesting the data from lucene directly at any time. You might not even want an index (You might actually want FieldIndexing.Analyzed, and might get away with just using dynamic queries here)
Up to you though.

Here's how I acheived an "ANDing" term search.
First, make sure that your field is indexed and analyzed:
public class MyIndex: AbstractIndexCreationTask<MyDocument>
{
public MyIndex()
{
Map = docs => from d in docs
select new { d.MyTextField };
Index(x => x.MyTextField, FieldIndexing.Analyzed);
}
}
Then query from the client:
var query = session.Query<MyDocument, MyIndex>();
query = theSearchText
.Split(new[] {" "}, StringSplitOptions.RemoveEmptyEntries)
.Aggregate(query, (q, term) =>
q.Search(x => x.MyTextField, term, options: SearchOptions.And));

Related

Distinct values in Azure Search Suggestions?

I am offloading my search feature on a relational database to Azure Search. My Products tables contains columns like serialNumber, PartNumber etc.. (there can be multiple serialNumbers with the same partNumber).
I want to create a suggestor that can autocomplete partNumbers. But in my scenario I am getting a lot of duplicates in the suggestions because the partNumber match was found in multiple entries.
How can I solve this problem ?
The Suggest API suggests documents, not queries. If you repeat the partNumber information for each serialNumber in your index and then suggest based on partNumber, you will get a result for each matching document. You can see this more clearly by including the key field in the $select parameter. Azure Search will eliminate duplicates within the same document, but not across documents. You will have to do that on the client side, or build a secondary index of partNumbers just for suggestions.
See this forum thread for a more in-depth discussion.
Also, feel free to vote on this UserVoice item to help us prioritize improvements to Suggestions.
I'm facing this problem myself. My solution does not involve a new index (this will only get messy and cost us money).
My take on this is a while-loop adding 'UserIdentity' (in your case, 'partNumber') to a filter, and re-search until my take/top-limit is met or no more suggestions exists:
public async Task<List<MachineSuggestionDTO>> SuggestMachineUser(string searchText, int take, string[] searchFields)
{
var indexClientMachine = _searchServiceClient.Indexes.GetClient(INDEX_MACHINE);
var suggestions = new List<MachineSuggestionDTO>();
var sp = new SuggestParameters
{
UseFuzzyMatching = true,
Top = 100 // Get maximum result for a chance to reduce search calls.
};
// Add searchfields if set
if (searchFields != null && searchFields.Count() != 0)
{
sp.SearchFields = searchFields;
}
// Loop until you get the desired ammount of suggestions, or if under desired ammount, the maximum.
while (suggestions.Count < take)
{
if (!await DistinctSuggestMachineUser(searchText, take, searchFields, suggestions, indexClientMachine, sp))
{
// If no more suggestions is found, we break the while-loop
break;
}
}
// Since the list might me bigger then the take, we return a narrowed list
return suggestions.Take(take).ToList();
}
private async Task<bool> DistinctSuggestMachineUser(string searchText, int take, string[] searchFields, List<MachineSuggestionDTO> suggestions, ISearchIndexClient indexClientMachine, SuggestParameters sp)
{
var response = await indexClientMachine.Documents.SuggestAsync<MachineSearchDocument>(searchText, SUGGESTION_MACHINE, sp);
if(response.Results.Count > 0){
// Fix filter if search is triggered once more
if (!string.IsNullOrEmpty(sp.Filter))
{
sp.Filter += " and ";
}
foreach (var result in response.Results.DistinctBy(r => new { r.Document.UserIdentity, r.Document.UserName, r.Document.UserCode}).Take(take))
{
var d = result.Document;
suggestions.Add(new MachineSuggestionDTO { Id = d.UserIdentity, Namn = d.UserNamn, Hkod = d.UserHkod, Intnr = d.UserIntnr });
// Add found UserIdentity to filter
sp.Filter += $"UserIdentity ne '{d.UserIdentity}' and ";
}
// Remove end of filter if it is run once more
if (sp.Filter.EndsWith(" and "))
{
sp.Filter = sp.Filter.Substring(0, sp.Filter.LastIndexOf(" and ", StringComparison.Ordinal));
}
}
// Returns false if no more suggestions is found
return response.Results.Count > 0;
}
public async Task<List<string>> SuggestionsAsync(bool highlights, bool fuzzy, string term)
{
SuggestParameters sp = new SuggestParameters()
{
UseFuzzyMatching = fuzzy,
Top = 100
};
if (highlights)
{
sp.HighlightPreTag = "<em>";
sp.HighlightPostTag = "</em>";
}
var suggestResult = await searchConfig.IndexClient.Documents.SuggestAsync(term, "mysuggestion", sp);
// Convert the suggest query results to a list that can be displayed in the client.
return suggestResult.Results.Select(x => x.Text).Distinct().Take(10).ToList();
}
After getting top 100 and using distinct it works for me.
You can use the Autocomplete API for that where does the grouping by default. However, if you need more fields together with the result, like, the partNo plus description it doesn't support it. The partNo will be distinct though.

Elastic Search-Search string having spaces and special characters in it using C#

I am looking for ElasticSearch nest query which will provide exact match on string having spaces in it using C#.
for example - I want to search for a word like 'XYZ Company Solutions'. I tried querystring query but it gives me all the records irrespective of search result. Also i read on the post and found that we have to add some mappings for the field. I tried 'Not_Analyzed' analyzer on the field but still it does not worked.
Here is my code of C#
var indexDefinition = new RootObjectMapping
{
Properties = new Dictionary<PropertyNameMarker, IElasticType>(),
Name = elastic_newindexname
};
var notAnalyzedField = new StringMapping
{
Index = FieldIndexOption.NotAnalyzed
};
indexDefinition.Properties.Add("Name", notAnalyzedField);
objElasticClient.DeleteIndex(d => d.Index(elastic_newindexname));
var reindex = objElasticClient.Reindex<dynamic>(r => r.FromIndex(elastic_oldindexname).ToIndex(elastic_newindexname).Query(q => q.MatchAll()).Scroll("10s").CreateIndex(i => i.AddMapping<dynamic>(m => m.InitializeUsing(indexDefinition))));
ReindexObserver<dynamic> o = new ReindexObserver<dynamic>(onError: e => { });
reindex.Subscribe(o);**
**ISearchResponse<dynamic> ivals = objElasticClient.Search<dynamic>(s => s.Index(elastic_newindexname).AllTypes().Query(q => q.Term("Name","XYZ Company Solutions")));** //this gives 0 records
**ISearchResponse<dynamic> ivals1 = objElasticClient.Search<dynamic>(s => s.Index(elastic_newindexname).AllTypes().Query(q => q.Term(u => u.OnField("Name").Value("XYZ Company Solutions"))));** //this gives 0 records
**ISearchResponse<dynamic> ivals = objElasticClient.Search<dynamic>(s => s.Index(elastic_newindexname).AllTypes().Query(#"Name = 'XYZ Company Solutions'"));** //this gives all records having fields value starting with "XYZ"
If anyone have complete example or steps in C# then can you please share with me?
Have you tried the match_phrase query?
The query DSL the request is the following:
"query": {
"match_phrase": {
"title": "XYZ Company Solutions"
}
}
In C# try the following:
_client.Search<T>(s => s
.Index(IndexName)
.Types(typeof (T))
.Query(q => q.MatchPhrase(m => m
.OnField(f => f.Name)
.Query("XYZ Company Solutions"))));
Check the official documentation for more information:
http://www.elastic.co/guide/en/elasticsearch/guide/master/phrase-matching.html#phrase-matching
Please refer the below code ,I think this will meet your requirements.
Here I have created and mapped index with dynamic template and then did the XDCR.
Now all string fields will be not_analysed.
IIndicesOperationResponse result = null;
if (!objElasticClient.IndexExists(elastic_indexname).Exists)
{
result = objElasticClient.CreateIndex(elastic_indexname, c => c.AddMapping<dynamic>(m => m.Type("_default_").DynamicTemplates(t => t
.Add(f => f.Name("string_fields").Match("*").MatchMappingType("string").Mapping(ma => ma
.String(s => s.Index(FieldIndexOption.NotAnalyzed)))))));
}
Thanks
Mukesh Raghuwanshi
It looks like you just need to refresh the new index following the re-indexing operation.
Using your code example (and your first term query), I was seeing the same result -- 0 hits.
Adding the following Refresh call after the reindex.Subscribe() call results in a single hit being returned:
objElasticClient.Refresh(new RefreshRequest() { });

Sitecore HOWTO: Search item bucket for items with specific values

I have an item bucket with more then 30 000 items inside. What I need is to quickly search items that have particular field set to particular value, or even better is to make something like SELECT WHERE fieldValue IN (1,2,3,4) statement. Are there any ready solutions?
I searched the web and the only thing I found is "Developer's Guide to Item
Buckets and Search" but there is no code examples.
You need something like this. The Bucket item is an IIndexable so it can be searched using Sitecore 7 search API.
This code snippet below can easily be adapted to meet your needs and it's just a question of modifying the where clause.if you need any further help with the sitecore 7 syntax just write a comment on the QuickStart blog post below and I'll get back to you.
var bucketItem = Sitecore.Context.Database.GetItem(bucketPath);
if (bucketItem != null && BucketManager.IsBucket(bucketItem))
{
using (var searchContext = ContentSearchManager.GetIndex(bucketItem as IIndexable).CreateSearchContext())
{
var result = searchContext.GetQueryable<SearchResultItem().Where(x => x.Name == itemName).FirstOrDefault();
if(result != null)
Context.Item = result.GetItem();
}
}
Further reading on my blog post here:
http://coreblimey.azurewebsites.net/sitecore-7-search-quick-start-guide/
Using Sitecore Content Editor:
Go to the bucket item then In search tab, start typing the following (replace fieldname and value with actual field name and value):
custom:fieldname|value
Then hit enter, you see the result of the query, you can multiple queries at once if you want.
Using Sitecore Content Search API:
using Sitecore.ContentSearch;
using Sitecore.ContentSearch.Linq;
using Sitecore.ContentSearch.SearchTypes;
using Sitecore.ContentSearch.Linq.Utilities
ID bucketItemID = "GUID of your bucket item";
ID templateID = "Guid of your item's template under bucket";
string values = "1,2,3,4,5";
using (var context = ContentSearchManager.GetIndex("sitecore_web_index").CreateSearchContext())
{
var predicate = PredicateBuilder.True<SearchResultItem>();
predicate = PredicateBuilder.And(item => item.TemplateId == new ID(templateID)
&& item.Paths.Contains(bucketItemID));
var innerPredicate = PredicateBuilder.False<SearchResultItem>();
foreach(string val in values.Split(','))
{
innerPredicate = PredicateBuilder.False<SearchResultItem>();
innerPredicate = innerPredicate.Or(item => item["FIELDNAME"] == val);
}
predicate = predicate.And(innerPredicate);
var result = predicate.GetResults();
List<Item> ResultsItems = new List<Item>();
foreach (var hit in result.Hits)
{
Item item = hit.Document.GetItem();
if(item !=null)
{
ResultsItems .Add(item);
}
}
}
The following links can give good start with the Search API:
http://www.fusionworkshop.co.uk/news-and-insight/tech-lab/sitecore-7-search-a-quickstart-guide#.VPw8AC4kWnI
https://www.sitecore.net/learn/blogs/technical-blogs/sitecore-7-development-team/posts/2013/06/sitecore-7-poco-explained.aspx
https://www.sitecore.net/learn/blogs/technical-blogs/sitecore-7-development-team/posts/2013/05/sitecore-7-predicate-builder.aspx
Hope this helps!

Getting a list of documents with a maximum field value from CouchDB view

Let's say I have blog entries like these in my CouchDB database:
{"name":"Mary", "postdate":"20110412", "subject":"this", "message":"blah"}
{"name":"Joe", "postdate":"20110411", "subject":"that", "message":"yadda"}
{"name":"Mary", "postdate":"20110411", "subject":"and this", "message":"blah-blah"}
{"name":"Joe", "postdate":"20110410", "subject":"And other thing", "message":"yada-yada"}
{"name":"Jane", "postdate":"20110409", "subject":"Serious stuff", "message":"Not really"}
It's pretty easy to get a list of all posts. But how do I get a list of latest posts from all the users?
Like that:
{"name":"Mary", "postdate":"20110412", "subject":"this", "message":"blah"}
{"name":"Joe", "postdate":"20110411", "subject":"that", "message":"yadda"}
{"name":"Jane", "postdate":"20110409", "subject":"Serious stuff", "message":"Not really"}
Try with this map function:
function(doc) {
if (doc.postdate && doc.name) {
emit([doc.name, doc.postdate], 1);
}
}
and the following reduce function:
function(keys, values, rereduce) {
var max = 0,
ks = rereduce ? values : keys;
for (var i = 1, l = ks.length; i < l; ++i) {
if (ks[max][0][1] < ks[i][0][1]) max = i;
}
return ks[max];
}
and query it with group_level=1. It gives you the _id of the posts, then you can retrieve them all with a single query with the keys parameter or using a POST.
I am not sure if this is the best approach, but it seems to work.
UPDATE: fixed map to handle rereduce correctly.
You're going to emit the postdate as the key because keys are sorted. For example, this is what your map function will look like...
function(doc) {
if(doc.postdate) {
emit(doc.postdate, doc);
}
}
That will give you all the docs sorted ascending by postdate. If you want descending then query with ?descending=true
Cheers.

Select top/latest 10 in couchdb?

How would I execute a query equivalent to "select top 10" in couch db?
For example I have a "schema" like so:
title body modified
and I want to select the last 10 modified documents.
As an added bonus if anyone can come up with a way to do the same only per category. So for:
title category body modified
return a list of latest 10 documents in each category.
I am just wondering if such a query is possible in couchdb.
To get the first 10 documents from your db you can use the limit query option.
E.g. calling
http://localhost:5984/yourdb/_design/design_doc/_view/view_name?limit=10
You get the first 10 documents.
View rows are sorted by the key; adding descending=true in the querystring will reverse their order. You can also emit only the documents you are interested using again the querystring to select the keys you are interested.
So in your view you write your map function like:
function(doc) {
emit([doc.category, doc.modified], doc);
}
And you query it like this:
http://localhost:5984/yourdb/_design/design_doc/_view/view_name?startkey=["youcategory"]&endkey=["youcategory", date_in_the_future]&limit=10&descending=true
here is what you need to do.
Map function
function(doc)
{
if (doc.category)
{
emit(['category', doc.category], doc.modified);
}
}
then you need a list function that groups them, you might be temped to abuse a reduce and do this, but it will probably throw errors because of not reducing fast enough with large sets of data.
function(head, req)
{
% this sort function assumes that modifed is a number
% and it sorts in descending order
function sortCategory(a,b) { b.value - a.value; }
var categories = {};
var category;
var id;
var row;
while (row = getRow())
{
if (!categories[row.key[0]])
{
categories[row.key[0]] = [];
}
categories[row.key[0]].push(row);
}
for (var cat in categories)
{
categories[cat].sort(sortCategory);
categories[cat] = categories[cat].slice(0,10);
}
send(toJSON(categories));
}
you can get all categories top 10 now with
http://localhost:5984/database/_design/doc/_list/top_ten/by_categories
and get the docs with
http://localhost:5984/database/_design/doc/_list/top_ten/by_categories?include_docs=true
now you can query this with a multiple range POST and limit which categories
curl -X POST http://localhost:5984/database/_design/doc/_list/top_ten/by_categories -d '{"keys":[["category1"],["category2",["category3"]]}'
you could also not hard code the 10 and pass the number in through the req variable.
Here is some more View/List trickery.
slight correction. it was not sorting untill I added the "return" keyword in your sortCategory function. It should be like this:
function sortCategory(a,b) { return b.value - a.value; }

Resources