Lucene.net multi searcher highlight issue - c#-4.0

I am using lucene.net 2.9.4 (cannot upgrade atm). I am also making use of highlighter.net from lucene.net contrib. I can get it working fine when i am searching on one index my code looks like:
QueryScorer fragmentScorer = new QueryScorer(query.Rewrite(searcher.GetIndexReader()));
Highlighter highlighter = new Highlighter(this.HighlightFormatter, fragmentScorer);
Lucene.Net.Analysis.TokenStream tokenStream = this.HighlightAnalyzer.TokenStream(highlightField, new System.IO.StringReader(value));
return highlighter.GetBestFragments(tokenStream, value, this.MaxNumHighlights, this.Separator);
return highlightField;
The issue is when my searcher object is multisearcher then I do not have the GetIndexReader method.
With multi searcher you are using more than one reader under the hood so kind of makes sense you do not have GetIndexReader.
Is it even possible to highlight with multisearcher? If not then is there a way todo this?

Related

Fuzzy search with redis om & node js

I have been trying to implement fuzzy search in redis search, redis om with node js.
I've gone through articles like this but I have not managed to fix it.
This is my code sample of the search that I am currently implementing.
let searchResults = await repository.search()
.where("country").equal(correctCountry)
.where("city").equal(city.toLocaleLowerCase())
.and("descriptionAndStreet")
.matches(placedescription + "*").return.page(0, 20)
I would like to implement fuzzy search when searching the "placedescription".
Any assistance would be greatly appreciated.
Found the solution
Redis OM does not have a fluent interface for fuzzy matching. However, you can always do a raw search (https://github.com/redis/redis-om-node/#running-raw-searches) and pass in pretty much any query you want:
let query = `#country:{${correctCountry}} #city:{${city}} #descriptionAndStreet:%Whatyouwanttosearch%`
let places = await placeRepository.searchRaw(query).return.page(0, 10)
If you want to search with more than one word, ie space separated
let query = `#country:{${correctCountry}} #city:{${city}}
#descriptionAndStreet:%What% %you% %want% %to% %search%`
If you get issues with that you can try removing the spaces in between
%What%%you%%want%%to%%search%

Support to upload dictionary of synonyms in azure search

I was looking for a way to upload a text file of dictionary of synonyms in azure search, the nearest I could find was
https://azure.microsoft.com/en-in/blog/azure-search-synonyms-public-preview/
https://learn.microsoft.com/en-us/azure/search/search-synonyms
I know it is not a good idea to compare products of different companies, but if there exists a way to upload a dictionary of synonyms in azure search like it is supported in elastic search, then it will of great help and might save a lot of time and rework.
Please help me know how to achieve such thing like uploading the dictionary of the synonym in azure search
The latest .NET SDK for Azure Cognitive Search has this capability. From this sample:
// Create a new SearchIndexClient
Uri endpoint = new Uri(Environment.GetEnvironmentVariable("SEARCH_ENDPOINT"));
AzureKeyCredential credential = new AzureKeyCredential(
Environment.GetEnvironmentVariable("SEARCH_API_KEY"));
SearchIndexClient indexClient = new SearchIndexClient(endpoint, credential);
// Create a synonym map from a file containing country names and abbreviations
// using the Solr format with entry on a new line using \n, for example:
// United States of America,US,USA\n
string synonymMapName = "countries";
string synonymMapPath = "countries.txt";
SynonymMap synonyms;
using (StreamReader file = File.OpenText(synonymMapPath))
{
synonyms = new SynonymMap(synonymMapName, file);
}
await indexClient.CreateSynonymMapAsync(synonyms);
The SDKs for Java, Python, and Javascript also support creating synonym maps. The Java SDK accepts a string rather than a file stream, so you'd have to read the file contents yourself. Unfortunately the Python and Javascript SDKs seem to require a list of strings (one for each line of the file), which is something we should improve. I'm following up with the Azure SDK team to make these improvements.

Search phrase in a sentence using Lucene 5.5

Purpose: To build a dictionary (Sample Dictionary taken from Gutenberg project). This application should have the capability to return the "word" is part of the meaning is provided. Example:
CONSOLE
Con*sole", v. t. [imp. & p.p. Consoled; p.pr. & vb.n. Consoling.]
Etym: [L. consolari,. p.p. consolatus; con- + solari to console, comfort: cf. F. consoler. See Solace.]
Defn: To cheer in distress or depression; to alleviate the grief and raise the spirits of; to relieve; to comfort; to soothe. And empty heads console with empty sound. Pope. I am much consoled by the reflection that the religion of Christ has been attacked in vain by all the wits and philosophers, and its triumph has been complete. P. Henry.
Syn. -- To comfort; solace; soothe; cheer; sustain; encourage; support. See Comfort.
So if my query is "To cheer in distress", it should return me "Console" as the output.
Am trying to build this tool using Lucene 5.5 (lower versions won't do for now). This is what I tried:
Indexing:
Document doc = new Document();<br>
doc.add(new Field(MEANING, meaningOfWord, Store.YES, Field.Index.ANALYZED));<br>
doc.add(new Field(WORD, word, Store.YES, Field.Index.ANALYZED));<br>
indexWriter.addDocument(doc);<br>
Analyzing:
Analyzer analyzer = new WhitespaceAnalyzer();<br>
QueryParser parser = new QueryParser(MEANING, analyzer);<br>
parser.setAllowLeadingWildcard(true);<br>
parser.setAutoGeneratePhraseQueries(true);<br>
Query query = parser.parse(".*" + searchString + ".*");<br>
TopDocs tophits = isearcher.search(query, null, 1000);<br>
This (tophits) is not returning me what I want. (I have been trying Lucene from last week or so, so please excuse if this is very naive). Any clues?
Sounds like a different analyzer was used when the documents were indexed. Probably KeywordAnalyzer or something. You (usually) need to pass the same analyzer to IndexWriter when indexing your documents as the one you will use when searching. Also, bear in mind, after correcting the IndexWriter's analyzer, you will need to reindex your documents in order for them to be indexed correctly.
Wrapping what should be a simple phrase query in wildcards is a extremely poor substitute for analyzing correctly.
Found the solution, use WildCardQuery, like this:
WildcardQuery wildCardQ = new WildcardQuery(new Term(MEANING, searchString));
But for incorrect words/phrases, it sometimes takes long time to come back with the answer.

Sitecore 7 Search, cannot access a disposed object

I've been working with some Sitecore 7 search code. Example below.
using (var context = Index.CreateSearchContext())
{
// ....Build predicates
var query = context.GetQueryable<SearchResultItem>().Where(predicate);
return query.GetResults();
}
This works fine in SOLR, but when used with standard Lucene, whenever I reference a property in the SearchResults<SearchResultItem> returned by GetResults(), Sitecore errors with "Cannot access a disposed object". It appears that GetResults() doesn't enumerate and still hangs on to the searchcontext.
Anyone come across this before and know how to fix? I've seen some articles suggesting having the SearchContext in application state, but ideally I want to avoid this.
Thanks
Ian
It seems that SearchResults<T> holds reference to SearchHit and the LuceneSearchProvider doesn't hold a reader open. The new version of Lucene automatically closes the reader. I think you might be returning the wrong type. You should probably do like this:
var query = context.GetQueryable<SearchResultItem>().Where(predicate);
return query.ToList();
However make sure, that don't return too many. You should probably use take() etc.
Is GetResults() returning a List or IEnumerable/IQueryable?
Try to return a list in case it isn't already.
return query.GetResults().ToList();
Cheers

Lucene analyzer for first name

Is there a Lucene analyzer out there that tokenizes name parts with their short name equivalents (e.g. Mike and Michael, Rich and Richard, Suzie and Susan), etc?
Fuzzy match on Levenshtein distance is a solution I know, and some implementors seem to pair fuzzy match with the soundex algorithm. Surely somebody has made a swipe at just plain listing all of these short names somewhere?
EDIT: The toughest part of this question is where to get the synonym data from?
I am not aware of any specific nickname filter out there.
A SynonymFilter would make it reasonably easy to generate though, if you had a data source for it. This appears to be a good source of nickname data:
https://code.google.com/p/nickname-and-diminutive-names-lookup/
You would need to generate the SynonymMap to pass into the SynonymFilter ctor, which should look something like this (I think):
SynonymMap.Builder builder = new SynonymMap.Builder(true);
builder.add(new CharsRef("Mike"), new CharsRef("Michael"), false);
builder.add(new CharsRef("Rich"), new CharsRef("Richard"), false);
builder.add(new CharsRef("Suzie"), new CharsRef("Susan"), false);
SynonymMap map = builder.build();

Resources