Print the nearby terms in Lucene Search - search

After making Lucene search my index using the IndexSearcher, how can I print the terms that are next to the search term.
QueryParser qp = new QueryParser("body", new StandardAnalyzer());
String queryStr = "search term";
Query q1 = qp.parse(queryStr);
TopDocs hits = searcher.search(q1, 1);
System.out.println(hits.totalHits + " docs found for the query \"" + q1.toString() + "\"");
The above code just prints the search term if it exists, but I wish to print the terms next to the search term instead.

Related

how to increase azure searching performance

I am new to Azure services. In my project
there is a function to search text words or multiple text words. For example, if I search "best phase", the search should return data that are related "best" and "phase" in my data.
Sample example code below. Note: searchParameters is used to sort and order my data by their date
string searchText = "best phase";
string[] temp = searchText.Contains(" ") ? searchText.Split(' ') : new string[] { searchText};
var documentSearch = _indexClient.Documents.Search("\"" + searchText + "\"^2, \"|" + searchText + "|\", +" + searchText + ", +" + string.Join(", +", temp) , searchParameters);
The current implementation consumes too much time at around 15-20 sec or more. So I need to do searches faster. Any idea how to make it faster :-)
You can use Azure Search as explained here.
You can also use Full-Text Search of SQL Azure as explained here.
To decide which one to use, please read the difference between them on this article.
Hope this helps.

Lucene Index Search without stoppers

I am doing some queries over a Lucene index, right now I'm looking for latin phrases over this queries. The problem is that some of this phrases include words that i think are consider like stoppers. For example if my search term is "a contrario sensu" the result is zero but if I only search for "contrario sensu" i have over 100 coincidences.
The question is how can i do a search without this stoppers?
My code looks like this
public IEnumerable<TesisIndx> Search(string searchTerm)
{
List<TesisIndx> results = new List<TesisIndx>();
IndexSearcher searcher = new IndexSearcher(FSDirectory.GetDirectory(indexPath));
QueryParser parser = new QueryParser("Rubro", analyzer);
PhraseQuery q = new PhraseQuery();
String[] words = searchTerm.Split(' ');
foreach (string word in words)
{
q.Add(new Term("Rubro", word));
}
//Query query = parser.Parse(searchTerm);
Hits hitsFound = searcher.Search(q);
TesisIndx sampleDataFileRow = null;
for (int i = 0; i < hitsFound.Length(); i++)
{
sampleDataFileRow = new TesisIndx();
Document doc = hitsFound.Doc(i);
sampleDataFileRow.Ius = int.Parse(doc.Get("Ius"));
sampleDataFileRow.Rubro = doc.Get("Rubro");
sampleDataFileRow.Texto = doc.Get("Texto");
results.Add(sampleDataFileRow);
}
}
I use a StandardAnalyzer to build the index and perform the search
It is a stop word. However, when it comes to phrase queries, that doesn't mean it isn't considered at all. If you try printing your query after parsing, you should see something like:
Rubro:"? contrario sensu"
That question mark represents a position increment, in this case a removed stop word. So it's looking for the phrase with a gap where a stop word has been removed at the beginning.
You can disable position increments in the query parser with QueryParser.setEnablePositionIncrements(false), though you should be aware this could cause problems for you if you still have position increments in the index, and run into a stop word in the middle of a phrase.
The StandardAnalyzer will exclude a set of stop words including "a" (see the end of https://github.com/apache/lucenenet/blob/3.0.3-2/src/core/Analysis/StopAnalyzer.cs for a full list)
It is important that the analysis style when querying is compatible with the style used when indexing. This is why your PhraseQuery only works without the "a" because the indexing step removed it.
You can use the StandardAnalyzer ctor that takes ISet<string> stopWords and pass in new HashSet<string>() Something like:
new StandardAnalyzer(Version.LUCENE_30, new HashSet<string>())
This means that all words will be included in the token stream for the field.
Use this analyzer when indexing and querying and you will get better results.
However, you should note that the StandardAnalyzer also fiddles with the words somewhat. It is designed to be "a good tokenizer for most European-language documents". See the comments at the beginning of https://github.com/apache/lucenenet/blob/3.0.3-2/src/core/Analysis/Standard/StandardTokenizer.cs for some more info and check if it's compatible with your use case.
It may be worth your time to investigate different analyzers for the kind of text you are indexing.

C# Dynamics For each giving me an error when multiple products

I'm trying to get a field to update with each line item on the invoice (without over writing what is already there), using a Query Expression to get the data that needs to be used to update the field.
So far I've been able to get this to work just fine when only 1 line item is present. But whenever I test this against multiple line items I get the " The given key was not present in the dictionary." error.
Any help or nudge in the right direction?
QueryExpression lineitem = new QueryExpression("invoicedetail");
lineitem.ColumnSet = new ColumnSet("quantity", "productid", "description");
lineitem.Criteria.AddCondition("invoiceid", ConditionOperator.Equal, invoiceid);
EntityCollection results = server.RetrieveMultiple(lineitem);
Invoice.Attributes["aspb_bookmarksandk12educational"] = "Purchases";
Invoice.Attributes["aspb_bookmarksandk12educational"] += "\n";
Invoice.Attributes["aspb_bookmarksandk12educational"] += "Product" + " " + "Quantity";
Invoice.Attributes["aspb_bookmarksandk12educational"] += "\n";
foreach (var a in results.Entities)
{
string name = a.Attributes["description"].ToString();
string quantity = a.Attributes["quantity"].ToString();
Invoice.Attributes["aspb_bookmarksandk12educational"] += " " + name + " ";
Invoice.Attributes["aspb_bookmarksandk12educational"] += quantity;
Invoice.Attributes["aspb_bookmarksandk12educational"] += "\n";
}
"The given key was not present in the dictionary."
Suggests the problem lies in the way you are trying to access attribute values and not with the multiple entities returned. When you try get an attribute value, try to check if the attribute exists before reading the value like so:
if (a.Attributes.ContainsKey("description"))
{
var name = a.Attributes["description"] as string;
}
Or even better use the SDK extension methods to help do the check and return a default value for you like so:
var name = a.GetAttributeValue<string>("description");
var quantity = a.GetAttributeValue<decimal>("quantity");

Grails search mechanism

For my website, i need to do a search mechanism, in which some of the entry field would be: Country, City, Between Dates (with or without year field), Keywords, etc etc.
My problem is, the user must decide what they wanna search for. For example, if they want to introduce just date, or date and city, or city and keyword.. etc. I dont really know how to do that, i mean, i know how to search for one thing at a time, but i'm not sure how can do this all-in-one.
a) Would i need like something like this: (if-else, if-else) and than write the code for each combination, or there is an easier way to do that?
b )Bytheway, my search mechanism is done the folowing way (i'v never done a search mechanism before, so i dont know if it is the best aproach, would apreciate some comments here also and suggestions):
class book{
String a
String b
...
Date z
String allAttributesTogether() {
a + b + c + ... + z
}
}
then in my controller, i do a double for statment and cross-match the introduced words for the search and the result of allAttributesTogether().
Thanks in advanced, VA
Check out the filter pane plugin.
When you say "search", comes to my mind search engines. But I think you are asking about querying the database, right?
If you are talking about search mechanisms, search engines are a great tool. You can take a look at Lucene, Compass, and ElasticSearch (ES) to name a few. Compass and ES are based on lucene, but are much higher in the abstraction level (easier to use).
I have been using ElasticSearch with great satisfaction.
If you are talking about querying the database, then you can just build a HQL query dynamically. The method bellow should be in a Controller, as it uses the params attribute. It is not tested ok?
List allAttributesTogether() {
def query = " select book from Book book "
def queryParams = [:]
def needsAnd = false
if(params.a || params.b || params.z ){
query += " where "
}
if(params.a){
query += " book.a = :a "
queryParams['a'] = params.a
needsAnd = true
}
if(params.b){
if(needsAnd) query += " and "
query += " book.b = :b "
queryParams['b'] = params.b
needsAnd = true
}
if(params.a){
if(needsAnd) query += " and "
query += " book.z = :z "
queryParams['z'] = params.z
}
return Book.executeQuery(query, queryParams)
}
There is also the alternative of using Criteria builder. You can also use "if" to add clauses to your Criteria clauses.

FullTextSqlQuery RowLimit setting defaulted when adding WHERE criteria

We are experiencing an issue where a FullTextSqlQuery is only returning the default 100 results whenever certain criteria are added in the WHERE clause. We are setting the RowLimit to int.MaxValue, and when a wide-open search is done, we are receiving the max results. It's only an issue when tacking-on CONTAINS clauses. Has anyone else seen this issue? I wasn't able to dig up anything on Google/Bing.
FullTextSqlQuery kRequest = new FullTextSqlQuery(ServerContext.Current);
kRequest.KeywordInclusion = KeywordInclusion.AnyKeyword;
kRequest.ResultTypes = ResultType.RelevantResults;
kRequest.TrimDuplicates = false;
kRequest.RowLimit = int.MaxValue;
kRequest.Timeout = 120000;
ResultTableCollection resultTbls = kRequest.Execute();
Query Code:
string query = "SELECT Title, Path, Facility, OwnerDepartment,
FacilityActiveDate, FacilityInactiveDate, ScheduledReviewDate, DocID,
Version FROM SCOPE() WHERE ";
query += "Path like '%" + site.Url + "%'";
// when it hits the else statement is an example of when it will only
// return 100 results
if (FacilitySelectedIndex == 1)
{
query += " AND Facility IS NOT NULL";
}
else
{
query += " AND CONTAINS(Facility, '\"*" + FacilityShortName.Trim() + "*\"')";
queryText.Add("Facility=" + FacilityShortName);
}
}
If I recall correctly, the RowLimit max value is actually less than int.MaxValue, only it won't tell you that. Try setting the RowLimit to some arbitrary large number that is still smaller than int.MaxValue, such as 99999

Resources