Lucene Phrase Query and Tokenized Indexing - search

I have a weird problem searching a lucene tokenized index with a phrase query.
I create the index in the following way
Document doc = new Document();
FieldType ft = new FieldType(StringField.TYPE_STORED);
ft.setTokenized(true);
Field field1 = new Field("key", "T-Test 1", ft);
doc.add(field1)
Field field2 = new Field("key", "T-Test 2", ft);
doc.add(field2)
Field field3 = new Field("key", "T-Test 3", ft);
doc.add(field3)
Field field4 = new Field("key", "T-Test", ft);
doc.add(field4)
I use the WhitespaceAnalyzer to tokenize the values.
If I want now search the String "T-Test" I get as result the values
T-Test 1,T-Test 2, T-Test 3, T-Test
I create the query in the following way.
PhraseQuery query = new PhraseQuery();
query.add(new Term("key","T-Test"));
query.setSlop(0);
BooleanQuery mainQuery = new BooleanQuery();
mainQuery.add(query, Occur.MUST);
I tryed also useing the ComplexPhraseQueryParser with the same effect. I just get all T-Test values. But I just want the T-Test and not the "T-Test 1,T-Test 2 and T-Test 3"
Could anyone help me please.
Im close to become desperate
Thanks

I just use the Stringfield for the problem. This solution works perfectly fine.
FieldType ft = new FieldType(StringField.TYPE_STORED);
ft.setTokenized(true);
StringField field = new StringField(key, value,Field.Store.YES);

Related

Casting Strint to integer in liferay dynamic query

I have been using dynamic query for a project.
Here is an scenario for which I am facing problem.
For a table xyz the column version is stored as varchar (I know it's a poor design, but it's too late to change now) and has values as 9,12.
For the query :
select max(version)
from xyz
where something = 'abc';
I am getting the output as 9 instead of 12.
The dynamic query for the same is:
ClassLoader classLoader = PortletBeanLocatorUtil.getBeanLocator(ClpSerializer.getServletContextName()).getClassLoader();
DynamicQuery dynamicQuery = DynamicQueryFactoryUtil.forClass(xyz.class, classLoader);
dynamicQuery.setProjection(ProjectionFactoryUtil.max("version"));
dynamicQuery.add(PropertyFactoryUtil.forName("something").eq("abc"));
List<Object> list = xyzLocalServiceUtil.dynamicQuery(dynamicQuery);
The query which is giving the correct value is :
select max(cast(version as signed))
from xyz
where something = 'abc';
Now, I want it to be in the dynamic query, how can I do that?
I am using liferay-6.2-ce
Try using ProjectionFactoryUtil.sqlProjection method.
That method allows using functions that are executed by SQL engine.
For example, I am using following code in order to get the max length of a string column called 'content':
Projection maxSizeProjection = ProjectionFactoryUtil.sqlProjection(
"max(length(content)) as maxSize", new String[] {"maxSize"},
new Type[] {Type.BIG_DECIMAL});
The same thing can be done with dynamic query criterions using RestrictionsFactoryUtil.sqlRestriction in case you want to use a SQL function in a condition.
In your case try following code:
import com.liferay.portal.kernel.dao.orm.ProjectionFactoryUtil;
import com.liferay.portal.kernel.dao.orm.Type;
...
Projection maxSizeProjection = ProjectionFactoryUtil.sqlProjection(
"max(cast(version as signed)) as maxVersion",
new String[] {"maxVersion"}, new Type[] {Type.BIG_DECIMAL});
dynamicQuery.setProjection(maxSizeProjection);

Highlighter.net returns no matches

I am using lucene.net 2.9.4 and lucene.net contrib 2.9.4 my lucene query looks like:
+contents:umbraco*
I get results for this query. My highlighter code to get fragments looks like:
public string GetHighlight(string value, string highlightField, IndexSearcher searcher, string luceneRawQuery)
{
var query = GetQueryParser(highlightField).Parse(luceneRawQuery);
var scorer = new QueryScorer(searcher.Rewrite(query));
var highlighter = new Highlighter(HighlightFormatter, scorer);
var tokenStream = HighlightAnalyzer.TokenStream(highlightField, new StringReader(value));
return highlighter.GetBestFragments(tokenStream, value, MaxNumHighlights, Separator);
}
In my scorer object the property termsToFind is 0 I would expect that to at least be one? Anyone any ideas or suggestions on how to fix / debug?
Regards
Ismail
Ok figured this out I was passing in the wrong values to the highlighter function. I was passing the query search term and field name. What i needed to pass in was the content of the contents field for each document match and the query term. All working now.

Lucene wild card search

How can I perform a wildcard search in Lucene ?
I have the text: "1997_titanic"
If I search like "1997_titanic", it is returning a result, but I am not able to do below two searches:
1) If I search with only 1997 it is not returning any results.
2) Also if there is a space, such as in "spider man", that is not finding any results.
I retrieve all movie information from a DB and store it in Lucene Documents:
public Document createMovieDoc(Movie m){
document.add(new StoredField("moviename", m.getName()));
TextField field = new TextField("movienameSearch", m.getName().toLowerCase(), Store.NO);
field.setBoost(5.0f);
document.add(field);
}
And to search, I have this method:
public List searh(String txt){
PhraseQuery phQuery= new PhraseQuery();
Term term = new Term("movienameSearch", txt.toLowerCase());
BooleanQuery b = new BooleanQuery();
b.add(phQuery, Occur.SHOULD);
TopFieldDocs tp= searcher.search(b, 20, ..);
for(int i=0;i<tp.length;i++)
{
int mId = tp[i].doc;
Document d = searcher.doc(mId);
String moviename = d.get("moviename");
list.add(moviename);
}
return list;
}
I'm not sure what analyzer you are using to index. Sounds like maybe WhitespaceAnalyzer? It sounds like, when indexing "1997_titanic" remains a single token, while "spider man" is split into the token "spider" and "man".
Could also be SimpleAnalyzer which uses a LetterTokenizer. This would make it impossible to search for "1997", since that tokenizer will eliminate all numbers for the indexed representation of the text.
Your search method doesn't look right. You aren't adding any terms to your PhraseQuery, so I wouldn't expect it to find anything. You must add some terms in order for anything to be found. You create a Term in what you've provided, but nothing is ever done with that Term. Maybe this has something to do with how you've pick your excerpts, or something? Not sure, I'm a bit confused by that.
In order to manually construct a PhraseQuery you must add each term individually, so to search for "spider man", you would do something like:
PhraseQuery phQuery= new PhraseQuery();
phQuery.add(new Term("movienameSearch", "spider"));
phQuery.add(new Term("movienameSearch", "man"));
This requires you to know what the analyzer was doing at index time, and tokenize the input yourself to suit. The simpler solution is to just use the QueryParser:
//With whatever analyzer you like to use.
QueryParser parser = new QueryParser(Version.LUCENE_46, "defaultField", analyzer);
Query query = parser.parse("movienameSearch:\"" + txt.toLowerCase() + "\"");
TopFieldDocs tp= searcher.search(query, 20);
This allows you to rely on the same analyzer to index and query, so you don't have to know how to tokenize your phrases to suit.
As far as finding "1997" and "titanic" individually, I would recommend just using StandardAnalyzer. It will tokenize those into discrete tokens, allowing them to be searched very easily, with a simple query like: movienameSearch:1997.

How can I search on list of values using Lucene Query interface

Simplistic Problem description:
Lucene index has two fields per document: ID and NAME.
I want to make a query using the Lucene Query interface such that I can find all the documents where ID is 1 OR 2 OR 3 OR so on. The IDs to be searched will be in a list and can potentially have upto 30 elements.
If I was using the query parser I would have done something like
ID:(1 OR 2 OR 3)
But the application is already heavily committed to the Query interface and I want to follow the current pattern. Only way I can think of doing this with Query interface is create n term queries and group them using the Boolean query as below
BooleanQuery booleanQuery = new BooleanQuery();
(String searchId : lstIds)
{
booleanQuery.add(new TermQuery(new Term("ID", searchId)), BooleanClause.Occur.SHOULD);
}
But is there a better/more efficient way of doing this?
Combining queries togetheer with a BooleanQuery is the correct way to reproduce a query like ID:(1 OR 2 OR 3). The query parser will generate a BooleanQuery similar to what you provided for that syntax, so you are absolutely doing the right thing here.
You might be able to make use of PrefixQuery, NumericRangeQuery or TermRangeQuery to simplify matters, if they actually suit your needs in practice, but there is nothing wrong with what you are doing already.
BooleanQuery is the solution for handling OR operator as you have shown in the code but if you want simple alternative of the it you could also use simple Query and pass the IDs as "1 OR 2 OR 3".
Here is the code snippet lucene 7.
Query query = new QueryParser("ID", analyzer).parse("1 OR 2 OR 3");
TopDocs topDocs = searcher.search(query, 10);
OR if you have all the OR you could also use QueryParser default Operator.
Here is the code snippet for lucene 7.
QueryParser queryParser = new QueryParser("ID", analyzer);
queryParser.setDefaultOperator(QueryParser.Operator.OR);
Query query = queryParser.parse("1 2 3");
TopDocs topDocs = searcher.search(query, 10);
I hope that work for you.

linq to entity Contains() and nested queries

i've a trouble with linq, i'll explain on example :
i have a database table called Employee which got FirstName and LastName columns,
and a method to search employees which gets a nameList list as argument, the elements in this list are names formatted like this one "Fred Burn", or this1 "Grim Reaper",
already tryed these approaches with no luck =[
//just all employees
var allData = from emp in Context.Employee select emp;
var test1 = from emp in allData
where(emp.FirstName + " " + emp.LastName).Contains
("" + ((from n in nameList select n).FirstOrDefault()))
select emp;
var test2 = (from emp in allData
where (emp.FirstName + " " + emp.LastName)
== ((from n in nameList select n).FirstOrDefault())
select emp);
var test3 = from emp in allData
where (from n in nameList select n).Contains
(emp.FirstName + " " + emp.LastName)
select emp;
first and second queries give : {"Unable to create a constant value of type 'Closure type'. Only primitive types ('such as Int32, String, and Guid') are supported in this context."} exceptionand third : {"LINQ to Entities does not recognize the method 'Boolean Contains[String](System.Collections.Generic.IEnumerable`1[System.String], System.String)' method, and this method cannot be translated into a store expression."}
would be glad to hear your suggestions :)
Thank You!
p.s.
yea i know it's possible to split names in list and compare them separately, but still curious why wont these queries work.
I assume nameList in this case is an in memory collection and that you are trying to use the LINQ to SQL trick creating a SQL "IN" clause inside of the Entity Framework. Unfortunately, EF doesn't support this syntax (yet). Depending on the number of records you are trying to match, you may be able to run separate queries for each child you are desiring. Alternatively, you could build an entitysql query using concatenation to append the multiple items from the nameList as separate OR clauses in the WHERE operation.

Resources