Lucene multiphrasequery search with wildcard - search

I have been trying to do a lucene search query where entering "Foo B" would return "Foo Bar", Foo Bear, Foo Build" etc. but will not return a record with an ID of "Foo" and the word "Bar" in say its 'description' field.
I have looked into multiphrasequery but it never returns any results, below is what I have been trying
Term firstTerm = new Term("jobTitle", "Entry");
Term secondTerm = new Term("jobTitle", "Artist");
Term asdTerm = new Term(fld)
Term[] tTerms = new Term[]{firstTerm, secondTerm};
MultiPhraseQuery multiPhrasequery = new MultiPhraseQuery();
multiPhrasequery.add( tTerms );
org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery(multiPhrasequery, this.type).setSort(sort);
results = hibQuery.list();

The likely problem that I see is capitalization. "Entry" and "Artist" are not getting passed through a query parser, and so will not be run through an analyzer, and so are case sensitive. The field you are indexing is probably analyzed with an analyzer that includes a LowercaseFilter, so the end terms would not contain leading capitals. Without knowing how you index your documents, I can't say that will fix it with any certainty, but it seems the most likely possibility.
That fixed, the query you've created should match anything with either the term "entry" or "artist" in the jobTitle field.

Related

SPARQL query to retrieve data with different datatype (string)

I am writing a SPARQL query to retrieve answers for the competency question. I want to retrieve all persons who have level of distress "not too disturbing".
select *
where
{
?person ocd:hasInsight ?insight;
ocd:hasThought ?thought;
ocd:hasEmotion ?emotion;
ocd:hasDistressLevel ?severitycontrol.
FILTER (?severitycontrol = ocd:Not too disturbing)
}
I am new at this and could not figure out how to fix that.
If the value is a string (e.g., "Not too disturbing"):
FILTER (?severitycontrol = "Not too disturbing") .
If the value is language-tagged in your RDF, you have to append that same language tag:
FILTER (?severitycontrol = "Not too disturbing"#en) .
String matching is case-sensitive. You can use ucase/lcase to make a string uppercase/lowercase.
If you only want to match a partial string, you can use strStarts/strEnds, contains, and more.

Arango wildcard query

I am working on building a simple arango query where if the user enters: "foo bar" (starting to type Foo Barber), the query returns results. The issue I am running in to is going from a normal single space separated string (i.e. imagine LET str = "foo barber" at the top), to having multiple wildcard queries like shown below.
Also, open to other queries that would work for this, i.e. LIKE, PHRASE or similar.
The goal is when we have a single string like 'foo bar', search results are returned for Foo Barber and similar.
FOR doc IN movies SEARCH PHRASE(doc.name,
[
{WILDCARD: ["%foo%"]},
{WILDCARD: ["%bar%"]}
], "text_en") RETURN doc
If you want to find Black Knight but not Knight Black if the search phrase is black kni, then you should probably avoid tokenizing Analyzers such as text_en.
Instead, create a norm Analyzer that removes diacritics and allows for case-insensitive searching. In arangosh:
var analyzers = require("#arangodb/analyzers");
analyzers.save("norm_en", "norm", {"locale": "en_US.utf-8", "accent": false, "case": "lower"}, []);
Add the Analyzer in the View definition for the desired field (should be title and not name, shouldn't it?). You should then be able to run queries like:
FOR doc IN movies SEARCH ANALYZER(STARTS_WITH(doc.title, TOKENS("Black Kni", "norm_en")[0]), "norm_en") RETURN doc
FOR doc IN movies SEARCH ANALYZER(LIKE(doc.title, TOKENS("Black Kni%", "norm_en")[0]), "norm_en") RETURN doc
FOR doc IN movies SEARCH ANALYZER(LIKE(doc.title, CONCAT(TOKENS(SUBSTITUTE("Black Kni", ["%", "_"], ["\\%", "\\_"]), "norm_en")[0], "%")), "norm_en") RETURN doc
The search phrase Black Kni is normalized to black kni and then used for a prefix search, either using STARTS_WITH() or LIKE() with a trailing wildcard %. The third example escapes user-entered wildcard characters.

Is it possible to find documents using incomplete words and ArangoSearch?

For example, let's pretend my document contained the attribute "description" and the value of it was "Quick brown fox". Could ArangoSearch use the input, "Quic" and be able to find the document that contains the description, "Quick brown fox"?
As far as I know, ArangoSearch can only find matches if the token/word is completed. Is this true?
Here's some query code to show what I'm talking about. If the binding variable, #searchInputValue, takes the value of "Quic", it won't find the document, but if it takes the value of "Quick", it does find the document.
FOR document IN v_test
SEARCH ANALYZER(
(
document.description IN TOKENS('#searchInputValue', 'text_en')
)
, 'text_en'
)
RETURN document
You can use the FULLTEXT function of AQL:
https://docs.arangodb.com/3.0/AQL/Functions/Fulltext.html
However you can't write the prefix syntax directly in AQL when using Input Parameters. You have to foormat the searchInputValue, to pass:
Quic,+prefix:Quic
So you can write your query as:
FOR res IN FULLTEXT(v_test, "description", #searchInputValue)
RETURN res

Sitecore 7 content search Starts with function

I am working with sitecore 7 content search.
var webIndex = ContentSearchManager.GetIndex("sitecore_web_index");
using (var context = webIndex.CreateSearchContext())
{
var results = context.GetQueryable<SearchResultItem>().Where(i =>
i.Content.Contains(mysearchterm));
}
sitecore performing contains operation on the content string, content contains the whole content of the page and does not return the result as I expect, for example searching for "hr" also returning results containing "through" in content, I tried using startswith but that just matches the start of the whole content string, I tried "Equal" but that matches the whole word, is there any way to search content where a word starts with search term?
Define '^' as the first character of a search phrase, it means "Starts With". for example to define all terms starting with "hr", just add '^' to search keyword like this "^hr".

Is there a way to prevent partial word matching using Sitecore Search and Lucene?

Is there a way when using Sitecore Search and Lucene to not match partial words? For example when searching for "Bos" I would like to NOT match the word "Boston". Is there a way to require the entire word to match? Here is a code snippet. I am using FieldQuery.
bool _foundHits = false;
_index = SearchManager.GetIndex("product_version_index");
using (IndexSearchContext _searchContext = _index.CreateSearchContext())
{
QueryBase _query = new FieldQuery("title", txtProduct.Text.Trim());
SearchHits _hits = _searchContext.Search(_query, 1000);
...
}
You may want to try something like this to get the query you want to run. It will put the + in (indicating a required term) and quote the term, so it should exactly match what you're looking for, its worked for me. Providing you're passing in BooleanClause.Occur.MUST.
protected BooleanQuery GetBooleanQuery(string fieldName, string term, BooleanClause.Occur occur)
{
QueryParser parser = new QueryParser(fieldName, new StandardAnalyzer());
BooleanQuery query = new BooleanQuery();
query.Add(parser.Parse(term), occur);
return query;
}
Essentially so your query ends up being parsed to +title:"Bos", you could also download Luke and play around with the query syntax in there, its easier if you know what the syntax should be and then work backwards to see what query objects will generate that.
You have to place the query in double quotes for the exact match results. Lucene supports many such opertators and boolean parameters that can be found here: http://lucene.apache.org/core/2_9_4/queryparsersyntax.html
It depends on field type. If you have memo or text field then partial matching is applied. If you want exact matching use string field instead. There you can find some details: https://www.cmsbestpractices.com/bug-how-to-fix-solr-exact-string-matching-with-sitecore/ .

Resources