Arango wildcard query - arangodb

I am working on building a simple arango query where if the user enters: "foo bar" (starting to type Foo Barber), the query returns results. The issue I am running in to is going from a normal single space separated string (i.e. imagine LET str = "foo barber" at the top), to having multiple wildcard queries like shown below.
Also, open to other queries that would work for this, i.e. LIKE, PHRASE or similar.
The goal is when we have a single string like 'foo bar', search results are returned for Foo Barber and similar.
FOR doc IN movies SEARCH PHRASE(doc.name,
[
{WILDCARD: ["%foo%"]},
{WILDCARD: ["%bar%"]}
], "text_en") RETURN doc

If you want to find Black Knight but not Knight Black if the search phrase is black kni, then you should probably avoid tokenizing Analyzers such as text_en.
Instead, create a norm Analyzer that removes diacritics and allows for case-insensitive searching. In arangosh:
var analyzers = require("#arangodb/analyzers");
analyzers.save("norm_en", "norm", {"locale": "en_US.utf-8", "accent": false, "case": "lower"}, []);
Add the Analyzer in the View definition for the desired field (should be title and not name, shouldn't it?). You should then be able to run queries like:
FOR doc IN movies SEARCH ANALYZER(STARTS_WITH(doc.title, TOKENS("Black Kni", "norm_en")[0]), "norm_en") RETURN doc
FOR doc IN movies SEARCH ANALYZER(LIKE(doc.title, TOKENS("Black Kni%", "norm_en")[0]), "norm_en") RETURN doc
FOR doc IN movies SEARCH ANALYZER(LIKE(doc.title, CONCAT(TOKENS(SUBSTITUTE("Black Kni", ["%", "_"], ["\\%", "\\_"]), "norm_en")[0], "%")), "norm_en") RETURN doc
The search phrase Black Kni is normalized to black kni and then used for a prefix search, either using STARTS_WITH() or LIKE() with a trailing wildcard %. The third example escapes user-entered wildcard characters.

Related

Is it possible to find documents using incomplete words and ArangoSearch?

For example, let's pretend my document contained the attribute "description" and the value of it was "Quick brown fox". Could ArangoSearch use the input, "Quic" and be able to find the document that contains the description, "Quick brown fox"?
As far as I know, ArangoSearch can only find matches if the token/word is completed. Is this true?
Here's some query code to show what I'm talking about. If the binding variable, #searchInputValue, takes the value of "Quic", it won't find the document, but if it takes the value of "Quick", it does find the document.
FOR document IN v_test
SEARCH ANALYZER(
(
document.description IN TOKENS('#searchInputValue', 'text_en')
)
, 'text_en'
)
RETURN document
You can use the FULLTEXT function of AQL:
https://docs.arangodb.com/3.0/AQL/Functions/Fulltext.html
However you can't write the prefix syntax directly in AQL when using Input Parameters. You have to foormat the searchInputValue, to pass:
Quic,+prefix:Quic
So you can write your query as:
FOR res IN FULLTEXT(v_test, "description", #searchInputValue)
RETURN res

Sitecore 7 content search Starts with function

I am working with sitecore 7 content search.
var webIndex = ContentSearchManager.GetIndex("sitecore_web_index");
using (var context = webIndex.CreateSearchContext())
{
var results = context.GetQueryable<SearchResultItem>().Where(i =>
i.Content.Contains(mysearchterm));
}
sitecore performing contains operation on the content string, content contains the whole content of the page and does not return the result as I expect, for example searching for "hr" also returning results containing "through" in content, I tried using startswith but that just matches the start of the whole content string, I tried "Equal" but that matches the whole word, is there any way to search content where a word starts with search term?
Define '^' as the first character of a search phrase, it means "Starts With". for example to define all terms starting with "hr", just add '^' to search keyword like this "^hr".

How to create a search query for partial string matches in Mongoose?

I'm new to Mongoose.js and I'm wondering how to create a simple Mongoose query that returns values containing the characters in the order that they were submitted.
This will be for an autocomplete form which needs to return cities with names that contain characters input into the search field. Should I start with a .where query?
You could find by regexp, which should allow you to search in a flexible (although not extremely fast) way. The code would be something similar to;
var input = 'ln'; // the input from your auto-complete box
cities.find({name: new RegExp(input, "i")}, function(err, docs) {
...
});
Of course, you could preprocess the string to make it match from the start (prepend by ^), from the end (append by $) etc. Just note that matching against arbitrary parts of long strings may be slow.

Lucene multiphrasequery search with wildcard

I have been trying to do a lucene search query where entering "Foo B" would return "Foo Bar", Foo Bear, Foo Build" etc. but will not return a record with an ID of "Foo" and the word "Bar" in say its 'description' field.
I have looked into multiphrasequery but it never returns any results, below is what I have been trying
Term firstTerm = new Term("jobTitle", "Entry");
Term secondTerm = new Term("jobTitle", "Artist");
Term asdTerm = new Term(fld)
Term[] tTerms = new Term[]{firstTerm, secondTerm};
MultiPhraseQuery multiPhrasequery = new MultiPhraseQuery();
multiPhrasequery.add( tTerms );
org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery(multiPhrasequery, this.type).setSort(sort);
results = hibQuery.list();
The likely problem that I see is capitalization. "Entry" and "Artist" are not getting passed through a query parser, and so will not be run through an analyzer, and so are case sensitive. The field you are indexing is probably analyzed with an analyzer that includes a LowercaseFilter, so the end terms would not contain leading capitals. Without knowing how you index your documents, I can't say that will fix it with any certainty, but it seems the most likely possibility.
That fixed, the query you've created should match anything with either the term "entry" or "artist" in the jobTitle field.

examine stripping out search words

I'm using umbraco and I have examine up and running however my query is having words stripped out
For example:
I am searching on "man on the moon" with the following line of code, the variable "searchTerm" should contain "man on the moon":
var Searcher = ExamineManager.Instance.SearchProviderCollection["MySearcher"];
var searchCriteria = Searcher.CreateSearchCriteria();
var query = searchCriteria.Field("Name", searchTerm).Compile();
however, the query is generated as this when I debug:
{ SearchIndexType: , LuceneQuery: +Name:"man moon" }
Notice how it has removed the words "on the" from the searchTerm?
Presumably these are because they are deemed as STOP/reserved words. However, this means I do not get the search results I expect.
How can I get around this?
Internally the StopAnalyzer class is used by the StandardAnalyzer as part of the standard indexing process. The StopAnalyzer (http://lucenenet.apache.org/docs/3.0.3/d7/df5/_stop_analyzer_8cs_source.html#l00054) contains a method which allows you to substitute a different set of stopwords as an ISet type parameter rather than use the standard ENGLISH_STOP_WORDS_SET (line 134).
And I read here (http://webcache.googleusercontent.com/search?q=cache:sA-uyAC015UJ:our.umbraco.org/m%3Fmode%3Dtopic%26id%3D25600+&cd=2&hl=en&ct=clnk&gl=uk) that you can get Examine to use an empty set of stopwords by adding the following line to your application_start method in global.asax
Lucene.Net.Analysis.StopAnalyzer.ENGLISH_STOP_WORDS_SET = new System.Collections.Hashtable();
So with an empty set of stopwords your man in the moon should be back.
A bit of an odd idea but as an alternative you could also add a StopAnalyzer to ExamineSettings.config to create an index of docs with only the stop words and then AND them with your standardanalyzer result set?

Resources