I'm creating a Sitecore.Ecommerce.Search.Query using FieldQuery objects. I'm then converting the Sitecore query to a Lucene.Net.Search.Query using the LuceneQueryBuilder class. Everything with the query works fine except for fields where I am trying to match on a empty string.
So... this works:
new FieldQuery(FieldName, "1", MatchVariant.NotEquals)
but this does not:
new FieldQuery(FieldName, string.Empty, MatchVariant.NotEquals)
I have reflected through both the Sitecore.Ecommerce assembly and the Lucene.Net assembly as well but I have not found any obvious issues. But, when I look at the Term that is created and used in the Lucene query, it looks like this:
-FieldName:
which I believe is incorrect... but maybe it is correct and I just don't have the correct field indexes setup... I'm not sure to be honest.
Any help is greatly appreciated.
Thanks!
Lucene does not really support searching for null/empty values. There is nothing indexed for it to find, after all. Lucene uses an inverted index, which makes certain kinds of queries, including pure negation queries and searching for nulls, difficult or even impossible.
If you need to search for documents in which certain fields are null, you should store a default value in the field (for instance "NULL") which you can search for.
That said, you could create
new RangeQuery(FieldName, null, null, true, true);
Which constructs a range query with open upper and lower bounds, so it matches anything that has a value.
Not a good way to do it, but neither is querying with only a negation.
Related
Does anyone know how to ensure we can return normal result as well as accented result set via the azure search filter. For e.g the below filter query in Azure search returns a name called unicorn when i check for record with name unicorn.
var result= searchServiceClient.Documents.SearchAsync<myDto>("*",new SearchParameters
{
SearchFields = new List<string> {"Name"},
Filter = "Name eq 'unicorn'"
});
This is all good but what i want is i want to write a filter such that it returns record named unicorn as well as record named únicorn (please note the first accented character) provided that both record exist.
This can be achieved when searching for such name via the search query using language or Standard ASCII folding search analyzer as mentioned in this link. What i am struggling to find out is how can we implement the same with azure filters?
Please let me know if anyone has got any solutions around this.
Filters are applied on the non-analyzed representation of the data, so I don’t think there’s any way to do any type of linguistic analysis on filters. One way to work around this is to manually create a field which only do lowercasing + asciifloding (no tokenization) and then search lucene queries that look like this:
"normal search query terms" AND customFilterColumn:"filtérValuèWithÄccents"
Basically the document would both need to match the search terms in any field AND also match the filter term in the “customFilterColumn”. This may not be sufficient for your needs, but at least you understand the art of the possible.
Using filters it won't work unless you specify in advance all the possibilities:
for example:
$filter=name eq 'unicorn' or name eq 'únicorn'
You'd better work with a different analyzer that will change accents to it's root form. As another possibility, you can try fuzzy search:
search=unicorn~&highlight=Name
This question/answer dealt with a pretty similar topic, but I couldn't find the solution I was searching for.
How to practially use a keywordanalyzer in azure-search?
Starting situation:
I created a resource with multiple indexes. One of these indexes contains a Collection(Edm.String) field.
From this field i only want to get documents which exactly contain the search term. For example the field contains documents like these: "Hovercraft zero", "Hovercraft one", "Hovercraft two".
If the search term is "Hover" all three documents should be returned. If the search term is "craft zer" only the document "Hovercraft zero" should be returned. The document shouldn't get a higher score, the desired behaviour is that I only get the "Hovercraft zero" document as result.
Further information:
It is not possible to set the searchmode to all (like it was recommended in the question on the top) because I just want to set this behaviour for this specific field and not for all search queries. It also is not possible to let the responsibility on the user to enter the search term with quotes.
What I have tried so far:
Use the keyword analyzer like it was described in the question on
top: no success
Use an indexanalyzer with specific token filters (ngram,
lowercase) and a searchanalyzer as a keyword analyzer: no success
Use Charfilters to manipulate the search term and manually set the
quotes on the first and last position (craft zer -> "craft zer").
Like Yahnoosh explained in the question on top, the query parser
processes the query string before the analyzers are applied. So:
no success
Is there any solution for this issue?
Or is there a other approach to achieve the desired behaviour?
Hopefully someone can help.
Thanks in advance!
Using your example with three documents: "Hovercraft zero", "Hovercraft one", "Hovercraft two"
Issue a prefix query to find all documents that contain terms that start with "Hover"
search=Hover*
To match the term "craft zer", you need to use the keyword analyzer (or the keyword tokenizer with the lowercase token filter) at indexing time to make sure elements of your string collection are not tokenized. Then at query time you can issue a regex query (note regex queries are much slower than term or prefix queries)
search=/.craft zer./&queryType=full
Also, please use the Analyze API to test your custom analyzer configurations. It will help you make sure the analyzer produces the terms you expect.
Thanks #Yahnoosh for your answer, I found a solution that worked for me.
Short example:
I have an index including three fields (field1, field2, field3). From field3 I want a result where documents exactly contain the search term. From field1 and field2 I want do get a "standard" result.
Solution:
I manipulated the searchquery to ->
field1:{searchterm} || field2:{searchterm} || field3:"{searchterm}" &queryType=full
Using this searchquery field1 and field2 are queried in the "standard" way and field3 is queried with the behaviour i was searching for. Of course there are more efficient and elegant ways out there to solve this issue, but it worked for me.
If anybody has a better solution let me know ;)
I have a set of indexed fields such as these:
submitted_form_2200FA17-AF7A-4E44-9749-79D3A391A1AF:true
submitted_form_2398389-2-32-43242423:true
submitted_form_54543-32SDf-3242340-32422:true
And I get that it's possible to wildcard queries such as
submitted_form_2398389-2-32-43242423:t*e
What I'm trying to do is get "any" submitted form via something like:
submitted_form_*:true
Is this possible? Or will I have to do a stream of "OR"s on the known forms (which seems quite heavy)
That's not the intended use of fields, I think. Field names aren't supposed to be the searchable values, field values are. Field names are supposed to be known a priori.
My suggestion is (if possible) to store the second part of the name as the field value, for instance: submitted_form:2398389-2-32-43242423. submitted_from would be the field known a priori, and the value could eventually be searched with a PrefixQuery.
Anyway, you could access the collection of fields' names using IndexReader.getFieldNames() in Lucene 3.x and this in Lucene 4.x. I wouldn't expect search performance there.
I read some articles about searching in indexes by numbers, but it does not work for me yet.
more::
i need to search in my documents by number but it does not work.
i create the docuemnt:
$doc1->addField(Zend_Search_Lucene_Field::UnIndexed('id', $id));
and i search the index:
$index->find("id:123");
but it does not work and the result is empty! i have to do that.
i tested this by changing index type to keyword,unstored,text, and unindexed
Here is my Bootstrap::
Zend_Search_Lucene_Analysis_Analyzer::setDefault
(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8());
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
Zend_Search_Lucene_Analysis_Analyzer::setDefault
(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive());
Zend_Search_Lucene_Analysis_Analyzer::setDefault
(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum());
I am using this on searching and indexing. Also I commented other settings, but they also didn't work.
Type of id filed should be "Keyword", so:
$doc1->addField(Zend_Search_Lucene_Field::Keyword('id', $id));
UnIndexed fields are not searchable.
Read section: Understanding Field Types Zend Lucene
I use solr to search for documents and when trying to search for documents using this query "id:*", I get this query parser exception telling that it cannot parse the query with * or ? as the first character.
HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery
type Status report
message org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery
description The request sent by the client was syntactically incorrect (org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery).
Is there any patch for getting this to work with just * ? Or is it very costly to do such a query?
If you want all documents, do a query on *:*
If you want all documents with a certain field (e.g. id) try id:[* TO *]
Lucene doesn't allow you to start WildcardQueries with an asterisk by default, because those are incredibly expensive queries and will be very, very, very slow on large indexes.
If you're using the Lucene QueryParser, call setAllowLeadingWildcard(true) on it to enable it.
If you want all of the documents with a certain field set, you are much better off querying or walking the index programmatically than using QueryParser. You should really only use QueryParser to parse user input.
id:[a* TO z*] id:[0* TO 9*] etc.
I just did this in lukeall on my index and it worked, therefore it should work in Solr which uses the standard query parser. I don't actually use Solr.
In base Lucene there's a fine reason for why you'd never query for every document, it's because to query for a document you must use a new indexReader("DirectoryName") and apply a query to it. Therefore you could totally skip applying a query to it and use the indexReader methods numDocs() to get a count of all the documents, and document(int n) to retrieve any of the documents.
If you are just trying to get all documents, Solr does support the *:* query. It's the only time I know of that Solr will let you begin a query with an *. I'm sure you've probably seen this as the default query in the Solr admin page.
If you are trying to do a more specific query with an * as the first character, like say id:*456 then one of the best ways I've seen is to index that field twice. Once normally (field name: id), and once with all the characters reversed (field name: reverse_id). Then you could essentially do the query id:456 by sending the query reverse_id:654 instead. Hope that makes sense.
You can also search the Solr user group mailing list at http://www.mail-archive.com/solr-user#lucene.apache.org/ where questions like this come up quite often.
The following Solr issue is a request to be able to configure the default lucene query parser.
https://issues.apache.org/jira/browse/SOLR-218
In this issue you can find the following description how to 'patch' Solr. This modification would allow you to start queries with a *.
Jonas Salk: I've basically updated only one Java file: SolrQueryParser.java.
public SolrQueryParser(IndexSchema schema, String defaultField) {
...
setAllowLeadingWildcard(true);
setLowercaseExpandedTerms(true);
...
}
...
public SolrQueryParser(QParser parser, String defaultField, Analyzer analyzer) {
...
setAllowLeadingWildcard(true);
setLowercaseExpandedTerms(true);
...
}
I'm not sure if setLowercaseExpandedTerms is needed...
I'm assuming with id:* you're just trying to match all documents, right?
I've never used solr before, but in my Lucene experience, when ingesting data, we've added a hidden field to every document, then when we need to return every record we do a search for the string constant in that field that's the same for every record.
If you can't add a field like that in your situation, you could use a RegexQuery with a regex that would match anything that could be found in the id field.
Edit: actually answering the question. I've never heard of a patch to get that to work, but I would be surprised if it could even be made to work reasonably well. See this question for a reason why unconstrained PrefixQuery's can cause a problem.
Actually, I have been using a workaround for this. I append a character to the id, eg: A1, A2, etc.
With such values in the field, it is possible to search using the query id:A*
But would love to find whether a true solution exists.