Modify haystack query syntax? - django-haystack

Is it possible to modify or extend how haystack understands a query?
For example, I'm looking at integrating haystack with an OSQA-based site to get SO-style search -- a search where regular keywords search question/answer/comment text, but where syntax like "[tag]" is understood to be restricted to the question's tags field. At some point we might want to add other goodies like "user:eternicode" and "score:0", but for now keywords and tags are the must-haves.
Unfortunately, it's not as simple as regexing the tags out of the query string and using that to filter on the tags field, because we want all the complexity of AND, OR, NOT, and arbitrary grouping to apply.
Is this possible with haystack? Better yet, has anyone done it before?

It seems there is no way to customize how Haystack's auto_query works, so what we ended up doing was preparsing the search query to extract tag and other custom syntaxes, perform the auto_query with the leftovers, and then apply the custom syntaxes as extra filters on the auto_query results.
In order to do this, though, we had to simplify our requirements and drop the OR requirement, so all terms are only ANDed now -- that simplified a lot of things (for example, grouping is now unnecessary).

Related

Returning accented as well as normal result set via azure search filters

Does anyone know how to ensure we can return normal result as well as accented result set via the azure search filter. For e.g the below filter query in Azure search returns a name called unicorn when i check for record with name unicorn.
var result= searchServiceClient.Documents.SearchAsync<myDto>("*",new SearchParameters
{
SearchFields = new List<string> {"Name"},
Filter = "Name eq 'unicorn'"
});
This is all good but what i want is i want to write a filter such that it returns record named unicorn as well as record named únicorn (please note the first accented character) provided that both record exist.
This can be achieved when searching for such name via the search query using language or Standard ASCII folding search analyzer as mentioned in this link. What i am struggling to find out is how can we implement the same with azure filters?
Please let me know if anyone has got any solutions around this.
Filters are applied on the non-analyzed representation of the data, so I don’t think there’s any way to do any type of linguistic analysis on filters. One way to work around this is to manually create a field which only do lowercasing + asciifloding (no tokenization) and then search lucene queries that look like this:
"normal search query terms" AND customFilterColumn:"filtérValuèWithÄccents"
Basically the document would both need to match the search terms in any field AND also match the filter term in the “customFilterColumn”. This may not be sufficient for your needs, but at least you understand the art of the possible.
Using filters it won't work unless you specify in advance all the possibilities:
for example:
$filter=name eq 'unicorn' or name eq 'únicorn'
You'd better work with a different analyzer that will change accents to it's root form. As another possibility, you can try fuzzy search:
search=unicorn~&highlight=Name

ArangoSearch: how to search without specifying the document field?

I'm looking into ArangoSearch for the first time and it looks like a pretty good functionality.
However, in all the tutorials, despite having the ability to tell it to index all fields, one cannot do a 'blind' search across all fields of the document. Like when we look at the example below:
FOR d in myView SEARCH d.text IN ["quick", "brown"] RETURN d
I don't seem to have the ability to just search d entirely without specifying each individual field that I want to include in my search. Is that correct and if so, why is that and are there workarounds? I'm dealing with a lot of different collections with a lot of different fields that can contain a relevant term, it would be a shame if I'd have to tabulate all of them to make an expansive search.

wildcard searches on specific elements only

I am looking for a way to do wildcard search only on specific elements when executing a search:search. Specifically, I might have documents that look like the following:
<pdbe:person-envelope xmlns:pdbe="http://schemas.abbvienet.com/people-db/envelope">
<person xmlns="http://schemas.abbvienet.com/people-db/model">
<costcenter>
<code>0000601775</code>
<name>DISC-PLAT INFORM</name>
</costcenter>
<displayName>Tj Tang</displayName>
<upi>10025613</upi>
<firstName>
<preferred>TJ</preferred>
<given>Tze-John</given>
</firstName>
<lastName>
<preferred>Tang</preferred>
<given>Tang</given>
</lastName>
<title>Principal Research Scientist</title>
</person>
<pdbe:raw/>
</pdbe:person-envelope>
When searches happen, I want the search text to be automatically wildcarded, but only for certain elements like displayName, firstName, lastName, but NOT for upi or code. As I understand it, I would have certain wildcard related indexes enabled in the database, but then I would need to have a custom query parser that rewrite the query into multiple cts:element-query and cts:element-value-query statements for each element that I want to wildcard search on, OR'd with the originally parsed search query. Or I can create field constraints, and rewrite the query to use field contraints.
Is there another way to conditionally search using wildcard on some elements but not others, when the user is entering as simple search query?, i.e. partial first and last name, "TJ Tan", but no partial hits when I search "100256".
You are on the right track. Lets take an element (or maybe field) query on "TS Tan"
With cts:tokenize, you can break this up (read about cs:tokenize - it is not just a normal tokenizer).
Then I have "TS" and "Tan"
You can the do things like apply business rules on which word should be wild-carded and which not and build the appropriate cts query (probably individual word queries in an and statement - or a near query - tuning depends on your need).
Now with search phrase tokenized, you can also consider that you may find building your results relies not on a wildcard index, but on a an element word lexicon - where you do your term-expansion with word-matches and those terms are then sent to the query.
We sometimes take that further and combine the query building with xdmp:estimate and make the query less restrictive if we don't get enough results early on.
Where to put this logic?
You mention search:search, so in this case, I would suggest you package this into a custom constraint.

Distinguishing synonym hits from regular hits in elastic search

We are using Elastic Search and as part of a requirement we want to be able to distinguish hits generated by the synonym filter from those that are not because of synonyms.
For example if we had a query such as:
(car AND red) AND (NOT ford)
With synonym: color <-> red
Then we want to know:
[the red car] is a simple hit.
But,
[the color of the car] is a hit caused by the synonym filter.
Our synonym filter is defined as follows:
synonym_filter :
type : synonym
synonyms_path : synonyms.txt
ignore_case : true
expand : true
format : solr
Since the synonym filter does its work by modifying the token stream at index time there might not be a straightforward way to do this. Perhaps by using the highlighting functionality there might be an algorithm.
I was wondering if anybody has experience with this kind of solution or if a clever solution exists for this requirement. Thank you in advance.
I believe the best solution would be to search content with synonyms separately from content without.
That is, if you are applying the SynonymFilter at index time, then index the content twice, once without synonyms, and once with synonyms (and possibly any other filters to facilitate a broader search). You could then either run separate queries against the two fields, or you could run a single query with matches against the more direct field significantly boosted.

Expression Engine - Is it possible for exp:channel:entries tag to have complex filters?

Coming from Codeigniter and new to Expression Engine, I am at a loss on how to do complex filters on the exp:channel:entries tag.
I am interested in this filters
status
start_on
stop_before
How do you do you implement filters for a complex condition like this?
(status=X|Y|Z AND start_on=A AND stop_before=B) OR (status=X AND start_on=C AND stop_before=D)
Is this even possible?
You can only use the search= parameter to search on “Text Input”, “Textarea”, and “Drop-down Lists” fields unfortunately. So you'd need to use the query module for this.
If you're just querying on those parameters you should be able to get the entry id's you need from the exp_channel_titles table, then use something like the Stash plugin to feed the entry_id's of the results into a regular channel entries tag. Yes it's nominally one more query that way but as EE abstracts the db schema fairly heavily the alternative is to get lost in a mess of JOINs.
So something like (pseudocode, won't work as is):
Get the entries, statuses are just a string in exp_channel_titles, entry_date is the date column you want - which is stored as a unix timestamp, so you'll need to select it with something like DATE( FROM_UNIXTIME(entry_date)) depending on the format of your filter data.
{exp:stash:set name="filtered_ids"}{exp:query sql="SELECT entry_id
FROM exp_channel_titles
WHERE status LIKE ...<your filter here>"
backspace="1"
}{entry_id}|{/exp:query}{/exp:stash:set}
Later in template:
{exp:channel:entries
entry_id="{exp:stash:get name="filtered_ids"}"
}
{!--loop --}
{/exp:channel:entries}
Yes it's a mess compared to what you're probably used to in pure CI, but the trade off is all the stuff you get for free from EE (CP, templating, member management etc).
Stash is awesome by the way - can be used to massively mitigate most EE performance issues/get around parse order issues
You can get a lot of this functionality using the search= parameter in your {exp:channel:entries...} loop.
It's not immediately clear to me how you'll get the complexity you seek, so you might end up resorting to the query module though.
If you're working with dates you might find the DT plugin useful.

Resources