Apache Solr : Filter results by minimum relevancy in drupal 6.x - drupal-6

We need to filter results by minimum relevancy, i tried to modify query in hook "hook_apachesolr_query_alter()" following way and failed, can someone help me
1) Add filter score:[0.5 TO *] with exclude as false - It display no items
2) Add filter score:[0.5 TO *] with exclude as true - There is no change in result, it display all items
3) Add parameter q = {!frange l=5}query($qq) and qq = <actual_query> - Getting 400 error
4) Add {!frange l=5}query($qq) and qq = in sortstring
- There is no change in result, it display all items

Related

Azure DevOps API WIQL get maximum max work item id

I'm using WIQL to query for a list of work items in Azure DevOps. However, Azure DevOps will return a maximum of 20000 work items in a single query. If the query results contain more that 20000 items, an error code is returned instead of the work items. To get a list of all work items matching my query, I modified the query to filter my ID and then loop to build of a list of work items with multiple queries. The issue is that there is apparently no way to know when I have reached the end of my loop because I don't know the maximum work item ID in the system.
idlist = []
max_items_per_query = 19000
counter = 0
while not done:
wiql = ("SELECT [System.Id] FROM WorkItems WHERE [System.WorkItemType] IN ('User Story','Bug')
AND [System.AreaPath] UNDER 'mypath' AND System.ID >= count AND System.ID < counter + max_items".format(counter, counter + max_items_per_query))
url = base_url+'wiql'
params = {'api-version':'4.0'}
body = {'query':wiql}
request = session.post(url, auth=('',personal_access_token), params=params, json=body)
response = request.json()
newItems = [w['id'] for w in response['workItems']]
idlist.extend(newItems)
if not newItems:
done = True
This works in most cases but the loop exits prematurely if it encounters a gap in work item ids under the specified area path. Ideally, I could make this work if there was a way to query to the max work item ID in the system and then use this number to exit when the counter reaches that value. However, I can't find a way to do this. Is there a way to query for this number or possibly another solution that will allow me to get a list of all work items matching a specific criteria?
You can use $top parameter to get the last one.
Something like below ( This is just sample - you can extend it to your query)
SELECT [System.Id] FROM workitems WHERE [System.Id] > 0 ORDER BY [System.Id] DESC with $top = 1
This will return the maximum System id - as it arranging it in the descending order.
Suggestion :
You can also change your logic something like below as well :
SELECT [System.Id] FROM workitems WHERE [System.Id] > 0 ORDER BY [System.Id] ASC with $top = 5000
Get the 5000th item System.Id , let's us assume there it is 5029
The next query would be :
SELECT [System.Id] FROM workitems WHERE [System.Id] > 5029 ORDER BY [System.Id] ASC with $top = 5000
You will get the next 5000 items starting from the system id- 5029.
You can loop the above logic.
For the exit case of the loop, you can check the number of items returned as part of the iteration - if it is less than 5000, then that would be the end of the iteration.

Cosmos DB paginated query with custom order by clause

I want to do a select query in Cosmos DB that returns a maximum number of results (say 50) and then gives me the continuation token so I can continue the search where I left off.
Now let's say my query has 2 equality conditions in my where clause, e.g.
where prop1 = "a" and prop2 = "w" and prop3 = "g"
In the results that are returned, I want the records that satisfy prop1 = "a" to appear first, followed by the results that have prop2 = "w" followed by the ones with prop3 = "g".
Why do I need it? Because while I could just get all the data to my application and sort it there, I can't pull all records obviously as that would mean pulling in too much data. So if I can't order it this way in cosmos itself, in the results that I get, I might only have those records that don't have prop1 = "a" at all. Now I could keep retrying this till I get the ones with prop1 = "a" (I need this because I want to show the results with prop1 = "a" as the first set of results to the user) but I might have to pull like a 100 times to get the first record since I have a huge dataset sitting in my Cosmos DB.
How can I handle this scenario in Cosmos? Thanks!
So if I am understanding your question correctly, you want to accomplish this:
SELECT * FROM c
WHERE
c.prop1 = 'a'
AND
c.prop2 = 'b'
AND
c.prop3 = 'c'
ORDER BY
c.prop1, c.prop2, c.prop3
OFFSET 0 LIMIT 25
Now, luckily you can now do this in CosmosDB SQL. But, there is a caveat. You have to set up a composite index in your collection to allow for this.
So, for this collection, my composite index would look like this:
Now, if I wanted to change it to this:
SELECT * FROM c
WHERE
c.prop1 = 'a'
AND
c.prop2 = 'b'
AND
c.prop3 = 'c'
ORDER BY
c.prop1 DESC, c.prop2, c.prop3
OFFSET 0 LIMIT 25
I could add another composite index to cover that use-case. You can see in your settings it's an array of arrays so you can add as many combinations as you'd like.
This should get you to where you need to be if I understood your question correctly.

How to check if ArangoDB query is not empty?

I would like to make an exists PostgreSQL query.
Let's say I have a Q ArangoDB query (AQL). How can I check if Q returns any result?
Example:
Q = "For u in users FILTER 'x#example.com' = u.email"
What is the best way to do it (most performant)?
I have ideas, but couldn't find an easy way to measure the performance:
Idea 1: using Length:
RETURN LENGTH(%Q RETURN 1) > 0
Idea 2: using Frist:
RETURN First(%Q RETURN 1) != null
Above, %Q is a substitution for the query defined at the beginning.
I think the best way to achieve this for a generic selection query with a structure like
Q = "For u in users FILTER 'x#example.com' = u.email"
is to first add a LIMIT clause to the query, and only make it return a constant value (in contrast to the full document).
For example, the following query returns a single match if there is such document or an empty array if there is no match:
FOR u IN users FILTER 'x#example.com' == u.email LIMIT 1 RETURN 1
(please note that I also changed the operator from = to == because otherwise the query won't parse).
Please note that this query may benefit a lot from creating an index on the search attribute, i.e. email. Without the index the query will do a full collection scan and stop at the first match, whereas with the index it will just read at most a single index entry.
Finally, to answer your question, the template for the EXISTS-like query will then become
LENGTH(%Q LIMIT 1 RETURN 1)
or fleshed out via the example query:
LENGTH(FOR u IN users FILTER 'x#example.com' == u.email LIMIT 1 RETURN 1)
LENGTH(...) will return the number of matches, which in this case will either be 0 or 1. And it can also be used in filter conditions like as follows
FOR ....
FILTER LENGTH(...)
RETURN ...
because LENGTH(...) will be either 0 or 1, which in context of a FILTER condition will evaluate to either false or true.
Do you need and AQL solution?
Only the count:
var q = "For u in users FILTER 'x#example.com' = u.email";
var res = db._createStatement({query: q, count: true}).execute();
var ct = res.count();
Is the fastest I can think of.

Using multi-layer queries in Solr

The Solr "qf" parameter works as follows:
Let's say I have: query = "sid" and qf = [field1, field1_edge, field2, field2_edge].
The Solr score is calculated as follows:
max(f1, f1_e, f2, f2_e) + tie * (sum of other 3 fields) where: "tie" lies in [0,1]
Let's call: winner1 = field with max(f1, f1_e) and
winner2 = field with max(f2, f2_e)
I would like to score a given query in Solr as follows:
score1 = winner1_score + tie_1 * loser1_score
score2 = winner2_score + tie_1 * loser2_score
final score = score1 + tie_2 * score2
Effectively, I want to apply qf in two layers (taking tie_1 = 0 and tie_2 = 1). What are my options to implement this idea of relevance? I think neither "qf" parameter nor function boosts support this.
Thanks!
It seems to me that's the way to do it is to use the query function which allows you to apply functions to queries.
You combine this with nested query parsers which allows you to run multiple dismax queries.
You can do something like this (where you set tie1 and tie2 according to what you want):
q=_val_:"add(query($qq1),product(query($qq2),${tie2}))"
qq1={!edismax qf='field1 field1_edge' v='sid' tie=${tie1}}
qq2={!edismax qf='field2 field2_edge' v='sid' tie=${tie1}}
tie1=0.5
tie2=0.3
If you used Solr 7.2 (or higher) you also need to set uf=_query_ * in order for the _val_ hook to work.
P.S: it should be possible (though I haven't tested it) to move the content of q into the qf parameter and that way you don't have to use the _val_ hook:
qf=add(query($qq1),product(query($qq2),${tie2}))

Filtering Haystack (SOLR) results by django_id

With Django/Haystack/SOLR, I'd like to be able to restrict the result of a search to those records within a particular range of django_ids. Getting these IDs is not a problem, but trying to filter by them produces some unexpected effects. The code looks like this (extraneous code trimmed for clarity):
def view_results(request,arg):
# django_ids list is first calculated using arg...
sqs = SearchQuerySet().facet('example_facet') # STEP_1
sqs = sqs.filter(django_id__in=django_ids) # STEP_2
view = search_view_factory(
view_class=SearchView,
template='search/search-results.html',
searchqueryset=sqs,
form_class=FacetedSearchForm
)
return view(request)
At the point marked STEP_1 I get all the database records. At STEP_2 the records are successfully narrowed down to the number I'd expect for that list of django_ids. The problem comes when the search results are displayed in cases where the user has specified a search term in the form. Rather than returning all records from STEP_2 which match the term, I get all records from STEP_2 plus all from STEP_1 which match the term.
Presumably, therefore, I need to override one/some of the methods in for SearchView in haystack/views.py, but what? Can anyone suggest a means of achieving what is required here?
After a bit more thought, I found a way around this. In the code above, the problem was occurring in the view = search_view_factory... line, so I needed to create my own SearchView class and override the get_results(self) method in order to apply the filtering after the search has been run with the user's search terms. The result is code along these lines:
class MySearchView(SearchView):
def get_results(self):
search = self.form.search()
# The ID I need for the database search is at the end of the URL,
# but this may have some search parameters on and need cleaning up.
view_id = self.request.path.split("/")[-1]
view_query = MyView.objects.filter(id=view_id.split("&")[0])
# At this point the django_ids of the required objects can be found.
if len(view_query) > 0:
view_item = view_query.__getitem__(0)
django_ids = []
for thing in view_item.things.all():
django_ids.append(thing.id)
search = search.filter_and(django_id__in=django_ids)
return search
Using search.filter_and rather than search.filter at the end was another thing which turned out to be essential, but which didn't do what I needed when the filtering was being performed before getting to the SearchView.

Resources