ArangoDB: How to rank results from two different Views - arangodb

Given the following ArangoSearch query, I'm trying to search for the term Clim Do in multiple Views' fields.
LET QR1 = (
FOR doc_asview_global1 IN asview_global1
SEARCH
ANALYZER(
Like(doc_asview_global1.f_name, "%Clim%") OR
Like(doc_asview_global1.f_name, "%Do%"),"identity") OR
ANALYZER(
LEVENSHTEIN_MATCH(doc_asview_global1.f_name, "Clim",1,true) OR
LEVENSHTEIN_MATCH(doc_asview_global1.f_name, "Do",1,true),"text_en") OR
ANALYZER(
PHRASE(doc_asview_global1.f_name, "Clim Do"), "text_en")
LIMIT 10000
SORT BM25(doc_asview_global1) DESC
RETURN doc_asview_global1)
LET QR2 = (
FOR doc_asview_global2 IN asview_global2
SEARCH
ANALYZER(
Like(doc_asview_global2.f_cause, "%Clim%") OR
Like(doc_asview_global2.f_cause, "%Do%"),"identity") OR
ANALYZER(
LEVENSHTEIN_MATCH(doc_asview_global2.f_cause, "Clim",1,true) OR
LEVENSHTEIN_MATCH(doc_asview_global2.f_cause, "Do",1,true),"text_en") OR
ANALYZER(
PHRASE(doc_asview_global2.f_cause, "Clim Do"), "text_en")
LIMIT 10000
SORT BM25(doc_asview_global2) DESC
RETURN doc_asview_global2)
RETURN UNION_DISTINCT(QR1,QR2)
The problem here is that the results of each query is ranked alone. So is there any way to merge this search into a single query or either to rank the merged results together?
BTW could this query be optimized anyway?

Related

Prioritise query values over others using another query for values to be prioritised

I have the following query of Olympic countries in power query which I wish to sort using another query containing "prioritised countries" (the current top 10). I wish to sort the original query such that if a country is on the prioritised list it is alphabetically sorted at the top of the query.
Below visually shows what I am trying to achieve:
The best I have been able to do is merge queries however this removes countries not on the prioritised query. I appreciate that I can create a second query of the original, append this to the prioritised countries and then remove duplicates however I am looking for a more elegant solution as this will require refreshing the data twice.
Let Q be the query to sort and P be the priority list. Then you can get your desired result by appending the intersection Q ∩ P with the set difference Q \ P.
Here's one way to do this in M:
let
Source =
Table.FromList(
List.Combine(
{
List.Sort( List.Intersect( { P[Country], Q[Country] } ) ),
List.Sort( List.RemoveItems( Q[Country], P[Country] ) )
}
),
null,
{"Country"}
)
in
Source

Getting AutoQuery pagination to work with left join

In my AutoQuery request I have a left join specified so I can query on properties in the joined table.
public class ProductSearchRequest : QueryDb<Book>
, ILeftJoin<Book, BookAuthor>, ILeftJoin<BookAuthor, Author>
{}
If I use standard way of autoquery like so:
var q = AutoQuery.CreateQuery(request, base.Request);
var results = AutoQuery.Execute(request, q);
And 100 are being requested, then often less than 100 will be retuned as the Take() is based on results with a left join.
To remedy this I am doing this instead:
var q = AutoQuery.CreateQuery(request, base.Request);
q.OrderByExpression = null //throws error if orderby exists
var total = Db.Scalar<int>(q.Select(x => Sql.CountDistinct(x.Id))); //returns 0
var q1 = AutoQuery.CreateQuery(request, base.Request).GroupBy(x => x);
var results = Db.Select<Book>(q1);
return new QueryResponse<Book>
{
Offset = q1.Offset.GetValueOrDefault(0),
Total = total
Results = results
};
The group by appears to return correct number of results so paging works but the Total returns 0.
I also tried:
var total2 = (int)Db.Count(q1);
But even though q1 has a GroupBy() it returns the number of results including the left join and not the actual query
How can I get the true total of the query?
(Getting some official docs on how to do paging and totals with autoquery & left join would be very helpful as right now it's a bit confusing)
Your primary issue stems from trying to return a different total then the actual query AutoQuery executes. If you have multiple left joins, the total is the total results of the query it executes not the number of rows in your source table.
So you're not looking for the "True total", rather you're looking to execute a different query to get a different total than the query that's executed, but still deriving from the original query as its basis. First consider using normal INNER JOINS (IJoin<>) instead of LEFT JOINS so it only returns results for related rows in joined tables which the total will reflect accordingly.
Your total query that returns 0 is likely returning no results, so I'd look at looking at the query in an SQL Profiler so you can see the query that's executed. You can also enable logging of OrmLite queries with Debug logging enabled and in your AppHost:
OrmLiteUtils.PrintSql();
Also note that GroupBy() of the entire table is unusual, you would normally group by a single or multiple explicit selected columns, e.g:
.GroupBy(x => x.Id);
.GroupBy(x => new { x.Id, x.Name });

how to get total number of ArangoDB AQL for paging

AQL support basic AQL for paging by LIMIT offset, count. But I need to get the total number of the query in order to know the total pages. How to get the total count of the query?
I know the LENGTH function to get the count of some collection, but maybe it doesn't suit for the following:
FOR v in 2 any 'Collection/id1' GRAPH 'graph-name' FILTER ... LIMIT 10 RETURN distinct v.
I want to get the total number, but I can't get it by RETURN distinct LENGTH(v)
I now can implement this in a ungraceful way:
LET nodeList=(FOR v IN 2 any 'Collection/id1' GRAPH 'graph-name' FILTER ... RETURN distinct v)
FOR v IN 2 any 'Collection/id1' GRAPH 'graph-name' FILTER ... limit 10 RETURN distinct {'nodes': v, 'total':LENGTH(nodeList)}
Is there any other good idea to get this?
I found this answer from the arangodb spring data project.
AqlQueryOptions has fullCount() function, to return the total count of the query.
and you can return the PageImpl which contains the query content and the pagination info.

Parse GeoPoint query slow and timed out using javascript sdk in node.js

I have the following parse query which times out when the number of records is large.
var query = new Parse.Query("UserLocation");
query.withinMiles("geo", geo, MAX_LOCATION_RADIUS);
query.ascending("createdAt");
if (createdAt !== undefined) {
query.greaterThan("createdAt", createdAt);
}
query.limit(1000);
it runs ok if UserLocation table is small. But the query times out from time to time when the table has ~100k records:
[2015-07-15 21:03:30.879] [ERROR] [default] - Error while querying for locations: [latitude=39.959064, longitude=-75.15846]: {"code":124,"message":"operation was slow and timed out"}
UserLocation table has a latitude,longitude pair and a radius. Given a geo point (latitude,longitude), I'm trying to find the list of UserLocations whose circle (lat,long)+radius covers the given geo point. It doesn't seem like I can use the value from another column in the table for the distance query (something like query.withinMiles("geo", inputGeo, "radius"), where "geo" and "radius" are the column names for GeoPoint and radius). It also has the limit that query "limit" combined with "skip" can only return maximum of 10,000 records (1000 records at a time and skip 10 times). So I had to do a almost full table scan by using "createdAt" as a filter criteria and keep querying until the query doesn't return results any more.
Anyway I can improve the algorithm so that it doesn't time out on large data set?

Error in Linq: The text data type cannot be selected as DISTINCT because it is not comparable

I've a problem with LINQ. Basically a third party database that I need to connect to is using the now depreciated text field (I can't change this) and I need to execute a distinct clause in my linq on results that contain this field.
I don't want to do a ToList() before executing the Distinct() as that will result in thousands of records coming back from the database that I don't require and will annoy the client as they get charged for bandwidth usage. I only need the first 15 distinct records.
Anyway query is below:
var query = (from s in db.tSearches
join sc in db.tSearchIndexes on s.GUID equals sc.CPSGUID
join a in db.tAttributes on sc.AttributeGUID equals a.GUID
where s.Notes != null && a.Attribute == "Featured"
select new FeaturedVacancy
{
Id = s.GUID,
DateOpened = s.DateOpened,
Notes = s.Notes
});
return query.Distinct().OrderByDescending(x => x.DateOpened);
I know I can do a subquery to do the same thing as above (tSearches contains unique records) but I'd rather a more straightfoward solution if available as I need to change a number of similar queries throughout the code to get this working.
No answers on how to do this so I went with my first suggestion and retrieved the unique records first from tSearch then constructed a subquery with the non unique records and filtered the search results by this subquery. Answer below:
var query = (from s in db.tSearches
where s.DateClosed == null && s.ConfidentialNotes != null
orderby s.DateOpened descending
select new FeaturedVacancy
{
Id = s.GUID,
Notes = s.ConfidentialNotes
});
/* Now filter by our 'Featured' attribute */
var subQuery = from sc in db.tSearchIndexes
join a in db.tAttributes on sc.AttributeGUID equals a.GUID
where a.Attribute == "Featured"
select sc.CPSGUID;
query = query.Where(x => subQuery.Contains(x.Id));
return query;

Resources