Many to many AQL query - arangodb

I have 2 collections and one edge collection. USERS, FILES and FILES_USERS.
Im trying to get all FILES documents, that has the field "what" set to "video", for a specific user, but also embed another document, also from the collection FILES, but where the "what" is set to "trailer" and belongs to the "video" into the results.
I have tried the below code but its not working correctly, im getting a lot of duplicate results...its a mess. Im definitely doing it wrong.
FOR f IN files
FILTER f.what=="video"
LET trailer = (
FOR f2 IN files
FILTER f2.parent_key==f._key
AND f2.what=="trailer"
RETURN f2
)
FOR x IN files_users
FILTER x._from=="users/18418062"
AND x.owner==true
RETURN DISTINCT {f,trailer}

There may be a better way to do this with graph query syntax, but try this. Adjust the UNIQUE functions based on your data-model.
LET user_files = UNIQUE(FOR u IN FILES_USERS
FILTER u._from == "users/18418062" AND u.owner
RETURN u._to)
FOR uf IN user_files
FOR f IN files
FILTER f._key == uf AND f.what == "video"
LET trailers = UNIQUE(FOR t IN files
FILTER t.parent_key == f._key AND t.what == "trailer"
RETURN t)
RETURN {"video": f, "trailers": trailers}

Well, check to see If you have duplicate data as suggested by TMan, however check your query syntax too. It appears that you have no link between your f subquery and the x in the main query. That would cause the query to potentially return a lot of dups if there are multiple records in collection files_users for user users/18418062
Try adding a join in the main query. Something like:
FOR x IN files_users
FILTER x._from=="users/18418062"
AND x.owner==true
AND x._to == f._id
RETURN DISTINCT {f,trailer}
On a related note, if you run into performance issues doing a subquery for trailers , you could instead try just doing a join and array expansion and see if that works for your case

Related

How to check if ArangoDB query is not empty?

I would like to make an exists PostgreSQL query.
Let's say I have a Q ArangoDB query (AQL). How can I check if Q returns any result?
Example:
Q = "For u in users FILTER 'x#example.com' = u.email"
What is the best way to do it (most performant)?
I have ideas, but couldn't find an easy way to measure the performance:
Idea 1: using Length:
RETURN LENGTH(%Q RETURN 1) > 0
Idea 2: using Frist:
RETURN First(%Q RETURN 1) != null
Above, %Q is a substitution for the query defined at the beginning.
I think the best way to achieve this for a generic selection query with a structure like
Q = "For u in users FILTER 'x#example.com' = u.email"
is to first add a LIMIT clause to the query, and only make it return a constant value (in contrast to the full document).
For example, the following query returns a single match if there is such document or an empty array if there is no match:
FOR u IN users FILTER 'x#example.com' == u.email LIMIT 1 RETURN 1
(please note that I also changed the operator from = to == because otherwise the query won't parse).
Please note that this query may benefit a lot from creating an index on the search attribute, i.e. email. Without the index the query will do a full collection scan and stop at the first match, whereas with the index it will just read at most a single index entry.
Finally, to answer your question, the template for the EXISTS-like query will then become
LENGTH(%Q LIMIT 1 RETURN 1)
or fleshed out via the example query:
LENGTH(FOR u IN users FILTER 'x#example.com' == u.email LIMIT 1 RETURN 1)
LENGTH(...) will return the number of matches, which in this case will either be 0 or 1. And it can also be used in filter conditions like as follows
FOR ....
FILTER LENGTH(...)
RETURN ...
because LENGTH(...) will be either 0 or 1, which in context of a FILTER condition will evaluate to either false or true.
Do you need and AQL solution?
Only the count:
var q = "For u in users FILTER 'x#example.com' = u.email";
var res = db._createStatement({query: q, count: true}).execute();
var ct = res.count();
Is the fastest I can think of.

Can I filter multiple collections?

I want to filter multiple collections, to return only documents who have those requirements, the problem is when there is more than one matching value in one collection, the elements shown are repeated.
FOR TurmaA IN TurmaA
FOR TurmaB IN TurmaB
FILTER TurmaA.Disciplinas.Mat >10
FILTER TurmaB.Disciplinas.Mat >10
RETURN {TurmaA,TurmaB}
Screenshot of the problem
What your query does is to iterate over all documents of the first collection, and for each record it iterates over the second collection. The applied filters reduce the number of results, but this is not how you should go about it as it is highly inefficient.
Do you actually want to return the union of the matches from both collections?
(SELECT ... UNION SELECT ... in SQL).
What you get with your current approach are all possible combinations of the documents from both collections. I believe what you want is:
LET a = (FOR t IN TurmaA FILTER t.Disciplinas.Mat > 10 RETURN t)
LET b = (FOR t IN TurmaB FILTER t.Disciplinas.Mat > 10 RETURN t)
FOR doc IN UNION(a, b)
RETURN doc
Both collections are filtered individually in sub-queries, then the results are combined and returned.
Another solution would be to store all documents in one collection Turma and have another attribute e.g. Type with a value of "A" or "B". Then the query would be as simple as:
FOR t IN Turma
FILTER t.Disciplinas.Mat > 10
RETURN t
If you want to return TurmaA documents only, you would do:
FOR t IN Turma
FILTER t.Disciplinas.Mat > 10 AND t.Type == "A"
RETURN t
BTW. I recommend to call variables different from collection names, e.g. t instead of Turma if there is a collection Turma.

using collect in arangodb insert to create new documents

I have a collection called prodSampleNew with documents that have hierarchy levels as fields in arangodb:
{
prodId: 1,
LevelOne: "clothes",
LevelTwo: "pants",
LevelThree: "jeans",
... etc....
}
I want take the hierarchy levels and convert them into their own documents, so I can eventually build a proper graph with the hierarchy.
I was able to get this to extract the first level fo the hierarchy and put it in a new collection using the following:
for i IN [1]
let HierarchyList = (
For prod in prodSampleNew
COLLECT LevelOneUnique = prod.LevelOne
RETURN LevelOneUnique
)
FOR hierarchyLevel in HierarchyList
INSERT {"name": hierarchyLevel}
IN tmp
However, having to put a for I IN [1] at the top seems wrong and that there should be a better way.(yes I am fairly new to AQL)
Any pointers on a better way to do this would be appreciated
Not sure what you are trying to achieve exactly.
The FOR i IN [1] seems unnecessary however, so you could start your AQL query directly with the subquery to compute the distinct values from hierarchy level 1:
LET HierarchyList = (
FOR prod IN prodSampleNew
COLLECT LevelOneUnique = prod.LevelOne
RETURN LevelOneUnique
)
FOR hierarchyLevel IN HierarchyList
INSERT {"name": hierarchyLevel} IN tmp
The result should be the same.
If the question is more like "how can I get all distinct names of levels from all hierarchies", then you could use something like
LET HierarchyList = UNIQUE(FLATTEN(
FOR prod IN prodSampleNew
RETURN [ prod.LevelOne, prod.LevelTwo, prod.LevelThree ]
))
...
to produce an array with the unique names of the hierarchy levels for level 1-3.
Shouldn't this answer your question, please describe the desired result the query should produce.

Filtering Haystack (SOLR) results by django_id

With Django/Haystack/SOLR, I'd like to be able to restrict the result of a search to those records within a particular range of django_ids. Getting these IDs is not a problem, but trying to filter by them produces some unexpected effects. The code looks like this (extraneous code trimmed for clarity):
def view_results(request,arg):
# django_ids list is first calculated using arg...
sqs = SearchQuerySet().facet('example_facet') # STEP_1
sqs = sqs.filter(django_id__in=django_ids) # STEP_2
view = search_view_factory(
view_class=SearchView,
template='search/search-results.html',
searchqueryset=sqs,
form_class=FacetedSearchForm
)
return view(request)
At the point marked STEP_1 I get all the database records. At STEP_2 the records are successfully narrowed down to the number I'd expect for that list of django_ids. The problem comes when the search results are displayed in cases where the user has specified a search term in the form. Rather than returning all records from STEP_2 which match the term, I get all records from STEP_2 plus all from STEP_1 which match the term.
Presumably, therefore, I need to override one/some of the methods in for SearchView in haystack/views.py, but what? Can anyone suggest a means of achieving what is required here?
After a bit more thought, I found a way around this. In the code above, the problem was occurring in the view = search_view_factory... line, so I needed to create my own SearchView class and override the get_results(self) method in order to apply the filtering after the search has been run with the user's search terms. The result is code along these lines:
class MySearchView(SearchView):
def get_results(self):
search = self.form.search()
# The ID I need for the database search is at the end of the URL,
# but this may have some search parameters on and need cleaning up.
view_id = self.request.path.split("/")[-1]
view_query = MyView.objects.filter(id=view_id.split("&")[0])
# At this point the django_ids of the required objects can be found.
if len(view_query) > 0:
view_item = view_query.__getitem__(0)
django_ids = []
for thing in view_item.things.all():
django_ids.append(thing.id)
search = search.filter_and(django_id__in=django_ids)
return search
Using search.filter_and rather than search.filter at the end was another thing which turned out to be essential, but which didn't do what I needed when the filtering was being performed before getting to the SearchView.

Search query with Subsonic

Ok,
Today I am trying to learn Subsonic. Pretty cool stuff.
I am trying to build some search functionality into my website but am struggling about how I might achieve this in Subsonic.
I have one search field that could contain multiple keywords. I want to return results that match all of the keywords. The target on the search is a single text column.
So far I have this (it runs but never returns results):
return new SubSonic.Select().From(Visit.Schema)
.InnerJoin(InfopathArchive.VisitIdColumn, Visit.VisitIdColumn)
.Where(InfopathArchive.XmlDocColumn).Like(keywords)
.ExecuteTypedList<Visit>();
There is a one to one mapping between the Visit table and the InfoPathArchive table. I just want to return the collection of Visits that have the keywords in the related XMLDocColumn.
If I could get that working it would be great. Now the second problem is that if someone searches for 'australia processmodel' then obviously the above code should only return that exact phrase. How can I create a query that splits up my search term so that it must return documents that contain ALL of the individual search terms?
Any help appreciated.
Edit: Ok, so the basic search works, but the multiple keyword search doesnt. I did what Adam suggested but it seems Subsonic only uses one parameter for the query.
Here is the code:
List<string> wordsInQueryList = keywords.Split(' ').ToList();
SqlQuery q = Select.AllColumnsFrom<Visit>()
.InnerJoin(InfopathArchive.VisitIdColumn, Visit.VisitIdColumn)
.Where(Visit.IsDeletedColumn).IsEqualTo(false);
foreach(string wordInQuery in wordsInQueryList)
{
q = q.And(InfopathArchive.XmlDocColumn).Like("%" + wordInQuery + "%");
}
return q.ExecuteTypedList();
Then if I look at the query that Subsonic generates:
SELECT (bunch of columns)
FROM [dbo].[Visit]
INNER JOIN [dbo].[InfopathArchive] ON [dbo].[Visit].[VisitId] = [dbo].[InfopathArchive].[VisitId]
WHERE [dbo].[Visit].[IsDeleted] = #IsDeleted
AND [dbo].[InfopathArchive].[XmlDoc] LIKE #XmlDoc
AND [dbo].[InfopathArchive].[XmlDoc] LIKE #XmlDoc
So it ends up that only the last keyword is being searched for.
Any ideas?
First question:
return new SubSonic.Select().From(Visit.Schema)
.InnerJoin(InfopathArchive.VisitIdColumn, Visit.VisitIdColumn)
.Where(InfopathArchive.XmlDocColumn).Like("%" + keywords + "%")
.ExecuteTypedList<Visit>();
Second question:
Pass a List of words in your query to a function that builds a SubSonic query as follows
SqlQuery query = DB.Select().From(Visit.Schema)
.InnerJoin(InfopathArchive.VisitIdColumn, Visit.VisitIdColumn)
.Where("1=1");
foreach(string wordInQuery in wordsInQueryList)
{
query = query.And(InfopathArchive.XmlDocColumn).Like("%" + wordInQuery + "%")
}
return query.ExecuteTypedList<Visit>();
Obviously this is untested but it should point you in the right direction.
You can do what Adam is suggesting or with 2.2 you can simply use "Contains()" instead of Like("%...%"). We also support StartsWith and EndsWith() :)

Resources