Limit edges used on named graph traversal - arangodb

Q: Can I limit the edge collections the system will try to use when traversing named graphs AQL?
Scenario:
If I have a named graph productGraph with two vertices collections and two edge collections:
Vertices: product, price
prodParentOf (product A is parent of product B)
prodHasPrice (product A has a price of $X)
If now I want the products children of product A (and no prices) , I would like to do something like this
WITH product
FOR v, e, p IN OUTBOUND 'product/A'
GRAPH 'productGraph'
RETURN {vertice:v, edge:e, path: p}
However, if I look at the explain plan, I see that the system attempted to use the indexes for both prodParentOf and prodHasPrice (even if I explicitly put the product collection in the 'With' clause):
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
2 edge prodHasPrice false false 75.00 % [ `_from`, `_to` ] base OUTBOUND
2 edge prodParentOf false false 65.37 % [ `_from`, `_to` ] base OUTBOUND
Can I limit the edge collections the system will try to use when querying named graphs? Or do I have to use edge collections in the query instead. (which in my mind would mean that it would better to traverse edge collections in general than named graphs).
Here is the same query using an edge collection
FOR v, e, p IN OUTBOUND 'product/A'
prodParentOf
RETURN {vertice:v, edge:e, path: p}

The WITH clause does not impose restrictions on which collections that are part
of your named graph will be used in a traversal. It is mainly for traversals in cluster, to declare which collections will be involved. This helps to avoid deadlocks, which may occur if collections are lazily locked at query runtime.
If you use a single server instance, then the WTIH clause is optional. It does not have an effect on the result. If you want to exclude collections from traversal, you can either use collections sets instead of the named graph, or use FILTERs together with IS_SAME_COLLECTION(). Using collection sets is more efficient, because with less edge collections there are less edges to traverse, whereas filters are applied after the traversal in most cases.
FOR v, e, p IN 1..5 OUTBOUND 'verts/start' GRAPH 'named-graph'
FILTER (FOR id IN p.edges[*]._id RETURN IS_SAME_COLLECTION('edgesX', id)) ALL == true
RETURN p
If your traversal has a depth of 1 only, then a filter query is simpler:
FOR v, e, p IN INBOUND 'product/A' GRAPH 'productGraph'
FILTER IS_SAME_COLLECTION('prodParentOf', e)
RETURN {vertex: v, edge: e, path: p}
A way to prune paths may come in the future, which should also help with your named graph scenario.

Related

ArangoDB: Traversal condition on related document

Been stuck for days with this concern, trying to accomplish this:
See the provided picture.
The black is the start vertex. Trying to get:
1: All child parts OUTBOUND (from) the start vertex
2: Condition: The children MUST have the INBOUND edge"types" and the other end a document with a variable set to "true" and of the type "type".
3: When a document of type "part" fails to met up the requirements with INBOUND document of type "type" with a attribute to "true", it stops the expand for that path then and there.
4: The documents who failed isn't included in the result either.
5: Should be compatible with any depths.
6: No subqueries (if possible without).
Example of graph
With the given information, the data model seems questionable. Why are there true and false vertices instead of a boolean edge attribute per partScrew? Is there a reason why it is modeled like this?
Using this data model, I don't see how this would be possible without subqueries. The traversal down a path can be stopped early with PRUNE, but that does not support subqueries. That only leaves FILTER for post-filtering as option, but be careful, you need to check all vertices on the path and not just the emitted vertex whether it has an inbound false type.
Not sure if it works as expected in all cases, but here is what I came up with and the query result, which looks good to me:
LET startScrew = FIRST(FOR doc IN screw LIMIT 1 RETURN doc) // Screw A
FOR v,e,p IN 1..2 OUTBOUND startScrew partScrew
FILTER (
FOR v_id IN SHIFT(p.vertices[*]._id) // ignore start vertex
FOR v2 IN 1..1 INBOUND v_id types
RETURN v2.value
) NONE == false
RETURN {
path: CONCAT_SEPARATOR(" -- ", p.vertices[*].value)
}
path
Screw A -- Part D
Screw A -- Part E
Screw A -- Part E -- Part F
Dump with test data: https://gist.github.com/Simran-B/6bd9b154d1d1e2e74638caceff42c44f

ArangoDB - Get edge information while using traversals

I'm interested in using traversals to quickly find all the documents linked to an initial document. For this I'd use:
let id = 'documents/18787898'
for d in documents
filter d._id == id
for i in 1..1 any d edges
return i
This generally provides me with all the documents related to the initial ones. However, say that in these edges I have more information than just the standard _from and _to. Say it also contains order, in which I indicate the order in which something is to be displayed. Is there a way to also grab that information at the same time as making the traversal? Or do I now have to make a completely separate query for that information?
You are very close, but your graph traversal is slightly incorrect.
The way I read the documentation, it shows that you can return vertex, edge, and path objects in a traversal:
FOR vertex[, edge[, path]]
IN [min[..max]]
OUTBOUND|INBOUND|ANY startVertex
edgeCollection1, ..., edgeCollectionN
I suggest adding the edge variable e to your FOR statement, and you do not need to find document/vertex matches first (given than id is a single string), so the FOR/FILTER pair can be eliminated:
LET id = 'documents/18787898'
FOR v, e IN 1 ANY id edges
RETURN e

Count distinct nodes from traversal in AQL

I am able to get all distinct nodes from a query, but not the count:
FOR v in 2..2 OUTBOUND "starting_node" GRAPH "some_graph"
return DISTINCT v._key
I want to get only the count of the result. I tried to use LENGTH(DISTINCT v._key) as suggested in the docs, but it's not a proper syntax of the AQL:
syntax error, unexpected DISTINCT modifier near 'DISTINCT v._key)'
The naive solution is to get all keys and count it on the client side, but I am curious how to do it on the server side?
What RETURN DISTINCT does is to remove duplicate values, but only after the traversal.
You can set traversal options to eliminate paths during the traversal, which can be more efficient especially if you have a highly interconnected graph and a high traversal depth:
RETURN LENGTH(
FOR v IN 2..2 OUTBOUND "starting_node" GRAPH "some_graph"
OPTIONS { uniqueVertices: "global", bfs: true }
RETURN v._key
)
The traversal option uniqueVertices can be set to "global" so that you don't get the same vertex returned twice from this traversal. The option for breadth-first search bfs needs to be enabled to use uniqueVertices: "global". The reason why depth-first search does not support this uniqueness option is that the result would not be deterministic, hence this combination was disabled.
Inspired by this blogpost http://jsteemann.github.io/blog/2014/12/12/aql-improvements-for-24/ I prepared the solution using LET:
LET result = (FOR v in 2..2 OUTBOUND "starting_node" GRAPH "some_graph"
return DISTINCT v._key)
RETURN LENGTH(result)
It might be not optimal solution, but it works as I expected.

Whats the best method to to filter graph edges by type in AQL

I have the following super-simple graph :
What i am trying to do is:
Select all questions where there is a property on the question document called firstQuestion with a value of true.
Select any options that are connected to the question via an outbound edge of type with_options
The following query works, however it feels like there must be a better way to inspect the edge type without using string operations - specifically the concatenation operation i use to recreate the edge _id value by joining it to the key with the edge type i want - is this the best way to inspect the type of edge?
FOR question IN questions
FILTER question.firstQuestion == true
let options =
(FOR v, e IN 1..1 OUTBOUND question._id GRAPH 'mygraph'
FILTER CONCAT('with_options/', e._key) == e._id
RETURN v)
RETURN {question: question, options: options}
We're currently introducing IS_SAME_COLLECTION for that specific purpose with ArangoDB 2.8.1.
The DOCUMENT function is also worth to mention in this context.
FOR question IN questions
FILTER question.firstQuestion == true
LET options = (FOR v, e IN 1..1 OUTBOUND question._id GRAPH 'mygraph'
FILTER IS_SAME_COLLECTION('with_options', e._id)
RETURN v)
RETURN {question: question, options: options}
However, the best solution in this special case is not to use the named graph interface, but specify the list of edge collections that should be concerned by the traversal in first place:
FOR question IN questions
FILTER question.firstQuestion == true
LET options = (FOR v, e IN 1..1 OUTBOUND question._id with_options RETURN v)
RETURN {question: question, options: options}

How can I determine root objects in an arangodb tree graph?

I have a document collection containing tree nodes and an edge collection containing "is child of" like this:
Folders=[
{_key:"1",name:"Root1"},
{_key:"2",name:"Root2"},
{_key:"3",name:"Root1.Node1"},
{_key:"4",name:"Root1.Node2"}]
FolderRelations=[
{_from:"Folders/3",_to:"Folders/1"},
{_from:"Folders/4",_to:"Folders/1"}
]
Now I would like to determine which Folder items are root objects in that tree (all objects that have no outbound relation).
Maybe, I am a bit stuck in thinking SQL, I would like to carry out something like:
SELECT *
FROM Folders
WHERE NOT EXIST (SELECT * FROM FolderRelations WHERE FolderRelations.FromKey=Folders.Key)
For using the traversal and path functionality, I have no vertex to start with.
here is an AQL example that should solve your problem:
for f in Folders
filter LENGTH( EDGES(FolderRelations, v._id, "outbound")) == 0
return f
you will get a list of all vertices that have no folder above in the hierarchy.
but be aware:
saving {key:1} will not have the desired effect, you have to set:
{_key: "1"}
_key is used for internal key attribute, and it has to be a string.

Resources