ArangoDB: Traversal condition on related document - arangodb

Been stuck for days with this concern, trying to accomplish this:
See the provided picture.
The black is the start vertex. Trying to get:
1: All child parts OUTBOUND (from) the start vertex
2: Condition: The children MUST have the INBOUND edge"types" and the other end a document with a variable set to "true" and of the type "type".
3: When a document of type "part" fails to met up the requirements with INBOUND document of type "type" with a attribute to "true", it stops the expand for that path then and there.
4: The documents who failed isn't included in the result either.
5: Should be compatible with any depths.
6: No subqueries (if possible without).
Example of graph

With the given information, the data model seems questionable. Why are there true and false vertices instead of a boolean edge attribute per partScrew? Is there a reason why it is modeled like this?
Using this data model, I don't see how this would be possible without subqueries. The traversal down a path can be stopped early with PRUNE, but that does not support subqueries. That only leaves FILTER for post-filtering as option, but be careful, you need to check all vertices on the path and not just the emitted vertex whether it has an inbound false type.
Not sure if it works as expected in all cases, but here is what I came up with and the query result, which looks good to me:
LET startScrew = FIRST(FOR doc IN screw LIMIT 1 RETURN doc) // Screw A
FOR v,e,p IN 1..2 OUTBOUND startScrew partScrew
FILTER (
FOR v_id IN SHIFT(p.vertices[*]._id) // ignore start vertex
FOR v2 IN 1..1 INBOUND v_id types
RETURN v2.value
) NONE == false
RETURN {
path: CONCAT_SEPARATOR(" -- ", p.vertices[*].value)
}
path
Screw A -- Part D
Screw A -- Part E
Screw A -- Part E -- Part F
Dump with test data: https://gist.github.com/Simran-B/6bd9b154d1d1e2e74638caceff42c44f

Related

ArangoDB - Get edge information while using traversals

I'm interested in using traversals to quickly find all the documents linked to an initial document. For this I'd use:
let id = 'documents/18787898'
for d in documents
filter d._id == id
for i in 1..1 any d edges
return i
This generally provides me with all the documents related to the initial ones. However, say that in these edges I have more information than just the standard _from and _to. Say it also contains order, in which I indicate the order in which something is to be displayed. Is there a way to also grab that information at the same time as making the traversal? Or do I now have to make a completely separate query for that information?
You are very close, but your graph traversal is slightly incorrect.
The way I read the documentation, it shows that you can return vertex, edge, and path objects in a traversal:
FOR vertex[, edge[, path]]
IN [min[..max]]
OUTBOUND|INBOUND|ANY startVertex
edgeCollection1, ..., edgeCollectionN
I suggest adding the edge variable e to your FOR statement, and you do not need to find document/vertex matches first (given than id is a single string), so the FOR/FILTER pair can be eliminated:
LET id = 'documents/18787898'
FOR v, e IN 1 ANY id edges
RETURN e

Count distinct nodes from traversal in AQL

I am able to get all distinct nodes from a query, but not the count:
FOR v in 2..2 OUTBOUND "starting_node" GRAPH "some_graph"
return DISTINCT v._key
I want to get only the count of the result. I tried to use LENGTH(DISTINCT v._key) as suggested in the docs, but it's not a proper syntax of the AQL:
syntax error, unexpected DISTINCT modifier near 'DISTINCT v._key)'
The naive solution is to get all keys and count it on the client side, but I am curious how to do it on the server side?
What RETURN DISTINCT does is to remove duplicate values, but only after the traversal.
You can set traversal options to eliminate paths during the traversal, which can be more efficient especially if you have a highly interconnected graph and a high traversal depth:
RETURN LENGTH(
FOR v IN 2..2 OUTBOUND "starting_node" GRAPH "some_graph"
OPTIONS { uniqueVertices: "global", bfs: true }
RETURN v._key
)
The traversal option uniqueVertices can be set to "global" so that you don't get the same vertex returned twice from this traversal. The option for breadth-first search bfs needs to be enabled to use uniqueVertices: "global". The reason why depth-first search does not support this uniqueness option is that the result would not be deterministic, hence this combination was disabled.
Inspired by this blogpost http://jsteemann.github.io/blog/2014/12/12/aql-improvements-for-24/ I prepared the solution using LET:
LET result = (FOR v in 2..2 OUTBOUND "starting_node" GRAPH "some_graph"
return DISTINCT v._key)
RETURN LENGTH(result)
It might be not optimal solution, but it works as I expected.

Limit edges used on named graph traversal

Q: Can I limit the edge collections the system will try to use when traversing named graphs AQL?
Scenario:
If I have a named graph productGraph with two vertices collections and two edge collections:
Vertices: product, price
prodParentOf (product A is parent of product B)
prodHasPrice (product A has a price of $X)
If now I want the products children of product A (and no prices) , I would like to do something like this
WITH product
FOR v, e, p IN OUTBOUND 'product/A'
GRAPH 'productGraph'
RETURN {vertice:v, edge:e, path: p}
However, if I look at the explain plan, I see that the system attempted to use the indexes for both prodParentOf and prodHasPrice (even if I explicitly put the product collection in the 'With' clause):
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
2 edge prodHasPrice false false 75.00 % [ `_from`, `_to` ] base OUTBOUND
2 edge prodParentOf false false 65.37 % [ `_from`, `_to` ] base OUTBOUND
Can I limit the edge collections the system will try to use when querying named graphs? Or do I have to use edge collections in the query instead. (which in my mind would mean that it would better to traverse edge collections in general than named graphs).
Here is the same query using an edge collection
FOR v, e, p IN OUTBOUND 'product/A'
prodParentOf
RETURN {vertice:v, edge:e, path: p}
The WITH clause does not impose restrictions on which collections that are part
of your named graph will be used in a traversal. It is mainly for traversals in cluster, to declare which collections will be involved. This helps to avoid deadlocks, which may occur if collections are lazily locked at query runtime.
If you use a single server instance, then the WTIH clause is optional. It does not have an effect on the result. If you want to exclude collections from traversal, you can either use collections sets instead of the named graph, or use FILTERs together with IS_SAME_COLLECTION(). Using collection sets is more efficient, because with less edge collections there are less edges to traverse, whereas filters are applied after the traversal in most cases.
FOR v, e, p IN 1..5 OUTBOUND 'verts/start' GRAPH 'named-graph'
FILTER (FOR id IN p.edges[*]._id RETURN IS_SAME_COLLECTION('edgesX', id)) ALL == true
RETURN p
If your traversal has a depth of 1 only, then a filter query is simpler:
FOR v, e, p IN INBOUND 'product/A' GRAPH 'productGraph'
FILTER IS_SAME_COLLECTION('prodParentOf', e)
RETURN {vertex: v, edge: e, path: p}
A way to prune paths may come in the future, which should also help with your named graph scenario.

How to structure CYPHER query to return a different results for none existent paths and none existent nodes

In my application I have setup roles which provide users with different levels of access to other users assets.
I have this query to return an asset BobsPrivate where user requesting is Bob
MATCH (u:User {name: 'Bob' })
MATCH (n:Asset:Album {name:'BobsPrivate'})
WHERE (u)-[:CREATED|:FRIENDS_CAN_READ]->(n) OR (n)<-[:CAN_READ]-()<-[:BELONGS_TO]-(u)
RETURN n
All my queries are over the REST API from node.
This works as expected and returns the asset because one of the relationships are present and if I pass a non existent asset name such as foo, it works as expected and does not return anything.
When I pass user James and BobsPrivate it also returns nothing, as you would expect but I would like to return something different.
My problem is that I get the same result for a non existent asset and a non existent relationship with the latter being equivalent to not having the proper access level.
How can I structure my query such that I can return two different results so that I can handle the HTTP response differently in my controller (404, 403)? I would also need to use this same principle in my UPDATE and DELETE methods.
EDIT:
I changed my query a little and it gives me what I'm looking for but, it does introduce another MATCH so I'm still open to suggestions
OPTIONAL MATCH (u:User {name: 'Bob' })
OPTIONAL MATCH (n:Asset:Album {name:'BobsPrivate'})
WHERE (u)-[:CREATED|:FRIENDS_CAN_READ]->(n) OR (n)<-[:CAN_READ]-()<-[:BELONGS_TO]-(u)
WITH n
OPTIONAL MATCH (l:Asset:Album {name:'BobsPrivate'})
RETURN n AS ASSET, l IS NOT NULL AS ASSET_EXISTS, CASE WHEN n IS NOT NULL AND l IS NOT NULL THEN true ELSE
What this lets me do is return the Asset as is, without any additional drilling down based on relationship so I can return a helpful boolean that my controller can use.
I'm new to NEO4J and I'm pretty sure there will be a better way than this so if you do know, then I would greatly appreciate it.
My 5-minute attempt to achieve desired behaviour:
MATCH (u:User {name: 'Bob' })
MATCH (n:Asset:Album {name:'BobsPrivate'})
RETURN
n,
EXISTS((u)-[:CREATED|:FRIENDS_CAN_READ]->(n)) as isDirectAccessible,
EXISTS((n)<-[:CAN_READ]-()<-[:BELONGS_TO]-(u)) as isIndirectAccessible
We are retrieving all facts separately:
Asset
Return boolean flag, whether asset can be directly accessed
Return boolean flag, whether asset can be indirectly accessed
Then, at a client side, we can decide what we are going to do with that. For example:
If no data are returned at all: return 404
If there is such asset in database, but one (or both of them) flag is false: return 403
If asset exists, both flags are true: return 200 and data
Delete query example:
MATCH (u:User {name: 'Bob' })
MATCH (n:Asset:Album {name:'BobsPrivate'})
WITH
n,
EXISTS((u)-[:CREATED|:FRIENDS_CAN_READ]->(n)) as isDirectAccessible,
EXISTS((n)<-[:CAN_READ]-()<-[:BELONGS_TO]-(u)) as isIndirectAccessible
WITH n, isDirectAccessible, isIndirectAccessible,
(CASE
WHEN isDirectAccessible OR isIndirectAccessible THEN n
ELSE null
END) as deletableObject
DETACH DELETE deletableObject
RETURN (deletableObject IS NOT NULL) as isDeleted, isDirectAccessible, isIndirectAccessible
Same principles are working here:
If nothing return, then asset do not exists
If something return: check flags
Note: I am feeling that this might not be the best approach. But, hey, it works.

Whats the best method to to filter graph edges by type in AQL

I have the following super-simple graph :
What i am trying to do is:
Select all questions where there is a property on the question document called firstQuestion with a value of true.
Select any options that are connected to the question via an outbound edge of type with_options
The following query works, however it feels like there must be a better way to inspect the edge type without using string operations - specifically the concatenation operation i use to recreate the edge _id value by joining it to the key with the edge type i want - is this the best way to inspect the type of edge?
FOR question IN questions
FILTER question.firstQuestion == true
let options =
(FOR v, e IN 1..1 OUTBOUND question._id GRAPH 'mygraph'
FILTER CONCAT('with_options/', e._key) == e._id
RETURN v)
RETURN {question: question, options: options}
We're currently introducing IS_SAME_COLLECTION for that specific purpose with ArangoDB 2.8.1.
The DOCUMENT function is also worth to mention in this context.
FOR question IN questions
FILTER question.firstQuestion == true
LET options = (FOR v, e IN 1..1 OUTBOUND question._id GRAPH 'mygraph'
FILTER IS_SAME_COLLECTION('with_options', e._id)
RETURN v)
RETURN {question: question, options: options}
However, the best solution in this special case is not to use the named graph interface, but specify the list of edge collections that should be concerned by the traversal in first place:
FOR question IN questions
FILTER question.firstQuestion == true
LET options = (FOR v, e IN 1..1 OUTBOUND question._id with_options RETURN v)
RETURN {question: question, options: options}

Resources