AQL's PRUNE: How to combine conditions? - arangodb

I am running ArangoDB 3.4.5 and I've been playing around with the PRUNE statements. I am having some difficulties combining conditions.
Assuming some vertices v on my path p have integer attributes ia and some v have boolean attributes ba. Even index v along p such as p.vertices[2] all have ba.
PRUNE HAS(v, "ia") AND v.ia != 5
works by itself.
PRUNE p.vertices[2].ba == false OR p.vertices[4].ba == false
also works by itself.
I observe, that I cannot combine them in one query, neither by multiple PRUNE statements nor by putting them in one
PRUNE (condition_1) OR (condition_2). Also I cannot put one in a PRUNE and the next in a FILTER statement.
Is anyone else experiencing this or is it just me?
UPDATE:
The FILTER and PRUNE statements did not return the desired results, the reason was however the missing OPTIONS {uniqueEdges: "none"}. As opposed to the uniqueVertices, none is not default.

I can't reproduce your issue in ArangoDB 3.4.5
If you create collections edge and vertex and populate these with an example tree:
FOR n in 0..100000
INSERT {_key: TO_STRING(n), val: n, modulo: n%2} INTO vertex
FILTER n > 0
INSERT {_from: CONCAT("vertex/", FLOOR((n-1)/2)), _to: NEW._id} INTO edge
Now I run a traversal:
WITH vertex
FOR v,e,p IN 0..5 OUTBOUND "vertex/0" edge
RETURN TO_STRING(p.vertices[*].val)
Result:
[
"[0]",
"[0,1]",
"[0,1,3]",
"[0,1,3,7]",
"[0,1,3,7,15]",
"[0,1,3,7,15,31]",
"[0,1,3,7,15,32]",
"[0,1,3,7,16]",
"[0,1,3,7,16,33]",
"[0,1,3,7,16,34]",
"[0,1,3,8]",
"[0,1,3,8,17]",
"[0,1,3,8,17,35]",
"[0,1,3,8,17,36]",
"[0,1,3,8,18]",
"[0,1,3,8,18,37]",
"[0,1,3,8,18,38]",
"[0,1,4]",
...
Next, I add "stop": true and "hide": 1 to the vertex _key: 7 and some other combinations to vertex 17 and 18. Now a PRUNE should stop traversing if the condition is meet. Be careful, the vertex itself is included in the results.
WITH vertex
FOR v,e,p IN 0..5 OUTBOUND "vertex/0" edge
PRUNE v.hide == 1 AND v.stop == true
RETURN TO_STRING(p.vertices[*].val)
Result:
[
"[0]",
"[0,1]",
"[0,1,3]",
"[0,1,3,7]", <-- stop: true, hide: 1
"[0,1,3,8]",
"[0,1,3,8,17]", <-- stop: true, hide: 1
"[0,1,3,8,18]",
"[0,1,3,8,18,37]",
"[0,1,3,8,18,38]",
...
The PRUNE condition can use AND / OR, but just one PRUNE condition is supported (in contrast to FILTERS).

Related

Filter neighbour's INBOUND vertices with path labels in ArangoDB

I have the following graph:
I'd like to write an AQL query that returns all vertices which are neighbor's INBOUND vertices colored in RED from the start vertex colored in GREEN.
I tried the following AQL to retrieve red vertices from the green vertex.
WITH collection_A, collection_W
LET A_Neighbors = (FOR t IN collection_edges
FILTER t._to == 'collection_W/W'
RETURN t._from)
let all_w = []
for item in A_Neighbors
let sub_w = (for v1 in collection_edges
FILTER v1._to == item
return v1 )
return APPEND(all_w, sub_w)
Is there any good solution other than this? Because I'm not sure this gives the correct values for start vertex collection_W/W.
My collection_edges contains following two kind of documents.
{
_from: collection_W/w,
_to: collection_A/a,
label: INBOUND
}
and
{
_from: collection_A/a,
_to: collection_W/w,
label: OUTBOUND
}
Given the diagram, I would suggest using a graph traversal specific [min[..max]] value, like this (using an anonymous graph):
WITH collection_A, collection_W
FOR vertex IN 2 ANY 'collection_W/W' // green start node
collection_edges
RETURN vertex
The [min[..max]] value can be a range (1..3) or it can be a single value (1).
0 will return the start node
1 will return adjacent nodes
2 will skip the adjacent nodes and return only nodes at the next level (if any)
2..999 will return all nodes (up to 999 hops away) from the start node
Further, if you want to make sure that you're only returning nodes from a specific collection, add a filter for that:
WITH collection_A, collection_W
FOR vertex IN 2 ANY 'collection_W/W' // green start node
collection_edges
FILTER IS_SAME_COLLECTION('collection_W', vertex)
RETURN vertex
You can also filter on edges (if you've added a specific attribute/value to your edges):
WITH collection_A, collection_W
FOR vertex, edge IN 2 ANY 'collection_W/W' // green start node
collection_edges
FILTER edge.someProperty == 'someValue' // only return vertices that are beyond matching edges
RETURN vertex
Or limit the traversal with PRUNE:
WITH collection_A, collection_W
FOR vertex, edge IN 2 ANY 'collection_W/W' // green start node
collection_edges
PRUNE edge.someProperty == 'someValue' // stop traversal when this is matched
RETURN vertex

AQL - graph traversal - filtering on path with complex condition

I would like to ask how to best traverse graph and return only subgraph based on complex condition which must be satisfied by all nodes from root to leaves.
In other words, I need some mechanism such that when the condition on any intermediate level is not met, traversal is stopped (none nested node is processed and returned to output)
Let's say I have the following graph:
A -> B -> C (active=false) -> D
where I deactivated node C (note the flag active=false means that all subgraph is deactivated including C and D).
According to documentation I can easily construct such filter via filtering on path, wildcard [*] and ALL keyword, which also stops traversing when condition on C is not met. With simple condition this works great:
for v,e,p in 1..100 outbound 'test/A' graph 'testGraph'
filter p.vertices[*].active ALL != false return v
// returns A, B
Now I have another graph where each node is either fixed or has some validity timespan (from, to) attributes:
A (type="fixed") -> B (from=2,to=3) -> C (from=1, to=5) -> D (type="fixed")
Now I would like to return only subgraph where all (intermediate) nodes are either fixed or satisfy time condition from>=2 and to<=3. I need that A,B are returned.
for v,e,p in 0..100 outbound 'test/A' graph 'testGraph'
filter p.vertices[*].type ALL == 'fixed' or
(p.vertices[*].from ALL >= 2 and p.vertices[*].from ALL <= 3)
return v
However this is obviously wrong (and returns only A), logically I need to add ALL keyword at the beginning of the condition (I need that the condition is applied on each level and when the condition is not met, traversing is stopped), however this is not supported:
filter ALL(p.vertices[*].type == 'fixed' or
(p.vertices[*].from >= 2 and p.vertices[*].from <= 3)
Classical approach via filtering on vertices does not meet my needs, because it doesn't stop traversing when the condition is not met, i.e. the following returns A,B,D (C is skipped but I also need to prune subtree of C such that D is not on output):
for v,e,p in 0..100 outbound 'test/A' graph 'testGraph'
filter v.type == 'fixed' or
(v.from >= 2 and v.from <= 3)
return v
Any ideas? Thank you.
The AQL PRUNE feature was introduced in ArangoDB versions 3.4.5 and 3.5.0. Using the AQL keyword PRUNE the traversing is stopped when a condition on the vertex, the edge, the path or any variable defined before before is met.
Pruning is the easiest variant to formulate conditions to reduce the amount of data to be checked during a search. So it allows to improve query performance and reduces the amount of overhead generated by the query. Pruning can be executed on the vertex, the edge and the path and any variable defined before.
This video tutorial shows the difference between FILTER and the new PRUNE with a hands-on example. You can find more details in the documentation.

ArangoDB: Get every node, which is in any way related to a selected node

I have a simple node-links graph in ArangoDB. How can I traverse from 1 preselected node and return all nodes which are related to it?
For example:
A→B, B→C, C→D, C→E, F→B, F→E
Selecting any of them should return the same result (all of them).
I am very new to ArangoDB.
What you need is AQL graph traversal, available since ArangoDB 2.8. Older versions provided a set of graph-related functions, but native AQL traversal is faster, more flexible and the graph functions are no longer available starting with 3.0.
AQL traversal let's you follow edges connected to a start vertex, up to a variable depth. Each encountered vertex can be accessed, e.g. for filtering or to construct a result, as well as the edge that led you to this vertex and the full path from start to finish including both, vertices and edges.
In your case, only the names of the visited vertices need to be returned. You can run the following AQL queries, assuming there's a document collection node and an edge collection links and they contain the data for this graph:
// follow edges ("links" collection) in outbound direction, starting at A
FOR v IN OUTBOUND "node/A" links
// return the key (node name) for every vertex we see
RETURN v._key
This will only return [ "B" ], because the traversal depth is implicitly 1..1 (min=1, max=1). If we increase the max depth, then we can include nodes that are indirectly connected as well:
FOR v IN 1..10 OUTBOUND "node/A" links
RETURN v._key
This will give us [ "B", "C", "D", "E"]. If we look at the graph, this is correct: we only follow edges that point from the vertex we come from to another vertex (direction of the arrow). To do the reverse, we could use INBOUND, but in your case, we want to ignore the direction of the edge and follow anyway:
FOR v IN 1..10 ANY "node/A" links
RETURN v._key
The result might be a bit surprising at first:
[ "B", "C", "D", "E", "F", "B", "F", "E", "C", "D", "B" ]
We see duplicate nodes returned. The reason is that there are multiple paths from A to C for instance (via B and also via B-F-E), and the query returns the last node of every path as variable v. (It doesn't actually process all possible paths up to the maximum depth of 10, but you could set the traversal option OPTIONS {uniqueEdges: "none"} to do so.)
It can help to return formatted traversal paths to better understand what is going on (i.e. how nodes are reached):
FOR v, e, p IN 1..10 ANY "node/A" links OPTIONS {uniqueEdges: "path"}
RETURN CONCAT_SEPARATOR(" - ", p.vertices[*]._key)
Result:
[
"A - B",
"A - B - C",
"A - B - C - D",
"A - B - C - E",
"A - B - C - E - F",
"A - B - C - E - F - B",
"A - B - F",
"A - B - F - E",
"A - B - F - E - C",
"A - B - F - E - C - D",
"A - B - F - E - C - B"
]
There is a cycle in the graph, but there can't be an infinite loop because the maximum depth is exceeded after 10 hops. But as you can see above, it doesn't even reach the depth of 10, it rather stops because the (default) option is to not follow edges twice per path (uniqueEdges: "path").
Anyway, this is not the desired result. A cheap trick would be to use RETURN DISTINCT, COLLECT or something like that to remove duplicates. But we are better off tweaking the traversal options, to not follow edges unnecessarily.
uniqueEdges: "global" would still include the B node twice, but uniqueVertices: "global" gives the desired result. In addition, bfs: true for breadth-first search can be used in this case. The difference is that the path to the F node is shorter (A-B-F instead of A-B-C-E-F). In general, the exact options you should use largely depend on the dataset and the questions you have.
There's one more problem to solve: the traversal does not include the start vertex (other than in p.vertices[0] for every path). This can easily be solved using ArangoDB 3.0 or later by setting the minimum depth to 0:
FOR v IN 0..10 ANY "node/A" links OPTIONS {uniqueVertices: "global"}
RETURN v._key
[ "A", "B", "C", "D", "E", "F" ]
To verify that all nodes from A through F are returned, regardless of the start vertex, we can issue the following test query:
FOR doc IN node
RETURN (
FOR v IN 0..10 ANY doc links OPTIONS {uniqueVertices: "global"}
SORT v._key
RETURN v._key
)
All sub-arrays should look the same. Remove the SORT operation if you want the node names returned in traversal order. Hope this helps =)

Check if Graph is Connected Upon Removal of Vertices

I would appreciate advice/algorithms for the following problem:
Consider a graph with V vertices connected by E edges (V, E <= 10^5). When a vertex is removed, all the edges connected to that vertex are removed. The vertices are labeled 1, 2, ..., V.
Input is given on E lines, and on each line there are two space-separated vertex numbers representing an edge between those two vertices. The next V lines are a permutation of 1, 2, ..., V, representing the order in which the vertices are removed. Output V lines stating if the graph is connected (i.e. there is a sequence of paths between every pair of vertices) at each step. V and E are known and are given as space-separated integers on the first line of input.
For example, consider the following example input (edges are undirected):
5 5
1 2
3 1
2 3
2 4
5 4
3
4
1
2
5
The first line indicates that there are 5 vertices and 5 edges. The next 5 lines describe the edges (which are undirected, i.e. an edge from 1 to 2 can also be taken from 2 to 1). The 5 lines after that give the order in which the vertices are removed.
For this example, we would get the output as follows:
When vertex 3 is removed, the graph is connected, since we can go from any of 1, 2, 4, 5 to any other of 1, 2, 4, 5. When vertex 4 is removed, the graph is disconnected because there are no connections out of vertex 5. When vertex 1 is removed the same problem exists. When vertex 2 is removed only 5 is left, so the graph is connected. When all vertices are removed the graph is connected.
I tried a naive recursive approach as follows to check if it is possible to go from a start vertex to an end vertex:
void dfs(int start, int curr, int end):
if (curr == 0): // start condition, i.e. curr not yet initialized
curr = start
if (curr == end):
return true
else:
for (int v : edges[curr]):
dfs(start, v, end)
return false
Checking at each step if it is possible to travel from all vertices A to all other vertices B using the above algorithm is far too slow (O(V^2 * E^V), algorithm should be ideally O(V log V), or maybe O(V^2) to run in about one second).
For each step store number of partition graph has. For initial graph it is done by doing full traversal (DFS or BFS) from initial vertex, than repeat traversal from vertex that is not yet covered. If number of partitions is <= 1, than graph is connected.
If removed vertex has degree 0, than number of partition is decreased by one.
If removed vertex has degree 1, than number of partitions stays the same.
If removed vertex has degree larger than 1, than number of partitions can be increased by max degree-1. That is checked by similar traversing from neighbours of removed vertex. Start from initial neighbour and find all neighours that are connected to it. Repeat traversing from not visited neigbour.

retrieve vertices with no linked edge in arangodb

What is the best way to retrieve all vertices that do not have an edge in a related edge_collection
I've tried to use the following code but it's got incredibly slow since arangodb 2.8 (It was not really fast in previous versions but round about 10 times faster as now). It takes more than 30 seconds on collection sizes of around 1000 edges and around 3000 vertices.
FOR v IN vertex_collection
FILTER LENGTH( EDGES(edge_collection, v._id, "outbound"))==0
RETURN v._id
...
update
...
After playing around a bit I came to the following query
LET vIDs = (FOR v IN vertex_collection
RETURN v._id)
LET vEdgesFrom = (FOR e IN edge_collection
FILTER e._from IN vIDs
RETURN e._from)
FOR v IN vertex_collection
FILTER v._id IN MINUS(vIDs, vEdgesFrom)
RETURN v._id
This one is much faster (around 0.05s) but still looks like some kind of work around (just thinking of more than one edge collections we need to query against).
So I'm still looking for the best method to find vertices having no edge in specific edge collections.
My sugestion was going to be similar - rather use joins than graph features.
FOR oneEdge IN edges
LET vertices=(FOR oneVertex IN vertices
FILTER oneEdge._from == oneVertex._id OR
oneEdge._to == oneVertex._id
RETURN 1)
FILTER LENGTH(vertices) < 2
RETURN {v: vertices, e: oneEdge}
to find all edges where one of _from and _to would point into nil, and then subsequently delete it.
Note the RETURN 1 which will reduce the amount of data passed up from the inner query.

Resources