I have the following graph structure. All vertexes are in the same collection, and all edges are in the same collection. From a particular start vertex (F), I want to return all the vertexes that are the result of going outwards once, then inwards once, so that I end up with, in the example, D and E.
Well after fooling with it for a while, this is what I came up with. Seems to work. Posting this in case someone else searches for a similar question.
FOR v IN 1..1 OUTBOUND "Vertex/F" edges
FOR vv IN 1..1 INBOUND v edges
FILTER vv._key != "F"
Collect uniqueKeys = vv._key
return uniqueKeys
The query take almost a millisecond for a small 8 vertex db, but I don't think I can do better.
Related
Say I have a graph with several nodes. I need to design an algorithm which randomly creates directed edges between nodes while satisfying the following conditions:
each node has exactly one edge pointing to it
each node has exactly one edge pointing away from it
no node points to itself
For example, say my graph had three nodes, the following scenarios would be acceptable:
Node A points to B, B points to C, C points to A
Node A points to C, C points to B, B points to A
Does anyone know what the most efficient way of doing this would be? I'm using nodejs btw. For argument's sake, we can say that I am starting with an array containing the names of the nodes.
Thanks
lets define you have array of vertex: V = {v}; |V| = N, now we can shuffle array of vertex by using any random shuffle algorithm.
V = [v_1, v_2, v_3,..,v_n]
Now we can define N-1 edges E, where e[i] = (v[i] to v[i + 1]), and the last vertex will be (v[N-1] to v[0])
In my situation I have territory objects. Each territory knows what other territories they are connected to through an array. Here is an visualization of said territories as they would appear on a map:
If you were to map out the connections on a graph, they would look like this:
So say I have a unit stationed in territory [b] and I want to move it to territory [e], I'm looking for a method of searching through this graph and returning a final array that represents the path my unit in territory [b] must take. In this scenario, I would be looking for it to return
[b, e].
If I wanted to go from territory [a] to territory [f] then it would return:
[a, b, e, f].
I would love examples, but even just posts pointing me in the right direction are appreciated. Thanks in advance! :)
Have you heard of Breadth-First Search (BFS) before?
Basically, you simply put your initial territory, "a" in your example, into an otherwise empty queue Q
The second data structure you need is an array of booleans with as many elements as you have territories, in this case 9. It helps with remembering which territories we have already checked. We call it V (for "visited"). It needs to be initialized as follows: All elements equal false except the one corresponding to the initial square. That is for all territories t, we have V[t] = false, but V[a] = true because "a" is already in the queue.
The third and final data structure you need is an array to store the parent nodes (i.e. which node we are coming from). It also has as many elements as you have territories. We call it P (for "parent") and every element points to itself initially, that is for all t in P, set P[t] = t.
Then, it is quite simple:
while Q is not empty:
t = front element in the queue (remove it also from Q)
if t = f we can break from while loop //because we have found the goal
for all neighbors s of t for which V[s] = false:
add s into the back of Q //we have never explored the territory s yet as V[s] = false
set V[s] = true //we do not have to visit s again in the future
//we found s because there is a connection from t to s
//therefore, we need to remember that in s we are coming from the node t
//to do this we simply set the parent of s to t:
P[s] = t
How do you read the solution now?
Simply check the parent of f, then the parent of that and then the parent of that and so on until you find the beginning. You will know what the beginning is once you have found an element which has itself as the parent (remember that we let them point to itself initially) or you can also just compare it to a.
Basically, you just need a empty list L, add f into it and then
while f != a:
f = P[f]
add f into L
Note that this obviously fails if there exists no path because f will never equal a.
Therefore, this:
while f != P[f]:
f = P[f]
add f into L
is a bit nicer. It exploits the fact that initially all territories point to themselves in P.
If you try this on paper with you example above, then you will end up with
L = [f, e, b, a]
If you simply reverse this list, then you have what you wanted.
I don't know C#, so I didn't bother to use C# syntax. I assume that you know it is easiest to index your territories with integers and then use an array to access them.
You will realize quite quickly why this works. It's called breadth-first search because you consider only neighbors of the territory "a" first, with trivially shortest path to them (only 1 edge) and only once you processed all these, then territories that are further away will appear in the queue (only 2 edges from the start now) and so on. This is why we use a queue for this task and not something like a stack.
Also, this is linear in the number of territories and edges because you only need to look at every territory and edge (at most) once (though edges from both directions).
The algorithm I have given to you is basically the same as https://en.wikipedia.org/wiki/Breadth-first_search with only the P data structure added to keep track where you are coming from to be able to figure out the path taken.
I would like to ask how to best traverse graph and return only subgraph based on complex condition which must be satisfied by all nodes from root to leaves.
In other words, I need some mechanism such that when the condition on any intermediate level is not met, traversal is stopped (none nested node is processed and returned to output)
Let's say I have the following graph:
A -> B -> C (active=false) -> D
where I deactivated node C (note the flag active=false means that all subgraph is deactivated including C and D).
According to documentation I can easily construct such filter via filtering on path, wildcard [*] and ALL keyword, which also stops traversing when condition on C is not met. With simple condition this works great:
for v,e,p in 1..100 outbound 'test/A' graph 'testGraph'
filter p.vertices[*].active ALL != false return v
// returns A, B
Now I have another graph where each node is either fixed or has some validity timespan (from, to) attributes:
A (type="fixed") -> B (from=2,to=3) -> C (from=1, to=5) -> D (type="fixed")
Now I would like to return only subgraph where all (intermediate) nodes are either fixed or satisfy time condition from>=2 and to<=3. I need that A,B are returned.
for v,e,p in 0..100 outbound 'test/A' graph 'testGraph'
filter p.vertices[*].type ALL == 'fixed' or
(p.vertices[*].from ALL >= 2 and p.vertices[*].from ALL <= 3)
return v
However this is obviously wrong (and returns only A), logically I need to add ALL keyword at the beginning of the condition (I need that the condition is applied on each level and when the condition is not met, traversing is stopped), however this is not supported:
filter ALL(p.vertices[*].type == 'fixed' or
(p.vertices[*].from >= 2 and p.vertices[*].from <= 3)
Classical approach via filtering on vertices does not meet my needs, because it doesn't stop traversing when the condition is not met, i.e. the following returns A,B,D (C is skipped but I also need to prune subtree of C such that D is not on output):
for v,e,p in 0..100 outbound 'test/A' graph 'testGraph'
filter v.type == 'fixed' or
(v.from >= 2 and v.from <= 3)
return v
Any ideas? Thank you.
The AQL PRUNE feature was introduced in ArangoDB versions 3.4.5 and 3.5.0. Using the AQL keyword PRUNE the traversing is stopped when a condition on the vertex, the edge, the path or any variable defined before before is met.
Pruning is the easiest variant to formulate conditions to reduce the amount of data to be checked during a search. So it allows to improve query performance and reduces the amount of overhead generated by the query. Pruning can be executed on the vertex, the edge and the path and any variable defined before.
This video tutorial shows the difference between FILTER and the new PRUNE with a hands-on example. You can find more details in the documentation.
What is the best way to retrieve all vertices that do not have an edge in a related edge_collection
I've tried to use the following code but it's got incredibly slow since arangodb 2.8 (It was not really fast in previous versions but round about 10 times faster as now). It takes more than 30 seconds on collection sizes of around 1000 edges and around 3000 vertices.
FOR v IN vertex_collection
FILTER LENGTH( EDGES(edge_collection, v._id, "outbound"))==0
RETURN v._id
...
update
...
After playing around a bit I came to the following query
LET vIDs = (FOR v IN vertex_collection
RETURN v._id)
LET vEdgesFrom = (FOR e IN edge_collection
FILTER e._from IN vIDs
RETURN e._from)
FOR v IN vertex_collection
FILTER v._id IN MINUS(vIDs, vEdgesFrom)
RETURN v._id
This one is much faster (around 0.05s) but still looks like some kind of work around (just thinking of more than one edge collections we need to query against).
So I'm still looking for the best method to find vertices having no edge in specific edge collections.
My sugestion was going to be similar - rather use joins than graph features.
FOR oneEdge IN edges
LET vertices=(FOR oneVertex IN vertices
FILTER oneEdge._from == oneVertex._id OR
oneEdge._to == oneVertex._id
RETURN 1)
FILTER LENGTH(vertices) < 2
RETURN {v: vertices, e: oneEdge}
to find all edges where one of _from and _to would point into nil, and then subsequently delete it.
Note the RETURN 1 which will reduce the amount of data passed up from the inner query.
I have a number of nodes connected through intermediate node of other type. Like on picture There are can be multiple middle nodes. I need to find all the middle nodes for a given number of nodes and sort it by number of links between my initial nodes. In my example given A, B, C, D it should return node E (4 links) folowing node F (3 links). Is this possible? If not may be it can be done using multiple requests? I was thinking about using SHORTEST_PATH function but seems it can only find path between nodes from the same collection?
Very nice question, it challenged the AQL part of my brain ;)
Good news: it is totally possible with only one query utilizing GRAPH_COMMON_NEIGHBORS and a portion of math.
Common neighbors will count for how many of your selected vertices a cross is the connecting component (taking into account ordering A-E-B is different from B-E-A) using combinatorics we end up having a*(a-1)=c many combinations, where c is comupted. We use p/q formula to identify a (the number of connected vertices given in your set).
If the type of vertex is encoded in an attribute of the vertex object
the resulting AQL looks like this:
FOR x in (
(
let nodes = ["nodes/A","nodes/B","nodes/C","nodes/D"]
for n in GRAPH_COMMON_NEIGHBORS("myGraph",nodes , nodes)
for f in VALUES(n)
for s in VALUES(f)
for candidate in s
filter candidate.type == "cross"
collect crosses = candidate._key into counter
return {crosses: crosses, connections: 0.5 + SQRT(0.25 + LENGTH(counter))}
)
)
sort x.connections DESC
return x
If you put the crosses in a different collection and filter by collection name the query will even get more efficient, we do not need to open any vertices that are not of type cross at all.
FOR x in (
(
let nodes = ["nodes/A","nodes/B","nodes/C","nodes/D"]
for n in GRAPH_COMMON_NEIGHBORS("myGraph",nodes, nodes,
{"vertexCollectionRestriction": "crosses"}, {"vertexCollectionRestriction": "crosses"})
for f in VALUES(n)
for s in VALUES(f)
for candidate in s
collect crosses = candidate._key into counter
return {crosses: crosses, connections: 0.5 + SQRT(0.25 + LENGTH(counter))}
)
)
sort x.connections DESC
return x
Both queries will yield the result on your dataset:
[
{
"crosses": "E",
"connections": 4
},
{
"crosses": "F",
"connections": 3
}
]