What is the best way to retrieve all vertices that do not have an edge in a related edge_collection
I've tried to use the following code but it's got incredibly slow since arangodb 2.8 (It was not really fast in previous versions but round about 10 times faster as now). It takes more than 30 seconds on collection sizes of around 1000 edges and around 3000 vertices.
FOR v IN vertex_collection
FILTER LENGTH( EDGES(edge_collection, v._id, "outbound"))==0
RETURN v._id
...
update
...
After playing around a bit I came to the following query
LET vIDs = (FOR v IN vertex_collection
RETURN v._id)
LET vEdgesFrom = (FOR e IN edge_collection
FILTER e._from IN vIDs
RETURN e._from)
FOR v IN vertex_collection
FILTER v._id IN MINUS(vIDs, vEdgesFrom)
RETURN v._id
This one is much faster (around 0.05s) but still looks like some kind of work around (just thinking of more than one edge collections we need to query against).
So I'm still looking for the best method to find vertices having no edge in specific edge collections.
My sugestion was going to be similar - rather use joins than graph features.
FOR oneEdge IN edges
LET vertices=(FOR oneVertex IN vertices
FILTER oneEdge._from == oneVertex._id OR
oneEdge._to == oneVertex._id
RETURN 1)
FILTER LENGTH(vertices) < 2
RETURN {v: vertices, e: oneEdge}
to find all edges where one of _from and _to would point into nil, and then subsequently delete it.
Note the RETURN 1 which will reduce the amount of data passed up from the inner query.
Related
For this kind of problem I thought it would have been better make some BFS-like implementation. I don't know why, after some running time testing, it came out a plot that resembles an exponential function.
So I began to think of my code's correctness: is it really efficient? Can you help me to make a running time analysis for a good algorithm?
I've plotted the log in base 1.5 for the x-axis (since in the code I use a list of the first 30 powers of 1.5 as number of vertices input in a random graph generetor). Still looks exponential...
def bfs_short(graph, source, target):
visited = set()
queue = collections.deque([source])
d={}
d[source]=0
while queue:
u = queue.pop()
if u==target:
return d[target]
if u not in visited:
visited.add(u)
for w in graph[u]:
if w not in visited:
queue.appendleft(w)
d[w]=d[u]+1
The thing is... I didn't posted also the benching input trials which also may cause problems but first of all I want to be sure that the code works fine... solving the problems related to testing is my final purpose.
A problem in your code is that your queue does not take in account the distance to the origin in order to prioritize the next vertex it will pop.
Also, collections.deque is not a priority queue, and thus will not give you the closest unvisited vertex seen so far when you pop an element from it.
This should do it, using heapq, a built-in heap implementation:
import heapq
def bfs_short(graph, source, target):
visited = set()
queue = [(0, source)]
heapq.heapify(queue)
while not queue:
dist, vertex = heapq.heappop(queue)
if vertex == target:
return dist
if vertex not in visited:
visited.add(vertex)
for neighbor in graph[vertex]:
if neighbor not in visited:
heapq.heappush(queue, (dist + 1, neighbor))
In this version, pairs are pushed in the queue. To evaluate and compare tuples, Python tries to compare their first element, then the second in case of equality, and on and on.
So by pushing dist(vertex, origin) as first member of the tuple, the smallest couple (dist, vertex) in the heap will also be the closest to the origin vertex.
I have the following graph structure. All vertexes are in the same collection, and all edges are in the same collection. From a particular start vertex (F), I want to return all the vertexes that are the result of going outwards once, then inwards once, so that I end up with, in the example, D and E.
Well after fooling with it for a while, this is what I came up with. Seems to work. Posting this in case someone else searches for a similar question.
FOR v IN 1..1 OUTBOUND "Vertex/F" edges
FOR vv IN 1..1 INBOUND v edges
FILTER vv._key != "F"
Collect uniqueKeys = vv._key
return uniqueKeys
The query take almost a millisecond for a small 8 vertex db, but I don't think I can do better.
I would like to ask how to best traverse graph and return only subgraph based on complex condition which must be satisfied by all nodes from root to leaves.
In other words, I need some mechanism such that when the condition on any intermediate level is not met, traversal is stopped (none nested node is processed and returned to output)
Let's say I have the following graph:
A -> B -> C (active=false) -> D
where I deactivated node C (note the flag active=false means that all subgraph is deactivated including C and D).
According to documentation I can easily construct such filter via filtering on path, wildcard [*] and ALL keyword, which also stops traversing when condition on C is not met. With simple condition this works great:
for v,e,p in 1..100 outbound 'test/A' graph 'testGraph'
filter p.vertices[*].active ALL != false return v
// returns A, B
Now I have another graph where each node is either fixed or has some validity timespan (from, to) attributes:
A (type="fixed") -> B (from=2,to=3) -> C (from=1, to=5) -> D (type="fixed")
Now I would like to return only subgraph where all (intermediate) nodes are either fixed or satisfy time condition from>=2 and to<=3. I need that A,B are returned.
for v,e,p in 0..100 outbound 'test/A' graph 'testGraph'
filter p.vertices[*].type ALL == 'fixed' or
(p.vertices[*].from ALL >= 2 and p.vertices[*].from ALL <= 3)
return v
However this is obviously wrong (and returns only A), logically I need to add ALL keyword at the beginning of the condition (I need that the condition is applied on each level and when the condition is not met, traversing is stopped), however this is not supported:
filter ALL(p.vertices[*].type == 'fixed' or
(p.vertices[*].from >= 2 and p.vertices[*].from <= 3)
Classical approach via filtering on vertices does not meet my needs, because it doesn't stop traversing when the condition is not met, i.e. the following returns A,B,D (C is skipped but I also need to prune subtree of C such that D is not on output):
for v,e,p in 0..100 outbound 'test/A' graph 'testGraph'
filter v.type == 'fixed' or
(v.from >= 2 and v.from <= 3)
return v
Any ideas? Thank you.
The AQL PRUNE feature was introduced in ArangoDB versions 3.4.5 and 3.5.0. Using the AQL keyword PRUNE the traversing is stopped when a condition on the vertex, the edge, the path or any variable defined before before is met.
Pruning is the easiest variant to formulate conditions to reduce the amount of data to be checked during a search. So it allows to improve query performance and reduces the amount of overhead generated by the query. Pruning can be executed on the vertex, the edge and the path and any variable defined before.
This video tutorial shows the difference between FILTER and the new PRUNE with a hands-on example. You can find more details in the documentation.
I have a number of nodes connected through intermediate node of other type. Like on picture There are can be multiple middle nodes. I need to find all the middle nodes for a given number of nodes and sort it by number of links between my initial nodes. In my example given A, B, C, D it should return node E (4 links) folowing node F (3 links). Is this possible? If not may be it can be done using multiple requests? I was thinking about using SHORTEST_PATH function but seems it can only find path between nodes from the same collection?
Very nice question, it challenged the AQL part of my brain ;)
Good news: it is totally possible with only one query utilizing GRAPH_COMMON_NEIGHBORS and a portion of math.
Common neighbors will count for how many of your selected vertices a cross is the connecting component (taking into account ordering A-E-B is different from B-E-A) using combinatorics we end up having a*(a-1)=c many combinations, where c is comupted. We use p/q formula to identify a (the number of connected vertices given in your set).
If the type of vertex is encoded in an attribute of the vertex object
the resulting AQL looks like this:
FOR x in (
(
let nodes = ["nodes/A","nodes/B","nodes/C","nodes/D"]
for n in GRAPH_COMMON_NEIGHBORS("myGraph",nodes , nodes)
for f in VALUES(n)
for s in VALUES(f)
for candidate in s
filter candidate.type == "cross"
collect crosses = candidate._key into counter
return {crosses: crosses, connections: 0.5 + SQRT(0.25 + LENGTH(counter))}
)
)
sort x.connections DESC
return x
If you put the crosses in a different collection and filter by collection name the query will even get more efficient, we do not need to open any vertices that are not of type cross at all.
FOR x in (
(
let nodes = ["nodes/A","nodes/B","nodes/C","nodes/D"]
for n in GRAPH_COMMON_NEIGHBORS("myGraph",nodes, nodes,
{"vertexCollectionRestriction": "crosses"}, {"vertexCollectionRestriction": "crosses"})
for f in VALUES(n)
for s in VALUES(f)
for candidate in s
collect crosses = candidate._key into counter
return {crosses: crosses, connections: 0.5 + SQRT(0.25 + LENGTH(counter))}
)
)
sort x.connections DESC
return x
Both queries will yield the result on your dataset:
[
{
"crosses": "E",
"connections": 4
},
{
"crosses": "F",
"connections": 3
}
]
I'm developping a multiplayer game with node.js. Every second I get the coordinates (X, Y, Z) of every player. How can I have, for each player a list of all players located closer than a given distance from him ?
Any idea to avoid a O(n²) calculation?
You are not looking for clustering algorithms.
Instead, you are looking for a database index that supports radius queries.
Examples:
R*-tree
kd-tree
M-tree
Gridfile
Octree (for 3d, quadtree for 2d)
Any of these should do the trick, and yield an O(n log n) performance theoretically. In practise, it's not as easy as this. If all your objects are really close, "closer than a given coordinate" may mean every object, i.e. O(n^2).
What you are looking for is a quadtree in 3 dimensions, i.e. an octree. An octree is basically the same as the binary tree, but instead of two children per node, it has 2^D = 2^3 = 8 children per node, where D is the dimension.
For example, imagine a cube. In order to create the next level of the root, you actually have every node representing the 8 sub-cubes inside the cube and so on.
This tree will yield fast lookups but careful not to use it for more dimensions. I had built a polymorphic quadtree and wouldn't go to more than 8-10 dimensions, because it was becoming too flat.
The other approach would be the kd-tree, where actually you halve the dataset (the players) at every step.
You could use a library that provides nearest neighbour searching.
I'm answering my own question because I have the answer now. Thanks to G. Samaras and Anony-Mousse:
I use a kd-tree algorithm:
First I build the tree with all the players
Then for each player I calculate the list of all the players within given range arround this player
This is very fast and easy with the npm module kdtree: https://www.npmjs.org/package/kdtree
var kd = require('kdtree');
var tree = new kd.KDTree(3); // A new tree for 3-dimensional points
var players = loadPlayersPosition(); // players is an array containing all the positions
for (var p in players){ //let's build the tree
tree.insert(players[p].x, players[p].y, players[p].z, players[p].username);
}
nearest = [];
for (var p in players){ //let's look for neighboors
var RANGE = 1000; //1km range
close = tree.nearestRange(players[p].x, players[p].y, players[p].z, RANGE);
nearest.push(close);
}
It returns nearest that is an array conataining for each player all his neighboors within a range of 1000m. I made some tests on my PC with 100,000 simulated players. It takes only 500 ms to build the tree and another 500 ms to find the nearest neigboors pairs. I find it very fast for such a big number of players.
bonus: if you need to do this with latitude and longitude instead of x, y, z, just convert lat, lon to cartesian x, y z, because for short distances chord distance on a sphere ~ great circle distance