Find the cross node for number of nodes in ArangoDB? - arangodb

I have a number of nodes connected through intermediate node of other type. Like on picture There are can be multiple middle nodes. I need to find all the middle nodes for a given number of nodes and sort it by number of links between my initial nodes. In my example given A, B, C, D it should return node E (4 links) folowing node F (3 links). Is this possible? If not may be it can be done using multiple requests? I was thinking about using SHORTEST_PATH function but seems it can only find path between nodes from the same collection?

Very nice question, it challenged the AQL part of my brain ;)
Good news: it is totally possible with only one query utilizing GRAPH_COMMON_NEIGHBORS and a portion of math.
Common neighbors will count for how many of your selected vertices a cross is the connecting component (taking into account ordering A-E-B is different from B-E-A) using combinatorics we end up having a*(a-1)=c many combinations, where c is comupted. We use p/q formula to identify a (the number of connected vertices given in your set).
If the type of vertex is encoded in an attribute of the vertex object
the resulting AQL looks like this:
FOR x in (
(
let nodes = ["nodes/A","nodes/B","nodes/C","nodes/D"]
for n in GRAPH_COMMON_NEIGHBORS("myGraph",nodes , nodes)
for f in VALUES(n)
for s in VALUES(f)
for candidate in s
filter candidate.type == "cross"
collect crosses = candidate._key into counter
return {crosses: crosses, connections: 0.5 + SQRT(0.25 + LENGTH(counter))}
)
)
sort x.connections DESC
return x
If you put the crosses in a different collection and filter by collection name the query will even get more efficient, we do not need to open any vertices that are not of type cross at all.
FOR x in (
(
let nodes = ["nodes/A","nodes/B","nodes/C","nodes/D"]
for n in GRAPH_COMMON_NEIGHBORS("myGraph",nodes, nodes,
{"vertexCollectionRestriction": "crosses"}, {"vertexCollectionRestriction": "crosses"})
for f in VALUES(n)
for s in VALUES(f)
for candidate in s
collect crosses = candidate._key into counter
return {crosses: crosses, connections: 0.5 + SQRT(0.25 + LENGTH(counter))}
)
)
sort x.connections DESC
return x
Both queries will yield the result on your dataset:
[
{
"crosses": "E",
"connections": 4
},
{
"crosses": "F",
"connections": 3
}
]

Related

Creating random directed edges between nodes in nodejs

Say I have a graph with several nodes. I need to design an algorithm which randomly creates directed edges between nodes while satisfying the following conditions:
each node has exactly one edge pointing to it
each node has exactly one edge pointing away from it
no node points to itself
For example, say my graph had three nodes, the following scenarios would be acceptable:
Node A points to B, B points to C, C points to A
Node A points to C, C points to B, B points to A
Does anyone know what the most efficient way of doing this would be? I'm using nodejs btw. For argument's sake, we can say that I am starting with an array containing the names of the nodes.
Thanks
lets define you have array of vertex: V = {v}; |V| = N, now we can shuffle array of vertex by using any random shuffle algorithm.
V = [v_1, v_2, v_3,..,v_n]
Now we can define N-1 edges E, where e[i] = (v[i] to v[i + 1]), and the last vertex will be (v[N-1] to v[0])

TSP / CPP variant - subtour constraint

I'm developing an optimization problem that is a variant on Traveling Salesman. In this case, you don't have to visit all the cities, there's a required start and end point, there's a min and max bound on the tour length, you can traverse each arc multiple times if you want, and you have a nonlinear objective function that is associated with the arcs traversed (and number of times you traverse each arc). Decision variables are integers, how many times you traverse each arc.
I've developed a nonlinear integer program in Pyomo and am getting results from the NEOS server. However I didn't put in subtour constraints and my results are two disconnected subtours.
I can find integer programming formulations of TSP that say how to formulate subtour constraints, but this is a little different from the standard TSP and I'm trying to figure out how to start. Any help that can be provided would be greatly appreciated.
EDIT: problem formulation
50 arcs , not exhaustive pairs between nodes. 50 Decision variables N_ab are integer >=0, corresponds to how many times you traverse from a to b. There is a length and profit associated with each N_ab . There are two constraints that the sum of length_ab * N_ab for all ab are between a min and max distance. I have a constraint that the sum of N_ab into each node is equal to the sum N_ab out of the node you can either not visit a node at all, or visit it multiple times. Objective function is nonlinear and related to the interaction between pairs of arcs (not relevant for subtour).
Subtours: looking at math.uwaterloo.ca/tsp/methods/opt/subtour.htm , the formulation isn't applicable since I am not required to visit all cities, and may not be able to. So for example, let's say I have 20 nodes and 50 arcs (all arcs length 10). Distance constraints are for a tour of exactly length 30, which means I can visit at most three nodes (start at A -> B -> C ->A = length 30). So I will not visit the other nodes at all. TSP subtour elimination would require that I have edges from node subgroup ABC to subgroup of nonvisited nodes - which isn't needed for my problem
Here is an approach that is adapted from the prize-collecting TSP (e.g., this paper). Let V be the set of all nodes. I am assuming V includes a depot node, call it node 1, that must be on the tour. (If not, you can probably add a dummy node that serves this role.)
Let x[i] be a decision variable that equals 1 if we visit node i at least once, and 0 otherwise. (You might already have such a decision variable in your model.)
Add these constraints, which define x[i]:
x[i] <= sum {j in V} N[i,j] for all i in V
M * x[i] >= N[i,j] for all i, j in V
In other words: x[i] cannot equal 1 if there are no edges coming out of node i, and x[i] must equal 1 if there are any edges coming out of node i.
(Here, N[i,j] is 1 if we go from i to j, and M is a sufficiently large number, perhaps equal to the maximum number of times you can traverse one edge.)
Here is the subtour-elimination constraint, defined for all subsets S of V such that S includes node 1, and for all nodes i in V \ S:
sum {j in S} (N[i,j] + N[j,i]) >= 2 * x[i]
In other words, if we visit node i, which is not in S, then there must be at least two edges into or out of S. (A subtour would violate this constraint for S equal to the nodes that are on the subtour that contains 1.)
We also need a constraint requiring node 1 to be on the tour:
x[1] = 1
I might be playing a little fast and loose with the directional indices, i.e., I'm not sure if your model sets N[i,j] = N[j,i] or something like that, but hopefully the idea is clear enough and you can modify my approach as necessary.

Find path following edges with greatest value in ArangoDB

Lets say, that in my graph I've got edges that have field called value. After selecting start vertex I would like to find path by always selecting the edge that has the highest value. Unfortunatly I can't figure out how to write proper query, is it possible in ArangoDB?
Hi i am unsure what you would like to achieve, there are two possible scenarios that i can imagine from your description:
First: Shortest Path
The use-case here is you know the starting vertex and the target vertex, and you want to find the shortest (or cheapest) path between those two.
The built in SHORTEST_PATH (https://docs.arangodb.com/3.1/AQL/Graphs/ShortestPath.html#shortest-path-in-aql) feature can serve it by defining the distance attribute in the options like this:
FOR v IN OUTBOUND #start TO #end ##edgeCollections OPTIONS {weightAttribute: "value", defaultWeight: 1}
RETURN v
This will give you all vertices on the path from start to end which has the lowest some of value attributes. If you need the "highest value" you could copy the value and save it again with 1/value in a different field, to find the path with the fewest edges having in total the highest sum of values
Second: Sorting of edges
The use case is you only have the starting vertex and want to get the connected vertices, ordered by the value on the edges. There you can simply combine the traversal statement with a simple sort. (https://docs.arangodb.com/3.1/AQL/Graphs/Traversals.html#graph-traversals-in-aql):
FOR v, e IN OUTBOUND #start ##edgeCollection
SORT e.value DESC
LIMIT 1 /* Only pick the highest one */
REUTRN {v: v, e: e}
Third use-case: Iterating several depth only using the highest value
The AQL in Use-case 2 can be chained up to an arbitrary depth which has to be known a-priori. So say you would like to iterate 3 steps only using the edge with highest value:
FOR v1, e1 IN OUTBOUND #start ##edgeCollection
SORT e1.value DESC
LIMIT 1 /* Only pick the highest one */
/* Depth 1 done. now depth 2*/
FOR v2, e2 IN OUTBOUND v1 ##edgeCollection
SORT e2.value DESC
LIMIT 1 /* Only pick the highest one */
FOR v3, e3 IN OUTBOUND v2 ##edgeCollection
SORT e3.value DESC
LIMIT 1 /* Only pick the highest one */
RETURN [v1,v2,v3]
Forth use-case:
The depth is not known a-priori, in this case pure AQL in the currently release version (3.1) cannot formulate this. It will be easier to use a Foxx service (https://docs.arangodb.com/3.1/Manual/Foxx/#foxx) using the traversal module (https://docs.arangodb.com/3.1/Manual/Graphs/Traversals/UsingTraversalObjects.html#getting-started) in JavaScript which is a bit more flexible, but can only be implemented in Javascript.

Spark Accumulator vs Count

I have a use case where I want to count types of elements in an RDD matching some filter.
e.g. RDD.filter(F1) and RDD.filter(!F1)
I have 2 options
Use accumulators: e.g.
LongAccumulator l1 = sparkContext.longAccumulator("Count1")
LongAccumulator l2 = sparkContext.longAccumulator("Count2")
RDD.forEachPartition(f -> {
if(F1) l1.add(1)
else l2.add(1)
});
Use Count
RDD.filter(F1).count(); RDD.filter(!F1).count()
One benefit of the first approach is that we only need to iterate data once (useful since my data set is 10s of TB)
What is the use of count if same affect can be achieved by using Accumulators ?
Major difference is that if your code will fail in transformation, then Accumulators will be updated and count() result not.
Other option is to use pure map-reduce:
val counts = rdd.map(x => (F1(x), 1)).reduceByKey(_ + _).collectAsMap()
Network cost should be also low as only few numbers will be sent. It creates pairs of (is F1(x) true/false, 1) and then sum all ones - it will give you number of items both F1(x) and !F1(x) in counts map

retrieve vertices with no linked edge in arangodb

What is the best way to retrieve all vertices that do not have an edge in a related edge_collection
I've tried to use the following code but it's got incredibly slow since arangodb 2.8 (It was not really fast in previous versions but round about 10 times faster as now). It takes more than 30 seconds on collection sizes of around 1000 edges and around 3000 vertices.
FOR v IN vertex_collection
FILTER LENGTH( EDGES(edge_collection, v._id, "outbound"))==0
RETURN v._id
...
update
...
After playing around a bit I came to the following query
LET vIDs = (FOR v IN vertex_collection
RETURN v._id)
LET vEdgesFrom = (FOR e IN edge_collection
FILTER e._from IN vIDs
RETURN e._from)
FOR v IN vertex_collection
FILTER v._id IN MINUS(vIDs, vEdgesFrom)
RETURN v._id
This one is much faster (around 0.05s) but still looks like some kind of work around (just thinking of more than one edge collections we need to query against).
So I'm still looking for the best method to find vertices having no edge in specific edge collections.
My sugestion was going to be similar - rather use joins than graph features.
FOR oneEdge IN edges
LET vertices=(FOR oneVertex IN vertices
FILTER oneEdge._from == oneVertex._id OR
oneEdge._to == oneVertex._id
RETURN 1)
FILTER LENGTH(vertices) < 2
RETURN {v: vertices, e: oneEdge}
to find all edges where one of _from and _to would point into nil, and then subsequently delete it.
Note the RETURN 1 which will reduce the amount of data passed up from the inner query.

Resources