Find path following edges with greatest value in ArangoDB - arangodb

Lets say, that in my graph I've got edges that have field called value. After selecting start vertex I would like to find path by always selecting the edge that has the highest value. Unfortunatly I can't figure out how to write proper query, is it possible in ArangoDB?

Hi i am unsure what you would like to achieve, there are two possible scenarios that i can imagine from your description:
First: Shortest Path
The use-case here is you know the starting vertex and the target vertex, and you want to find the shortest (or cheapest) path between those two.
The built in SHORTEST_PATH (https://docs.arangodb.com/3.1/AQL/Graphs/ShortestPath.html#shortest-path-in-aql) feature can serve it by defining the distance attribute in the options like this:
FOR v IN OUTBOUND #start TO #end ##edgeCollections OPTIONS {weightAttribute: "value", defaultWeight: 1}
RETURN v
This will give you all vertices on the path from start to end which has the lowest some of value attributes. If you need the "highest value" you could copy the value and save it again with 1/value in a different field, to find the path with the fewest edges having in total the highest sum of values
Second: Sorting of edges
The use case is you only have the starting vertex and want to get the connected vertices, ordered by the value on the edges. There you can simply combine the traversal statement with a simple sort. (https://docs.arangodb.com/3.1/AQL/Graphs/Traversals.html#graph-traversals-in-aql):
FOR v, e IN OUTBOUND #start ##edgeCollection
SORT e.value DESC
LIMIT 1 /* Only pick the highest one */
REUTRN {v: v, e: e}
Third use-case: Iterating several depth only using the highest value
The AQL in Use-case 2 can be chained up to an arbitrary depth which has to be known a-priori. So say you would like to iterate 3 steps only using the edge with highest value:
FOR v1, e1 IN OUTBOUND #start ##edgeCollection
SORT e1.value DESC
LIMIT 1 /* Only pick the highest one */
/* Depth 1 done. now depth 2*/
FOR v2, e2 IN OUTBOUND v1 ##edgeCollection
SORT e2.value DESC
LIMIT 1 /* Only pick the highest one */
FOR v3, e3 IN OUTBOUND v2 ##edgeCollection
SORT e3.value DESC
LIMIT 1 /* Only pick the highest one */
RETURN [v1,v2,v3]
Forth use-case:
The depth is not known a-priori, in this case pure AQL in the currently release version (3.1) cannot formulate this. It will be easier to use a Foxx service (https://docs.arangodb.com/3.1/Manual/Foxx/#foxx) using the traversal module (https://docs.arangodb.com/3.1/Manual/Graphs/Traversals/UsingTraversalObjects.html#getting-started) in JavaScript which is a bit more flexible, but can only be implemented in Javascript.

Related

Reaching nth Stair

total number of ways to reach the nth floor with following types of moves:
Type 1 in a single move you can move from i to i+1 floor – you can use the this move any number of times
Type 2 in a single move you can move from i to i+2 floor – you can use this move any number of times
Type 3 in a single move you can move from i to i+3 floor – but you can use this move at most k times
i know how to reach nth floor by following step 1 ,step 2, step 3 any number of times using dp like dp[i]=dp[i-1]+dp[i-2]+dp[i-3].i am stucking in the condition of Type 3 movement with atmost k times.
someone tell me the approach here.
While modeling any recursion or dynamic programming problem, it is important to identify the goal, constraints, states, state function, state transitions, possible state variables and initial condition aka base state. Using this information we should try to come up with a recurrence relation.
In our current problem:
Goal: Our goal here is to somehow calculate number of ways to reach floor n while beginning from floor 0.
Constraints: We can move from floor i to i+3 at most K times. We name it as a special move. So, one can perform this special move at most K times.
State: In this problem, our situation of being at a floor could be one way to model a state. The exact situation can be defined by the state variables.
State variables: State variables are properties of the state and are important to identify a state uniquely. Being at a floor i alone is not enough in itself as we also have a constraint K. So to identify a state uniquely we want to have 2 state variables: i indicating floor ranging between 0..n and k indicating number of special move used out of K (capital K).
State functions: In our current problem, we are concerned with finding number of ways to reach a floor i from floor 0. We only need to define one function number_of_ways associated with corresponding state to describe the problem. Depending on problem, we may need to define more state functions.
State Transitions: Here we identify how can we transition between states. We can come freely to floor i from floor i-1 and floor i-2 without consuming our special move. We can only come to floor i from floor i-3 while consuming a special move, if i >=3 and special moves used so far k < K.
In other words, possible state transitions are:
state[i,k] <== state[i-1,k] // doesn't consume special move k
state[i,k] <== state[i-2,k] // doesn't consume special move k
state[i,k+1] <== state[i-3, k] if only k < K and i >= 3
We should now be able to form following recurrence relation using above information. While coming up with a recurrence relation, we must ensure that all the previous states needed for computation of current state are computed first. We can ensure the order by computing our states in the topological order of directed acyclic graph (DAG) formed by defined states as its vertices and possible transitions as directed edges. It is important to note that it is only possible to have such ordering if the directed graph formed by defined states is acyclic, otherwise we need to rethink if the states are correctly defined uniquely by its state variables.
Recurrence Relation:
number_of_ways[i,k] = ((number_of_ways[i-1,k] if i >= 1 else 0)+
(number_of_ways[i-2,k] if i >= 2 else 0) +
(number_of_ways[i-3,k-1] if i >= 3 and k < K else 0)
)
Base cases:
Base cases or solutions to initial states kickstart our recurrence relation and are sufficient to compute solutions of remaining states. These are usually trivial cases or smallest subproblems that can be solved without recurrence relation.
We can have as many base conditions as we require and there is no specific limit. Ideally we would want to have a minimal set of base conditions, enough to compute solutions of all remaining states. For the current problem, after initializing all not computed solutions so far as 0,
number_of_ways[0, 0] = 1
number_of_ways[0,k] = 0 where 0 < k <= K
Our required final answer will be sum(number_of_ways[n,k], for all 0<=k<=K).
You can use two-dimensional dynamic programming:
dp[i,j] is the solution value when exactly j Type-3 steps are used. Then
dp[i,j]=dp[i-1,j]+dp[i-2,j]+dp[i-3,j-1], and the initial values are dp[0,0]=0, dp[1,0]=1, and dp[3*m,m]=m for m<=k. You can build up first the d[i,0] values, then the d[i,1] values, etc. Or you can do a different order, as long as all necessary values are already computed.
Following #LaszloLadanyi approach ,below is the code snippet in python
def solve(self, n, k):
dp=[[0 for i in range(k+1)]for _ in range(n+1)]
dp[0][0]=1
for j in range(k+1):
for i in range(1,n+1):
dp[i][j]+=dp[i-1][j]
if i>1:
dp[i][j]+=dp[i-2][j]
if i>2 and j>0:
dp[i][j]+=dp[i-3][j-1]
return sum(dp[n])

"unique" crossover for genetic algorithm - TSP

I am creating a Genetic Algorithm to solve the Traveling Salesman Problem.
Currently, two 2D lists represent the two parents that need to be crossed:
path_1 = np.shuffle(np.arange(12).reshape(6, 2))
path_2 = np.arange(12).reshape(6,2)
Suppose each element in the list represents an (x, y) coordinate on a cartesian plane, and the 2D list represents the path that the "traveling salesman" must take (from index 0 to index -1).
Since the TSP requires that all points are included in a path, the resulting child of this crossover must have no duplicate points.
I have little idea as to how I can make such crossover and have the resulting child representative of both parents.
You need to use an ordered crossover operator, like OX1.
OX1 is a fairly simple permutation crossover.
Basically, a swath of consecutive alleles from parent 1 drops down,
and remaining values are placed in the child in the order which they
appear in parent 2.
I used to run TSP with these operators:
Crossover: Ordered Crossver (OX1).
Mutation: Reverse Sequence Mutation (RSM)
Selection: Roulette Wheel Selection
You can do something like this,
Choose half (or any random number between 0 to (length - 1)) coordinates from one parent using any approach, lets say where i % 2 == 0.
These can be positioned into the child using multiple approaches: either randomly, or all in the starting (or ending), or alternate position.
Now the remaining coordinated need to come from the 2nd parent for which you can traverse in the 2nd parent and if the coordinate is not chosen add it in the empty spaces.
For example,
I am choosing even positioned coordinated from parent 1 and putting it in even position indices in the child and then traversing in parent 2 to put the remaining coordinated in the odd position indices in the child.
def crossover(p1, p2, number_of_cities):
chk = {}
for i in range(number_of_cities):
chk[i] = 0
child = [-1] * number_of_cities
for x in range(len(p1)):
if x % 2 == 0:
child[x] = p1[x]
chk[p1[x]] = 1
y = 1
for x in range(len(p2)):
if chk[p2[x]] == 0:
child[y] = p2[x]
y += 2
return child
This approach preserves the order of cities visited from both parents.
Also since it is not symmetric p1 and p2 can be switched to give 2 children and the better (or both) can be chosen.

Searching through a multi-branch graph and returning a path [C#]

In my situation I have territory objects. Each territory knows what other territories they are connected to through an array. Here is an visualization of said territories as they would appear on a map:
If you were to map out the connections on a graph, they would look like this:
So say I have a unit stationed in territory [b] and I want to move it to territory [e], I'm looking for a method of searching through this graph and returning a final array that represents the path my unit in territory [b] must take. In this scenario, I would be looking for it to return
[b, e].
If I wanted to go from territory [a] to territory [f] then it would return:
[a, b, e, f].
I would love examples, but even just posts pointing me in the right direction are appreciated. Thanks in advance! :)
Have you heard of Breadth-First Search (BFS) before?
Basically, you simply put your initial territory, "a" in your example, into an otherwise empty queue Q
The second data structure you need is an array of booleans with as many elements as you have territories, in this case 9. It helps with remembering which territories we have already checked. We call it V (for "visited"). It needs to be initialized as follows: All elements equal false except the one corresponding to the initial square. That is for all territories t, we have V[t] = false, but V[a] = true because "a" is already in the queue.
The third and final data structure you need is an array to store the parent nodes (i.e. which node we are coming from). It also has as many elements as you have territories. We call it P (for "parent") and every element points to itself initially, that is for all t in P, set P[t] = t.
Then, it is quite simple:
while Q is not empty:
t = front element in the queue (remove it also from Q)
if t = f we can break from while loop //because we have found the goal
for all neighbors s of t for which V[s] = false:
add s into the back of Q //we have never explored the territory s yet as V[s] = false
set V[s] = true //we do not have to visit s again in the future
//we found s because there is a connection from t to s
//therefore, we need to remember that in s we are coming from the node t
//to do this we simply set the parent of s to t:
P[s] = t
How do you read the solution now?
Simply check the parent of f, then the parent of that and then the parent of that and so on until you find the beginning. You will know what the beginning is once you have found an element which has itself as the parent (remember that we let them point to itself initially) or you can also just compare it to a.
Basically, you just need a empty list L, add f into it and then
while f != a:
f = P[f]
add f into L
Note that this obviously fails if there exists no path because f will never equal a.
Therefore, this:
while f != P[f]:
f = P[f]
add f into L
is a bit nicer. It exploits the fact that initially all territories point to themselves in P.
If you try this on paper with you example above, then you will end up with
L = [f, e, b, a]
If you simply reverse this list, then you have what you wanted.
I don't know C#, so I didn't bother to use C# syntax. I assume that you know it is easiest to index your territories with integers and then use an array to access them.
You will realize quite quickly why this works. It's called breadth-first search because you consider only neighbors of the territory "a" first, with trivially shortest path to them (only 1 edge) and only once you processed all these, then territories that are further away will appear in the queue (only 2 edges from the start now) and so on. This is why we use a queue for this task and not something like a stack.
Also, this is linear in the number of territories and edges because you only need to look at every territory and edge (at most) once (though edges from both directions).
The algorithm I have given to you is basically the same as https://en.wikipedia.org/wiki/Breadth-first_search with only the P data structure added to keep track where you are coming from to be able to figure out the path taken.

HyperLogLog intersection: why not use min?

When doing a union between two compatible HyperLogLog objects, you can just take the maximum bucket to do a lossless union that doesn't introduce any new error:
Union.Bucket[i] = Max(A.Bucket[i], B.Bucket[i])
When doing an intersection though, you have to use the inclusion-exclusion principle:
IntersectionCountEstimate = A.CountEstimate() + B.CountEstimate() - Union.CountEstimate()
Why is it that using the minimum bucket value doesn't work as an effective intersection?
Intersection.Bucket[i] = Min(A.Bucket[i], B.Bucket[i])
The cause is that the relationship between two instances of the HyperLogLog statistic is not very intuitive:
Consider two corresponding buckets A[i] and B[i] from separate HyperLogLog structures A and B (which have the same number of buckets and use the same hash function), and for simplicity's sake assume the data in A and in B are independently drawn from the same distribution. Let's assume we first draw all the elements for A, and only then draw elements for B.
For every element we observe reaching B[i], what is the probability that is it in the intersection of A and B, i.e. what is the probability that it is already in A[i]? Well that depends - how "full" is A[i]? If A[i] is completely "full" (i.e., A[i] "contains" ALL the elements from the distribution which can reach A[i]), then the probability is 1. In that case, the cardinality of the intersection of A[i] and B[i] would indeed be the cardinality of B[i]. However, it is almost NEVER the case that A[i] is "full" - so the intersection is MUCH SMALLER than the cardinality of B[i].

Find the cross node for number of nodes in ArangoDB?

I have a number of nodes connected through intermediate node of other type. Like on picture There are can be multiple middle nodes. I need to find all the middle nodes for a given number of nodes and sort it by number of links between my initial nodes. In my example given A, B, C, D it should return node E (4 links) folowing node F (3 links). Is this possible? If not may be it can be done using multiple requests? I was thinking about using SHORTEST_PATH function but seems it can only find path between nodes from the same collection?
Very nice question, it challenged the AQL part of my brain ;)
Good news: it is totally possible with only one query utilizing GRAPH_COMMON_NEIGHBORS and a portion of math.
Common neighbors will count for how many of your selected vertices a cross is the connecting component (taking into account ordering A-E-B is different from B-E-A) using combinatorics we end up having a*(a-1)=c many combinations, where c is comupted. We use p/q formula to identify a (the number of connected vertices given in your set).
If the type of vertex is encoded in an attribute of the vertex object
the resulting AQL looks like this:
FOR x in (
(
let nodes = ["nodes/A","nodes/B","nodes/C","nodes/D"]
for n in GRAPH_COMMON_NEIGHBORS("myGraph",nodes , nodes)
for f in VALUES(n)
for s in VALUES(f)
for candidate in s
filter candidate.type == "cross"
collect crosses = candidate._key into counter
return {crosses: crosses, connections: 0.5 + SQRT(0.25 + LENGTH(counter))}
)
)
sort x.connections DESC
return x
If you put the crosses in a different collection and filter by collection name the query will even get more efficient, we do not need to open any vertices that are not of type cross at all.
FOR x in (
(
let nodes = ["nodes/A","nodes/B","nodes/C","nodes/D"]
for n in GRAPH_COMMON_NEIGHBORS("myGraph",nodes, nodes,
{"vertexCollectionRestriction": "crosses"}, {"vertexCollectionRestriction": "crosses"})
for f in VALUES(n)
for s in VALUES(f)
for candidate in s
collect crosses = candidate._key into counter
return {crosses: crosses, connections: 0.5 + SQRT(0.25 + LENGTH(counter))}
)
)
sort x.connections DESC
return x
Both queries will yield the result on your dataset:
[
{
"crosses": "E",
"connections": 4
},
{
"crosses": "F",
"connections": 3
}
]

Resources