UniqueVertices: path or global? - arangodb

What is the difference between UniqueVertices: Path and UniqueVertices: Global?
According to the ARANGO documentation:
path” – it is guaranteed that there is no path returned with a duplicate vertex
“global” – it is guaranteed that each vertex is visited at most once during the traversal, no matter how many paths lead from the start vertex to this one. If you start with a min depth > 1 a vertex that was found before min depth might not be returned at all (it still might be part of a path). Note: Using this configuration the result is not deterministic any more. If there are multiple paths from startVertex to vertex, one of those is picked. It is required to set bfs: true because with depth-first search the results would be unpredictable.
What does UniqueVertices global actually do? What does it entail that the vertex is visited at most once during the traversal?

Consider this graph with 5 vertices which contains a cycle B -> D -> E -> B:
An outbound traversal without uniqueness restrictions will run into this loop and visit some of the vertices and edges multiple times on a single path, until the maximum traversal depth is reached:
FOR v,e,p IN 1..10 OUTBOUND "vert/A" edge
OPTIONS { uniqueVertices: "none", uniqueEdges: "none" }
RETURN CONCAT_SEPARATOR(" --> ", p.vertices[*]._key)
[
"A --> B",
"A --> B --> C",
"A --> B --> D",
"A --> B --> D --> E",
"A --> B --> D --> E --> B",
"A --> B --> D --> E --> B --> C",
"A --> B --> D --> E --> B --> D",
"A --> B --> D --> E --> B --> D --> E",
"A --> B --> D --> E --> B --> D --> E --> B",
"A --> B --> D --> E --> B --> D --> E --> B --> C",
"A --> B --> D --> E --> B --> D --> E --> B --> D",
"A --> B --> D --> E --> B --> D --> E --> B --> D --> E",
"A --> B --> D --> E --> B --> D --> E --> B --> D --> E --> B"
]
By default, uniqueVertices is "none" and uniqueEdges is "path". These options avoid that the same edge is followed twice for one path. This prevents the traversal from entering the cycle a second time (the edge from B to D), but the latter two paths include B twice nonetheless:
FOR v,e,p IN 1..10 OUTBOUND "vert/A" edge
OPTIONS { uniqueVertices: "none", uniqueEdges: "path" }
RETURN CONCAT_SEPARATOR(" --> ", p.vertices[*]._key)
[
"A --> B",
"A --> B --> C",
"A --> B --> D",
"A --> B --> D --> E",
"A --> B --> D --> E --> B",
"A --> B --> D --> E --> B --> C"
]
By restricting the uniqueness to "path" for both vertices and edges, no vertex or edge will be visited twice per path. The edge from E to B is followed, because it's the first time that this edge is encountered on the path (whether uniqueEdges is "none" or "path" does not matter with this particular graph). But the vertex B was already visited in the beginning (A --> B --> D), so the traversal stops there and does not include B a second time:
FOR v,e,p IN 1..10 OUTBOUND "vert/A" edge
OPTIONS { uniqueVertices: "path", uniqueEdges: "path" }
RETURN CONCAT_SEPARATOR(" --> ", p.vertices[*]._key)
[
"A --> B",
"A --> B --> C",
"A --> B --> D",
"A --> B --> D --> E"
]
A different example graph is required to show the effect of uniqueVertices: "global", with 4 vertices in a diamond shape:
An outbound traversal starting at F leads via G and H to the same vertex I. No vertex or edge can possibly be visited twice, but each discovered path may include I:
FOR v,e,p IN 1..10 OUTBOUND "vert/F" edge
OPTIONS { bfs: true, uniqueVertices: "path", uniqueEdges: "path" }
RETURN CONCAT_SEPARATOR(" --> ", p.vertices[*]._key)
[
"F --> G",
"F --> H",
"F --> G --> I",
"F --> H --> I"
]
If we now change the uniqueness for vertices to "global", only one path can end in I. Which route is taken (via G or H) is undefined:
FOR v,e,p IN 1..10 OUTBOUND "vert/F" edge
OPTIONS { bfs: true, uniqueVertices: "global", uniqueEdges: "path" }
RETURN CONCAT_SEPARATOR(" --> ", p.vertices[*]._key)
[
"F --> G",
"F --> H",
"F --> G --> I"
]
Note that uniqueVertices: "global" requires bfs: true (breadth-first search). While it doesn't make the traversal deterministic, it does make it more predictable. With a depth-first search, it would follow down a path until the end (or until the maximum depth is reached) before other edges at the same depth would be followed. The traversal would then be stopped early on some paths if they contained an already visited vertex, but the order in which paths are discovered is undefined and that could result in different sets of vertices being returned for the same query. A breadth-first search on the other hand ensures that the same set of vertices is returned, even though the edges followed may vary between runs.

Related

How to get all vertices in ArangoDB graph traversal without max?

I want to get all vertices from graph,as shown below
a -> b -> c -> d -> e -> f -> ...
b2 -> c2 -> cd -> ....
I want to get all vertices, and use syntax like:
[WITH vertexCollection1[, vertexCollection2[, ...vertexCollectionN]]]
FOR vertex[, edge[, path]]
IN [min[..max]]
OUTBOUND|INBOUND|ANY startVertex
GRAPH graphName
[PRUNE pruneCondition]
[OPTIONS options]
as you can see, I have to define the value of max first( IN [min[..max]] ), how can I get all vertices without providing a value for max when the depth is unknown?
This isn't possible, a max value must either be provided, or it will default to the min value (which in turn defaults to 1).
There's nothing to stop you from setting this to an arbitrarily large number though (e.g. FOR v IN 1..99999999) - you'll get all vertices as long as max exceeds your max depth.

Prolog ways to compare variables

I was trying to implement some graph algorithms in Prolog. I came up with an idea to use unification to build a tree from the graph structure:
The graph would be defined as follows:
A list of Vertex-Variable pairs where Vertex is a constant representing the vertex and Variable is a corresponding variable, that would be used as a "reference" to the vertex. e.g.:
[a-A, b-B, c-C, d-D]
A list of VertexVar-NeighboursList pairs, where the VertexVar and the individual neighbours in the NeighboursList are the "reference variables". e.g.:
[A-[B, C, D], B-[A, C], C-[A, B], D-[A]] meaning b, c, d are neighbours of a etc.
Then before some graph algorithm (like searching for components, or simple DFS/BFS etc.) that could use some kind of tree built from the original graph, one could use some predicate like unify_neighbours that unifies the VertexVar-NeighbourList pairs as VertexVar = NeighboursList. After that, the vertex variables may be interpreted as lists of its neighbours, where each neighbour is again a list of its neighbours.
So this would result in a good performance when traversing the graph, as there is no need in linear search for some vertex and its neighbours for every vertex in the graph.
But my problem is: How to compare those vertex variables? (To check if they're the same.) I tried to use A == B, but there are some conflicts. For the example above, (with the unify_neighbours predicate) Prolog interprets the graph internally as:
[a-[S_1, S_2, S_3], b-S_1, c-S_2, d-S_3]
where:
S_1 = [[S_1, S_2, S_3], S_2]
S_2 = [[S_1, S_2, S_3], S_1]
S_3 = [[S_1, S_2, S_3]]
The problem is with S_1 and S_2 (aka b and c) as X = [something, Y], Y = [something, X], X == Y is true. The same problem would be with vertices, that share the same neighbours. e.g. U-[A, B] and V-[A, B].
So my question is: Is there any other way to compare variables, that could help me with this? Something that compares "the variables themselves", not the content, like comparing addresses in procedural programming languages? Or would that be too procedural and break the declarative idea of Prolog?
Example
graph_component(Vertices, Neighbours, C) :-
% Vertices and Neighbours as explained above.
% C is some component found in the graph.
vertices_refs(Vertices, Refs),
% Refs are only the variables from the pairs.
unify_neighbours(Neighbours), % As explained above.
rec_(Vertices, Refs, [], C).
rec_(Vertices, Refs, Found, RFound) :-
% Vertices as before.
% Refs is a stack of the vertex variables to search.
% Found are the vertices found so far.
% RFound is the resulting component found.
[Ref|RRest] = Refs,
vertices_pair(Vertices, Vertex-Ref),
% Vertex is the corresponding Vertex for the Ref variable
not(member(Vertex, Found)),
% Go deep:
rec_(Vertices, Ref, [Vertex|Found], DFound),
list_revpush_result([Vertex|Found], DFound, Found1),
% Go wide:
rec_(Vertices, RRest, Found1, RFound).
rec_(Vertices, Refs, Found, []) :-
% End of reccursion.
[Ref|_] = Refs,
vertices_pair(Vertices, Vertex-Ref),
member(Vertex, Found).
This example doesn't really work, but it's the idea. (Also, checking whether the vertices were found is done linearly, so the performance is still not good, but it's just for demonstration.) Now the predicate, that finds the corresponding vertex for the variable is implemented as:
vertices_pair([Vertex-Ref|_], Vertex-Ref).
vertices_pair([_-OtherRef|Rest], Vertex-Ref) :-
Ref \== OtherRef,
vertices_pair(Rest, Vertex-Ref).
where the \== operator is not really what I want and it creates those conflicts.
It is an intrinsic feature of Prolog that, once you have bound a variable to a term, it becomes indistinguishable from the term itself. In other words, if you bind two variables to the same term, you have two identical things, and there is no way to tell them apart.
Applied to your example: once you have unified every vertex-variable with the corresponding neighbours-list, all the variables are gone: you are left simply with a nested (and most likely circular) data structure, consisting of a list of lists of lists...
But as you suggest, the nested structure is an attractive idea because it gives you direct access to adjacent nodes. And although Prolog system vary somewhat in how well they support circular data structures, this need not stop you from exploiting this idea.
The only problem with your design is that a node is identified purely by the (potentially deeply nested and circular) data structure that describes the sub-graph that is reachable from it. This has the consequence that
two nodes that have the same descendants are indistinguishable
it can be very expensive to check whether two "similar looking" sub-graphs are identical or not
A simple way around that is to include a unique node identifier (such as a name or number) in your data structure. To use your example (slightly modified to make it more interesting):
make_graph(Graph) :-
Graph = [A,B,C,D],
A = node(a, [C,D]),
B = node(b, [A,C]),
C = node(c, [A,B]),
D = node(d, [A]).
You can then use that identifier to check for matching nodes, e.g. in a depth-first traversal:
dfs_visit_nodes([], Seen, Seen).
dfs_visit_nodes([node(Id,Children)|Nodes], Seen1, Seen) :-
( member(Id, Seen1) ->
Seen2 = Seen1
;
writeln(visiting(Id)),
dfs_visit_nodes(Children, [Id|Seen1], Seen2)
),
dfs_visit_nodes(Nodes, Seen2, Seen).
Sample run:
?- make_graph(G), dfs_visit_nodes(G, [], Seen).
visiting(a)
visiting(c)
visiting(b)
visiting(d)
G = [...]
Seen = [d, b, c, a]
Yes (0.00s cpu)
Thanks, #jschimpf, for the answer. It clarified a lot of things for me. I just got back to some graph problems with Prolog and thought I'd give this recursive data structure another try and came up with the following predicates to construct this data structure from a list of edges:
The "manual" creation of the data structure, as proposed by #jschimpf:
my_graph(Nodes) :-
Vars = [A, B, C, D, E],
Nodes = [
node(a, [edgeTo(1, B), edgeTo(5, D)]),
node(b, [edgeTo(1, A), edgeTo(4, E), edgeTo(2, C)]),
node(c, [edgeTo(2, B), edgeTo(6, F)]),
node(d, [edgeTo(5, A), edgeTo(3, E)]),
node(e, [edgeTo(3, D), edgeTo(4, B), edgeTo(1, F)]),
node(e, [edgeTo(1, E), edgeTo(6, C)])
],
Vars = Nodes.
Where edgeTo(Weight, VertexVar) represents an edge to some vertex with a weight assosiated with it. The weight is just to show that this can be customized for any additional information. node(Vertex, [edgeTo(Weight, VertexVar), ...]) represents a vertex with its neighbours.
A more "user-friendly" input format:
[edge(Weight, FromVertex, ToVertex), ...]
With optional list of vertices:
[Vertex, ...]
For the example above:
[edge(1, a, b), edge(5, a, d), edge(2, b, c), edge(4, b, e), edge(6, c, f), edge(3, d, e), edge(1, e, f)]
This list can be converted to the recursive data structure with the following predicates:
% make_directed_graph(+Edges, -Nodes)
make_directed_graph(Edges, Nodes) :-
vertices(Edges, Vertices),
vars(Vertices, Vars),
pairs(Vertices, Vars, Pairs),
nodes(Pairs, Edges, Nodes),
Vars = Nodes.
% make_graph(+Edges, -Nodes)
make_graph(Edges, Nodes) :-
vertices(Edges, Vertices),
vars(Vertices, Vars),
pairs(Vertices, Vars, Pairs),
directed(Edges, DiretedEdges),
nodes(Pairs, DiretedEdges, Nodes),
Vars = Nodes.
% make_graph(+Edges, -Nodes)
make_graph(Edges, Nodes) :-
vertices(Edges, Vertices),
vars(Vertices, Vars),
pairs(Vertices, Vars, Pairs),
directed(Edges, DiretedEdges),
nodes(Pairs, DiretedEdges, Nodes),
Vars = Nodes.
% make_directed_graph(+Vertices, +Edges, -Nodes)
make_directed_graph(Vertices, Edges, Nodes) :-
vars(Vertices, Vars),
pairs(Vertices, Vars, Pairs),
nodes(Pairs, Edges, Nodes),
Vars = Nodes.
The binary versions of these predicates assume, that every vertex can be obtained from the list of edges only - There are no "edge-less" vertices in the graph. The ternary versions take an additional list of vertices for exactly these cases.
make_directed_graph assumes the input edges to be directed, make_graph assumes them to be undirected, so it creates additional directed edges in the opposite direction:
% directed(+UndirectedEdges, -DiretedEdges)
directed([], []).
directed([edge(W, A, B)|UndirectedRest], [edge(W, A, B), edge(W, B, A)|DirectedRest]) :-
directed(UndirectedRest, DirectedRest).
To get all the vertices from the list of edges:
% vertices(+Edges, -Vertices)
vertices([], []).
vertices([edge(_, A, B)|EdgesRest], [A, B|VerticesRest]) :-
vertices(EdgesRest, VerticesRest),
\+ member(A, VerticesRest),
\+ member(B, VerticesRest).
vertices([edge(_, A, B)|EdgesRest], [A|VerticesRest]) :-
vertices(EdgesRest, VerticesRest),
\+ member(A, VerticesRest),
member(B, VerticesRest).
vertices([edge(_, A, B)|EdgesRest], [B|VerticesRest]) :-
vertices(EdgesRest, VerticesRest),
member(A, VerticesRest),
\+ member(B, VerticesRest).
vertices([edge(_, A, B)|EdgesRest], VerticesRest) :-
vertices(EdgesRest, VerticesRest),
member(A, VerticesRest),
member(B, VerticesRest).
To construct uninitialized variables for every vertex:
% vars(+List, -Vars)
vars([], []).
vars([_|ListRest], [_|VarsRest]) :-
vars(ListRest, VarsRest).
To pair up verticies and vertex variables:
% pairs(+ListA, +ListB, -Pairs)
pairs([], [], []).
pairs([AFirst|ARest], [BFirst|BRest], [AFirst-BFirst|PairsRest]) :-
pairs(ARest, BRest, PairsRest).
To construct the recursive nodes:
% nodes(+Pairs, +Edges, -Nodes)
nodes(Pairs, [], Nodes) :-
init_nodes(Pairs, Nodes).
nodes(Pairs, [EdgesFirst|EdgesRest], Nodes) :-
nodes(Pairs, EdgesRest, Nodes0),
insert_edge(Pairs, EdgesFirst, Nodes0, Nodes).
First, a list of empty nodes for every vertex is initialized:
% init_nodes(+Pairs, -EmptyNodes)
init_nodes([], []).
init_nodes([Vertex-_|PairsRest], [node(Vertex, [])|NodesRest]) :-
init_nodes(PairsRest, NodesRest).
Then the edges are inserted one by one:
% insert_edge(+Pairs, +Edge, +Nodes, -ResultingNodes)
insert_edge(Pairs, edge(W, A, B), [], [node(A, [edgeTo(W, BVar)])]) :-
vertex_var(Pairs, B, BVar).
insert_edge(Pairs, edge(W, A, B), [node(A, EdgesTo)|NodesRest], [node(A, [edgeTo(W, BVar)|EdgesTo])|NodesRest]) :-
vertex_var(Pairs, B, BVar).
insert_edge(Pairs, edge(W, A, B), [node(X, EdgesTo)|NodesRest], [node(X, EdgesTo)|ResultingNodes]) :-
A \= X,
insert_edge(Pairs, edge(W, A, B), NodesRest, ResultingNodes).
To get a vertex variable for a given vertex: (This actually works in both directions.)
% vertex_var(+Pairs, +Vertex, -Var)
vertex_var(Pairs, Vertex, Var) :-
member(Vertex-Var, Pairs).
```Prolog
This, of course, brings additional time overhead, but you can do this once and then just copy this data structure every time you need to perform some graph algorithm on it and access neighbours in constant time.
You can also add additional information to the `node` predicate. For example:
```Prolog
node(Vertex, Neighbours, OrderingVar)
Where the uninitialized variable OrderingVar can be "assigned" (initialized) in constant time with information about the vertex' position in a partial ordering of the graph, for example. So this may be used as output. (As sometimes denoted by +- in Prolog comments - an uninitialized variable as a part of an input term, that is yet to be initialized by the used predicate and provides output.)

What does heuristic h: {1, ... , N} --> R mean?

I want to know what does a heuristic h: {1, ... , N} --> R with the goal state always being 1 mean?
The state are represented as points in a 2D Cartesian system, with coordinates (x,y).
Wikipedia describes this notation of a function. In your specific case:
h: {1, ..., N] --> R
we have:
h: The function's symbol (h for heuristic)
{1, ..., N}: the domain of your function, in this case the set of all integers from 1 up to and including N. This is the ''input'' that your function can take. Note that this means that your function h(x) is not, for example, defined for x = 1.5. It can only take integers between 1 and N (both inclusive) as input.
R: The codomain of your functions, in this case the set R which is probably supposed to denote the set of all real numbers. Your function can produce any real number as output.

ArangoDB: Get every node, which is in any way related to a selected node

I have a simple node-links graph in ArangoDB. How can I traverse from 1 preselected node and return all nodes which are related to it?
For example:
A→B, B→C, C→D, C→E, F→B, F→E
Selecting any of them should return the same result (all of them).
I am very new to ArangoDB.
What you need is AQL graph traversal, available since ArangoDB 2.8. Older versions provided a set of graph-related functions, but native AQL traversal is faster, more flexible and the graph functions are no longer available starting with 3.0.
AQL traversal let's you follow edges connected to a start vertex, up to a variable depth. Each encountered vertex can be accessed, e.g. for filtering or to construct a result, as well as the edge that led you to this vertex and the full path from start to finish including both, vertices and edges.
In your case, only the names of the visited vertices need to be returned. You can run the following AQL queries, assuming there's a document collection node and an edge collection links and they contain the data for this graph:
// follow edges ("links" collection) in outbound direction, starting at A
FOR v IN OUTBOUND "node/A" links
// return the key (node name) for every vertex we see
RETURN v._key
This will only return [ "B" ], because the traversal depth is implicitly 1..1 (min=1, max=1). If we increase the max depth, then we can include nodes that are indirectly connected as well:
FOR v IN 1..10 OUTBOUND "node/A" links
RETURN v._key
This will give us [ "B", "C", "D", "E"]. If we look at the graph, this is correct: we only follow edges that point from the vertex we come from to another vertex (direction of the arrow). To do the reverse, we could use INBOUND, but in your case, we want to ignore the direction of the edge and follow anyway:
FOR v IN 1..10 ANY "node/A" links
RETURN v._key
The result might be a bit surprising at first:
[ "B", "C", "D", "E", "F", "B", "F", "E", "C", "D", "B" ]
We see duplicate nodes returned. The reason is that there are multiple paths from A to C for instance (via B and also via B-F-E), and the query returns the last node of every path as variable v. (It doesn't actually process all possible paths up to the maximum depth of 10, but you could set the traversal option OPTIONS {uniqueEdges: "none"} to do so.)
It can help to return formatted traversal paths to better understand what is going on (i.e. how nodes are reached):
FOR v, e, p IN 1..10 ANY "node/A" links OPTIONS {uniqueEdges: "path"}
RETURN CONCAT_SEPARATOR(" - ", p.vertices[*]._key)
Result:
[
"A - B",
"A - B - C",
"A - B - C - D",
"A - B - C - E",
"A - B - C - E - F",
"A - B - C - E - F - B",
"A - B - F",
"A - B - F - E",
"A - B - F - E - C",
"A - B - F - E - C - D",
"A - B - F - E - C - B"
]
There is a cycle in the graph, but there can't be an infinite loop because the maximum depth is exceeded after 10 hops. But as you can see above, it doesn't even reach the depth of 10, it rather stops because the (default) option is to not follow edges twice per path (uniqueEdges: "path").
Anyway, this is not the desired result. A cheap trick would be to use RETURN DISTINCT, COLLECT or something like that to remove duplicates. But we are better off tweaking the traversal options, to not follow edges unnecessarily.
uniqueEdges: "global" would still include the B node twice, but uniqueVertices: "global" gives the desired result. In addition, bfs: true for breadth-first search can be used in this case. The difference is that the path to the F node is shorter (A-B-F instead of A-B-C-E-F). In general, the exact options you should use largely depend on the dataset and the questions you have.
There's one more problem to solve: the traversal does not include the start vertex (other than in p.vertices[0] for every path). This can easily be solved using ArangoDB 3.0 or later by setting the minimum depth to 0:
FOR v IN 0..10 ANY "node/A" links OPTIONS {uniqueVertices: "global"}
RETURN v._key
[ "A", "B", "C", "D", "E", "F" ]
To verify that all nodes from A through F are returned, regardless of the start vertex, we can issue the following test query:
FOR doc IN node
RETURN (
FOR v IN 0..10 ANY doc links OPTIONS {uniqueVertices: "global"}
SORT v._key
RETURN v._key
)
All sub-arrays should look the same. Remove the SORT operation if you want the node names returned in traversal order. Hope this helps =)

Is Transactional Locking 2 algorithm serializable?

Consider two threads A and B
A.readset intersects with B.writeset
B.readset NOT intersect with A.writeset
A.writeset NOT intersect with B.writeset
They commit at the same time: A.lock --> A.validation --> B.lock --> B.validation --> (A B installs updates)
Is this not serializable because B may overwrite A's reads before A commits?
It is serializable because the the value written to Transaction A's writeset depends on the cached value of A's readset which was confirmed during validation. The over-writing of A's readset by B does not affect the cached values of A's readset which A's writes are based on. The values written to Transaction A's writeset are exactly the same values that would have been written if transaction A ran to completion before transaction B started, so it's serializable.
EXAMPLE
We have a transactional memory consists of 3 variables X, Y, Z = X1, Y1, Z1
Transaction A reads X and writes Y with a value that depends on X (X + P)
Transaction B reads Z and writes X with a value that depends on Z (Z + Q)
SERIALIZED EXECUTION
Transaction A: locks Y, validates X = X1.
Transaction A: sets Y = X1 + P and commits.
Transaction B: locks X, validates Z = Z1.
Transaction B: sets X = Z1 + Q and commits
Final result: (X, Y, Z) = (Z1 + Q, X1 + P, Z1)
INTERLEAVED EXECUTION
Transaction A: locks Y, validates X = X1.
Transaction B: locks X, validates Z = Z1.
Transaction B: sets X = Z1 + Q and commits (writes A's readset X before A commits)
Transaction A: sets Y = X1 + P and commits (uses cached value of X not the latest value)
Final result: (X, Y, Z) = (Z1 + Q, X1 + P, Z1) (same result as serial execution)

Resources