I was trying to implement some graph algorithms in Prolog. I came up with an idea to use unification to build a tree from the graph structure:
The graph would be defined as follows:
A list of Vertex-Variable pairs where Vertex is a constant representing the vertex and Variable is a corresponding variable, that would be used as a "reference" to the vertex. e.g.:
[a-A, b-B, c-C, d-D]
A list of VertexVar-NeighboursList pairs, where the VertexVar and the individual neighbours in the NeighboursList are the "reference variables". e.g.:
[A-[B, C, D], B-[A, C], C-[A, B], D-[A]] meaning b, c, d are neighbours of a etc.
Then before some graph algorithm (like searching for components, or simple DFS/BFS etc.) that could use some kind of tree built from the original graph, one could use some predicate like unify_neighbours that unifies the VertexVar-NeighbourList pairs as VertexVar = NeighboursList. After that, the vertex variables may be interpreted as lists of its neighbours, where each neighbour is again a list of its neighbours.
So this would result in a good performance when traversing the graph, as there is no need in linear search for some vertex and its neighbours for every vertex in the graph.
But my problem is: How to compare those vertex variables? (To check if they're the same.) I tried to use A == B, but there are some conflicts. For the example above, (with the unify_neighbours predicate) Prolog interprets the graph internally as:
[a-[S_1, S_2, S_3], b-S_1, c-S_2, d-S_3]
where:
S_1 = [[S_1, S_2, S_3], S_2]
S_2 = [[S_1, S_2, S_3], S_1]
S_3 = [[S_1, S_2, S_3]]
The problem is with S_1 and S_2 (aka b and c) as X = [something, Y], Y = [something, X], X == Y is true. The same problem would be with vertices, that share the same neighbours. e.g. U-[A, B] and V-[A, B].
So my question is: Is there any other way to compare variables, that could help me with this? Something that compares "the variables themselves", not the content, like comparing addresses in procedural programming languages? Or would that be too procedural and break the declarative idea of Prolog?
Example
graph_component(Vertices, Neighbours, C) :-
% Vertices and Neighbours as explained above.
% C is some component found in the graph.
vertices_refs(Vertices, Refs),
% Refs are only the variables from the pairs.
unify_neighbours(Neighbours), % As explained above.
rec_(Vertices, Refs, [], C).
rec_(Vertices, Refs, Found, RFound) :-
% Vertices as before.
% Refs is a stack of the vertex variables to search.
% Found are the vertices found so far.
% RFound is the resulting component found.
[Ref|RRest] = Refs,
vertices_pair(Vertices, Vertex-Ref),
% Vertex is the corresponding Vertex for the Ref variable
not(member(Vertex, Found)),
% Go deep:
rec_(Vertices, Ref, [Vertex|Found], DFound),
list_revpush_result([Vertex|Found], DFound, Found1),
% Go wide:
rec_(Vertices, RRest, Found1, RFound).
rec_(Vertices, Refs, Found, []) :-
% End of reccursion.
[Ref|_] = Refs,
vertices_pair(Vertices, Vertex-Ref),
member(Vertex, Found).
This example doesn't really work, but it's the idea. (Also, checking whether the vertices were found is done linearly, so the performance is still not good, but it's just for demonstration.) Now the predicate, that finds the corresponding vertex for the variable is implemented as:
vertices_pair([Vertex-Ref|_], Vertex-Ref).
vertices_pair([_-OtherRef|Rest], Vertex-Ref) :-
Ref \== OtherRef,
vertices_pair(Rest, Vertex-Ref).
where the \== operator is not really what I want and it creates those conflicts.
It is an intrinsic feature of Prolog that, once you have bound a variable to a term, it becomes indistinguishable from the term itself. In other words, if you bind two variables to the same term, you have two identical things, and there is no way to tell them apart.
Applied to your example: once you have unified every vertex-variable with the corresponding neighbours-list, all the variables are gone: you are left simply with a nested (and most likely circular) data structure, consisting of a list of lists of lists...
But as you suggest, the nested structure is an attractive idea because it gives you direct access to adjacent nodes. And although Prolog system vary somewhat in how well they support circular data structures, this need not stop you from exploiting this idea.
The only problem with your design is that a node is identified purely by the (potentially deeply nested and circular) data structure that describes the sub-graph that is reachable from it. This has the consequence that
two nodes that have the same descendants are indistinguishable
it can be very expensive to check whether two "similar looking" sub-graphs are identical or not
A simple way around that is to include a unique node identifier (such as a name or number) in your data structure. To use your example (slightly modified to make it more interesting):
make_graph(Graph) :-
Graph = [A,B,C,D],
A = node(a, [C,D]),
B = node(b, [A,C]),
C = node(c, [A,B]),
D = node(d, [A]).
You can then use that identifier to check for matching nodes, e.g. in a depth-first traversal:
dfs_visit_nodes([], Seen, Seen).
dfs_visit_nodes([node(Id,Children)|Nodes], Seen1, Seen) :-
( member(Id, Seen1) ->
Seen2 = Seen1
;
writeln(visiting(Id)),
dfs_visit_nodes(Children, [Id|Seen1], Seen2)
),
dfs_visit_nodes(Nodes, Seen2, Seen).
Sample run:
?- make_graph(G), dfs_visit_nodes(G, [], Seen).
visiting(a)
visiting(c)
visiting(b)
visiting(d)
G = [...]
Seen = [d, b, c, a]
Yes (0.00s cpu)
Thanks, #jschimpf, for the answer. It clarified a lot of things for me. I just got back to some graph problems with Prolog and thought I'd give this recursive data structure another try and came up with the following predicates to construct this data structure from a list of edges:
The "manual" creation of the data structure, as proposed by #jschimpf:
my_graph(Nodes) :-
Vars = [A, B, C, D, E],
Nodes = [
node(a, [edgeTo(1, B), edgeTo(5, D)]),
node(b, [edgeTo(1, A), edgeTo(4, E), edgeTo(2, C)]),
node(c, [edgeTo(2, B), edgeTo(6, F)]),
node(d, [edgeTo(5, A), edgeTo(3, E)]),
node(e, [edgeTo(3, D), edgeTo(4, B), edgeTo(1, F)]),
node(e, [edgeTo(1, E), edgeTo(6, C)])
],
Vars = Nodes.
Where edgeTo(Weight, VertexVar) represents an edge to some vertex with a weight assosiated with it. The weight is just to show that this can be customized for any additional information. node(Vertex, [edgeTo(Weight, VertexVar), ...]) represents a vertex with its neighbours.
A more "user-friendly" input format:
[edge(Weight, FromVertex, ToVertex), ...]
With optional list of vertices:
[Vertex, ...]
For the example above:
[edge(1, a, b), edge(5, a, d), edge(2, b, c), edge(4, b, e), edge(6, c, f), edge(3, d, e), edge(1, e, f)]
This list can be converted to the recursive data structure with the following predicates:
% make_directed_graph(+Edges, -Nodes)
make_directed_graph(Edges, Nodes) :-
vertices(Edges, Vertices),
vars(Vertices, Vars),
pairs(Vertices, Vars, Pairs),
nodes(Pairs, Edges, Nodes),
Vars = Nodes.
% make_graph(+Edges, -Nodes)
make_graph(Edges, Nodes) :-
vertices(Edges, Vertices),
vars(Vertices, Vars),
pairs(Vertices, Vars, Pairs),
directed(Edges, DiretedEdges),
nodes(Pairs, DiretedEdges, Nodes),
Vars = Nodes.
% make_graph(+Edges, -Nodes)
make_graph(Edges, Nodes) :-
vertices(Edges, Vertices),
vars(Vertices, Vars),
pairs(Vertices, Vars, Pairs),
directed(Edges, DiretedEdges),
nodes(Pairs, DiretedEdges, Nodes),
Vars = Nodes.
% make_directed_graph(+Vertices, +Edges, -Nodes)
make_directed_graph(Vertices, Edges, Nodes) :-
vars(Vertices, Vars),
pairs(Vertices, Vars, Pairs),
nodes(Pairs, Edges, Nodes),
Vars = Nodes.
The binary versions of these predicates assume, that every vertex can be obtained from the list of edges only - There are no "edge-less" vertices in the graph. The ternary versions take an additional list of vertices for exactly these cases.
make_directed_graph assumes the input edges to be directed, make_graph assumes them to be undirected, so it creates additional directed edges in the opposite direction:
% directed(+UndirectedEdges, -DiretedEdges)
directed([], []).
directed([edge(W, A, B)|UndirectedRest], [edge(W, A, B), edge(W, B, A)|DirectedRest]) :-
directed(UndirectedRest, DirectedRest).
To get all the vertices from the list of edges:
% vertices(+Edges, -Vertices)
vertices([], []).
vertices([edge(_, A, B)|EdgesRest], [A, B|VerticesRest]) :-
vertices(EdgesRest, VerticesRest),
\+ member(A, VerticesRest),
\+ member(B, VerticesRest).
vertices([edge(_, A, B)|EdgesRest], [A|VerticesRest]) :-
vertices(EdgesRest, VerticesRest),
\+ member(A, VerticesRest),
member(B, VerticesRest).
vertices([edge(_, A, B)|EdgesRest], [B|VerticesRest]) :-
vertices(EdgesRest, VerticesRest),
member(A, VerticesRest),
\+ member(B, VerticesRest).
vertices([edge(_, A, B)|EdgesRest], VerticesRest) :-
vertices(EdgesRest, VerticesRest),
member(A, VerticesRest),
member(B, VerticesRest).
To construct uninitialized variables for every vertex:
% vars(+List, -Vars)
vars([], []).
vars([_|ListRest], [_|VarsRest]) :-
vars(ListRest, VarsRest).
To pair up verticies and vertex variables:
% pairs(+ListA, +ListB, -Pairs)
pairs([], [], []).
pairs([AFirst|ARest], [BFirst|BRest], [AFirst-BFirst|PairsRest]) :-
pairs(ARest, BRest, PairsRest).
To construct the recursive nodes:
% nodes(+Pairs, +Edges, -Nodes)
nodes(Pairs, [], Nodes) :-
init_nodes(Pairs, Nodes).
nodes(Pairs, [EdgesFirst|EdgesRest], Nodes) :-
nodes(Pairs, EdgesRest, Nodes0),
insert_edge(Pairs, EdgesFirst, Nodes0, Nodes).
First, a list of empty nodes for every vertex is initialized:
% init_nodes(+Pairs, -EmptyNodes)
init_nodes([], []).
init_nodes([Vertex-_|PairsRest], [node(Vertex, [])|NodesRest]) :-
init_nodes(PairsRest, NodesRest).
Then the edges are inserted one by one:
% insert_edge(+Pairs, +Edge, +Nodes, -ResultingNodes)
insert_edge(Pairs, edge(W, A, B), [], [node(A, [edgeTo(W, BVar)])]) :-
vertex_var(Pairs, B, BVar).
insert_edge(Pairs, edge(W, A, B), [node(A, EdgesTo)|NodesRest], [node(A, [edgeTo(W, BVar)|EdgesTo])|NodesRest]) :-
vertex_var(Pairs, B, BVar).
insert_edge(Pairs, edge(W, A, B), [node(X, EdgesTo)|NodesRest], [node(X, EdgesTo)|ResultingNodes]) :-
A \= X,
insert_edge(Pairs, edge(W, A, B), NodesRest, ResultingNodes).
To get a vertex variable for a given vertex: (This actually works in both directions.)
% vertex_var(+Pairs, +Vertex, -Var)
vertex_var(Pairs, Vertex, Var) :-
member(Vertex-Var, Pairs).
```Prolog
This, of course, brings additional time overhead, but you can do this once and then just copy this data structure every time you need to perform some graph algorithm on it and access neighbours in constant time.
You can also add additional information to the `node` predicate. For example:
```Prolog
node(Vertex, Neighbours, OrderingVar)
Where the uninitialized variable OrderingVar can be "assigned" (initialized) in constant time with information about the vertex' position in a partial ordering of the graph, for example. So this may be used as output. (As sometimes denoted by +- in Prolog comments - an uninitialized variable as a part of an input term, that is yet to be initialized by the used predicate and provides output.)
I have a simple node-links graph in ArangoDB. How can I traverse from 1 preselected node and return all nodes which are related to it?
For example:
A→B, B→C, C→D, C→E, F→B, F→E
Selecting any of them should return the same result (all of them).
I am very new to ArangoDB.
What you need is AQL graph traversal, available since ArangoDB 2.8. Older versions provided a set of graph-related functions, but native AQL traversal is faster, more flexible and the graph functions are no longer available starting with 3.0.
AQL traversal let's you follow edges connected to a start vertex, up to a variable depth. Each encountered vertex can be accessed, e.g. for filtering or to construct a result, as well as the edge that led you to this vertex and the full path from start to finish including both, vertices and edges.
In your case, only the names of the visited vertices need to be returned. You can run the following AQL queries, assuming there's a document collection node and an edge collection links and they contain the data for this graph:
// follow edges ("links" collection) in outbound direction, starting at A
FOR v IN OUTBOUND "node/A" links
// return the key (node name) for every vertex we see
RETURN v._key
This will only return [ "B" ], because the traversal depth is implicitly 1..1 (min=1, max=1). If we increase the max depth, then we can include nodes that are indirectly connected as well:
FOR v IN 1..10 OUTBOUND "node/A" links
RETURN v._key
This will give us [ "B", "C", "D", "E"]. If we look at the graph, this is correct: we only follow edges that point from the vertex we come from to another vertex (direction of the arrow). To do the reverse, we could use INBOUND, but in your case, we want to ignore the direction of the edge and follow anyway:
FOR v IN 1..10 ANY "node/A" links
RETURN v._key
The result might be a bit surprising at first:
[ "B", "C", "D", "E", "F", "B", "F", "E", "C", "D", "B" ]
We see duplicate nodes returned. The reason is that there are multiple paths from A to C for instance (via B and also via B-F-E), and the query returns the last node of every path as variable v. (It doesn't actually process all possible paths up to the maximum depth of 10, but you could set the traversal option OPTIONS {uniqueEdges: "none"} to do so.)
It can help to return formatted traversal paths to better understand what is going on (i.e. how nodes are reached):
FOR v, e, p IN 1..10 ANY "node/A" links OPTIONS {uniqueEdges: "path"}
RETURN CONCAT_SEPARATOR(" - ", p.vertices[*]._key)
Result:
[
"A - B",
"A - B - C",
"A - B - C - D",
"A - B - C - E",
"A - B - C - E - F",
"A - B - C - E - F - B",
"A - B - F",
"A - B - F - E",
"A - B - F - E - C",
"A - B - F - E - C - D",
"A - B - F - E - C - B"
]
There is a cycle in the graph, but there can't be an infinite loop because the maximum depth is exceeded after 10 hops. But as you can see above, it doesn't even reach the depth of 10, it rather stops because the (default) option is to not follow edges twice per path (uniqueEdges: "path").
Anyway, this is not the desired result. A cheap trick would be to use RETURN DISTINCT, COLLECT or something like that to remove duplicates. But we are better off tweaking the traversal options, to not follow edges unnecessarily.
uniqueEdges: "global" would still include the B node twice, but uniqueVertices: "global" gives the desired result. In addition, bfs: true for breadth-first search can be used in this case. The difference is that the path to the F node is shorter (A-B-F instead of A-B-C-E-F). In general, the exact options you should use largely depend on the dataset and the questions you have.
There's one more problem to solve: the traversal does not include the start vertex (other than in p.vertices[0] for every path). This can easily be solved using ArangoDB 3.0 or later by setting the minimum depth to 0:
FOR v IN 0..10 ANY "node/A" links OPTIONS {uniqueVertices: "global"}
RETURN v._key
[ "A", "B", "C", "D", "E", "F" ]
To verify that all nodes from A through F are returned, regardless of the start vertex, we can issue the following test query:
FOR doc IN node
RETURN (
FOR v IN 0..10 ANY doc links OPTIONS {uniqueVertices: "global"}
SORT v._key
RETURN v._key
)
All sub-arrays should look the same. Remove the SORT operation if you want the node names returned in traversal order. Hope this helps =)