How to get all the possible paths between 2 vertices (eg. X and Y) with maxDepth = 2?
I tried with TRAVERSAL but it is taking around 10 seconds to execute. Here is the query :
FOR p IN TRAVERSAL(locations, connections, "X", "outbound", { minDepth: 1, maxDepth: 2, paths: true })
FILTER p.destination._key == "Y"
RETURN p.path.vertices[*].name
The locations (vertices) collection has 23753 documents, and the connections (edges) collection has 123414 documents.
You can speed up the query a lot if you put the filter for destination right into Traversal via the options filterVertices to give examples of vertices that should be touched by the traversal. With vertexFilterMethod you can define what should happen with all vertices that do not match the example.
So in your query you only want to match the target vertex "Y" and all other vertices should be passed through but not included in the result, exclude.
This makes the later FILTER obsolete.
Right now the internal optimizer is not able to do that automagically but this magic is on our roadmap.
This is a query containing the optimization:
FOR p IN TRAVERSAL(locations, connections, "X", "outbound", { minDepth: 1, maxDepth: 2, paths: true, filterVertices: [{_key: "Y"}], vertexFilterMethod: ["exclude"]})
RETURN p.path.vertices[*].name
Related
Been stuck for days with this concern, trying to accomplish this:
See the provided picture.
The black is the start vertex. Trying to get:
1: All child parts OUTBOUND (from) the start vertex
2: Condition: The children MUST have the INBOUND edge"types" and the other end a document with a variable set to "true" and of the type "type".
3: When a document of type "part" fails to met up the requirements with INBOUND document of type "type" with a attribute to "true", it stops the expand for that path then and there.
4: The documents who failed isn't included in the result either.
5: Should be compatible with any depths.
6: No subqueries (if possible without).
Example of graph
With the given information, the data model seems questionable. Why are there true and false vertices instead of a boolean edge attribute per partScrew? Is there a reason why it is modeled like this?
Using this data model, I don't see how this would be possible without subqueries. The traversal down a path can be stopped early with PRUNE, but that does not support subqueries. That only leaves FILTER for post-filtering as option, but be careful, you need to check all vertices on the path and not just the emitted vertex whether it has an inbound false type.
Not sure if it works as expected in all cases, but here is what I came up with and the query result, which looks good to me:
LET startScrew = FIRST(FOR doc IN screw LIMIT 1 RETURN doc) // Screw A
FOR v,e,p IN 1..2 OUTBOUND startScrew partScrew
FILTER (
FOR v_id IN SHIFT(p.vertices[*]._id) // ignore start vertex
FOR v2 IN 1..1 INBOUND v_id types
RETURN v2.value
) NONE == false
RETURN {
path: CONCAT_SEPARATOR(" -- ", p.vertices[*].value)
}
path
Screw A -- Part D
Screw A -- Part E
Screw A -- Part E -- Part F
Dump with test data: https://gist.github.com/Simran-B/6bd9b154d1d1e2e74638caceff42c44f
Q: Can I limit the edge collections the system will try to use when traversing named graphs AQL?
Scenario:
If I have a named graph productGraph with two vertices collections and two edge collections:
Vertices: product, price
prodParentOf (product A is parent of product B)
prodHasPrice (product A has a price of $X)
If now I want the products children of product A (and no prices) , I would like to do something like this
WITH product
FOR v, e, p IN OUTBOUND 'product/A'
GRAPH 'productGraph'
RETURN {vertice:v, edge:e, path: p}
However, if I look at the explain plan, I see that the system attempted to use the indexes for both prodParentOf and prodHasPrice (even if I explicitly put the product collection in the 'With' clause):
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
2 edge prodHasPrice false false 75.00 % [ `_from`, `_to` ] base OUTBOUND
2 edge prodParentOf false false 65.37 % [ `_from`, `_to` ] base OUTBOUND
Can I limit the edge collections the system will try to use when querying named graphs? Or do I have to use edge collections in the query instead. (which in my mind would mean that it would better to traverse edge collections in general than named graphs).
Here is the same query using an edge collection
FOR v, e, p IN OUTBOUND 'product/A'
prodParentOf
RETURN {vertice:v, edge:e, path: p}
The WITH clause does not impose restrictions on which collections that are part
of your named graph will be used in a traversal. It is mainly for traversals in cluster, to declare which collections will be involved. This helps to avoid deadlocks, which may occur if collections are lazily locked at query runtime.
If you use a single server instance, then the WTIH clause is optional. It does not have an effect on the result. If you want to exclude collections from traversal, you can either use collections sets instead of the named graph, or use FILTERs together with IS_SAME_COLLECTION(). Using collection sets is more efficient, because with less edge collections there are less edges to traverse, whereas filters are applied after the traversal in most cases.
FOR v, e, p IN 1..5 OUTBOUND 'verts/start' GRAPH 'named-graph'
FILTER (FOR id IN p.edges[*]._id RETURN IS_SAME_COLLECTION('edgesX', id)) ALL == true
RETURN p
If your traversal has a depth of 1 only, then a filter query is simpler:
FOR v, e, p IN INBOUND 'product/A' GRAPH 'productGraph'
FILTER IS_SAME_COLLECTION('prodParentOf', e)
RETURN {vertex: v, edge: e, path: p}
A way to prune paths may come in the future, which should also help with your named graph scenario.
I'm attempting to find all unique paths from one friend to another.
When I use uniqueVertices: 'global', it is only returning one path because the end vertices is considered is part of the global unique.
FOR v,e,p
IN 1..6
ANY "entities/foo"
GRAPH "friendGraph"
OPTIONS {
bfs: true,
uniqueVertices: 'path'
}
SORT e.weight ASC
FILTER v._id == "entities/bar"
RETURN p
Is there a way to have uniqueVertices: 'global' ignore the end vertices? I know there isn't a way to specifically do that. But is there a way to accomplish the same thing?
'path' resulted in way to many results.
Thank you.
In order to use globally unique vertices but for the last one, you could add the last step in the path manually like so:
FOR v,e,p
IN 0..5
ANY "entities/foo"
GRAPH "friendGraph"
OPTIONS {
bfs: true,
uniqueVertices: 'global'
}
FILTER p.vertices[*]._id ALL != "entities/bar"
FOR w,f
IN 1..1
ANY v
GRAPH "friendGraph"
FILTER w._id == "entities/bar"
SORT f.weight ASC
RETURN { edges: APPEND(p.edges, [f]), vertices: APPEND(p.vertices, [w]) }
I'd like to note two things:
the SORT operation you added might not achieve what you want: it sorts the paths by the weight of the path's last edge
this does not find all unique paths between the two vertices. For that using the option uniqueVertices: 'path' would be correct, and there might well be a lot of them.
For example, I want to query out exactly this graph starting from dave with limit of depth 2
Now if I want to get the node connected to Dave with depth of 2 I would use
For v,c in 0..2
ANY "persons/dave" knows
OPTIONS {uniqueVertices: "global",bfs: true }
return v
This would return:
Dave-Bob-Charlie-Eve-Alice (everyone in the graph)
But I do not know how to query to return the correct set of relations which is:
Eve to Alice not missing
If graph is bigger, Alice-to-someoneelse would not be in the result
My current solution below would not return Eve-to-Alice
For v,c in 1..2
ANY "persons/dave" knows
OPTIONS {uniqueEdges: "global",bfs: true }
return c
In this case, Eve-to-Alice is a third level of traversal if you start at Dave. You can write the query:
for v, e, p in 1..3 ANY "persons/dave" knows options {uniqueEdges: "path",bfs: true}
return {vertex: v, edge: e, path: p}
This will give you every edge, including the one between Eve and Alice. Does this answer your question?
If you need to limit this to paths only between second level nodes, you need to create a filter.
Let's say we have four documents with a tags field. It can contain multiple strings, let's say foo, bar and baz.
docA.tags = ['foo']
docB.tags = ['bar']
docC.tags = ['foo', 'bar']
docD.tags = ['foo', 'baz']
I query the docs using aggregations so I get the four documents and a list of three buckets with the count that matches the specific tag.
buckets = [
{key: 'bar', doc_count: 2}, // docB, docC
{key: 'foo', doc_count: 3}, // docA, docC, docD
{key: 'baz', doc_count: 1} // docD
]
If I now run another query and add one of those tags – lets say foo – as a terms-filter to the query, I only get the docs (docA, docC, docD) that have this tag. That's what I want.
But I also get another list of possible aggregations with updated counts.
buckets = [
{key: 'bar', doc_count: 1}, // docC
{key: 'baz', doc_count: 1}, // docD
]
But these counts don't really match what's happening. They reflect the count of documents that match both of the tags, the one I selected in the first place (foo) AND the one of the bucket (bar or baz).
But if I then select a second tag – let's say baz – I get documents that have been tagged with foo OR baz. That's because I use the terms filter.
So what I really want is this
buckets = [
{key: 'bar', doc_count: 1}, //docB
{key: 'baz', doc_count: 0},
]
How can I achieve that the counts are appropriate. They should reflect the count of documents that would be added if I select the second tag. An example of this is here.
I already tried to use post_filter but that always gives me the first result. Than a min_doc_count-flag to the aggs, but this only shows me the combinations that would result in count=0.
I have a solution for this, but it seem pretty complicated to me. For this I would have to run another request for each aggregation where I invert the filter criteria. So in the upper example I have to make a query to all docs that don't have the tag foo and match the rest of the query. The aggregation results would be exactly what I needed.
It sounds like you're trying to do something a little atypical for facets/aggregations.
(However, it's not invalid... it makes a lot of sense to understand how the size of your selection will change through the application of a filter)
What I think you're asking for is:
Display results for: QUERY AND FILTER
Display term aggregation counts for: QUERY NOT FILTER
You mentioned you're doing subsequent request(s) for counts? You should be able to construct this aggregation request inside your main search request.
Structurally it's:
match: (QUERY) or match_all
aggregations:
filter: { not: (FILTER) }
aggregations: { terms: ... }
post_filter: (FILTER)
That post_filter is executed after the aggregations are calculated (but still applied to the search results) so your results will be what you expect.
The aggregations are working in the scope of the search query alone. (The postfilter has not been applied yet.)
The filter aggregation excludes all documents matching FILTER from the search query results before the Terms Aggregation calculates the counts.
(giving you the left outside edge of the Venn shown above, but just for the counts)