Take US cities for example and say I want the traversal of all cities and roads that go through NYC, Chicago and Seattle.
This can be done with TRAVERSAL AQL function (using filterVertices). However this function only takes the ID and not the vertex example as in GRAPH_TRAVERSAL.
The GRAPH_TRAVERSAL doesn't have a filter option, so my question is there a way to filter the results using graph operations?
the feature is actually there but was somehow not documented. I added it to our documentation which should be updated soon. Sorry for the inconvenience.
filterVertices takes a list of vertex examples.
Note: you can also give the name of a custom AQL function. with signature function(config, vertex, path). For more specific filtering.
vertexFilterMethod defines what should be done with all other vertices:
"prune" will not follow edges attached to these vertices. (Used here)
"exclude" will not include this specific vertex.
["prune", "exclude"] both of the above. (default)
An example query for your question is the following (airway is my graph):
FOR x in GRAPH_TRAVERSAL("airway", "a/SFO", "outbound", {filterVertices: [{_key: "SFO"}, {_key: "NYC"}, {name: "Chicago"}, {name: "Seattle"}], vertexFilterMethod: "prune"}) RETURN x
Hint: Make sure you include the start vertex in the filter as well. Otherwise it will always return with an empty array (the first visited vertex is directly pruned)
Related
For some context: I am currently using azure cosmos db with gremlin api, because of the storage-scaling architecture, it's much less expensive to perform a '.out()' operation than a '.in()' operation, hence I always create double directed edges, so I choose which one to use with '.out()' operation depending on which direction I want to query.
We use the graph to associate events with users. Whenever a user 'U' raises an event 'E', we create two edges:
g.V('U').addE('raisedEvent').to(g.V('E'))
g.V('E').addE('raisedByUser').to(g.V('U'))
Very rarely, one of these queries fails for one reason or another and we end up with only a single edge between the two vertices. I've been trying to find a way to query for all vertices that have only a uni-directional relationship given a set of 'paired' edge-labels, in order to find these errors and re-create the missing edge.
Basically I need a query where...
given a pair of edge labels E1 (for outgoing, V1-->V2), E2 (for incoming V1<--V2)
finds finds all vertices V1 where for every outgoing edge E1 to another vertex V2, V2 doesn't have an edge E2 going back to V1; and vice-versa
Example:
// given a graph
g.addV('user').property('id','user_1')
g.addV('user').property('id','user_2')
g.addV('user').property('id','user_3')
g.addV('user').property('id','user_4')
g.addV('event').property('id','event_1')
g.addV('event').property('id','event_2')
g.addV('event').property('id','event_3')
g.addV('event').property('id','event_4')
g.V('user_1').addE('raisedEvent').to(g.V('event_1')).V('event_1').addE('raisedByUser').to(g.V('user1'))
g.V('user_2').addE('raisedEvent').to(g.V('event_2')).V('event_1').addE('raisedByUser').to(g.V('user1'))
g.V('user_2').addE('raisedEvent').to(g.V('event_3'))
g.V('event_4').addE('raisedByUser').to(g.V('user_3'))
// i.e.
// (user_1) <--> (event_1)
// (event_2) <--> (user_2) ---> (event_3)
// (event_4) ---> (user_3)
// (user_4)
// Then, the query should match with user_2 and user_3...
// ...as they contain uni-directional links to events
Edit: Note - The cosmosdb implementation of the 'is()' operation doesn't support giving traversal results as an input I.e. queries such as
where(_.outE('raisedEvent').count().is(__.out('raisedEvent').outE('raisedByUser').count()))
Are currently unsupported in cosmosdb.
If possible, it would also be great to get a list of which pairs of vertices have a bad link (e.g. in this case [(user_2, event_3), (user_3, event_4)]), but just knowing which vertices have a bad link will be very useful already.
Thanks to Kelvin Lawrence, I ended up using this pattern to get a list of vertex id pairs that are only uni-directionally connected from a to b:
g.V().haslabel("user").as('a').out('raisedEvent').where(__.not(out('raisedByUser').as('a'))).as('b').select('a','b').by('id')
I am new to Arango and I am trying to understand the 'right' way to write some queries. I read (https://www.arangodb.com/docs/stable/graphs-traversals-using-traversal-objects.html) and (http://jsteemann.github.io/blog/2015/01/28/using-custom-visitors-in-aql-graph-traversals/), since they always popped up when searching for what I am trying to do. In particular, I have a graph where a given node has a single path (via a certain 'type' of edge) from that node to a leaf. Something like x -a-> y -a-> z. Where a is the edge type, and x,y,z are the nodes. These paths can be of arbitrary length. I would like to write an AQL query that returns the single 'longest' path from the starting node to the leaf node. I find that I always get every sub-path and would then have to do some post-processing. The traversal objects looked like they supplied a solution to this issue, but it seems they are now deprecated. Is there a correct way to do this in AQL? Is there a document that shows how to do what steemann does in his article, but only using AQL? Is there some great AQL documentation on graph queries other than what is on the arangodb site (all of which I have already read, including the graph presentation and the udemy course)? If not, I would be happy to write something to share with the community, but I am not sure yet how to do this myself, so I'd need some pointers to material that can get me started. Long, short, I'd just like to know how to run my query to find the path from node to leaf. However, I'd be happy to contribute once I see how things should be done without traversal-objects. Thank you for your help
Taking a traversal in OUTBOUND direction as example, you can do a second traversal with depth = 1 to check if you reached a leaf node (no more incoming edges).
Based on this information, the “short” paths can be filtered out.
Note that a second condition is required:
it is possible that the last node in a traversal is not a leaf node if the maximum traversal depth is exceeded.
Thus, you need to also let paths through, which contain as many edges as hops you do in the traversal (here: 5).
LET maxDepth = 5
FOR v, e, p IN 1..maxDepth OUTBOUND "verts/s" edges
LET next = (
FOR vv, ee IN OUTBOUND v edges
//FILTER ee != e // needed if traversing edges in ANY direction
LIMIT 1 // optimization, no need to check for more than a single neighbor
RETURN true // using a constant here, which is not actually used
)
FILTER LENGTH(next) == 0 || LENGTH(p.edges) == maxDepth
RETURN CONCAT_SEPARATOR(" -> ", p.vertices[*]._key)
Cross-post from https://groups.google.com/g/arangodb/c/VAo_i_1UHbo/m/ByIUTqIfBAAJ
I'm looking for the right keywords/nomenclature for the following problem, since I cannot find anything on google to this topic:
I have a graph where each edge and each node is assigned to a certain class/color or whatever you call it. Now I want to find a path between a start and a goal node, having some constraints on the path. For example I'd like to have as less "blue" nodes on the path as possible, or max. 2 "red" edges, or a combination of those things. Of course there are also the usual edge costs, which have to be minimized in addition to the fixed path constraints.
How is this kind of problem called, or what do I have to search for?
Best regards
Mark
I do not think that a name for such a general problem exists. However, I'm pretty certain you can re-model your graph and solve this problem via a simple Dijkstra search:
Trying to avoid certain (type of) vertex: Say you have a vertex that is to be avoided, and that has k neighbors. Replace it by a K_k (i.e. a clique with k vertices), and connect each neighbor to one of the k new vertices. Then set the weight of all the clique-edges to something large. Now every path passing over the original vertex will have to pass through the clique and "pay the fee", i.e. it will be avoided, if possible
Trying to avoid certain edges: Just raise their edge weight accordingly
Then, run a simple Dijkstra search. If you have multiple classes that are to be avoided, you can even set the weights as to determine priorities for avoiding each of them..
Hope that helps,
Lukas
My graph has a edge of kind pm_child that forms a tree structure. Below is a picture showing an example tree:
When doing the following AQL command:
FOR v,e,p IN 1..50 OUTBOUND 'pmsite/482149696650' pm_child RETURN p
It returns only 14 of the 21 possible "project" kinds. This seems like a bug but I wanted to verify I didn't do something silly before I report it.
This question is a red-herring. Arangojs introduced a default LIMIT value which truncated the results. With some direction from ArangoDB support, I was able to construct a better way to pull out what I needed without the umpteen intermediate traversal results.
I have the following information in a Titan Graph database.I am trying to make sense of the information by sending queries across gremlin shell.The Graph database that I am trying to investigate models a Network.There are two types of vertices
- `Switch`
- `Port`
I am trying to figure out the relationship between these two types of vertices.
g = TitanFactory.open("/tmp/cassandra.titan")
To see the list of vertices of each type
$ g.V('type', 'switch')
==>v[228]
==>v[108]
==>v[124]
==>v[92]
==>v[156]
==>v[140]
$ g.V('type', 'port')
==>v[160]
==>v[120152]
==>v[164]
==>v[120156]
==>v[560104]
==>v[680020]
==>v[680040]
==>v[112]
==>v[120164]
==>v[560112]
==>v[680012]
==>v[680004]
==>v[144]
==>v[680032]
==>v[236]
==>v[100]
==>v[560128]
==>v[128]
==>v[680028]
==>v[232]
==>v[96]
To find the relation between the switch and port.
g.v(108).out
==>v[560104]
==>v[680004]
==>v[112]
What is this "out"? As I understand there is a outward arrow pointing from Switch represented by vertex 108 to the Ports represented by vertices 560104 680004 and 112
What is this in and out? Is it something very specific to Graph Databases? Also what is a label in a graph databse? Are in and out labels?
The use of in and out is descriptive of the direction of the edge going from one vertex to another. In your case, you have this:
switch --> port
When you write:
g.v(108).out
you are telling Gremlin to find the vertex at 108, then walk along edges that point out or away from it. You might also think of out as starting from the tail of the arrow and walking to the head. Given your schema, those lead to "ports".
Similarly, in simply means to have Gremlin walk along edges that point in to the vertex. You might also think of in as starting from the head of the arrow and walking to the tail. Given your schema, switches will have no in edges and hence will always return no results. However if you were to start from a "port" vertex and traverse in:
g.v(560104).in
you would at least get back vertex 108 as vertex "560104" has at least one edge with an arrow pointing to it (given what I know of your sample data).
By now you've gathered that in and out are "directions" and not "labels". A label has a different purpose; it categorizes an edge. For example, you might have the following schema:
switch --connectsTo--> port
company --manufactures--> switch
switch --locatedIn--> rack
In other words you might have three edge labels representing different ways that a "switch" relates to other parts of your schema. In this way your queries can be more descriptive about what you want. Given your previous example and this revised schema you would have to write the following to get the same result you originally showed:
g.v(108).out("connectsTo")
==>v[560104]
==>v[680004]
==>v[112]
Graph databases will typically take advantage of these labels to help improve performance of queries.