Labels,vertices and edges TitanDB - cassandra

I have the following information in a Titan Graph database.I am trying to make sense of the information by sending queries across gremlin shell.The Graph database that I am trying to investigate models a Network.There are two types of vertices
- `Switch`
- `Port`
I am trying to figure out the relationship between these two types of vertices.
g = TitanFactory.open("/tmp/cassandra.titan")
To see the list of vertices of each type
$ g.V('type', 'switch')
==>v[228]
==>v[108]
==>v[124]
==>v[92]
==>v[156]
==>v[140]
$ g.V('type', 'port')
==>v[160]
==>v[120152]
==>v[164]
==>v[120156]
==>v[560104]
==>v[680020]
==>v[680040]
==>v[112]
==>v[120164]
==>v[560112]
==>v[680012]
==>v[680004]
==>v[144]
==>v[680032]
==>v[236]
==>v[100]
==>v[560128]
==>v[128]
==>v[680028]
==>v[232]
==>v[96]
To find the relation between the switch and port.
g.v(108).out
==>v[560104]
==>v[680004]
==>v[112]
What is this "out"? As I understand there is a outward arrow pointing from Switch represented by vertex 108 to the Ports represented by vertices 560104 680004 and 112
What is this in and out? Is it something very specific to Graph Databases? Also what is a label in a graph databse? Are in and out labels?

The use of in and out is descriptive of the direction of the edge going from one vertex to another. In your case, you have this:
switch --> port
When you write:
g.v(108).out
you are telling Gremlin to find the vertex at 108, then walk along edges that point out or away from it. You might also think of out as starting from the tail of the arrow and walking to the head. Given your schema, those lead to "ports".
Similarly, in simply means to have Gremlin walk along edges that point in to the vertex. You might also think of in as starting from the head of the arrow and walking to the tail. Given your schema, switches will have no in edges and hence will always return no results. However if you were to start from a "port" vertex and traverse in:
g.v(560104).in
you would at least get back vertex 108 as vertex "560104" has at least one edge with an arrow pointing to it (given what I know of your sample data).
By now you've gathered that in and out are "directions" and not "labels". A label has a different purpose; it categorizes an edge. For example, you might have the following schema:
switch --connectsTo--> port
company --manufactures--> switch
switch --locatedIn--> rack
In other words you might have three edge labels representing different ways that a "switch" relates to other parts of your schema. In this way your queries can be more descriptive about what you want. Given your previous example and this revised schema you would have to write the following to get the same result you originally showed:
g.v(108).out("connectsTo")
==>v[560104]
==>v[680004]
==>v[112]
Graph databases will typically take advantage of these labels to help improve performance of queries.

Related

Gremlin Query to check for pairs of edges on vertices

For some context: I am currently using azure cosmos db with gremlin api, because of the storage-scaling architecture, it's much less expensive to perform a '.out()' operation than a '.in()' operation, hence I always create double directed edges, so I choose which one to use with '.out()' operation depending on which direction I want to query.
We use the graph to associate events with users. Whenever a user 'U' raises an event 'E', we create two edges:
g.V('U').addE('raisedEvent').to(g.V('E'))
g.V('E').addE('raisedByUser').to(g.V('U'))
Very rarely, one of these queries fails for one reason or another and we end up with only a single edge between the two vertices. I've been trying to find a way to query for all vertices that have only a uni-directional relationship given a set of 'paired' edge-labels, in order to find these errors and re-create the missing edge.
Basically I need a query where...
given a pair of edge labels E1 (for outgoing, V1-->V2), E2 (for incoming V1<--V2)
finds finds all vertices V1 where for every outgoing edge E1 to another vertex V2, V2 doesn't have an edge E2 going back to V1; and vice-versa
Example:
// given a graph
g.addV('user').property('id','user_1')
g.addV('user').property('id','user_2')
g.addV('user').property('id','user_3')
g.addV('user').property('id','user_4')
g.addV('event').property('id','event_1')
g.addV('event').property('id','event_2')
g.addV('event').property('id','event_3')
g.addV('event').property('id','event_4')
g.V('user_1').addE('raisedEvent').to(g.V('event_1')).V('event_1').addE('raisedByUser').to(g.V('user1'))
g.V('user_2').addE('raisedEvent').to(g.V('event_2')).V('event_1').addE('raisedByUser').to(g.V('user1'))
g.V('user_2').addE('raisedEvent').to(g.V('event_3'))
g.V('event_4').addE('raisedByUser').to(g.V('user_3'))
// i.e.
// (user_1) <--> (event_1)
// (event_2) <--> (user_2) ---> (event_3)
// (event_4) ---> (user_3)
// (user_4)
// Then, the query should match with user_2 and user_3...
// ...as they contain uni-directional links to events
Edit: Note - The cosmosdb implementation of the 'is()' operation doesn't support giving traversal results as an input I.e. queries such as
where(_.outE('raisedEvent').count().is(__.out('raisedEvent').outE('raisedByUser').count()))
Are currently unsupported in cosmosdb.
If possible, it would also be great to get a list of which pairs of vertices have a bad link (e.g. in this case [(user_2, event_3), (user_3, event_4)]), but just knowing which vertices have a bad link will be very useful already.
Thanks to Kelvin Lawrence, I ended up using this pattern to get a list of vertex id pairs that are only uni-directionally connected from a to b:
g.V().haslabel("user").as('a').out('raisedEvent').where(__.not(out('raisedByUser').as('a'))).as('b').select('a','b').by('id')

Add dimension between two elements that are not inside the family editor?

I've seen examples using the NewDimension method to dimension between two points and two lines, I assume in the family editor, but I want to add a dimension to two family instances in the model, such as a pipe tap's centerline and a pipe end. Then the dimension would 'drive' the distance if the user edits it, moving the outlet along the pipe, just like it does if a user created the dimension using the Revit UI.
I just don't know what way Revit wants me to try to do this:
Finding the family instance ID, going into each family ID, and finding a line/plane/point in the family to use as a dimension point when you use NewDimension. Hopefully this would work outside the family editor trying to make a dimension between two different family instances (pipe end and pipe tap).
Finding the x,y,z location of the points you want to snap to, and creating a dimension (using NewDimension method for example) between those two x,y,z locations, and if the x,y,z locations fall on appropriate points like a pipe end and center-line of a pipe tap then perhaps Revit automatically makes it a 'smart' dimension that 'drives' the location of the pipe tap.
Here's some promising methods I found in the API, not sure which of them I should be using though.
NewDimension
AlignedDimension
AddListeningDimensionBendToBend
AddListeningDimensionSegmentToBend
AddListeningDimensionSegmentToSegment
SetElementsToDimension
Look at the two Building Coder samples showing how to Dimension Walls by Iterating Faces and Dimension Walls using FindReferencesByDirection.
The approach used for walls works with standard family instances as well.
Note that the FindReferencesByDirection method has now been replaced by the `ReferenceIntersector class.

Graph search with constraints on edge type

I'm looking for the right keywords/nomenclature for the following problem, since I cannot find anything on google to this topic:
I have a graph where each edge and each node is assigned to a certain class/color or whatever you call it. Now I want to find a path between a start and a goal node, having some constraints on the path. For example I'd like to have as less "blue" nodes on the path as possible, or max. 2 "red" edges, or a combination of those things. Of course there are also the usual edge costs, which have to be minimized in addition to the fixed path constraints.
How is this kind of problem called, or what do I have to search for?
Best regards
Mark
I do not think that a name for such a general problem exists. However, I'm pretty certain you can re-model your graph and solve this problem via a simple Dijkstra search:
Trying to avoid certain (type of) vertex: Say you have a vertex that is to be avoided, and that has k neighbors. Replace it by a K_k (i.e. a clique with k vertices), and connect each neighbor to one of the k new vertices. Then set the weight of all the clique-edges to something large. Now every path passing over the original vertex will have to pass through the clique and "pay the fee", i.e. it will be avoided, if possible
Trying to avoid certain edges: Just raise their edge weight accordingly
Then, run a simple Dijkstra search. If you have multiple classes that are to be avoided, you can even set the weights as to determine priorities for avoiding each of them..
Hope that helps,
Lukas

Neo4j query for shortest path stuck (Do not work) if I have 2way relationship in graph nodes and nodes are interrelated

I made relation graph two relationship, like if A knows B then B knows A, Every node has unique Id and Name along with other properties.. So my graph looks like
if I trigger a simple query
MATCH (p1:SearchableNode {name: "Ishaan"}), (p2:SearchableNode {name: "Garima"}),path = (p1)-[:NAVIGATE_TO*]-(p2) RETURN path
it did not give any response and consumes 100% CPU and RAM of the machine.
UPDATED
As I read though posts and from comments on this post I simplified the model and relationship. Now it ends up to
Each relationship has different weights, to simplify consider horizontal connections weight 1, vertical weights 1 and diagonal relations have weights 1.5
In my database there are more than 85000 nodes and 0.3 Million relationships
Query with shortest path is not ends up to some result. It stuck in the processing and CPU goes to 100%
im afraid you wont be able to do much here. your graph is very specific, having a relation only to closest nodes. thats too bad cause neo4j is ok to play around the starting point +- few relations away, not over whole graph with each query
it means, once, you are 2 nodes away, the computational complexity raises up to:
8 relationships per node
distance 2
8 + 8^2
in general, the top complexity for a distance n is
O(8 + 8^n) //in case all affected nodes have 8 connections
you say, you got like ~80 000 of nodes.this means (correct me if im wrong), the longest distance of ~280 (from √80000). lets suppose your nodes
(p1:SearchableNode {name: "Ishaan"}),
(p2:SearchableNode {name: "Garima"}),
to be only 140 hopes away. this will create a complexity of 8^140 = 10e126, im not sure if any computer in the world can handle this.
sure, not all nodes have 8 connections, only those "in the middle", in our example graph it will have ~500 000 relationships. you got like ~300 000, which is maybe 2 times less so lets supose the overal complexity for an average distance of 70 (out of 140 - a very relaxed bottom estimation) for nodes having 4 relationships in average (down from 8, 80 000 *4 = 320 000) to be
O(4 + 4^70) = ~10e42
one 1GHz CPU should be able to calculate this by:
-1000 000 per second
10e42 == 10e36 * 1 000 000 -> 10e36 seconds
lets supose we got a cluster of 100 10Ghz cpu serves, 1000 GHz in total.
thats still 10e33 * 1 000 000 000 -> 10e33seconds
i would suggest to just keep away from AllshortestPaths, and look only for the first path available. using gremlin instead of cypher it is possible to implement own algorithms with some heuristics so actually you can cut down the time to maybe seconds or less.
exmaple: using one direction only = down to 10e16 seconds.
an example heuristic: check the id of the node, the higher the difference (subtraction value) between node2.id - node1.id, the higher the actual distance (considering the node creation order - nodes with similar ids to be close together). in that case you can either skip the query or just jump few relations away with something like MATCH n1-[:RELATED..5]->q-[:RELATED..*]->n2 (i forgot the syntax of defining exact relation count) which will (should) actually jump (instantly skip to) 5 distances away nodes which are closer to the n2 node = complexity down from 4^70 to 4^65. so if you can exactly calculate the distance from the nodes id, you can even match ... [:RELATED..65] ... which will cut the complexity to 4^5 and thats just matter of miliseconds for cpu.
its possible im completely wrong here. it has been already some time im our of school and would be nice to ask a mathematician (graph theory) to confirm this.
Let's consider what your query is doing:
MATCH (p1:SearchableNode {name: "Ishaan"}),
(p2:SearchableNode {name: "Garima"}),
path = (p1)-[:NAVIGATE_TO*]-(p2)
RETURN path
If you run this query in the console with EXPLAIN in front of it, the DB will give you its plan for how it will answer. When I did this, the query compiler warned me:
If a part of a query contains multiple disconnected patterns, this
will build a cartesian product between all those parts. This may
produce a large amount of data and slow down query processing. While
occasionally intended, it may often be possible to reformulate the
query that avoids the use of this cross product, perhaps by adding a
relationship between the different parts or by using OPTIONAL MATCH
You have two issues going on with your query - first, you're assigning p1 and p2 independent of one another, possibly creating this cartesian product. The second issue is that because all of your links in your graph go both ways and you're asking for an undirected connection you're making the DB work twice as hard, because it could actually traverse what you're asking for either way. To make matters worse, because all of the links go both ways, you have many cycles in your graph, so as cypher explores the paths that it can take, many paths it will try will loop back around to where it started. This means that the query engine will spend a lot of time chasing its own tail.
You can probably immediately improve the query by doing this:
MATCH p=shortestPath((p1:SearchableNode {name:"Ishaan"})-[:NAVIGATE_TO*]->(p2:SearchableNode {name:"Garima"}))
RETURN p;
Two modifications here - p1 and p2 are bound to each other immediately, you don't separately match them. Second, notice the [:NAVIGATE_TO*]-> part, with that last arrow ->; we're matching the relationship ONE WAY ONLY. Since you have so many reflexive links in your graph, either way would work fine, but either way you choose you cut the work the DB has to do in half. :)
This may still perform not so great, because traversing that graph is still going to have a lot of cycles, which will send the DB chasing its tail trying to find the best path. In your modeling choice here, you usually shouldn't have relationships going both ways unless you need separate properties on each relationship. A relationship can be traversed in both directions, so it doesn't make sense to have two (one in each direction) unless the information that relationship is capturing is semantically different.
Often you'll find with query performance that you can do better by reformulating the query and thinking about it, but there's major interplay between graph modeling and overall performance. With the graph set up with so many bi-directional links, there will only be so much you can do to optimize path-finding.
MATCH (p1:SearchableNode {name: "Ishaan"}), (p2:SearchableNode {name: "Garima"}),path = (p1)-[:NAVIGATE_TO*]->(p2) RETURN path
Or:
MATCH (p1:SearchableNode {name: "Ishaan"}), (p2:SearchableNode {name: "Garima"}), (p1)-[path:NAVIGATE_TO*]->(p2) RETURN path

ArangoDB GRAPH TRAVERSAL through specific nodes

Take US cities for example and say I want the traversal of all cities and roads that go through NYC, Chicago and Seattle.
This can be done with TRAVERSAL AQL function (using filterVertices). However this function only takes the ID and not the vertex example as in GRAPH_TRAVERSAL.
The GRAPH_TRAVERSAL doesn't have a filter option, so my question is there a way to filter the results using graph operations?
the feature is actually there but was somehow not documented. I added it to our documentation which should be updated soon. Sorry for the inconvenience.
filterVertices takes a list of vertex examples.
Note: you can also give the name of a custom AQL function. with signature function(config, vertex, path). For more specific filtering.
vertexFilterMethod defines what should be done with all other vertices:
"prune" will not follow edges attached to these vertices. (Used here)
"exclude" will not include this specific vertex.
["prune", "exclude"] both of the above. (default)
An example query for your question is the following (airway is my graph):
FOR x in GRAPH_TRAVERSAL("airway", "a/SFO", "outbound", {filterVertices: [{_key: "SFO"}, {_key: "NYC"}, {name: "Chicago"}, {name: "Seattle"}], vertexFilterMethod: "prune"}) RETURN x
Hint: Make sure you include the start vertex in the filter as well. Otherwise it will always return with an empty array (the first visited vertex is directly pruned)

Resources