I would like to store data that is not directly related into seperate named graphs in ArangoDB. However, there might be cases where I would want to query data from more than one of these graphs at a time.
I know that you can perform a graph traversal as below, particularly using the 'GRAPH' keyword, but is it possible to do something like 'GRAPH graphName1, graphName2' to query both at the same time?
FOR vertex[, edge[, path]]
IN [min[..max]]
OUTBOUND|INBOUND|ANY startVertex
GRAPH graphName
[OPTIONS options]
I know I could "union" the results of multiple of the above graph traversals, but given that only the graphName would be different, it would be great if I could make it concise instead of repeating redundant code.
You cannot traverse multiple named graphs in a traversal. Instead you can either:
Create anew named graph with all the necessary vertex and edges
Or (better)
Traverse the collections directly without using named graphs. The performance is the same. Here is the syntax:
FOR vertex[, edge[, path]]
IN [min[..max]]
OUTBOUND|INBOUND|ANY startVertex
edgeCollection1, ..., edgeCollectionN
[PRUNE pruneCondition]
[OPTIONS options]
Related
I have a graph representing levels in a game. Some examples of types of nodes I have are "Level", "Room", "Chair". The types are represented by VertexCollections where one collection is called "Level" etc. They are connected by edges from separate EdgeCollections, for example "Level" -roomEdge-> "Room" -chairEdge-> "Chair". Rooms can also contain rooms etc. so the depth is unknown.
I want to start at an arbitrary "Level"-vertex and traverse the whole subtree and find all chairs that belong to the level.
I'm trying to see if ArangoDB would work better than OrientDB for me, in OrientDB I use the query:
SELECT FROM (TRAVERSE out() FROM startNode) WHERE #class = 'Chair'
I have tried the AQL query:
FOR v IN 1..6 OUTBOUND #startVertex GRAPH 'testDb' FILTER IS_SAME_COLLECTION('Chair', v) == true RETURN v;
It does however seem to be executing much slower compared to the OrientDB query(~1 second vs ~0.1 second).
The code im using for the query is the following:
String statement = "FOR v IN 1..6 OUTBOUND #startVertex GRAPH 'testDb' FILTER IS_SAME_COLLECTION('Chair', v) == true RETURN v";
timer.start();
ArangoCursor<BaseDocument> cursor = db.query(statement, new MapBuilder().put("startVertex", "Level/"+startNode.getKey()).get(), BaseDocument.class);
timer.saveTime();
Both solutions are running on the same hardware without any optimization done, both databases are used "out of the box". Both cases use the same data (~1 million vertices) and return the same result.
So my question is if I'm doing things correctly in my AQL query or is there a better way to do it? Have I misunderstood the concept of VertexCollections and how they're used?
Is there a reason you have multiple collections for each entity type, e.g. one collection for Rooms, one for Levels, one for Chairs?
One option is to have a single collection that contains your entities, and you identify the type of entity it is with a type: "Chair" or type: "Level" key on the document.
You then have a single relationship collection, that holds edges both _to and _from the entity collection.
You can then start at a given node (for example a Level) and find all entities of type Chair that it is connected to with a query like:
FOR v, e, p IN 1..6 OUTBOUND Level_ID Relationship_Collection
FILTER p.vertices[-1].Type == 'Chair'
RETURN v
You could return v (final vertex) or e (final edge) or p (all paths).
I'm not sure you need to use a graph object, rather use a relationships collection that adds relationships to your entity collection.
Graphs are good if you need them, but not necessary for traversal queries. Read the documentation at ArangoDB to see if you need them, usually I don't use them as using a graph can slow performance down a little bit.
Remember to look at indexes, and use the 'Explain' feature in the UI to see how your indexes are being used. Maybe add a hash index to the 'Type' key.
Take US cities for example and say I want the traversal of all cities and roads that go through NYC, Chicago and Seattle.
This can be done with TRAVERSAL AQL function (using filterVertices). However this function only takes the ID and not the vertex example as in GRAPH_TRAVERSAL.
The GRAPH_TRAVERSAL doesn't have a filter option, so my question is there a way to filter the results using graph operations?
the feature is actually there but was somehow not documented. I added it to our documentation which should be updated soon. Sorry for the inconvenience.
filterVertices takes a list of vertex examples.
Note: you can also give the name of a custom AQL function. with signature function(config, vertex, path). For more specific filtering.
vertexFilterMethod defines what should be done with all other vertices:
"prune" will not follow edges attached to these vertices. (Used here)
"exclude" will not include this specific vertex.
["prune", "exclude"] both of the above. (default)
An example query for your question is the following (airway is my graph):
FOR x in GRAPH_TRAVERSAL("airway", "a/SFO", "outbound", {filterVertices: [{_key: "SFO"}, {_key: "NYC"}, {name: "Chicago"}, {name: "Seattle"}], vertexFilterMethod: "prune"}) RETURN x
Hint: Make sure you include the start vertex in the filter as well. Otherwise it will always return with an empty array (the first visited vertex is directly pruned)
the job of the layout is to place vertexes at given locations. if the layout is iterative, then the layout's job is to iterate through an algo, moving the vertexes with each step, until the final layout configuration is achieved.
I have a multi-level graph - say 100 objects of type A; each A object has 10 objects as children; call the children type B objects.
I would like the layout location placement algos to operate on objects of type A only (let's say) - and ignore the B objects.
The cleanest way to achieve this objective might be to define a transform to expose those elements that should participate in the 'algo' placement operation via the step method.
Currently, the step methods, assuming they respect the lock flag at all, do their calculations including the locked vertexes first - so lock/unlock won't work in this case.
Is it possible to do this somehow without resorting to multiple graph objects?
If you want to ignore the B objects entirely, then the simplest option is to create a graph consisting only of the A objects, lay it out, and use the locations from that layout.
That said, it's not clear how you intend to assign locations to the B objects. And if the A objects aren't connected to each other at all, then this approach won't make much sense. (OTOH, if they aren't connected to each other then you're really just laying out a bunch of trees.)
I have the following information in a Titan Graph database.I am trying to make sense of the information by sending queries across gremlin shell.The Graph database that I am trying to investigate models a Network.There are two types of vertices
- `Switch`
- `Port`
I am trying to figure out the relationship between these two types of vertices.
g = TitanFactory.open("/tmp/cassandra.titan")
To see the list of vertices of each type
$ g.V('type', 'switch')
==>v[228]
==>v[108]
==>v[124]
==>v[92]
==>v[156]
==>v[140]
$ g.V('type', 'port')
==>v[160]
==>v[120152]
==>v[164]
==>v[120156]
==>v[560104]
==>v[680020]
==>v[680040]
==>v[112]
==>v[120164]
==>v[560112]
==>v[680012]
==>v[680004]
==>v[144]
==>v[680032]
==>v[236]
==>v[100]
==>v[560128]
==>v[128]
==>v[680028]
==>v[232]
==>v[96]
To find the relation between the switch and port.
g.v(108).out
==>v[560104]
==>v[680004]
==>v[112]
What is this "out"? As I understand there is a outward arrow pointing from Switch represented by vertex 108 to the Ports represented by vertices 560104 680004 and 112
What is this in and out? Is it something very specific to Graph Databases? Also what is a label in a graph databse? Are in and out labels?
The use of in and out is descriptive of the direction of the edge going from one vertex to another. In your case, you have this:
switch --> port
When you write:
g.v(108).out
you are telling Gremlin to find the vertex at 108, then walk along edges that point out or away from it. You might also think of out as starting from the tail of the arrow and walking to the head. Given your schema, those lead to "ports".
Similarly, in simply means to have Gremlin walk along edges that point in to the vertex. You might also think of in as starting from the head of the arrow and walking to the tail. Given your schema, switches will have no in edges and hence will always return no results. However if you were to start from a "port" vertex and traverse in:
g.v(560104).in
you would at least get back vertex 108 as vertex "560104" has at least one edge with an arrow pointing to it (given what I know of your sample data).
By now you've gathered that in and out are "directions" and not "labels". A label has a different purpose; it categorizes an edge. For example, you might have the following schema:
switch --connectsTo--> port
company --manufactures--> switch
switch --locatedIn--> rack
In other words you might have three edge labels representing different ways that a "switch" relates to other parts of your schema. In this way your queries can be more descriptive about what you want. Given your previous example and this revised schema you would have to write the following to get the same result you originally showed:
g.v(108).out("connectsTo")
==>v[560104]
==>v[680004]
==>v[112]
Graph databases will typically take advantage of these labels to help improve performance of queries.
How do you define a directed acyclic graph (DAG) (of strings) (with one root) best in Haskell?
I especially need to apply the following two functions on this data structure as fast as possible:
Find all (direct and indirect) ancestors of one element (including the parents of the parents etc.).
Find all (direct) children of one element.
I thought of [(String,[String])] where each pair is one element of the graph consisting of its name (String) and a list of strings ([String]) containing the names of (direct) parents of this element. The problem with this implementation is that it's hard to do the second task.
You could also use [(String,[String])] again while the list of strings ([String]) contain the names of the (direct) children. But here again, it's hard to do the first task.
What can I do? What alternatives are there? Which is the most efficient way?
EDIT: One more remark: I'd also like it to be defined easily. I have to define the instance of this data type myself "by hand", so i'd like to avoid unnecessary repetitions.
Have you looked at the tree implemention in Martin Erwig's Functional Graph Library? Each node is represented as a context containing both its children and its parents. See the graph type class for how to access this. It might not be as easy as you requested, but it is already there, well-tested and easy-to-use. I have used it for more than a decade in a large project.