Blazegraph Tinkerpop 3 Indexing - tinkerpop3

I am trying to learn about Blazegraph. At the moment I am puzzled how I can optimise simple lookups.
Suppose all my vertices have a property id, which is unique. This property is set by the user. Is there any way to speed up finding a vertex of a particular id while still sticking to the Tinkerpop APIs?
Is the search API defined here the only way?
My previous experience is in TitanDB and in Titan's case it's possible to define an index which the Tinkerpop APIs integrate with flawlessly. Is there any way to achieve the same results in Blazegraph without using the Search API?

Whether a mid-traversal V() uses an index or not, depends on a)
whether suitable index exists and b) if the particular graph system
provider implemented this functionality.
Gremlin (Tinkerpop) does not specify how to set indexes although the documentation presents things like the following
graph.createIndex("username",Vertex.class)
But may be reserved for the ThinkerGraph implementation, as a matter of fact it says
Each graph system will have different mechanism by which indices and
schemas are defined. TinkerPop3 does not require any conformance in
this area. In TinkerGraph, the only definitions are around indices.
With other graph systems, property value types, indices, edge labels,
etc. may be required to be defined a priori to adding data to the
graph.
There is an example for Neo4J
TinkerPop3 does not provide method interfaces for defining
schemas/indices for the underlying graph system. Thus, in order to
create indices, it is important to call the Neo4j API directly.
But the code is very specific for that plugin
graph.cypher("CREATE INDEX ON :person(name)")
Note that for BlazeGraph the search uses a built in full-text index

Related

With ArangoDB what is the practical difference between 1 named graph with x edge definitions vs x named graphs with 1 edge definition?

Is the difference only logical / housekeeping related?
I did read this question but the answer there only deals with 1 edge definition vs multiple edge definitions within a graph which is now already covered in the documentation. So I'm curious.
I have used Arango for 6 years and don't use Graph objects, all my queries are just AQL queries, which means you don't need to use a Graph to use the benefits of graph databases and to perform traversals.
The way I think of a 'Graph' in Arango is that it's a limited / curated view of your collections that is query-able, but also is helpful if you want it to manage some level of integrity on deletes.
Overall, it slows down a Traversal, so I find it better to avoid them. A key driver for my decision is that I don't need views, and I don't need the system to handle the deletion of edges if I delete a connected vertex, but that's just my use case.

Arangodb using Java API: when a graph is created do all Edges need to be defined already?

As far as I can tell, you must specify the edge definitions at creation time and there does not seem to be a method for adding an edge definition later. But I also see examples written in Javascript (I think) where edge definitions can be added later. Am I right about this Java limitation and does that suggest that Javascript might be a better choice for programming language to interact with ArangoDB?
EDIT: Could the edgeDefinitions Collection be added to after the graph is created?
EDIT: Seems to me that since the Java API is making REST calls, adding to the Collection later would not work at all.
It is possible to add an edge definition to an existing graph by using the method addEdgeDefinition of the ArangoDB-Java-Driver.
An example is listed in the Java Driver documentation.
Similar it is possible to replace/remove an edge definition byreplaceEdgeDefinition/removeEdgeDefinition.

Titan+Cassandra and String Vertex Ids

It looks like we can get Titan 1.0 use custom long ids by setting "graph.set-vertex-id" to true. Is there some way to use non-long (i.e. String) ids as Vertex Ids? Seeing that the Tinkerpop api supports Strings, and there's a feature called "StringIds", is there some way of enabling that feature? I'm using Titan with Cassandra.
I think this goes against Titan's internal structure. One of the Titan devs recommends here to just use your own indexed property. This is reiterated here and here stating that unique indexed properties should be used.
I think the reason for this is that the internal ids actually refer to locations on the system. As stated here:
The (64 bit) vertex id (which Titan uniquely assigns to every vertex) is the key which points to the row containing the vertex’s adjacency list.
No, String identifiers are not supported in the StandardTitanGraph.features(). You could consider using an indexed String property as an alternative.

What is the difference between the AQL EDGES, NEIGHBORS, etc. and GRAPH_EDGES, GRAPH_NEIGHBORS

In ArangoDB, there seem to be two set of functions for working with graphs. One one side you have EDGES, NEIGHBORS, TRAVERSAL, SHORTEST_PATH and more (https://docs.arangodb.com/Aql/GraphFunctions.html).
OTOH there are the graph operations (https://docs.arangodb.com/Aql/GraphOperations.html) that seems to have the same functions prefixed by GRAPH and with some different parameters, such as GRAPH_EDGES, GRAPH_NEIGHBORS, GRAPH_TRAVERSAL, GRAPH_SHORTEST_PATH.
What is the difference between these. Are they used in different scenarios? Are there performance differences, etc?
There is no general recommendation which to choose over the other - it depends on your requirements.
The EDGES functions may work on collections that are not managed by the graph module, and thus may not be visible in the graph viewer (but you may use them on collections that are also managed). It however has lesser overhead by not doing graph management.
The GRAPH_EDGES family is the more recent implementation. It only works on managed graphs that you can also browse in the graph viewer. As you already noted, the later have many more options to i.e. filter the graphs by examples etc.
With ArangoDB 3 the GRAPH_* family of functions was removed. We explain in this cookbook how their functionality can be achieved with AQL in ArangoDB 3.

CouchDB query for more dynamic values

I have more "Location documents" in my couchdb with longitude and latitude fields. How to find all location documents in database which distance to provided latitude and longitude is less than provided distance.
There is a way how to achieve it using vanilla CouchDB, but it‘s bit tricky.
You can use the fact you can apply two map functions during one request. Second map function can be created using list mechanics.
Lists are not very efficient from computational side, they can‘t cache results as views. But they have one unique feature – you can pass several arguments into list. Moreover, one of your arguments can be, for example, JS code, that is eval-ed inside list function (risky!).
So entire scheme looks like this:
Make view, that performs coarse search
Make list, that receives custom params and refines data set
Make client-side API to ease up querying this chain.
Can‘t provide exact code for your particular case, many details are not clear, but it seems that coarse search must group results to somehow linearly enumerated squares, and list perform more precise calculations.
Please note, that scheme might be inefficient for large datasets since it‘s computationally hungry.
Vanilla CouchDB isn't really built for geospacial queries.
Your best bet is to either use GeoCouch, CouchDB-Lucene or something similar.
Failing that, you could emit a Geohash from your map function, and do range queries over those.
Caveats apply. Queries around Geohash "fault lines" (equator, poles, longitude 180, etc) can give too many or too little results.
There are multiple JavaScript libraries that can help convert to/from Geohash, as well as help with some of those caveats.
CouchDB is not built for dynamic queries, so there is no good/fast way of implementing it in vanilla couchDB.
If you know beforehand which locations you want to calculate the distance from you could create a view for each location and call it with parameters ?startkey=0&endkey=max_distance
function(doc) {
function distance(...){ /* your function for calculating distance */ }
var NY = {lat:40,lon:73}
emit( distance(NY,doc), doc._id);
}
If you do not know the locations beforehand you could solve it by using a temporary view, but I would strongly advise against it since it's slow and should only be used for testing.

Resources