Dse graph indexing on vertex label using Java - cassandra

I am using Datastax Dse grap (5.x) java driver version 1.1.1 beta.
My use case is that I can not allow more than one vertex to have same vertex label.
For that I want to create index on vertex label.
I found this below code on Datastax official website schema.vertexLabel('recipe').index('byRecipe').secondary().by('name').add()
But, this datastax tutorial lacks two things
How to create index on vertex label
How to execute this using java
My question is how to index Dse graph on Vertex label using java?

To implement this, you would execute the example as a graph statement. It is typically recommended to create your schema outside of your application, traversal code base.

If you are correct in stating that "you cannot allow more than one vertex to have the same vertex label", I think you need to reconsider your data model. A vertex label is intended to identify a group of vertices, with a vertex property distinguishing several vertices from one another.
If you created a vertex label "vtype" and a property "name" that identified each instance "vtype1, vtype2, etc.", then the index could be: schema.vertexLabel('vtype').index('byVType').secondary().by('name').add()

Related

How to traverse without indexed value?

I have a scenario that I want to query for getting all the vertices which are not having a specific property key and value.So, what I have tried is
g.V().hasLabel('Persona').hasNot('isTempDob').count()
resulted in
Could not find a suitable index to answer graph query and graph scans are disabled: [(~label = Persona)]:VERTEX
So I have tried for the simpler one by following the documentation like getting the count of vertices with label name Persona. The query is
g.V('Persona').count()
resulted in
0
I don't know why I am not able to traverse without any indexed value? Any help would be greatly appreciated.Thanks in advance.
Perhaps it is just a mis-type but g.V('Persona').count() does not search labels - it looks for vertices with a unique identifier of "Persona". As a result I suspect that the answer of "0" was correct.
I don't know why I am not able to traverse without any indexed value?
JanusGraph (and other graph systems) prevent traversals without indexed values as a guardrail to prevent users from accidentally executing long run queries. If you have a small graph with just thousands vertices then it's likely you find this feature an annoyance, but when you have billions of vertices it can be a great helper as such a query as the one you've written there would take hours and hours to complete and would not be the recommended approach to get that answer. If you are doing a global scan of all the vertices in the graph then you should be executing your Gremlin over Spark to enjoy it's parallel processing capabilities, turning those hours and hours into minutes.
Going back to the case of a small graph, you can temporarily disable the guardrail by configuring JanusGraph with query.force-index=false to allow the query to execute. That should allow your query to work without that error/warning.
I tested in the gremlin console docker.
The identifier is a unique ID for each vertex. When creating a vertex, you can either specify an ID or generate one automatically. For example,
gremlin> saturn = g.addV("character").property(T.id, 1).property('name', 'saturn').property('age', 10000).property('type', 'titan').next();
==>v[1]
If you create a vertex in the above way, you'll get v[1] as the ID cause you specify it.
The following didn't specify ID, then gremlin will generate one for you:
gremlin> g.addV("character").property('name', 'prometheus').property('age', 1000).property('type', 'god').next()
==>v[0]
Assume it's first vertex, then its ID will be 0.
In your scenario, if you are querying all vertices have the same tag and exclude a specified name, you can try this(I'm using the JanusGraph toy graph gods):
gremlin> g.V().has('character','name',neq('hercules')).valueMap();
==>[name:[prometheus],type:[god],age:[1000]]
==>[name:[saturn],type:[titan],age:[10000]]
==>[name:[jupiter],type:[god],age:[5000]]
==>[name:[neptune],type:[god],age:[4500]]
==>[name:[alcmene],type:[human],age:[45]]
==>[name:[pluto],type:[god],age:[4000]]
==>[name:[nemean],type:[monster],age:[20]]
==>[name:[hydra],type:[monster],age:[0]]
==>[name:[cerberus],type:[monster],age:[0]]
The above query finds all vertices tagged character and exclude name hercules. Or you can use the following query to do the same thing.
gremlin> g.V().hasLabel('character').has('name',neq('hercules')).valueMap();
The reason your query didn't work is that hasNot(key) must be used with key rather than values.
Let me know if it helps.

What is the difference between gremlin in neptune to azure selecting a single vertex?

we switched our database from azure to neptune. In azure you could select one vertex and the gremlinquery returned the id, the label and all properties of this vertex. If you do the same on neptune, just the id and the label is returned. How can I get neptune to return the id, the label and all properties of a vertex? Is there a option you can choose in the neptune configuration? If there is no option, which query I have to execute to get the id, the label and all properties of a vertex?
The difference might have to do with a number of things. First thing that comes to mind is that if you went from CosmosDB to Neptune you might be using bytecode based traversals in which case they don't return properties (just references which means id and label as you are seeing). If you didn't switch then it's possible that Neptune may be more aligned with TinkerPop in terms of serialization semantics which calls for references only in newer versions.
Either way, it's considered a best practice to only return the data that you need in the form you want it, rather than a graph element with all properties. The reasoning is similar to why you wouldn't do SELECT * FROM table in SQL - you would specify the column names.

Why does inserting a vertex with the same property not throw an error in DSE Graph?

I am using Dse graph version 5.x.
I have created a schema using Dse studio which uses gremlin query.
What I am trying to do is:
I want to index my graph based on vertex property called 'name'
Here is what I get when I do schema.describe()
Here is what I get when I do g.V()
As you can clearly see, I have index my vertex label type with property name.
But when I insert multiple vertex (of label type) with same name it accept it without error.
Ideally because of indexing it should show error on inserting vertex with same property `name'.
indexing in DSE Graph is a performance optimization operation not a Referential Integrity operation. Currently there is no mechanism that will "reject" creating a new index if one exists with the same property. We have this feature request on our roadmap. In the interim, it is possible to achieve "upsert" style semantics with DSE Graph by leveraging Custom IDs as described here - http://docs.datastax.com/en/latest-dse/datastax_enterprise/graph/using/createCustVertexId.html?hl=custom%2Cid

Titan+Cassandra and String Vertex Ids

It looks like we can get Titan 1.0 use custom long ids by setting "graph.set-vertex-id" to true. Is there some way to use non-long (i.e. String) ids as Vertex Ids? Seeing that the Tinkerpop api supports Strings, and there's a feature called "StringIds", is there some way of enabling that feature? I'm using Titan with Cassandra.
I think this goes against Titan's internal structure. One of the Titan devs recommends here to just use your own indexed property. This is reiterated here and here stating that unique indexed properties should be used.
I think the reason for this is that the internal ids actually refer to locations on the system. As stated here:
The (64 bit) vertex id (which Titan uniquely assigns to every vertex) is the key which points to the row containing the vertex’s adjacency list.
No, String identifiers are not supported in the StandardTitanGraph.features(). You could consider using an indexed String property as an alternative.

Blazegraph Tinkerpop 3 Indexing

I am trying to learn about Blazegraph. At the moment I am puzzled how I can optimise simple lookups.
Suppose all my vertices have a property id, which is unique. This property is set by the user. Is there any way to speed up finding a vertex of a particular id while still sticking to the Tinkerpop APIs?
Is the search API defined here the only way?
My previous experience is in TitanDB and in Titan's case it's possible to define an index which the Tinkerpop APIs integrate with flawlessly. Is there any way to achieve the same results in Blazegraph without using the Search API?
Whether a mid-traversal V() uses an index or not, depends on a)
whether suitable index exists and b) if the particular graph system
provider implemented this functionality.
Gremlin (Tinkerpop) does not specify how to set indexes although the documentation presents things like the following
graph.createIndex("username",Vertex.class)
But may be reserved for the ThinkerGraph implementation, as a matter of fact it says
Each graph system will have different mechanism by which indices and
schemas are defined. TinkerPop3 does not require any conformance in
this area. In TinkerGraph, the only definitions are around indices.
With other graph systems, property value types, indices, edge labels,
etc. may be required to be defined a priori to adding data to the
graph.
There is an example for Neo4J
TinkerPop3 does not provide method interfaces for defining
schemas/indices for the underlying graph system. Thus, in order to
create indices, it is important to call the Neo4j API directly.
But the code is very specific for that plugin
graph.cypher("CREATE INDEX ON :person(name)")
Note that for BlazeGraph the search uses a built in full-text index

Resources