Arangodb using Java API: when a graph is created do all Edges need to be defined already? - arangodb

As far as I can tell, you must specify the edge definitions at creation time and there does not seem to be a method for adding an edge definition later. But I also see examples written in Javascript (I think) where edge definitions can be added later. Am I right about this Java limitation and does that suggest that Javascript might be a better choice for programming language to interact with ArangoDB?
EDIT: Could the edgeDefinitions Collection be added to after the graph is created?
EDIT: Seems to me that since the Java API is making REST calls, adding to the Collection later would not work at all.

It is possible to add an edge definition to an existing graph by using the method addEdgeDefinition of the ArangoDB-Java-Driver.
An example is listed in the Java Driver documentation.
Similar it is possible to replace/remove an edge definition byreplaceEdgeDefinition/removeEdgeDefinition.

Related

How to overwrite generic ODATA expand handling functionality

We are currently working on performance issues with our provided OData interface, since the UI5 issues a read request with multiple expand paths attached. Due to the generic handling of the request by the framework this leads to an additional processing per expand option, which we need to prevent.
Reading the blog about this topic there seems to be a way to overwrite the generic handling somehow:
https://blogs.sap.com/2018/03/19/sap-cloud-platform-sdk-for-service-development-create-odata-service-7-more-navigation-read-create-expand-sqo/
In this case it is us who need to decide if we can afford to rely on the FWK-functionality. Of course, such generic support cannot be performant. But for small amount of data it is just nice to get it for free.
Stay tuned to learn how to overwrite such generic FWK-functionality with own specific implementation.
However, there is no further blog post on this and looking through the framework, my only idea to overwrite this would be to configure and use an own com.sap.gateway.core.api.provider.data.IDataProvider implementation which handles the request in a custom way, although this would be an immense workaround.
So the questions is if there is some leaner or easier approach to overwriting this functionality which I missed?
UPDATE:
I was update to create a custom data provider and register it with the RuntimeDelegate after servlet initialization. This custom data provider would then check for a custom annotation on the mapped method handler to see if expand should be handled or not. If not it will just read the entity, but not perform he generic expanded read. This works more or less fine, but what is of course missing is a way to pass the properties to be expanded in the ReadRequest. So far only a static implementation is possible solving our performance problem, but I would gladly have a hint if there is another, better solution for this...
At the time of this writing, no better approach exists at the moment.

Google Datastore returns incomplete data via official client library for nodejs

Here some information about context of the problem I facing:
we have a semi-structured (JSON from node.js backend) data in datastore.
after saving an entity,
and getting a list of entities about them soon and even a while later,
returned data does not have one indexed property
I can find the entity by that property value.
I use Google Datastore via node.js client library. #google-cloud/datastore: "^2.0.0".
How it can be possible? I understood when due to eventual consistency some updates can be incompletely written etc. But when I getting same inconsistency - lack of whole property of entity saved e. g. hour ago?
I gone through scenario multiple times for same kind multiple times.
I do not have such issues with other kinds or other properties of that kind.
How I can avoid this type of issues with Google Datastore?
Answer for anyone who may encounter with such issue.
We mostly do not use any DTO (data-transfer objects) or any other wrappers for most of our kinds in this project, but for this one a DTO has been used, mostly to be sure the result objects have default values for properties omitted/absent in entity which usually happens for entities created by older version of code.
After reviewing my own code more carefully, I found a piece of code which is out of sync with other related pieces of code - there was no a line to copy this property from entity to the DTO object.
Side note: Actually all this situation remind me a story or meme about a guy who claimed he found a bug in compiler just because he was not able to find a mistake he made in his code.

Combine Traversal API and Gremlin

This might be a silly question.
I like traversal API because of the type safety I get from within my Java program, However am exploring the possibility of using it in conjunction with Gremlin API.
Something like roughly below. Am guessing this is not possible, But would like to know.
GraphTraversalSource g; // Get reference
g.V().has('author', 'name', 'Duke').injectGrooovy('SomeExternalGroovy').toList()
I'll start with a side-note....I wouldn't make a distinction between "Traversal API" and "Gremlin API" - that's not a comparison we typically make. There is a distinction between the Traversal API and the Graph API. The Traversal API is for users (like you) and the Graph API is for graph providers (like DSE Graph) who want to become TinkerPop enabled. The Traversal API is initialized through GraphTraversalSource, typically named g whereas the Graph API is initialized through the Graph, typically named graph. You can see the javadoc for related classes in the Traversal API here and the Graph API here. The Traversal API forms the steps that we think of in the Gremlin language.
The equivalent to injectGroovy("SomeExternalGroovy") is likely something akin to:
g.V().has('author','name','Duke').map(Lambda.function("it.get().value('name')"))
It basically passes a Groovy closure as a string into the map() step and the server will evaluate the string in the context of the server when it executes the traversal. I am assuming here of course that you are creating g through the DseGraph class like this:
GraphTraversalSource g = DseGraph.traversal();
If you were just submitting a string then you could just use Groovy directly do:
dseSession.executeGraph("g.V().has('author','name','Duke').map{it.get().value('name')"};
Note that you will need to enable lambdas in DSE Graph for any of this to work, by issuing a command like:
graph.schema().config().option("graph_name.traversal_sources.g.restrict_lambda").set(false)
Before you do any of that though you should ask yourself why you need to use a lambda. TinkerPop generally advises that you avoid lambdas in your Gremlin and only use them when there are no other options available to you. Gremlin is quite expressive and in most cases you can usually find appropriate Gremlin steps to replicate what you are doing in a lambda.

Blazegraph Tinkerpop 3 Indexing

I am trying to learn about Blazegraph. At the moment I am puzzled how I can optimise simple lookups.
Suppose all my vertices have a property id, which is unique. This property is set by the user. Is there any way to speed up finding a vertex of a particular id while still sticking to the Tinkerpop APIs?
Is the search API defined here the only way?
My previous experience is in TitanDB and in Titan's case it's possible to define an index which the Tinkerpop APIs integrate with flawlessly. Is there any way to achieve the same results in Blazegraph without using the Search API?
Whether a mid-traversal V() uses an index or not, depends on a)
whether suitable index exists and b) if the particular graph system
provider implemented this functionality.
Gremlin (Tinkerpop) does not specify how to set indexes although the documentation presents things like the following
graph.createIndex("username",Vertex.class)
But may be reserved for the ThinkerGraph implementation, as a matter of fact it says
Each graph system will have different mechanism by which indices and
schemas are defined. TinkerPop3 does not require any conformance in
this area. In TinkerGraph, the only definitions are around indices.
With other graph systems, property value types, indices, edge labels,
etc. may be required to be defined a priori to adding data to the
graph.
There is an example for Neo4J
TinkerPop3 does not provide method interfaces for defining
schemas/indices for the underlying graph system. Thus, in order to
create indices, it is important to call the Neo4j API directly.
But the code is very specific for that plugin
graph.cypher("CREATE INDEX ON :person(name)")
Note that for BlazeGraph the search uses a built in full-text index

What (in_memory) graph DB if modeling data is focused

I am out of ideas and hope to get some useful input. I am using this question to compress my experiences and share them, hoping to inspire some distributors to go the next step with modeling graph databases as a first class question/way.
I've been validating some graph database solutions usable by node.js for a few weeks. My use case is to save interactions of different social user network accounts. The need is to use CPU and memory in the most efficient way.
My most important requirements are:
in_memory (at least for indexing)
open source (and free to use)
same JavaScript/Node.js performance as first class citizen
comfortable query and modeling language
Neo4J
I really like cypher so my best choice would be Neo4j.
But the major issue about Neo4j is the JavaScript access is non-native. It uses the REST-API which is about ten times (10x) slower than direct Java access. So I took a look at node-neo4j-embedded, but it has been inactive for more than two years. It looks like its author isn't active at all (bad sign).
ArangoDB
The really nice core developers of ArangoDB answered to my question about internals. Finally it means JavaScript is first class citizen because native queries can be pushed out of JS. Looking at the open source benchmarks, I think it is fair. But I am afraid they didn't use node-neo4j-embedded for their benchmark. The benchmarks compare the REST-APIs (Edited because of #weinberger comment). I wished they compare the native APIs (maybe someone is snoopy enough and give it a try! - let us know!). Update: As I noticed now, OrientDB has answered the benchmark with a new node.js driver (using Command Cache by starting the server with -Dcommand.cache.enabled=true -Dcommand.cache.minExecutionTime=3, what isn't fair, because it wasn't a query caches benchmark!)
Because I like to use ArangoDB as a graph database I would have 3 choices (source: FAQ):
traverse JS objects
using AQLs graph functions
using the REST API
In general it isn't comfortable like cypher. And I am not sure how to compare and what is the right way modeling data (like Neo4J explains very well). I'd love to have something like this for ArangoDB Graphs. It feels like ArangoDB is focused on graph operations and Neo4J fits more the needs of using graphs if you have more relations than rows (the reason to use graphs instead of relations with joins).
MongoDB
The document based MongoDB isn't optimized for graph operations but latterly has gotten an experimental in_memory storage engine. Also there are some projects either in_memory or graph related but nothing is really compelling. And at this discussion it looks like MongoDB isn't what I like to use.
OrientDB
Because there is a comparison about OrientDB vs. MongoDB available (from OrientDB) I though about to use this one. "OrientDB has a hybrid Document-Graph engine" using SQL. I am a former PHP/MySQL expert. But where is the modeling part ? Their chapter working with graphs is not cypher like. It is like using SQL for Graphs. There is nothing wrong with that, but using cypher before I miss the modeling like feeling.
If someone did a modeling process with OrientDB and Graphs maybe you could write a tutorial like Neo4J had done.
Update: About JavaScript access like first citizen there are news:
"In the next release the speed of this driver will be comparable to the native Java one" The forked node.js driver had bin fixed last days.
Update: Before choosing OrientDB one might want to read article about some issues and discussions linked from there. The article is touching a sensitive issue and should be approached with critical mind. Note from author of this update: I'm new to editing SO and don't have enough reputation to put this to comments. I believe this information is a valid point to discussion, not sure how to place it here according to SO rules.
LokiJS
Before I was looking at Neo4J, ArangoDB and MongoDB, I played around with that JavaScript based in_memory database called LokiJS, what seams to follow the strategy to ignore everything what slows down performance and efficiency. LokiJS is trying to complete the Mongo-Style (RoadMap). The major issue is the bad ability to scale. Of cause it isn't a graph database but it was an interesting solution while the beginning of my project. Also it wasn't a perfect feeling to find all the distributed documentation (maybe they should reboot with GitBook).
Finally LokiJS is a very interesting project at all and I hope they will go forward!
LevelDB
Previously when I wrote my degree paper I was looking at levelDB. Remembering this while writing this post, I searched for LevelDB in_memory and got a promising result called MemDown (see also). I haven't tested this find, but maybe someone has experiences working and modeling for this solution. Maybe it would be the most efficient way if all the others will not fit because I would simply write a lightweight cypher clone with the goal to stay much lightweight as I can do.
Edit: Due to comment, here is a link to LevelGraph. As an idea to implement a CYPHER parser for LevelGraph/LevelDB your starting point would be to compare
Cypher:
CREATE (SUBJECT:"a") - [b:PREDICATE] -> (OBJECT:"c")
RETURN, subject, predicate, object
LevelGraph:
var RETURN = { SUBJECT: "a", PREDICATE: "b", OBJECT: "c" };
db.put(RETURN, function(err) {
// ..
});
Conclusion
As you likely noticed I am not the super hero about graphs. But this is my initial dive into this and I'm trying to get an overview. I assume there are a lot people out there who want to ask the same questions as me but haven't the time. I hope this post will help a lot people and will change by comments and answers to become a well done overview how to modeling data for graphs.
#editors: You are welcome.
#commenters: This is the result of my personal research - if you also have done a journey like me, please answer with a short summary like I have done for each DB I've evaluated (don't forget to target my 4 goals).
The idea to combine node-style performance through any of the native features (e.g. streams) and a high level query language like CYPHER is actually quite neat.
What you likely won't get is any kind of low level API, since this is rather rare with DB authors and, supposedly, not wanted in their design patterns. So, long running tcp connections shall just serve fine.
cypher-stream since to incorporate all of this, while (superficially judged) maintaining a good style.
Since you likely won't get any further with the search, I'd suggest sending him a pull request if any other features are needed :)
You should take a look at Gundb https://github.com/amark/gun
It's open source and has a very active and helpfull lead developer.
Join us at https://gitter.im/amark/gun

Resources