How to traverse without indexed value? - node.js

I have a scenario that I want to query for getting all the vertices which are not having a specific property key and value.So, what I have tried is
g.V().hasLabel('Persona').hasNot('isTempDob').count()
resulted in
Could not find a suitable index to answer graph query and graph scans are disabled: [(~label = Persona)]:VERTEX
So I have tried for the simpler one by following the documentation like getting the count of vertices with label name Persona. The query is
g.V('Persona').count()
resulted in
0
I don't know why I am not able to traverse without any indexed value? Any help would be greatly appreciated.Thanks in advance.

Perhaps it is just a mis-type but g.V('Persona').count() does not search labels - it looks for vertices with a unique identifier of "Persona". As a result I suspect that the answer of "0" was correct.
I don't know why I am not able to traverse without any indexed value?
JanusGraph (and other graph systems) prevent traversals without indexed values as a guardrail to prevent users from accidentally executing long run queries. If you have a small graph with just thousands vertices then it's likely you find this feature an annoyance, but when you have billions of vertices it can be a great helper as such a query as the one you've written there would take hours and hours to complete and would not be the recommended approach to get that answer. If you are doing a global scan of all the vertices in the graph then you should be executing your Gremlin over Spark to enjoy it's parallel processing capabilities, turning those hours and hours into minutes.
Going back to the case of a small graph, you can temporarily disable the guardrail by configuring JanusGraph with query.force-index=false to allow the query to execute. That should allow your query to work without that error/warning.

I tested in the gremlin console docker.
The identifier is a unique ID for each vertex. When creating a vertex, you can either specify an ID or generate one automatically. For example,
gremlin> saturn = g.addV("character").property(T.id, 1).property('name', 'saturn').property('age', 10000).property('type', 'titan').next();
==>v[1]
If you create a vertex in the above way, you'll get v[1] as the ID cause you specify it.
The following didn't specify ID, then gremlin will generate one for you:
gremlin> g.addV("character").property('name', 'prometheus').property('age', 1000).property('type', 'god').next()
==>v[0]
Assume it's first vertex, then its ID will be 0.
In your scenario, if you are querying all vertices have the same tag and exclude a specified name, you can try this(I'm using the JanusGraph toy graph gods):
gremlin> g.V().has('character','name',neq('hercules')).valueMap();
==>[name:[prometheus],type:[god],age:[1000]]
==>[name:[saturn],type:[titan],age:[10000]]
==>[name:[jupiter],type:[god],age:[5000]]
==>[name:[neptune],type:[god],age:[4500]]
==>[name:[alcmene],type:[human],age:[45]]
==>[name:[pluto],type:[god],age:[4000]]
==>[name:[nemean],type:[monster],age:[20]]
==>[name:[hydra],type:[monster],age:[0]]
==>[name:[cerberus],type:[monster],age:[0]]
The above query finds all vertices tagged character and exclude name hercules. Or you can use the following query to do the same thing.
gremlin> g.V().hasLabel('character').has('name',neq('hercules')).valueMap();
The reason your query didn't work is that hasNot(key) must be used with key rather than values.
Let me know if it helps.

Related

What is the difference between gremlin in neptune to azure selecting a single vertex?

we switched our database from azure to neptune. In azure you could select one vertex and the gremlinquery returned the id, the label and all properties of this vertex. If you do the same on neptune, just the id and the label is returned. How can I get neptune to return the id, the label and all properties of a vertex? Is there a option you can choose in the neptune configuration? If there is no option, which query I have to execute to get the id, the label and all properties of a vertex?
The difference might have to do with a number of things. First thing that comes to mind is that if you went from CosmosDB to Neptune you might be using bytecode based traversals in which case they don't return properties (just references which means id and label as you are seeing). If you didn't switch then it's possible that Neptune may be more aligned with TinkerPop in terms of serialization semantics which calls for references only in newer versions.
Either way, it's considered a best practice to only return the data that you need in the form you want it, rather than a graph element with all properties. The reasoning is similar to why you wouldn't do SELECT * FROM table in SQL - you would specify the column names.

Define Graph Schema in AWS Neptune to prevent data duplication

When using TinkerPop/JanusGraph I am able to define, VertexLabels and Property Keys which I can than use to create composite indexes. I read somewhere on the Neptune documentation that indexes are not necessary (or supported).
My question is then how do I prevent duplication when loading data into the database? The only examples I found on the AWS documentation involves loading data where an Unique ID is already provided for each record, which for me seems like I would need to first extract data from a RDBMS in order to have all the IDs and their relationships before I can load it.
Am I understanding this correctly, if not how could I solve this?
Yes your understanding is correct. Uniqueness constraint for vertices & edges applies on their ~id property i.e. IDs are unique.
There are two ways to insert data into Neptune. You can either use the loader interface(recommended) or insert via Gremlin.
Case#1: Insert via bulk loader (recommended)
Inserting via loader only supports CSV format for now and as you observed, it does necessarily require user defined IDs for Vertices and Edges.
Case#2: Insert via Gremlin
For insertion via Gremlin providing IDs is optional. If you do not provide an ID, then Neptune will automatically assign a unique ID to the vertex or the edge.
e.g. g.addV() adds a vertex and assigns a unique identifier to it.
Further regarding case#2, you can add the two vertices and the relationship in the same query. This does not require knowledge of the ID auto-assigned to the vertex by the database.
g.addV().as("node1").property("name","Simba").addV().as("node2").property("name","Mufasa").addE("knows").from("node1").to("node2")
Alternatively, use a unique property identifier to query for nodes from the DB:
g.addV().property("name","Simba");
g.addV().property("name","Mufasa");
g.V().has("name","Simba").as("node1").V().has("name","Mufasa").as("node2").addE("knows").from("node1").to("node2");

How to create a relationship between 2 nodes where the nodes' labels, properties and the relationship are variable using APOC

I'm having a really hard time querying Neo4j with Cypher and the APOC library. I've been recommended to use the APOC library a few days ago to create nodes with a label based on a variable. Creating these nodes works great, but a few days have past since and I still can't figure out how to create a relationship between these nodes.
The error messages I'm getting are the same as the ones I got before I started using APOC. The first character of the query is always seen as invalid input. Another one I have been getting is that the procedure call does not provide the required number of arguments.
I don't really understand the APOC documentation on how to create a relationship. I also tried CALL APOC.help('relationship') and saw that it's also possible to use apoc.merge. This can't be found in their documentation though. Furthermore I read about APOC's new summer release on Neo4J's blog, but I still really don't know how I can make this query work.
I've tried every possible tweak for the query I could think of, but the nodes just won't connect. I clearly don't know what I'm doing and missing out on something.
I really would like to be able to match 2 nodes and create a relationship between them. These nodes' labels and properties are variable since that's the way they were created. If possible, it would be great if the relationship type could be based on a variable too.
I'm working with NodeJS, the Neo4j driver and put the APOC Jar file succesfully in Neo4j's plugin folder.
Here's one of the failed queries to get an idea of what I'm trying to do:
('CALL apoc.create.relationship([{labelParamN1}], {name: {nameParamN1}}, {relationParam}, [{labelParamN2}], {name: {nameParamN2}})',
{labelParamN1: labelParamN1, nameParamN1: nameParamN1, labelParamN2: labelParamN2, nameParamN2: nameParamN2, relationParam: relation})
Some help with this query would be really appreciated
You first have to use MATCH to get the required nodes (n1 and n2), and then use the apoc.create.relationship method. Provided that you do want to add any properties on the relationships (and so you just pass {} for the third parameter), the following query should work:
MATCH (n1 {name: {nameParamN1}}), (n2 {name: {nameParamN2}})
CALL apoc.create.relationship(n1, {relationParam}, {}, n2)
YIELD rel
RETURN rel

CouchDB query for more dynamic values

I have more "Location documents" in my couchdb with longitude and latitude fields. How to find all location documents in database which distance to provided latitude and longitude is less than provided distance.
There is a way how to achieve it using vanilla CouchDB, but it‘s bit tricky.
You can use the fact you can apply two map functions during one request. Second map function can be created using list mechanics.
Lists are not very efficient from computational side, they can‘t cache results as views. But they have one unique feature – you can pass several arguments into list. Moreover, one of your arguments can be, for example, JS code, that is eval-ed inside list function (risky!).
So entire scheme looks like this:
Make view, that performs coarse search
Make list, that receives custom params and refines data set
Make client-side API to ease up querying this chain.
Can‘t provide exact code for your particular case, many details are not clear, but it seems that coarse search must group results to somehow linearly enumerated squares, and list perform more precise calculations.
Please note, that scheme might be inefficient for large datasets since it‘s computationally hungry.
Vanilla CouchDB isn't really built for geospacial queries.
Your best bet is to either use GeoCouch, CouchDB-Lucene or something similar.
Failing that, you could emit a Geohash from your map function, and do range queries over those.
Caveats apply. Queries around Geohash "fault lines" (equator, poles, longitude 180, etc) can give too many or too little results.
There are multiple JavaScript libraries that can help convert to/from Geohash, as well as help with some of those caveats.
CouchDB is not built for dynamic queries, so there is no good/fast way of implementing it in vanilla couchDB.
If you know beforehand which locations you want to calculate the distance from you could create a view for each location and call it with parameters ?startkey=0&endkey=max_distance
function(doc) {
function distance(...){ /* your function for calculating distance */ }
var NY = {lat:40,lon:73}
emit( distance(NY,doc), doc._id);
}
If you do not know the locations beforehand you could solve it by using a temporary view, but I would strongly advise against it since it's slow and should only be used for testing.

Implement site wide search with neo4j db using node-neo4j

I am using node-neo4j to communicate with my neo4j. Following github.com/aseemk/node-neo4j-template was a real help to get started. Still learning my way to get things done, I am looking to solve a few issues, I'd appreciate any heads up you give me.
Implement site wide search.
We have users indexed with their email id's, and want to index stories/posts by tags or keywords. How do we search across all nodes, do we maintain indices for all nodes of various types, what would be a good approach? Should I go with google to enable this feature? How to index same node with multiple tags/keywords?
Specify custom id's for nodes
We are fine with integer indices for nodes, but since these id's can be re-used, we would like to identify nodes with unique id's, Is there a way to make neo4j use uuid's, adding an uid attribute would do but want to avoid having to maintain two id's.
Traversing nodes
How do we traverse nodes using node-neo4j, Cipher-lang looks like the answer, I am yet to get used to it. Does node-neo4j help do this out of the box?
Transactions
I may sound silly, but can I do transactional operations with node-neo4j?
Too many questions, I feel most of my doubts would clear once I get more used to querying the db, but any input from you will give me a headstart.
You probably should have broken this up into separate questions. I can answer a couple of them but not all.
Yes, node-neo4j can handle Cypher out of the box, with the query method: https://github.com/thingdom/node-neo4j/blob/develop/lib/GraphDatabase._coffee#L179. Help with Cypher--you should watch this intro video: http://vimeopro.com/neo4j/webinars/video/48603403
For your uuid, you probably should add a separate attribute to the nodes, and have an index on it--just ignore the regular ids except during transient queries where it's more convenient. As far as I know there's no way to override the incrementing ID--that sure would be nice, though.
Hope that helps.

Resources