How to use UUIDs in Neo4j, to keep pointers to nodes elsewhere? - node.js

I figured out thanks to some other questions that Neo4j makes use of ids for its nodes that could get recycled in case of node deletion.
That's a real concern for me as I need to store a reference to my node in another database (relational this time) in order to keep some sort of "pinned" nodes.
I've tried using this https://github.com/graphaware/neo4j-uuid to generate them automatically, but I did not succeed, all my queries kept running indefinitely.
My new idea is to make a new field in each of my nodes that I would manually fill with a UUID generated by NodeJs package uuid through uuid.v4().
I also came across the concept of indexing multiple times, which is totally unclear to me, but it seems that I should run this query:
CREATE INDEX ON :MyNodeLabel(myUUIDField)
If you think that it doesn't make sense at all don't hesitate to come up with another proposition. I am open to all kinds of suggestions.
Thanks for your help.

I would consider using the APOC library's apoc.uuid.install procedure.
Definitely create a unique constraint on the label and attribute you are going to use. This will not only create an index but also guarantee uniqueness of the attribute in the label namespace.
CREATE CONSTRAINT ON (mynode:MyNodeLabel) ASSERT mynode.myUUIDField IS UNIQUE
Then call the apoc.uuid.install procedure. This will create uuid's in the attribute myUUIDField on all of the existing MyNodeLabel nodes and on any new ones.
CALL apoc.uuid.install('MyNodeLabel', {addToExistingNodes: true, uuidProperty: 'myUUIDField'}) yield label, installed, properties
NOTE: you will have to install APOC and set apoc.uuid.enabled=true n the neo4j.conf file.

Related

How Do I create in memory search indexes in Elixir

I am currently working on an Elixir/Phoenix project and I was wondering what is a good way to create a quick in-memory search index.
The index would be created on request and destroyed when the request is over and currently the data comes from a database via Ecto. Also, I would like to query it by different indexes so not just by :id but other indexes Example :user_id so a flat key value store may not be enough.
Are there any tools that would be helpful? I looked a bit into mnesia but when using it with ecto3_mnesia, a local file/folder was created and I would prefer if everything was in memory.
Thanks
I have no idea about ecto3_mnesia, but I am pretty sure raw :mnesia without any redundant wrapper is a good fit here (or, even, :ets if you don’t need a clustered solution.)
:mnesia.table_create/2 accepts many options, two you might be interested in are disc_copies and raw_copies. Simply initialize the former with empty node list and the latter with your complete node list, and you are all set: no disk copies are created, everything is in memory.

DDD: How to save the order of aggregates?

I have the two Aggregates 'notebook' and 'note'.
When I use the role 'aggregates reference only by there ids', I think I have two options:
Notebook(List<NoteId>, [other properties])
Note([other properties])
or
Notebook([other properties])
Note(NotebookId, [other properties])
With the first option, I need two DB calls to show all notes of a notebook (one to get the list and the second to load the notes).
So my current favorite is the second option. Now I have few options in my mind to save the order of the notes, where anyone has some disadvantages.
What is a good approach to solve my problem? Or is the first option better and the two DB calls are negligible?
Can anybody help?
Big THX
It looks that the order of the Notes is important, at least related to the Notebook, so maybe it should be part of the domain. If yes, I would suggest to store it together with the Note. Or use some other information of the Note to give an ordering when a list is loaded.
If not, why is the order relevant? I mean, the two entities have a related but separated lifecycle, or at least it looks: one aggregate - the Notebook - has a list that only references the other - the Note. Hence no direct interaction is planned. But, given the the domain is correctly modelled (there's not enough information to say something about it), somewhere you need a ordered list of Notes. The only way to have it as you need it is to store the information (or use one already stored), otherwise the hypothesis (order is relevant) is not valid anymore.
update after infos about number of Notes and their size
It looks that your domain is organized in this way:
a root entity, the Notebook, where the order of each Note, with only its ID, is also stored: any change in the order will be updated from here, not from the Note
another root entity, the Note, with its own lifecycle and its own 'actions' (operations that trigger a change in the entity)
Whenever you load the Notebook, you must load also the Note and it's order to show it correctly ordered. On the other side, when you change the order, this structure allows you to have a single action (or operation) on the Notebook, for example changeOrder(NoteId), that updates the order of the given Note and, if needed, changes the order of all the others. The trick, here, is that when you persist the Notebook you work just with the ID of the Note, so you don't have to load all the entity, but just a part of it, update and save it again. So, how big is the Note entity is not important, because you don't use it all. Hence, at every change you could trigger an update of all the couples (NoteID, order) for that Notebook. You can't do differently. But, to support this you need a single function in the repository where you load the ID of the Note and its order and you save it again; that should be not so expensive.
On the other side, all the actions that operate directly on the Note should load it, hence you have to load all. But in this case is required to load all, and save all, because you are changing the Note itself.
Anyway, the way you persist the order is totally demanded to the persistence layer, that is built over the domain. I mean, the domain has a Notebook and a set of Notes with order 1, 2, 3, etc.
Even if I don't think that this needs such a complex solution, you could use a totally differen way to store the order: you can use for example steps of 100 (so 100, 200, 300, etc): each new Note is put in the middle of the old two ones, and is the only one to be saved each time. Every since a while you run a job, or something else, that just normalizes all the values restoring the 100 steps (or whatever you use to persit the order). As I said, this looks an overcomplicated solution to the problem, but it also shows the fact that the entities of the domain could be totally different from the Persitence ones.

How to create a relationship between 2 nodes where the nodes' labels, properties and the relationship are variable using APOC

I'm having a really hard time querying Neo4j with Cypher and the APOC library. I've been recommended to use the APOC library a few days ago to create nodes with a label based on a variable. Creating these nodes works great, but a few days have past since and I still can't figure out how to create a relationship between these nodes.
The error messages I'm getting are the same as the ones I got before I started using APOC. The first character of the query is always seen as invalid input. Another one I have been getting is that the procedure call does not provide the required number of arguments.
I don't really understand the APOC documentation on how to create a relationship. I also tried CALL APOC.help('relationship') and saw that it's also possible to use apoc.merge. This can't be found in their documentation though. Furthermore I read about APOC's new summer release on Neo4J's blog, but I still really don't know how I can make this query work.
I've tried every possible tweak for the query I could think of, but the nodes just won't connect. I clearly don't know what I'm doing and missing out on something.
I really would like to be able to match 2 nodes and create a relationship between them. These nodes' labels and properties are variable since that's the way they were created. If possible, it would be great if the relationship type could be based on a variable too.
I'm working with NodeJS, the Neo4j driver and put the APOC Jar file succesfully in Neo4j's plugin folder.
Here's one of the failed queries to get an idea of what I'm trying to do:
('CALL apoc.create.relationship([{labelParamN1}], {name: {nameParamN1}}, {relationParam}, [{labelParamN2}], {name: {nameParamN2}})',
{labelParamN1: labelParamN1, nameParamN1: nameParamN1, labelParamN2: labelParamN2, nameParamN2: nameParamN2, relationParam: relation})
Some help with this query would be really appreciated
You first have to use MATCH to get the required nodes (n1 and n2), and then use the apoc.create.relationship method. Provided that you do want to add any properties on the relationships (and so you just pass {} for the third parameter), the following query should work:
MATCH (n1 {name: {nameParamN1}}), (n2 {name: {nameParamN2}})
CALL apoc.create.relationship(n1, {relationParam}, {}, n2)
YIELD rel
RETURN rel

Implement site wide search with neo4j db using node-neo4j

I am using node-neo4j to communicate with my neo4j. Following github.com/aseemk/node-neo4j-template was a real help to get started. Still learning my way to get things done, I am looking to solve a few issues, I'd appreciate any heads up you give me.
Implement site wide search.
We have users indexed with their email id's, and want to index stories/posts by tags or keywords. How do we search across all nodes, do we maintain indices for all nodes of various types, what would be a good approach? Should I go with google to enable this feature? How to index same node with multiple tags/keywords?
Specify custom id's for nodes
We are fine with integer indices for nodes, but since these id's can be re-used, we would like to identify nodes with unique id's, Is there a way to make neo4j use uuid's, adding an uid attribute would do but want to avoid having to maintain two id's.
Traversing nodes
How do we traverse nodes using node-neo4j, Cipher-lang looks like the answer, I am yet to get used to it. Does node-neo4j help do this out of the box?
Transactions
I may sound silly, but can I do transactional operations with node-neo4j?
Too many questions, I feel most of my doubts would clear once I get more used to querying the db, but any input from you will give me a headstart.
You probably should have broken this up into separate questions. I can answer a couple of them but not all.
Yes, node-neo4j can handle Cypher out of the box, with the query method: https://github.com/thingdom/node-neo4j/blob/develop/lib/GraphDatabase._coffee#L179. Help with Cypher--you should watch this intro video: http://vimeopro.com/neo4j/webinars/video/48603403
For your uuid, you probably should add a separate attribute to the nodes, and have an index on it--just ignore the regular ids except during transient queries where it's more convenient. As far as I know there's no way to override the incrementing ID--that sure would be nice, though.
Hope that helps.

Drupal: Avoid database when dealing with node type info?

I'm writing a Drupal module that deals with creating new nodes from CSV files. The way I've been doing it currently, the user provides a node type, and my module goes to the database to find the fields for that node.
After the user matches the node fields to the CSV fields, I want to validate the data. This requires finding out the types of the node fields. I'm not entirely sure how to do that. (Maybe look at the content_node_field table?)
Then, I have to create the nodes. Currently, the module creates a new StdClass object, populates it with the necessary data, and saves it.
But what if I could abstract away from the database entirely and avoid dealing with it? What if I asked the user to a node of this type that already exists? I could node_load() this node, and use that to determine node fields. When it comes time to save the nodes, I could use the "seed" node to figure out what the structure of the new nodes needs to be.
One downside: this requires at least one node of this type to exist before the module can function.
Also, would this be slower than accessing the db directly?
I fear that over time, db names could change, and content types could be defined across multiple tables. By working only from a pre-existing node, I could get around many of these issues. Right?
Surely node_load will be hitting the database anyway? The node fields are stored in the database so if you need to get them, at some point you have to talk to the database. Given that some page loads on Drupal invoke hundreds (or even thousands!) of database queries I really wouldn't worry about one or two!
Table names are unlikely to change and the schema should stay fixed between point versions of Drupal at least. It would be better practice to use the API to get the data you want if it is possible though, and this would give better protection against change. I don't know if that's possible.

Resources