Neo4j, how to index and give ids to node - node.js

I'm kinda new to neo4j, and I want to start building an application with neo4j and nodejs.
From what I understand neo4j adds id to each node it creates, and this id should not be use outside the db, So that means if I have users then looking user by id (the same id neo give when created that user) is not that smart.
So the questions are:
What will be the best why to look for a user? email? could be, but not every thing in the application has a unique identifier like email for a user
I saw few posts writing about uuid, lets assume I'm using it.. can i save save that field with name id? or i need some other name?
Do I need to do something special if I want that field to use as an index? (I want the search by id to be fast.)
uuid generates a very long string, Isn't that a bit of overhead to index that string? indexing a number is faster, no?
if not using uuid what you think is other option?

1) Best way - UUID.
2) Yes.
3) No. You just need to add index to database. Example:
CREATE INDEX ON :User(uuid)
4) That's true that id lookup is faster, especially in Neo4j (due to storage implementation). However index-backed lookup using UUID performs very well and most of the Neo4j users are using this (if there are no another unique identifier in their domain).
5) UUID is the best option. Especially when you take in account - how to generate ID's in clustered setup. UUID's provides possibility to generate unique identifier without taking any global database locks and etc. Here you can read a bit more theoretic information about UUID's.
There are existing Neo4j extensions, which can generate UUID's for you.
For example - GraphAware/neo4j-uuid.
In this extension you can configure property name, for which nodes/relationships UUID's should be applied and etc.

Related

Is it better to generate "UUID" and "TIMESTAMP" within the NodeJS application or the database?

I am writing a TypeScript-NodeJS application and want to handle object ids and created_at TIMESTAMP within the NodeJS application, instead of using MySQL or Cassandra built-in UUID or TIMESTAMP generator.
1- First of all I like to know is this a good idea to generate id and created_at values within a web server application instead of letting databases generate them?
2- Secondly, I want to know is there a commonly used library for generating those values(UUID and TIMESTAMP) inside a NodeJS app or I should write my own functions?
3- I am also wondering know any considerations about this topic like using what version of UUID, etc? (All I need is generating a UNIQUE( maybe secure if needed?) id for each object I put into MySQL/Cassandra database, also a general TIMESTAMP that I can also use/generate within my Flutter front-end application that uses DateTime` class)
4- Furthermore, I like to know is it a good idea to generate a UUID version1 that consists of TIMESTAMP, then extract the first part as id and the TIMESTAMPpart ascreated_at`?
In my opinion,
Using the database to generate unique IDs and timestamps is generally a good option because it offers a more reliable and efficient method of doing so. However, if you have particular demands or limitations, it is still possible to handle them within the program.
The "uuid" library in Node.js (https://www.npmjs.com/package/uuid) is the most widely used library for creating UUIDs. The "Date" object that comes with JavaScript can be used to create timestamps.
Version 4 (random) of UUIDs is the one that is most frequently utilized. You might also think about version 1 (time-based) or version 5 for security reasons (SHA-1 based).
It is possible to create a UUID version 1 consisting of a timestamp, however this may not be the ideal option for unique identification because several objects can have the same date.

Redis: Delete user token by email ( find Key by Value )

I have followed tutorial on how to create token-based authentication with node from this tutorial http://www.kdelemme.com/2014/08/16/token-based-authentication-with-nodejs-redis/
I got it all worked out, but I got 1 problem.
The way I store token is :
KEY = TOKEN
VALUE = UserData (Username, email, etc.)
To protect multiple devices login, I would like to invalidate the existing Token, and generate new one. During login, I would like to check if the user's token is already existed. However, I need to find Key by Value. ( I need to find TOKEN by email ). But as I look through Redis document I couldn't find any line talking about finding Key by value.
Thank you very much :)
You basically have to choose one of two approaches: a full scan of the database or an index. A full scan, as proposed in another answer to this question, will be quite inefficient - you'll go over your entire keyspace (or at least all the tokens) and will need to fetch each one until you find a match to the email.
An index will allow you to get an answer to your query much faster, at the expense of some RAM and administrative overhead. While Redis doesn't provide indexing capabilities out of the box, you can easily devise them using regular Redis data structures and operations. For example, the straightforward way to accomplish what you want would be to store for each token another key who's name is the email and its value the token. This will let you let the token but email with a single GET operation.
Note that this indexing approach will effectively double the number of token-related keys, so in order to optimize your RAM consumption you may want to consider other types of indexing structures (e.g. using a Hash to group email-token pairs where the is used as a bucket).
You would have to do a SCAN of some kind and iterate through the keys, searching each value. The redis module supports these commands, but if you need/want a streaming interface for SCAN, there are at least a couple of modules to do that: redis-scanstreams and redisscan (which technically uses a callback approach, so not a real stream implementation).

CouchDB - human readable id

Im using CouchDB with node.js. Right now there is one node involved and even in remote future its not planned to changed that. While I can remove most of the cases where a short and auto-incremental-like (it can be sparse but not like random) ID is required there remains one place where the users actually needs to enter the ID of a product. I'd like to keep this ID as short as possible and in a more human readable format than something like '4ab234acde242349b' as it sometimes has to be typed by hand and so on.
However in the database it can be stored with whatever ID pleases CouchDB (using the default auto generated UUID) but it should be possible to give it a number that can be used to identify it as well. What I have thought about is creating a document that consists of an array with all the UUIDs from CouchDB. When in node I create a new product I would run an update handler that updates said document with the new unique ID at the end. To obtain the products ID I'd then query the array and client side using indexOf I could get the index as a short ID.
I dont know if this is feasible. From the performance point of view I can say the following: There are more queries that should do numerical ID -> uuid than uuid -> numerical ID. There will be at max 7000 new entries a year in the database. Also there is no use case where a product can be deleted yet I'd like not to rely on that.
Are there any other applicable ways to genereate a shorter and more human readable ID that can be associated with my document?
/EDIT
From a technical point of view: It seems to be working. I can do both conversions number <-> uuid and it seems go well. I dont now if this works well with replication and stuff but as there is said array i guess it should, right?
You have two choices here:
Set your human readable id as _id field. Basically you can just set in create document calls to DB, and it will accept it. This can be a more lightweight solution, but it comes with some limitations:
It has to be unique. You should also be careful about clients trying to create documents, but instead overwrite existing ones.
It can only contain alphanumeric or a few special characters. In my experience it is asking for trouble to have extra character types.
It cannot be longer than a theoretical string length limit(Couchdb doesn't define any, but you should). Long ids will increase your views(indexes) size really bad. And it might make it s lower.
If these things are no problem with you, then you should go with this solution.
As you said yourself, let the _id be a UUID, and set the human readable id to another field. To reach the document by the human readable id, you can just create a view emitting the human readable id as a key, and then either emit the document as value or get the document via include_docs=true option. Whenever the view is reached Couchdb will update the view incrementally and return you the list. This is really same as you creating a document with an array/object of ids inside it. Except with using a couchdb view, you get more performance.
This might be also slightly slower on querying and inserting. If the ids are inserted sequentially, it's fine, if not, CouchDB will slightly take more time to insert it at the right place. These don't work well with huge amounts of insert coming at the DB.
Querying shouldn't be more than 10% of total query time longer than first option. I think 10% is really a big number. It will be most probably less than 5%, I remember in my CouchDB application, I switched from reading by _id to reading from a view by a key and the slow down was very little that from user end point, when making 100 queries at the same time, it wasn't noticeable.
This is how people, query documents by other fields than id, for example querying a user document with email, when the user is logging in.
If you don't know how couchdb views work, you should read the views chapter of couchdb definite guide book.
Also make sure you stay away from documents with huge arrays inside them. I think CouchDB, has a limit of 4GB per document. I remember having many documents and it had really long querying times because the view had to iterate on each array item. In the end for each array item, instead I created one document. It was way faster.

How to set a field containing unique key

I want to save data in CouchDB documents and as I am used to doing it in RDBMS. I want to create a field which can only contain a unique value in the database. If I now save a document and there is already a document with unique key I expect an error from CouchDB.
I guess I can use the document ID and replace the auto generated doc-id by my value, but is there a way to set other field as unique key holder. Any best practice regarding unique keys?
As you said, the generated _id is enforced as unique. That is the only real unique constraint in CouchDB, and some people use it as such for their own applications.
However, this only applies to a single CouchDB instance. Once you start introducing replication and other instances, you can run into conflicts if the same _id is generated on more than 1 node. (depending on how you generate your _ids, this may or may not be a problem)
As Dominic said, the _id is the only parameter that is almost assured to be unique. One thing that is sure is that you have to design your "database" in a different way. Keep in mind that the _id will be database wide. You will be able to have only one document with this _id.
The _id must be a string, which means you can't have an array or a number or anything else.
If you want to make the access public, you'll have to think about how to generate your id in a way that it won't mess with your system.
I came up with ids that looked like that:
"email:email#example.com"
It worked well in my particular case to prevent people from creating multiple auth on the same email. But as Documinic said, if you have multiple masters, you'll have to think about possible conflicts.

Generating lexographically ascending unique IDs

I want to generate IDs for use with CouchDB. I'd like the IDs to be lexographically ascending by time so that I can sort on id without maintaining a seperate timestamp field. I know that CouchDB will generate ids with this property, but I don't want the performance hit of querying the database, I'd rather just run an algorithm on my servers. I'd go with an implementation of rfc 4112 except that the results aren't lexographically ascending. Is there any good reason I shouldn't just do:
(Date.now()) + 'x' + Math.round(Math.random() *1E18)
(I'm using nodejs). Are there any costs of using a non-standard uuid, or of relying on javascript's built in random function?
You have some choices when it comes to uuids.
The first choice is if you want the _id generated client side(node, browser, etc..), or by couch. It sounds like you want to generate your own uuid on the client side. That is fine. Just stick the result of your function into the _id field of the doc you save to couchdb. Couch will just use that.
You could have couch create the id. Couchdb only generates a _id if you don't choose one for yourself. Couchdb by default uses a 'sequential' uuid generation algorithm. You can change the algorithm to others via futon and config. There is a section called 'uuids' with a key of 'algorithm'. You can see the source for these algorithms here:
https://github.com/apache/couchdb/blob/master/src/couchdb/couch_uuids.erl
With descriptions about them here:
http://wiki.apache.org/couchdb/HttpGetUuids?highlight=%28utc%5C_random%29
As you can see the utc_random function is very similiar to your suggestion. But if you wanted your own,If you were inclined you could add you algorithm on the serverside and recompile couch.
The second part of your question is about the performance of choosing different algorithms. I am going to quote Dave Cottlehuber from a user list post:
CouchDB will have best insert time when your doc ids are
continually increasing, as this minimises rewrites to the b~tree. This
will also help
your view build time for the same reason, and also minimises wasted doc space,
although that would also be recovered during compaction.
So both your algorithm and the utc_random should be fine as they doc ids are continually increasing do to the seemingly helpful one direction of time.
I would recommend sticking with the UUID that CouchDB generates for you, but you can configure the server to use utc_random which will prefix a timestamp which you can sort your records by.
http://wiki.apache.org/couchdb/HttpGetUuids

Resources