Neo4j Get All Properties IFF Relationship 'Met' Exists, Partial Properties Otherwise - node.js

I am still learning Neo4j and using the browser console with REST transactions to perform queries. I have a question on how to accomplish a particular task. Given the following scenario how would one go about completing the following:
I have 3 users in the database
2 users are connected with a relationship :Met label.
The 3rd user does not have any relationship connections
I want to be able to create a Cypher query to do the following:
IFF a :Met relationship exists between the user with whom we are making the query context and the desired user, return all of the properties for the desired user.
If no relationship exists between the user with whom we are making the query context and the desired user, only return back a public subset of data (avatar, first name, etc.)
Is there a way to execute a single query which can check if this relationship connection exists, return all User information? And if not, return only a subset of properties?
Thanks all!

In this query, p1 is the "query context", and p2 is all other people. A result row will only have the bar property if p1 and p2 have met.
MATCH (p1:Person { name: 'Fred' }),(p2:Person)
USING INDEX p1:Person(name)
WHERE p1 <> p2
RETURN
CASE WHEN (p1)-[:Met]-(p2)
THEN { name: p2.name, foo: p2.foo, bar: p2.bar }
ELSE { name: p2.name, foo: p2.foo }
END AS result;
For efficiency, this query assumes that you have first created an index on :Person(name).

You could do something like this whereby you match the first person and then optionally match the second person connected to the first via the :MET relationship. If the relationship exists then the results set you return could have more sensitive data in it.
match (p1:Person {name: '1'})
with p1
optional match (p1)-[r:MET]->(p2:Person {name: '2'})
with p1, case when r is not null then
{ name: p1.name, birthday: p1.birthday }
else
{ name: p1.name }
end as data
return data
EDIT:
OR maybe this is a better fit instead. Match both users and if a relationship exists return more data for the second person.
match (p1:Person {name: '1'}), (p2:Person {name: '2'})
with p1, p2
optional match (p1)-[r:MET]->(p2)
with p2, case when r is not null then
{ name: p2.name, birthday: p2.birthday }
else
{ name: p2.name }
end as data
return data

Related

Proper Sequelize flow to avoid duplicate rows?

I am using Sequelize in my node js server. I am ending up with validation errors because my code tries to write the record twice instead of creating it once and then updating it since it's already in DB (Postgresql).
This is the flow I use when the request runs:
const latitude = req.body.latitude;
var metrics = await models.user_car_metrics.findOne({ where: { user_id: userId, car_id: carId } })
if (metrics) {
metrics.latitude = latitude;
.....
} else {
metrics = models.user_car_metrics.build({
user_id: userId,
car_id: carId,
latitude: latitude
....
});
}
var savedMetrics = await metrics();
return res.status(201).json(savedMetrics);
At times, if the client calls the endpoint very fast twice or more the endpoint above tries to save two new rows in user_car_metrics, with the same user_id and car_id, both FK on tables user and car.
I have a constraint:
ALTER TABLE user_car_metrics DROP CONSTRAINT IF EXISTS user_id_car_id_unique, ADD CONSTRAINT user_id_car_id_unique UNIQUE (car_id, user_id);
Point is, there can only be one entry for a given user_id and car_id pair.
Because of that, I started seeing validation issues and after looking into it and adding logs I realize the code above adds duplicates in the table (without the constraint). If the constraint is there, I get validation errors when the code above tries to insert the duplicate record.
Question is, how do I avoid this problem? How do I structure the code so that it won't try to create duplicate records. Is there a way to serialize this?
If you have a unique constraint then you can use upsert to either insert or update the record depending on whether you have a record with the same primary key value or column values that are in the unique constraint.
await models.user_car_metrics.upsert({
user_id: userId,
car_id: carId,
latitude: latitude
....
})
See upsert
PostgreSQL - Implemented with ON CONFLICT DO UPDATE. If update data contains PK field, then PK is selected as the default conflict key. Otherwise, first unique constraint/index will be selected, which can satisfy conflict key requirements.

Node.js avoid db race condition with cluster/pm2

I have a Node application which runs in cluster mode with pm2.
I also have a function which checks if a specific row is in a db table. If the row is missing it creates the row otherwise a value is set and saved.
I only need one row for each combination of userId and groupId.
function someFunction()={
return Activation.findOne({ where: { userId: userId, groupId: groupId } })
.then(activationObject => {
if (!activationObject) {
return Activation.create({ userId: userId, groupId: groupId, activationTime: sequelize.fn('NOW') })
} else {
activationObject.activationTime = sequelize.fn('NOW');
return activationObject.save()
}
})
}
How can I avoid race conditions when running node in cluster mode?
Currently if first worker checks the row is available and the second checks at the same time both get no result and in the end we have two newly created rows instead of one.
I know that Sequelize provides a findOrCreate() method but I wanted an easy understandable example.
The easiest way would be to add a UNIQUE constraint for the combination of userId and groupId with an ON CONFLICT REPLACE clause, and always create a new row instead of updating. This will cause a newly inserted row with the new activationTime to replace the old row.
You can additionally check the number of rows inserted to tell whether the insert succeeded or not.
Example: UNIQUE (userId, groupId) ON CONFLICT REPLACE

How to update an index with new variables in Elasticsearch?

I have an index 'user' which has a mapping with field of "first", "last", and "email". The fields with names get indexed at one point, and then the field with the email gets indexed at a separate point. I want these indices to have the same id though, corresponding to one user_id parameter. So something like this:
function indexName(client, id, name) {
return client.update({
index: 'user',
type: 'text',
id: id,
body: {
first: name.first
last: name.last
}
})
}
function indexEmail(client, id, email) {
return client.update({
index: 'user',
type: 'text',
id: id,
body: {
email: email
}
})
}
When running:
indexName(client, "Jon", "Snow").then(indexEmail(client, "jonsnow#gmail.com"))
I get an error message saying that the document has not been created yet. How do I account for a document with a variable number of fields? And how do I create the index if it has not been created and then subsequently update it as I go?
The function you are using, client.update, updates part of a document. What you actually needs is to first create the document using the client.create function.
To create and index, you need the indices.create function.
About the variable number of fields in a document type, it is not a problem because Elastic Search support dynamic mapping. However, it would be advisable to provide a mapping when creating the index, and try to stick to it. Elastic Search default mapping can create you problems later on, e.g. analyzing uuids or email addresses which then become difficult (or impossible) to search and match.

CSV to Mongo using mongoose schema

I'm attempting to get a CSV file to my mongodb collection (via mongoose) while checking for matches at each level of my schema.
So for a given schema personSchema with a nest schema carSchema:
repairSchema = {
date: Date,
description: String
}
carSchema = {
make: String,
model: String
}
personSchema = {
first_name: String,
last_name: String,
car: [carSchema]
}
and an object that I am mapping the CSV data to:
mappingObject = {
first_name : 0,
last_name: 1,
car : {
make: 2,
model: 3,
repair: {
date: 4,
description: 5
}
}
}
check my collection for a match then check each nested schema for a match or create the entire document, as appropriate.
Desired process:
I need to check if a person document matching first_name and last_name exists in my collection.
If such a person document exists, check if that person document contains a matching car.make and car.model.
If such a car document exists, check if that car document contains a matching car.repair.date and car.repair.description.
If such a repair document exists, do nothing, exact match to existing record.
If such a repair document does not exist, push this repair to the repair document for the appropriate car and person.
If such a car document does does not exist, push this car to the car document for the appropriate person.
If such a person document does not exist, create the document.
The kicker
This same function will be used across many schemas, which may be nested many levels deep (current database has one schema that goes 7 levels deep). So it has to be fairly abstract. I can already get the data into the structure I need as a javascript object, so I just need to get from that object to the collection as described.
It also has to be synchronous, since multiple records from the CSV could have the same person, and asynchronous creation could mean that the same person gets created twice.
Current solution
I run through each line of the CSV, map the data to my mappingObject, then step through each level of the object in javascript, checking non-object key-value pairs for a match using find, then pushing/creating or recursing as appropriate. This absolutely works, but it is painfully slow with such large documents.
Here's my full recursing function, which works:
saveObj is the object that I've mapped the CSV on to that matches my schema.
findPrevObj is initially false. path and topKey both are initially "".
lr is the line reader object, lr.resume simply moves on to the next line.
var findOrSave = function(saveObj, findPrevObj, path, topKey){
//the object used to search the collection
var findObj = {};
//if this is a nested schema, we need the previous schema search to match as well
if (findPrevObj){
for (var key in findPrevObj){
findObj[key] = findPrevObj[key];
}
}
//go through all the saveObj, compiling the findObj from string fields
for (var key in saveObj){
if (saveObj.hasOwnProperty(key) && typeof saveObj[key] === "string"){
findObj[path+key] = saveObj[key]
}
}
//search the DB for this record
ThisCollection.find(findObj).exec(function(e, doc){
//this level at least exists
if (doc.length){
//go through all the deeper levels in our saveObj
for (var key in saveObj){
var i = 0;
if (saveObj.hasOwnProperty(key) && typeof saveObj[key] === "string"){
i += 1;
findOrSave(saveObj[key], findObj, path+key+".", path+key);
}
//if there were no deeper levels (basically, full record exists)
if (!i){
lr.resume();
}
}
//this level doesn't exist, add new record or push to array
} else {
if (findPrevObj){
var toPush = {};
toPush[topKey] = saveObj;
ThisCollection.findOneAndUpdate(
findPrevObj,
{$push: toPush},
{safe: true, upsert: true},
function(err, doc) {
lr.resume();
}
)
} else {
// console.log("\r\rTrying to save: \r", saveObj, "\r\r\r");
ThisCollection.create(saveObj, function(e, doc){
lr.resume();
});
}
}
});
}
I'll update for clarity, but the person.find is to check if a person with a matching first and last name exists. If they do exist, I check each car for a match - if the car exists already, there's no reason to add this record. If the car doesn't exist, I push it to the car array for the matching person. If no person was matched, I'd save the entire new record.
Ah, what you want is to update with upsert:
replace
Person.find({first_name: "adam", last_name: "snider"}).exec(function(e, d){
//matched? check {first_name: "adam", last_name: "snider", car.make: "honda", car.model: "civic"}
//no match? create this record (or push to array if this is a nested array)
});
with
Person.update(
{first_name: "adam", last_name: "snider"},
{$push: {car: {make: 'whatever', model: 'whatever2'}}},
{upsert: true}
)
If a match is found, it will push into OR create the car field this subdoucment: {car_make: 'whatever', car_model: 'whatever2'}.
If a match is not found, it will create a new doc that looks like:
{first_name: "adam", last_name: "snider", car: {car_make: 'whatever', car_model: 'whatever2'}}
This cuts your total db round trips in half. However, for even more efficiency, you can use an orderedBulkOperation. This would result in a single round trip to the database.
Here's what that would look like (using es6 here for concision...not a necessity):
const bulk = Person.collection.initializeOrderedBulkOp();
lr.on('line', function(line) {
const [first_name, last_name, make, model, repair_date, repair_description] = line.split(',');
// Ensure user exists
bulk.update({first_name, last_name}, {first_name, last_name}, {upsert: true});
// Find a user with the existing make and model. This makes sure that if the car IS there, it matches the proper document structure
bulk.update({first_name, last_name, 'car.make': make, 'car.model': model}, {$set: {'car.$.repair.date': repair_date, 'car.$.repair.description': repair_description}});
// Now, if the car wasn't there, let's add it to the set. This will not push if we just updated because it should match exactly now.
bulk.update({first_name, last_name}, {$addToSet: {car: {make, model, repair: {date: repair_date, description: repair_description}}}})
});

Retrieving Hierarchical/Nested Data From CouchDB

I'm pretty new to couchDB and even after reading (latest archive as now deleted) http://wiki.apache.org/couchdb/How_to_store_hierarchical_data (via ‘Store the full path to each node as an attribute in that node's document’) it's still not clicking just yet.
Instead of using the full path pattern as described in the wiki I'm hoping to keep track of children as an array of UUIDs and the parent as a single UUID. I'm leaning towards this pattern so I can maintain the order of children by their positions in the children array.
Here are some sample documents in couch, buckets can contain buckets and items, items can only contain other items. (UUIDs abbreviated for clarity):
{_id: 3944
name: "top level bucket with two items"
type: "bucket",
parent: null
children: [8989, 4839]
}
{_id: 8989
name: "second level item with no sub items"
type: "item"
parent: 3944
}
{
_id: 4839
name: "second level bucket with one item"
type: "bucket",
parent: 3944
children: [5694]
}
{
_id: 5694
name: "third level item (has one sub item)"
type: "item",
parent: 4839,
children: [5390]
}
{
_id: 5390
name: "fourth level item"
type: "item"
parent: 5694
}
Is it possible to look up a document by an embedded document id within a map function?
function(doc) {
if(doc.type == "bucket" || doc.type == "item")
emit(doc, null); // still working on my key value output structure
if(doc.children) {
for(var i in doc.children) {
// can i look up a document here using ids from the children array?
doc.children[i]; // psuedo code
emit(); // the retrieved document would be emitted here
}
}
}
}
In an ideal world final JSON output would look something like.
{"_id":3944,
"name":"top level bucket with two items",
"type":"bucket",
"parent":"",
"children":[
{"_id":8989, "name":"second level item with no sub items", "type":"item", "parent":3944},
{"_id": 4839, "name":"second level bucket with one item", "type":"bucket", "parent":3944, "children":[
{"_id":5694", "name":"third level item (has one sub item)", "type":"item", "parent": 4839, "children":[
{"_id":5390, "name":"fourth level item", "type":"item", "parent":5694}
]}
]}
]
}
You can find a general discussion on the CouchDB wiki.
I have no time to test it right now, however your map function should look something like:
function(doc) {
if (doc.type === "bucket" || doc.type === "item")
emit([ doc._id, -1 ], 1);
if (doc.children) {
for (var i = 0, child_id; child_id = doc.children[i]; ++i) {
emit([ doc._id, i ], { _id: child_id });
}
}
}
}
You should query it with include_docs=true to get the documents, as explained in the CouchDB documentation: if your map function emits an object value which has {'_id': XXX} and you query view with include_docs=true parameter, then CouchDB will fetch the document with id XXX rather than the document which was processed to emit the key/value pair.
Add startkey=["3944"]&endkey["3944",{}] to get only the document with id "3944" with its children.
EDIT: have a look at this question for more details.
Can you output a tree structure from a view? No. CouchDB view queries return a list of values, there is no way to have them output anything other than a list. So, you have to deal with your map returning the list of all descendants of a given bucket.
You can, however, plug a _list post-processing function after the view itself, to turn that list back into a nested structure. This is possible if your values know the _id of their parent — the algorithm is fairly straightforward, just ask another question if it gives you trouble.
Can you grab a document by its id in the map function? No. There's no way to grab a document by its identifier from within CouchDB. The request must come from the application, either in the form of a standard GET on the document identifier, or by adding include_docs=true to a view request.
The technical reason for this is pretty simple: CouchDB only runs the map function when the document changes. If document A was allowed to fetch document B, then the emitted data would become invalid when B changes.
Can you output all descendants without storing the list of parents of every node? No. CouchDB map functions emit a set of key-value-id pairs for every document in the database, so the correspondence between the key and the id must be determined based on a single document.
If you have a four-level tree structure A -> B -> C -> D but only let a node know about its parent and children, then none of the nodes above know that D is a descendant of A, so you will not be able to emit the id of D with a key based on A and thus it will not be visible in the output.
So, you have three choices:
Grab only three levels (this is possible because B knows that C is a descendant of A), and grab additional levels by running the query again.
Somehow store the list of descendants of every node within the node (this is costly).
Store the list of parents of every node within the node.

Resources