Node.js avoid db race condition with cluster/pm2 - node.js

I have a Node application which runs in cluster mode with pm2.
I also have a function which checks if a specific row is in a db table. If the row is missing it creates the row otherwise a value is set and saved.
I only need one row for each combination of userId and groupId.
function someFunction()={
return Activation.findOne({ where: { userId: userId, groupId: groupId } })
.then(activationObject => {
if (!activationObject) {
return Activation.create({ userId: userId, groupId: groupId, activationTime: sequelize.fn('NOW') })
} else {
activationObject.activationTime = sequelize.fn('NOW');
return activationObject.save()
}
})
}
How can I avoid race conditions when running node in cluster mode?
Currently if first worker checks the row is available and the second checks at the same time both get no result and in the end we have two newly created rows instead of one.
I know that Sequelize provides a findOrCreate() method but I wanted an easy understandable example.

The easiest way would be to add a UNIQUE constraint for the combination of userId and groupId with an ON CONFLICT REPLACE clause, and always create a new row instead of updating. This will cause a newly inserted row with the new activationTime to replace the old row.
You can additionally check the number of rows inserted to tell whether the insert succeeded or not.
Example: UNIQUE (userId, groupId) ON CONFLICT REPLACE

Related

Proper Sequelize flow to avoid duplicate rows?

I am using Sequelize in my node js server. I am ending up with validation errors because my code tries to write the record twice instead of creating it once and then updating it since it's already in DB (Postgresql).
This is the flow I use when the request runs:
const latitude = req.body.latitude;
var metrics = await models.user_car_metrics.findOne({ where: { user_id: userId, car_id: carId } })
if (metrics) {
metrics.latitude = latitude;
.....
} else {
metrics = models.user_car_metrics.build({
user_id: userId,
car_id: carId,
latitude: latitude
....
});
}
var savedMetrics = await metrics();
return res.status(201).json(savedMetrics);
At times, if the client calls the endpoint very fast twice or more the endpoint above tries to save two new rows in user_car_metrics, with the same user_id and car_id, both FK on tables user and car.
I have a constraint:
ALTER TABLE user_car_metrics DROP CONSTRAINT IF EXISTS user_id_car_id_unique, ADD CONSTRAINT user_id_car_id_unique UNIQUE (car_id, user_id);
Point is, there can only be one entry for a given user_id and car_id pair.
Because of that, I started seeing validation issues and after looking into it and adding logs I realize the code above adds duplicates in the table (without the constraint). If the constraint is there, I get validation errors when the code above tries to insert the duplicate record.
Question is, how do I avoid this problem? How do I structure the code so that it won't try to create duplicate records. Is there a way to serialize this?
If you have a unique constraint then you can use upsert to either insert or update the record depending on whether you have a record with the same primary key value or column values that are in the unique constraint.
await models.user_car_metrics.upsert({
user_id: userId,
car_id: carId,
latitude: latitude
....
})
See upsert
PostgreSQL - Implemented with ON CONFLICT DO UPDATE. If update data contains PK field, then PK is selected as the default conflict key. Otherwise, first unique constraint/index will be selected, which can satisfy conflict key requirements.

Most performant way to Insert or Read(if record already exists) in Google Cloud Spanner

Assuming I have a cars table where vin is the primary key.
I want to insert a record(in a transaction) or read the record(if one already exists with the same PK).
What's the most performant way to insert the record or read it if one already exists with the same PK?
This is my current approach:
Case A: Record does not exist
Insert record
Return record
Case B: Record already exists
Insert record
Check if error is due to the record already existing
Read the record
Return record
const car = { vin: '123', make: 'honda', model: 'accord' };
spannerDatabase.runTransactionAsync(async (databaseTransaction) => {
try {
// Try to insert car
await databaseTransaction.insert('cars', car);
await databaseTransaction.commit();
return car;
} catch (error) {
await databaseTransaction.end();
// Spanner "row already exists" error. Insert failed because there is already a record with the same vin(PK)
if (error.code === 6) {
// Since the record already exists, I want to read it and return it. Whats the most performant way to do this?
const existingRecord = await carsTable.read({
columns: ['vin', 'make', 'model'],
keys: [car.vin],
json: true,
});
return existingRecord;
}
}
})
As #skuruppu mentioned in the comment above, your current example is mostly fine for what you are describing. It does however implicitly assume a couple of things, as you are not executing the read and the insert in the same transaction. That means that the two operations together are not atomic, and other transactions might update or delete the record between your two operations.
Also, your approach assumes that scenario A (record does not exist) is the most probable. If that is not the case, and it is just as probable that the record does exist, then you should execute the read in the transaction before the write.
You should also do that if there are other processes that might delete the record. Otherwise, another process might delete the record after you tried to insert the record, but before you try to read it (outside the transaction).
The above is only really a problem if there are other processes that might delete or alter the record. If that is not the case, and also won't be in the future, this is only a theoretical problem.
So to summarize:
Your example is fine if scenario A is the most probable and no other process will ever delete any records in the cars table.
You should execute the read before the write using the same read/write transaction for both operations if any of the conditions in 1 are not true.
The read operation that you are using in your example is the most efficient way to read a single row from a table.

Dynamodb putItem written twice

I am new to AWS and I feel like I am missing something important.
I am using this code from a lambda function in nodeJS to create an entry in a DynamoDB table :
function recordUser(item) {
return ddb.putItem({
TableName: 'Users',
Item: item,
Expected: {
username: { Exists: false }
}
}).promise();
}
username is the primary key of my table.
I though the condition would restrain duplicates to appear but I still see some duplicated entries with same username, what am I missing ?
You are giving "Expected" a wrong interpretation... You seemed to hope that it checks whether there is any existing item in the database with the given value for the "username" attribute. But this is not what Expected does... It does something very different: It reads one specific item - the item with the same key as the one you specified in "Item", and then check whether for this specific item, a value (any value!) exists for its "username" attribute.
To suggest how to fix your use case, we would need to know more about your data. The easiest solution is, of course, to have a table whose sole key is "username", which will allow just one item per username. But I don't know if this is good enough for your usecase.

Query condition missed key schema element : Validation Error

I am trying to query dynamodb using the following code:
const AWS = require('aws-sdk');
let dynamo = new AWS.DynamoDB.DocumentClient({
service: new AWS.DynamoDB(
{
apiVersion: "2012-08-10",
region: "us-east-1"
}),
convertEmptyValues: true
});
dynamo.query({
TableName: "Jobs",
KeyConditionExpression: 'sstatus = :st',
ExpressionAttributeValues: {
':st': 'processing'
}
}, (err, resp) => {
console.log(err, resp);
});
When I run this, I get an error saying:
ValidationException: Query condition missed key schema element: id
I do not understand this. I have defined id as the partition key for the jobs table and need to find all the jobs that are in processing status.
You're trying to run a query using a condition that does not include the primary key. This is how queries work in DynamoDB. You would need to do a scan for the info in your case, however, I don't think that is the best option.
I think you want to set up a global secondary index and use that to query for the processing status.
In another answer #smcstewart responded to this question. But he provides a link instead of commenting why this error occurs. I want to add a brief comment hoping it will save your time.
AWS docs on Querying a Table states that you can do WHERE condition queries (e.g. SQL query SELECT * FROM Music WHERE Artist='No One You Know') in the DynamoDB way, but with one important caveat:
You MUST specify an EQUALITY condition for the PARTITION key, and you can optionally provide another condition for the SORT key.
Meaning you can only use key attributes with Query. Doing it in any other way would mean that DynamoDB would run a full scan for you which is NOT efficient - less efficient than using Global secondary indexes.
So if you need to query on non-key attributes using Query is usually NOT an option - best option is using Global Secondary Indexes as suggested by #smcstewart.
I found this guide to be useful to create a Global secondary index manually.
If you need to add it using CloudFormation here is a relevant page.
I was getting this error for a different scenario. Here is my scenario.
(It's very unlikely that anyone else ends up with this case, but incase)
I had a query working on a Table (say table A). Table A had a partition key m_id and sort key u_id.
I had a query to fetch data using m_id. The query was working.
'''
var queryParams = {
ExpressionAttributeValues: {
':m_id': mId
},
KeyConditionExpression: 'm_id = :m_id',
TableName: "A"
};
let connections = await docClient.query(queryParams).promise();
'''
I created another Table say Table B. I made some errors in naming keys so I simply deleted and created a table with the same name again, Table B. Table B had partition key m_id, and sort key s_id.
I copied pasted the same query which I was using for Table A, I changed Table name only because partition key had the same name.
To my shock, I get this expectation.
"ValidationException: Query condition missed key schema element"
I rechecked all the names, I compared the query with the working query. Everything was fine.
I thought maybe because, I was deleting recreating Table B, it could be something with that. So I create a fresh Table with a new Name Table B2 with the same key names as Table B.
In my query that was throwing exceptions, I changed only the Table name from B to B2.
And the Exception was gone.
If you are getting this on a fresh table, where no query has worked earlier, creating a new Table with a new name is an option.
If you delete a Table only to change partition key names, it may be safer to use a new name for Table as well (Dynamo could be referring metadata by table names and not by internal identifiers, it is possible that old metadata stays even if you delete a table. Just a guess given I faced this case).
EDIT:2022-July-12
This error does not leave me. My own answer was helpful but one more case, there was a trailing space in name of Key in the table. And Dynamo does not even check for spaces in key names.
You have to create an global secondary index for the status field.
Then, you code could look like smth like this:
dynamo.query({
TableName: "Jobs",
IndexName: 'status',
KeyConditionExpression: '#s = :st',
ExpressionAttributeValues: {
':st': 'processing'
},
ExpressionAttributeNames: {
'#s': 'status',
},
}, (err, resp) => {
console.log(err, resp);
});
Note: scan operation is indeed very costly, especially if you table is huge in size
i solved the problem using AWS.DynamoDB.DocumentClient() with scan, for sample (nodejs):
var docClient = new AWS.DynamoDB.DocumentClient();
var params = {
TableName: "product",
FilterExpression: "#cg = :data",
ExpressionAttributeNames: {
"#cg": "categoria",
},
ExpressionAttributeValues: {
":data": category,
}
};
docClient.scan(params, onScan);
function onScan(err, data) {
if (err) {
// for the log in server
console.error("Unable to scan the table. Error JSON:", JSON.stringify(err, null, 2));
res.json(err);
} else {
console.log("Scan succeeded.");
res.json(data);
}
}

Issue with updating new row by using the mongodb driver

How can I add a new row with the update operation
I am using following code
statuscollection.update({
id: record.id
}, {
id: record.id,
ip: value
}, {
upsert: true
}, function (err, result) {
console.log(err);
if (!err) {
return context.sendJson([], 404);
}
});
While calling this first one i will add the row
id: record.id
Then id:value then i have to add id:ggh
How can i add every new row by calling this function for each document I need to insert
By the structure of your code you are probably missing a few concepts.
You are using update in a case where you probably do not need to.
You seem to be providing an id field when the primary key for MongoDB would be _id. If that is what you mean.
If you are intending to add a new document on every call then you probably should be using insert. Your use of update with upsert has an intended usage of matching a document with the query criteria, if the document exists update the fields as specified, if not then insert a new document with the fields specified.
Unless that actually is your goal then insert is most certainly what you need. In that case you are likely to rely on the value of _id being populated automatically or by supplying your own unique value yourself. Unless you specifically want another field as an identifier that is not unique then you will likely want to be using the _id field as described before.

Resources