all transactions fails in dynamodb when condition for one of the transaction fails

all transactions fails in dynamodb when condition for one of the transaction fails - node.js

Data in products table in dynamoDB.
[
{
productId:123,
quantity:50
},
{
productId:4565,
quantity:10
}
// more items
]
Now, a client can order one or more than one product at once. Now, suppose the client is ordering products 123 & 4565 with quantity 30 & 12 respectively.
The client can purchase product 123, but he can not purchase product 4565 because it has less quantity than the client wants.
I am using AWS docClient and dc.transactWrite() method to achieve this. But, the problem with transactWrite is that, if one of the conditions fails then all transactions will fail.
Implementation of Atomic Transactions in dynamodb
ConditionExpression for transactWrite
// QN - quantity
// :val - entered by client
ConditionExpression: '#QN >= :val'
Basically, I want to update the product which has the available quantity and give some information about the transaction which has not enough quantity.
Is there any way to achieve this, or I have to manually called documentClient.update() for every product.

The whole point of using a transaction is to ensure if one fails, nothing is changed.
That's the very definition of "transaction" in a DB.
Seems like you should just use BatchWriteItem()
The individual PutItem and DeleteItem operations specified in
BatchWriteItem are atomic; however BatchWriteItem as a whole is not.
If any requested operations fail because the table's provisioned
throughput is exceeded or an internal processing failure occurs, the
failed operations are returned in the UnprocessedItems response
parameter.

Related

Some trivial transactions take dozens of seconds to complete on Spanner microinstance

Here are some bits of context.
Nodejs server, connecting to Cloud Spanner from development machine.
Most of the time the queries take like 200-400ms including data transfer from servers location to my dev machine.
But sometimes these trivial transaction takes 12-16 seconds which surely not acceptable for use case - sessions storage for backend server.
In local dev context sessions service runs on same machine as main backend, at staging at prod they run in same Kubernetes cluster.
This is not about amount of data, it is very small amount of data now in our staging Spanner database overall, like few MB across all tables and just like 10 rows in the table under question.
Spanner instance stats:
Processing units: 100
CPU utilization: 4.3% for the staging database and 10% overall for instance.
Table is like so (few other small fields omitted):
CREATE TABLE sessions
(
id STRING(255) NOT NULL,
created TIMESTAMP,
updated TIMESTAMP,
status STRING(16),
is_local BOOL,
user_id STRING(255),
anonymous BOOL,
expires_at TIMESTAMP,
last_activity_at TIMESTAMP,
json_data STRING(MAX),
) PRIMARY KEY(id);
Transaction under question makes single question like this:
UPDATE ${schema.reportsTable}
SET ${statusCol.columnName} = #status_recycled
WHERE ${idCol.columnName} = #id_value
AND ${statusCol.columnName} = #status_active
with parameters like this:
{
"id_value": "some_session_id",
"status_active": "active",
"status_recycled": "recycled"
}
Yes, that status field of STRING(16) with readable names instead of boolean field is not ideal, I know, but this concept is inherited from an older code. What concerns me is that while we do not have yet too much of data there, just 10 rows or such, experience this sort of delays is surely unexpected at this scale.
Okay, I understand I am like on other side of the globe from the Spanner servers, but this usually gives delays between 200-1200 ms, not 12-16 seconds.
Delay happens quite rarely and randomly but seems to happen on queries like this.
The delay comes at commit, not at e. g. sending SQL command itself or obtaining a transaction.
I tried different query first, like
DELETE FROM Sessions WHERE id = #id_value
and it was the same - random rare long delay of 12-16 such trivial query.
Thanks a lot for your help and time.
PS: Update: actually this 12-16 seconds delay can happen at any random transaction in described context, and all of these transactions are standard CRUD single-row operations.
Update 2:
The code that sends transaction is own wrapper over the standard #google-cloud/spanner client library for nodejs.
The library gives just an easy to use wrapping around the Spanner instance, database, and transaction.
The Spanner instance and database objects are long-living singletons, I mean they do not recreated for every transaction from scratch.
The main purpose of that wrapper is to give logic like:
let result = await useDataContext(async(ctx) => {
let sql = await ctx.getSQLRunner();
return await sql.runSQLUpdate({
sql: `Some SQL Trivial Statement`,
parameters: {
param1: 1,
param2: true,
param3: "some string"
}
});
});
purpose of that is to give some warrantees that if some changes were made over data, transaction.commit surely will be called, and if no changes were made, transaction.end will be called, and if an error boom in the called code, like invalid SQL generated or some variable will be undefined or null, transaction rollback will be initiated.

how to write a function in chaincode that simply count the total records and return total number.hyperledger fabric

For example, we have a bank record, we use a query to get all the bank's record, I just wanted to create a function who simply return the total bank record and return number only

Do you mean the total number of records in CouchDB or just a particular type of record?
Anyhow, I'll propose solutions for both assuming you're using CouchDB as your state DB.
Reading the total number of records present in CouchDB from chaincode will just be a big overhead. You can simply make a GET API call like this http://couchdb.server.com/mydatabase and you'd get a JSON back looking something like this:
{
"db_name":"mydatabase",
"update_seq":"2786-g1AAAAFreJzLYWBg4MhgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUoxJTIkyf___z8riYGB0RuPuiQFIJlkD1Naik-pA0hpPExpDj6lCSCl9TClwXiU5rEASYYGIAVUPR-sPJqg8gUQ5fvBygMIKj8AUX4frDyOoPIHEOUQt0dlAQB32XIg",
"sizes":{
"file":13407816,
"external":3760750,
"active":4059261
},
"purge_seq":0,
"other": {
"data_size":3760750
},
"doc_del_count":0,
"doc_count":2786,
"disk_size":13407816,
"disk_format_version":6,
"data_size":4059261,
"compact_running":false,
"instance_start_time":"0"
}
From here, you can simply read the doc_count value.
However, if you want to read the total number of docs in chaincode, then I should mention that it'll be a very costly operation and you might get a timeout error if the number of records is very high. For a particular type of record, you can use Couchdb selector syntax.
If you want to read all the records, then you can use getStateByRange(startKey, endKey) method and count all the records.

NodeJS and Mongo line who's online

TL;DR
logging online users and reporting back a count (based on a mongo find)
We've got a saas app for schools and students, as part of this I've been wanting a 'live' who's online ticker.
Teachers from the schools will see the counter, and the students and parents will trigger it.
I've got a socket.io connect from the web app to a NodeJS app.
Where there is lots of traffic, the Node/Mongo servers can't handle it, and rather than trow more resources at it, I figured it's better to optomise the code - because I don't know what I'm doing :D
with each student page load:
Create a socket.io connection with the following object:
{
'name': 'student or caregiver name',
'studentID': 123456,
'schoolID': 123,
'role': 'student', // ( or 'mother' or 'father' )
'page': window.location
}
in my NODE script:
io.on('connection', function(client) {
// if it's a student connection..
if(client.handshake.query.studentID) {
let student = client.handshake.query; // that student object
student.online = new Date();
student.offline = null;
db.collection('students').updateOne({
"reference": student.schoolID + student.studentID + student.role }, { $set: student
}, { upsert: true });
}
// IF STAFF::: just show count!
if(client.handshake.query.staffID) {
db.collection('students').find({ 'offline': null, 'schoolID':client.handshake.query.schoolID }).count(function(err, students_connected) {
emit('online_users' students_connected);
});
}
client.on('disconnect', function() {
// then if the students leaves the page..
if(client.handshake.query.studentID) {
db.collection('students').updateMany({ "reference": student.reference }, { $set: { "offline": new Date().getTime() } })
.catch(function(er) {});
}
// IF STAFF::: just show updated count!
if(client.handshake.query.staffID) {
db.collection('students').find({ 'offline': null, 'schoolID':client.handshake.query.schoolID }).count(function(err, students_connected) {
emit('online_users' students_connected);
});
}
});
});
What Mongo Indexes would you add, would you store online students differently (and in a different collection) to a 'page tracking' type deal like this?
(this logs the page and duration so I have another call later that pulls that - but that's not heavily used or causing the issue.
If separately, then insert, then delete?
The EMIT() to staff users, how can I only emit to staff with the same schoolID as the Students?
Thanks!

You have given a brief about the issue but no diagnosis on why the issue is happening. Based on a few assumptions I will try to answer your question.
First of all you have mentioned that you'd like suggestions on what Indexes can help your cause, based on what you have mentioned it's a write heavy system and indexes in principle will only slow the writes because on every write the Btree that handles the indexes will have to be updated too. Although the reads become way better specially in case of a huge collection with a lot of data.
So an index can help you a lot if your collection has let's say, 1 million documents. It helps you to skim only the required data without even doing a scan on all data, thanks to the Btree.
And Index should be created specifically based on the read calls you make.
For e.g.
{"student_id" : "studentID", "student_fname" : "Fname"}
If the read call here is based on student_id then create and index on that, and if multiple values are involved (equality - sort or anything) then create a compound index on those fields, giving priority to Equality field first and range and sort fields thereafter.
Now the seconds part of question, what would be better in this scenario.
This is a subjective thing and I'm sure everyone will have a different approach to this. My solution is based on a few assumptions.
Assumption(s)
The system needs to cater to a specific feature where student's online status is updated in some time interval and that data is available for reads for parents, teachers, etc.
The sockets that you are using, if they stay connected continuously all the time then it's that many concurrent connections with the server, if that is required or not, I don't know. But concurrent connections are heavy for the server as you would already know and unless that's needed 100 % try a mixed approach.
If it would be okay for you disconnect for a while or keep connection with the server for only a short interval then please consider that. Which basically means, you disconnect from the server gracefully, connect send data and repeat.
Or, just adopt a heartbeat system where your frontend app will call an API after set time interval and ping the server, based on that you can handle if the student is online or not, a little time delay, yes but easily scaleable.
Please use redis or any other in memory data store for such frequent writes and specially when you don't need to persist the data for long.
For example, let's say we use a redis list for every class / section of user and only update the timestamp (epoch) when their last heartbeat was received from the frontend.
In a class with 60 students, sort the students based on student_id or something like that.
Create a list for that class
For student_id which is the first in ascended student's list, update the epoch like this
LSET mylist 0 "1266126162661" //Epoch Time Stamp
0 is your first student and 59 is our 60th student, update it on every heartbeat. Either via API or the same socket system you have. Depends on your use case.
When a read call is needed
LRANGE classname/listname 0 59
Now you have epochs of all users, maintain the list of students either via database or another list where you can simply match the indexes with a specific student.
LSET studentList 0 "student_id" //Student id of the student or any other data, I am trying to explain the logic
On frontend when you have the epochs take the latest epoch in account and based on your use case, for e.g. let's say I want a student to be online if the hearbeat was received 5 minutes back.
Current Timestamp - Timestamp (If less than 5 minutes (in seconds)) then online or else offline.

This won't be a complete answer without discussing the problem some more, but figured I'd post some general suggestions.
First, we should figure out where the performance bottlenecks are. Is it a particular query? Is it too many simultaneous connections to MongoDB? Is it even just too much round trip time per query (if the two servers aren't within the same data center)? There's quite a bit to narrow down here. How many documents are in the collection? How much RAM does the MongoDB server have access to? This will give us an idea of whether you should be having scaling issues at this point. I can edit my answer later once we have more information about the problem.
Based on what we know currently, without making any model changes, you could consider indexing the reference field in order to make the upsert call faster (if that's the bottleneck). That could look something like:
db.collection('students').createIndex({
"reference": 1
},
{ background: true });
If the querying is the bottleneck, you could create an index like:
db.collection('students').createIndex({
"schoolID": 1
},
{ background: true });
I'm not confident (without knowing more about the data) that including offline in the index would help, because optimizing for "not null" can be tricky. Depending on the data, that may lead to storing the data differently (like you suggested).

How to fix a race condition in node js + redis + mongodb web application

I am building a web application that will process many transactions a second. I am using an Express Server with Node Js. On the database side, I am using Redis to store attributes of a user which will fluctuate continuously based on stock prices. I am using MongoDB to store semi-permanent attributes like Order configuration, User configuration, etc.,
I am hitting a race condition when multiple orders placed by a user are being processed at the same time, but only one would have been eligible as a check on the Redis attribute which stores the margin would not have allowed both the transactions.
The other issue is my application logic interleaves Redis and MongoDB read + write calls. So how would I go about solving race condition across both the DBs
I am thinking of trying to WATCH and MULTI + EXEC on Redis in order to make sure only one transaction happens at a time for a given user.
Or I can set up a Queue on Node / Redis which will process Orders one by one. I am not sure which is the right approach. Or how to go about implementing it.
This is all pseudocode. Application logic is a lot more complex with multiple conditions.
I feel like my entire application logic is a critical section ( Which I think is a bad thing )
//The server receives a request from Client to place an Order
getAvailableMargin(user.username).then((margin) => { // REDIS call to fetch margin of user. This fluctuates a lot, so I store it in REDIS
if (margin > 0) {
const o = { // Prepare an order
user: user.username,
price: orderPrice,
symbol: symbol
}
const order = new Order(o);
order.save((err, o) => { // Create new Order in MongoDB
if (err) {
return next(err);
}
User.findByIdAndUpdate(user._id, {
$inc: {
balance: pl
}
}) // Update balance in MongoDB
decreaseMargin(user.username) // decrease margin of User in REDIS
);
}
});
Consider margin is 1 and with each new order margin decreases by 1.
Now if two requests are received simultaneously, then the margin in Redis will be 1 for both the requests thus causing a race condition. Also, two orders will now be open in MongoDB as a result of this. When in fact at the end of the first order, the margin should have become 0 and the second order should have been rejected.
Another issue is that we have now gone ahead and updated the balance for the User in MongoDB twice, one for each order.
The expectation is that one of the orders should not execute and a retry should happen by checking the new margin in Redis. And the balance of the user should also have updated only once.
Basically, would I need to implement a watch on both Redis and MongoDB
and somehow retry a transaction if any of the watched fields/docs change?
Is that even possible? Or is there a much simpler solution that I might be missing?

Update Parent Standard Object Getting Sum Value from Child in APEX

I am new in Apex. I want to write a trigger in apex for before insert. I have two standard objects (Contact, Opportunity).
SELECT sum(amount), Bussiness__c FROM opportunity
WHERE stagename='Closed Won' and id='006i000000Kt683AAB' GROUP BY Bussiness__c
I want when trigger runs this get sum(Amount) field and Bussiness__c value and then update Contact Total_Business__c with Sum(Amount) Value. Here Bussiness__C is contact id at opportunity object.
Thanks in advance and Waiting for your positive Response.

I'm assuming yo don't have currencies enabled in your org (if you'll see "CurrencyIsoCode" somewhere on your objects you'll have to modify this design a bit).
I am a lazy person and you didn't write anything about amount of data you expect. What I've written will work when there's reasonable amount of Opportunities per contact. If you'll start hitting the governor limit of 50K query rows it'd have to be done differently (I'll write a bit about it at the end).
I am not going to give you a ready solution because "homemade rollup summary" is one of assignments you might encounter during SF DEV 501 certification. I'll just outline some pointers and food for thought.
I wouldn't do it before insert, it's easier in after insert, after update (you didn't think about recalculation when the Amount changes, did you?). There should also be something said about after delete, after undelete if your users are allowed to delete Opportunities.
First thing is to build a set of "contacts we'll have to recalculate":
Set<Id> contactIds = new Set<Id>();
for(Opportunity o : trigger.old){
contactIds.add(o.Business__c);
}
for(Opportunity o : trigger.new){
contactIds.add(o.Business__c);
}
contactIds.remove(null);
This forces recalculation for all related contacts and ignores opportunities without contact. It'll fire always... which is not the best thing because on insert, delete, undelete you'd want it to fire always but on update you'd want it to fire only when Amount or Contact changes (trigger.old will hold different contact than trigger.new). You can control these scenarios by using stuff like Trigger.isUpdate, read up about it.
Anyway - you got an unique set of Contact Ids. I've said I'd do it in "after" trigger because at that point the new Amount is already saved to database and you can query it back from it:
SELECT Business__c, SUM(Amount) sumAmount
FROM Opportunity
WHERE Business__c IN :contactIds
This type of queries returns an "AggregateResult" that you'll have to parse like that:
List<Contact> contactsToUpdate = new List<Contact>();
for(AggregateResult ar : [SELECT Business__c, SUM(Amount) sumAmount
FROM Opportunity
WHERE Business__c IN :contactIds]){
System.debug(ar);
contactsToUpdate.add(new Contact(Id = (Id) ar.get('Business__c'),
Total_Business__c = (Double) ar.get('sumAmount)
);
}
update contactsToUpdate;
As I said - it's a basic outline, should get you started.
This thing queries all opportunities for given contact. Your trigger can fire on at most 200 Opps. Imagine a situation where you change contact on all 200 opps -> gives you 400 contacts you need to update to clear/fix old value and to set new value. With 50K rows limit, assuming no other business logic is triggered (like update of Accounts? Action that started because some Opportunity Products were added?) it gives you problems when on average 1 contact is involved in 125 Opps. It sounds like a ridiculous problem but there are scenarios when you need to do it differently.
In such cases you can attack it from another angle. You don't really need to query all opps for given Contact, it's lazy. You couuld instead learn the current value of total business (put 0 if it happens to be null) and then add/substract all changes to the amount as needed, looking only at your trigger.old and trigger.new. It makes for more code and more planning upfront but the performance will increase significantly and this solution will scale as the amount of opps grow (it'll continue to look at only the current max of 200 opps in the trigger's scope).
Another approach would be to accept some delay in this rollup summary and write a batch job for it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string