I'm designing an accounting/ book-keeping application and my table has the following columns -
Transaction ID
Transaction Details
Amount
Closing Balance
All the transactions are fed to NodeJS function one-by-one through a queue
However, while saving each transaction, I have to fetch the previous transaction to get the last closing balance and add current transaction amount to the same to get the new closing balance
But I have to use async/await to fetch previous transaction so the event loop is free for a few milliseconds during which the function receives a new transaction event from the user. This is causing a lot of inconsistencies with the data as sometimes 2 rows with the same closing balance are inserted.
const prevTransaction = await Transaction.findOne({
where: { userId },
order: [['createdAt', 'DESC']]
})
await Transaction.create({
userId,
amount,
closingBalance: prevTransaction ? prevTransaction.closingBalance + amount : amount,
transactionDate
})
Now if the system receives a lot of events in bulk then there could be some inconsistencies in data due to the gap between GET & INSERT query.
In some scenarios, this is the data that's inserted
ID
Amount
ClosingBalance
1
20
20
2
20
40
3
20
40
4
10
50
5
20
60
When ideally it should be -
ID
Amount
ClosingBalance
1
20
20
2
20
40
3
20
60
4
10
70
5
20
90
Any particular way I could tweak the above code to get the sequential effect?
This is the reason I'm using a BullJS queue is so that transactions are processed one-by-one. But this issue still persists because of the 2 await calls.
I have temporarily solved this problem by pausing the job queue if any transaction is being processed and resuming it once the transaction is inserted - link
Would love to hear about any alternate approaches since this approach is relying on a third-party library.
Related
I have one user who has multiple tasks with hours and minutes. which I stored in milliseconds
user's tasks schema
tasks:{
userId: String //getting from the session(storing the task schema)
time:{type: "number"}
}
User's input like this
time:{
hrs: 10,
min: 45
}
I converted the user input in milliseconds using the date.getTime() function by setting the hours and minutes.
Which will later be stored as milliseconds in the schema.
The output will be ( suppose there are multiple tasks) 10 + 10 (the final output should be
20hrs
So I Want to do the sum of all the multiple hours based on a specific user(userId) stored in
tasks
Should I Use the Aggregation method for this problem?
I am working with several APIs on my app and a few of them have limits that are not just simply per sec.
For example one of my apis has the following limits:
Max 100 requests per 2 minutes
Max 20 requests per 1 second
So I have tried implementing this library https://github.com/aishek/axios-rate-limit in the following way:
axiosRateLimit(baseAxios.create(), {
maxRequests: 1, // 1
perMilliseconds: 1200 // per 1.2 seconds
// 100 requests per 2 minutes, 50 requests per 60 seconds, 60 seconds / 50 requests = 1 per 1.2 seconds
});
But it can't take advantage of the 20 requests per 1 second limit, because to adhere to the 100 requests per 2 minutes, I have to limit it to 1 per 1.2 seconds, otherwise if I limit it to 20 per second, I can do 2400 requests in 2 minutes.
So how can I implement both conditions and have them both working together?
What if I need to do only 50 requests every 2 minutes, with the current implementation, it will take me 1 minute for all of them, and I am not taking advantage of the 20 per second (becaus if I do, I can do it in 3 seconds, instead of 1 minute).
Is there a way to accomplish this with this library? Initially I thought that the maxRequests works with perMilliseconds and maxRPS can be used to handle the other case, so when all 3 are supplied I thought it would be like:
{
maxRequests: 100, // 100 max requests
perMilliseconds: 2 * 60 * 1000, // per 2 minutes
maxRPS: 20 // 20 max per second
}
But the docs say:
// sets max 2 requests per 1 second, other will be delayed
// note maxRPS is a shorthand for perMilliseconds: 1000, and it takes precedence
// if specified both with maxRequests and perMilliseconds
const http = rateLimit(axios.create(), { maxRequests: 2, perMilliseconds: 1000, maxRPS: 2 })
So obviously it doesnt work the way I expected it to work, is there a way to achieve what I want?
Are there any other libraries that can do what I want?
Do I have to implement it from scratch on my own?
I have documents in my mongodb, this documents have event field - this fild type is date. The year, month , day, does not matter, means only the time during day. I want the cron script,every day, to aggregate from mongodb the documents with the event (date typed) field to be in nearest 10 minutes (to the script calling date). How to implement it in right way?
db.mytable.find(
{
"event": {
$gt: new Date(new Date().getTime() - (10 * 60 * 1000))
}
})
This query will find all documents that have an "event" property with a value within the past 10 minutes. new Date() without arguments returns a Date representing "right now". We pull the numeric epoch time in milliseconds from that and subtract 10 minutes. More specifically, we subtract (10 minutes * 60 seconds per minute * 1000 milliseconds per second), so that we convert to the correct units. We then use that value to construct another new Date(...), and this is the one that goes into the $gt (greater-than) filtering condition.
You mentioned a need for "aggregation". If so, then this same query can also be used within any Aggregation Pipeline that you need.
I have to run 100 iterations with 50 users. The total duration of the test is 1 hour. 1 user can do 2 iterations and the number of transactions in the script is 6.
How to calculate pacing time?
Example:
1000 Users, 10000 Full Iterations per hour
10,000/1,000 = 10 iterations per user per hour
3600 seconds per hour /10 iterations per user per hour = one iteration every 360 seconds ( six minutes ) on average
The random algorithm in LoadRunner is based upon the C rand() function, which is approximately (but not exactly ) uniform for large datasets. So, I take the average pacing interval from the start of one iteration to the next and then adjust it by plus/minus 20%.
So, your 360 ( 0:06:00 ) second pacing becomes a range from 288 seconds (0:04:48) to 432 seconds (0:07:12 ).
You would run these calculations for each business process you want to stage
For think time look to your production logs for information on the range of users from page X to Page X+1. This is easily achievable since each top level page refers to the REFERER, or previous page that it came from. A comparison of the timestamps grouped by client IP can provide that range you need for think times.
Always Apply Little's Law for calculate Pacing, ThinkTime, No.of VUsers
From Little's Law: No of VUsers= Throughput*(Responce_Time + Think_Time)
Expl.
Throughput= Total No of Transactions/Time in Seconds
, Pacing= (Response_Time + Think_Time)
From Your Requirements-
Total No of iterations 100 and 1 iteration have 6 transactions, So total no of transactions = 600
Throughput for 1 Minute is: 600/60 = 10
, Throughput for 1 Sec is: 0.16
According to formula 50 = 0.16*(Pacing)
Pacing = 312.5 seconds
To achieve 100 Iterations in 1 Hour you have to set pacing 312.5 seconds, Make sure Pacing = Response_time + Think_Time.
Pacing is the 'inter-iteration' gap and it is used to control the rate of iterations during the test. If the goal for 1 user is to complete 2 iterations per hour, that results into a Pacing of 1800sec (little's law mentioned above) . Now as long as the summation of resp times of those 6 transactions and think time between them is less than 1800s, you will be able to achieve the desired rate.
NOTE: iteration is not equal to transaction, unless the iteration has just one transaction. Refer this to get a pictorial understanding
https://theperformanceengineer.com/2013/09/11/loadrunner-how-to-calculate-transaction-per-second-tps/
Pacing is the wait time between iterations so i'm agree with #CyberNinja, in your use case pacing is 1800s because it's the max duration of your script that achieve your goal : produce 100 iterations with 50 users in a hour.
Pacing is not Response_time + Think_Time!
According to Little's Law :
No. of Concurrent Users(N) =
Throughput or TPS(X) * [
Response Time (RT) + Think Time (TT) + Pacing (P)
]
Here RT+TT is Script Execution Time SET which you can calculate by running script once and adding up all the RT of transactions and all think times.
Assume SET to be 60 seconds.
As per your question
total transactions in 1 hr =
100(Iterations) *
50(Users) *
2(Each User Iteration) *
6(No. of Transactions)
= 60000 Transactions/hr
Converting it to TPS = 60000/3600 = 16.66
Now Putting all values in Little's Law:
50 = 16.66 (60 - Pacing)
Pacing = 60 - 50/16.66
Pacing = 57 secs (approx).
I have about 3 million rows of data in a azure table storage which come from log files. Each row in the table is a detection of a certain event (this may be 1 or 100 rows of data per client, we don't know till its there) and there is a number of different events.
For each event i need to find the duration of the event from the timestamp of each row for each client. If there is a gap between end and start time, it would be count as a new event. EventId is the Partition Key for the row, and a composite key of timestamp to epoch and client ID make up the rowKey.
The Azure Table Storage Looks like the following, with some example data:
PartitionKey RowKey ClientId Epoch Additional
1 1370966492_1 1 1370969592 34
1 1370967792_1 1 1370967792 63
2 1370969592_1 1 1370969592 34
1 1370972592_1 2 1370972592 47
1 1370973542_1 1 1370969592 44
2 1370976562_1 1 1370976562 18
1 1370978592_1 2 1370978592 92
3 1370981542_1 2 1370981542 34
2 1370982562_1 1 1370982562 37
1 1370982592_1 1 1370982592 73
And the output i need is (example not related to data above:
EventId ClientId StartTime EndTime Max(additional)
1 1 1370966492 1370973492 78
1 2 1370967834 1370979536 29
What would be the most efficient way of processing the data? would it to be to keep the data in Table Storage? once i have processed these logs it is possible to change the import procedure to the Table Storage if need be.
You may need a different format for your date in the RowKey. The problem is that the RowKey and PartitionKey are both string then the comparison is always OrdinalCase. You must provide a format that represent all dates with the same number of characters and the OrdinalCase comparison be the same of the Dates comparison. EX: 20130805122200 (yyyyMMddhhmmss).
The other thing is that TableStorage services work as follow:
- Search for a given PartitionKey
- For each Partition that match search for a given RowKey
- For each Entity that match search for any Other criteria
Then in the example above you use the Date and the Event in the RowKey. If you always search by dates i recommend you that you include this property in the PartitionKey too.