Prisma timesout whith large datasets - node.js

I have an app (Node.js (Fastify), postgres with prisma) that writes sales from an external onto the postgres db based on dates. Once the sales have been written the timestamps are written in a table in order to check later if that date has been queried (so if we request the sales for October 2019 it will check whether or not October 2019 has been queried before and return the sales from the db if that's the case or fetch from the external API, writes them on the db and write October 2019 on the date table for the next time).
My issue is when trying to get all the sales, which can be over several years. The way I do it right now is (please note that the only endpoint I can use with the API is year/month, so I have no other choice but to iterate my requests every month
Get the amount of months between first and last sale (for example, 97)
Loop over each month and check whether or not this month has been queried before
if it has been queried before, do nothing
If it has not been queried before, fetch this year/month combination from external API and write it on db
Once the loop has finished, get all the sales from the db in between those 2 dates
The issue I have is that while I paginated my endpoint, prisma timesout with some stores while upserting. Some months can have thousands of sales with relations for the products sold and I feel that that's where the issue is.
Here is the error message
Timed out fetching a new connection from the connection pool. More info: http://pris.ly/d/connection-pool (Current connection pool timeout: 10, connection limit: 10)"
My question is, is it my logic that is bad and should be redone, or should I not write that many objects in the database, is there a best practice I'm missing ?
I did not provide code as it is working and I feel the issue lies in the logic more than the code itself but I will happily provide code if needed.

Prisma has a connection pool, which you need to tell heroku's connection limit.
You'll need a ".profile" file in your root folder containing:
export DATABASE_URL="$DATABASE_URL?connection_limit=10&pool_timeout=0"
".profile" is like .bashrc or .zshrc. Its content will be executed on startup of your server. The line above will overwrite the standard env variable for databases on heroku.

Related

Handling timezones between Client and Server

I'm developing a client and server application and I'm using luxon and postgres.
Considering the current time i'm writing this post (2022-11-15T22:55:27.374-03):
On local server/db, the dates are saved on UTC-3, like the above (2022-11-15T22:55:27.374-03);
On hosted server/db, the dates are saved on UTC-0 (2022-11-16T01:55:27+00);
My problem is that i need to query data in between begin and end of day, and while on local it works, on server it doesn't because day starts at 03:00 and finishes on next day 02:59.
I tried hardcoding conditional offset based on env but i dont believe that this should be the best solution...
Is there a proper way to handle this timezone difference?
You need to define what beginning and end of day mean. There are a total of 48 hours in which any timezone can be in a particular date. You'll have to decide if your queries will be "today" in the server's timezone or in the local client's timezone. If your queries are the server's "today". You can parse any request as plain timestamps and let the receiver decide what bounds to use. The flip side is to request the actual date string you want e.g. 2020-01-01 and let the server calculate the bounds in its timezone.

How to copy managed database?

AFAIK there is no REST API providing this functionality directly. So, I am using restore for this (there are other ways but those don’t guarantee transactional consistency and are more complicated) via Create request.
Since it is not possible to turn off short time backup (retention has to be at least 1 day) it should be reliable. I am using current time for ‘properties.restorePointInTime’ property in request. This works fine for most databases. But one db returns me this error (from async operation request):
"error": {
"code": "BackupSetNotFound",
"message": "No backups were found to restore the database to the point in time 6/14/2021 8:20:00 PM (UTC). Please contact support to restore the database."
}
I know I am not out of range because if the restore time is before ‘earliestRestorePoint’ (this can be found in GET request on managed database) or in future I get ‘PitrPointInTimeInvalid’ error. Nevertheless, I found some information that I shouldn’t use current time but rather current time - 6 minutes at most. This is also true if done via Azure Portal (where it fails with the same error btw) which doesn’t allow to input time newer than current - 6 minutes. After few tries, I found out that current time - circa 40 minutes starts to work fine. But 40 minutes is a lot and I didn’t find any way to find out what time works before I try and wait for result of async operation.
My question is: Is there a way to find what is the latest time possible for restore?
Or is there a better way to do ‘copy’ of managed database which guarantees transactional consistency and is reasonably quick?
EDIT:
The issue I was describing was reported to MS. It was occuring when:
there is a custom time zone format e.g. UTC + 1 hour.
Backups are skipped for the source database at the desired point in time because the database is inactive (no active transactions).
This should be fixed as of now (25th of August 2021) and I were not able to reproduce it with current time - 10 minutes. Also I was told there should be new API which would allow to make copy without using PITR (no sooner than 1Q/22).
To answer your first question "Is there a way to find what is the latest time possible for restore?"
Yes. Via SQL. The only way to find this out is by using extended event (XEvent) sessions to monitor backup activity.
Process to start logging the backup_restore_progress_trace extended event and report on it is described here https://learn.microsoft.com/en-us/azure/azure-sql/managed-instance/backup-activity-monitor
Including the SQL here in case the link goes stale.
This is for storing in the ring buffer (max last 1000 records):
CREATE EVENT SESSION [Verbose backup trace] ON SERVER
ADD EVENT sqlserver.backup_restore_progress_trace(
WHERE (
[operation_type]=(0) AND (
[trace_message] like '%100 percent%' OR
[trace_message] like '%BACKUP DATABASE%' OR [trace_message] like '%BACKUP LOG%'))
)
ADD TARGET package0.ring_buffer
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,
TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)
ALTER EVENT SESSION [Verbose backup trace] ON SERVER
STATE = start;
Then to see output of all backup events:
WITH
a AS (SELECT xed = CAST(xet.target_data AS xml)
FROM sys.dm_xe_session_targets AS xet
JOIN sys.dm_xe_sessions AS xe
ON (xe.address = xet.event_session_address)
WHERE xe.name = 'Verbose backup trace'),
b AS(SELECT
d.n.value('(#timestamp)[1]', 'datetime2') AS [timestamp],
ISNULL(db.name, d.n.value('(data[#name="database_name"]/value)[1]', 'varchar(200)')) AS database_name,
d.n.value('(data[#name="trace_message"]/value)[1]', 'varchar(4000)') AS trace_message
FROM a
CROSS APPLY xed.nodes('/RingBufferTarget/event') d(n)
LEFT JOIN master.sys.databases db
ON db.physical_database_name = d.n.value('(data[#name="database_name"]/value)[1]', 'varchar(200)'))
SELECT * FROM b
NOTE: This tip came to me via Microsoft support when I had the same issue of point in time restores failing what seemed like randomly. They do not give any SLA for log backups. I found that on a busy database the log backups seemed to happen every 5-10 minutes but on a quiet database hourly. Recovery of a database this way can be slow depending on number of transaction logs and amount of activity to replay etc. (https://learn.microsoft.com/en-us/azure/azure-sql/database/recovery-using-backups)
To answer your second question: "Or is there a better way to do ‘copy’ of managed database which guarantees transactional consistency and is reasonably quick?"
I'd have to agree with Thomas - if you're after guaranteed transactional consistency and speed you need to look at creating a failover group https://learn.microsoft.com/en-us/azure/azure-sql/database/auto-failover-group-overview?tabs=azure-powershell#best-practices-for-sql-managed-instance and https://learn.microsoft.com/en-us/azure/azure-sql/managed-instance/failover-group-add-instance-tutorial?tabs=azure-portal
A failover group for a managed instance will have a primary server and failover server with the same user databases on each kept in synch.
But yes, whether this suits your needs depends on the question Thomas asked of what is the purpose of the copy.

How to run something automatically in mongoDB server after certain time like in 60 days and change value of document without any user interaction

Condition is like this.
How to allow a user to insert a document(data) in the collection(table) which will expire at 90 days .On the 85th day and on the 90th day Server will send a notification as a reminder for expiry date and in 90th day it be will be sent to the trash for 30 days and again in the trash server will give notification on the 28th day (118th total life of data )and then on the 30th day as reminder sent notification that it will be deleted from the database and be deleted.
2.How can i automatically change the document value like on the 60th day a document in the database will go from (visible:true) to (visible:false) after when the user extends the time to 90th from 60 days then it will changed to (visible:true) again.The problem is just here to change the value to false automatically without user interaction with the app on 60th day.
i am currently trying to do it on Node.js using express.
The core MongoDB server does not implement scheduling, but if you use Atlas you can use Atlas Triggers.

Running a repetitive task in Node.js for each row in a postgres table on a different interval for each row

What would be a good approach to running a repetitive task for each row in a large postgres db table on a different per row interval in Node.js.
To give you some more context, here's a quick description of the application:
It's a chat based customer support app.
It consists of teams, which can be either a client team or a support team. Teams have users, which can be either client users or support users.
Client users send messages to a support team and wait for one of that team's users to answer their question.
When there's an unanswered client message waiting for a response, every agent for the receiving support team will receive a notification every n seconds (n being set on a per-team basis by the team admin).
So this task needs to infinitely loop through the rows in the teams table and send notifications if:
The team has messages waiting to be answered.
N seconds have passed since the last notification was sent (N being the number of seconds set by the team admin).
There might be a better approach to this condition altogether.
So my questions are:
What is an efficient way to infinitely loop through a postgres table with no upper limit on the number rows?
Should I load 1 row at a time? Several at a time?
What would be a good way to do this in Node?
I'm using Knex. Does Knex provide a mechanism for lazy loading a table and iterating through the rows?
A) Running a repetitive task via node can be done via a the js built-in function 'setInterval'.
// run the intervalFnc() every 5 seconds
const timerId = setTimeout(intervalFnc, 5000);
function intervalFnc() { console.log("Hello"); }
// to quit running it:
clearTimeout(timerId);
Then your interval function can do the actual work. An alternative would be to use cron (linux), or some OS process scheduler to trigger the function. I would use this method if you want to do it every minute, and a cron job if you want to do it every hour (in between these times becomes more debatable).
B) An efficient way...
B-1) Retrieving a block of records from a DB will be more efficient than one at a time. Knex has .offset and .limit clauses to choose a group of records to retrieve. A sample from the knex doc:
knex.select('*').from('users').limit(10).offset(30)
B-2) Database indexed access is important for performance if your tables are very large. I would recommend including an status flag field in your table to note which records are 'in-process', and also include a "next-review-timestamp" field with both fields being both indexed. Retrieve the records that have status_flag='in-process' AND next_review_timestamp <= now(). Sample:
knex('users').where('status_flag', 'in-process').whereRaw('next_review_timestamp <= now()')
Hope this helps!

Execute a function after particular time in NodeJs and its lifetime

In my node application with mongodb I have feature where users can post books on rent and other users can request for them with a "whenDate". One post is mapped to only one book.
Consider a user requests for a book for 1 week 5 days from now. In this case I want to lock the book for a week so that no one else can request at that period.
1) How can I achieve in NodeJs that a function gets executed after sometime considering that I will be having many of them? This function will get executed after 5 days in the above case to lock the particular book document. Please consider the question 2 also.
2) I don't want these timers to get deleted if I restart my application. How can I achieve this?
Thanks in advance.
You can use TTL feature in mongo DB to discard records automatically after the time to live.
Let's say you keep a table with the booking requests and set TTL according to the booking duration. Mongo DB then can remove these booking record after the TTL is achieved. So your node.js application does not need to trigger any job.
Refer: https://docs.mongodb.com/manual/tutorial/expire-data/

Resources