AFAIK there is no REST API providing this functionality directly. So, I am using restore for this (there are other ways but those don’t guarantee transactional consistency and are more complicated) via Create request.
Since it is not possible to turn off short time backup (retention has to be at least 1 day) it should be reliable. I am using current time for ‘properties.restorePointInTime’ property in request. This works fine for most databases. But one db returns me this error (from async operation request):
"error": {
"code": "BackupSetNotFound",
"message": "No backups were found to restore the database to the point in time 6/14/2021 8:20:00 PM (UTC). Please contact support to restore the database."
}
I know I am not out of range because if the restore time is before ‘earliestRestorePoint’ (this can be found in GET request on managed database) or in future I get ‘PitrPointInTimeInvalid’ error. Nevertheless, I found some information that I shouldn’t use current time but rather current time - 6 minutes at most. This is also true if done via Azure Portal (where it fails with the same error btw) which doesn’t allow to input time newer than current - 6 minutes. After few tries, I found out that current time - circa 40 minutes starts to work fine. But 40 minutes is a lot and I didn’t find any way to find out what time works before I try and wait for result of async operation.
My question is: Is there a way to find what is the latest time possible for restore?
Or is there a better way to do ‘copy’ of managed database which guarantees transactional consistency and is reasonably quick?
EDIT:
The issue I was describing was reported to MS. It was occuring when:
there is a custom time zone format e.g. UTC + 1 hour.
Backups are skipped for the source database at the desired point in time because the database is inactive (no active transactions).
This should be fixed as of now (25th of August 2021) and I were not able to reproduce it with current time - 10 minutes. Also I was told there should be new API which would allow to make copy without using PITR (no sooner than 1Q/22).
To answer your first question "Is there a way to find what is the latest time possible for restore?"
Yes. Via SQL. The only way to find this out is by using extended event (XEvent) sessions to monitor backup activity.
Process to start logging the backup_restore_progress_trace extended event and report on it is described here https://learn.microsoft.com/en-us/azure/azure-sql/managed-instance/backup-activity-monitor
Including the SQL here in case the link goes stale.
This is for storing in the ring buffer (max last 1000 records):
CREATE EVENT SESSION [Verbose backup trace] ON SERVER
ADD EVENT sqlserver.backup_restore_progress_trace(
WHERE (
[operation_type]=(0) AND (
[trace_message] like '%100 percent%' OR
[trace_message] like '%BACKUP DATABASE%' OR [trace_message] like '%BACKUP LOG%'))
)
ADD TARGET package0.ring_buffer
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,
TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)
ALTER EVENT SESSION [Verbose backup trace] ON SERVER
STATE = start;
Then to see output of all backup events:
WITH
a AS (SELECT xed = CAST(xet.target_data AS xml)
FROM sys.dm_xe_session_targets AS xet
JOIN sys.dm_xe_sessions AS xe
ON (xe.address = xet.event_session_address)
WHERE xe.name = 'Verbose backup trace'),
b AS(SELECT
d.n.value('(#timestamp)[1]', 'datetime2') AS [timestamp],
ISNULL(db.name, d.n.value('(data[#name="database_name"]/value)[1]', 'varchar(200)')) AS database_name,
d.n.value('(data[#name="trace_message"]/value)[1]', 'varchar(4000)') AS trace_message
FROM a
CROSS APPLY xed.nodes('/RingBufferTarget/event') d(n)
LEFT JOIN master.sys.databases db
ON db.physical_database_name = d.n.value('(data[#name="database_name"]/value)[1]', 'varchar(200)'))
SELECT * FROM b
NOTE: This tip came to me via Microsoft support when I had the same issue of point in time restores failing what seemed like randomly. They do not give any SLA for log backups. I found that on a busy database the log backups seemed to happen every 5-10 minutes but on a quiet database hourly. Recovery of a database this way can be slow depending on number of transaction logs and amount of activity to replay etc. (https://learn.microsoft.com/en-us/azure/azure-sql/database/recovery-using-backups)
To answer your second question: "Or is there a better way to do ‘copy’ of managed database which guarantees transactional consistency and is reasonably quick?"
I'd have to agree with Thomas - if you're after guaranteed transactional consistency and speed you need to look at creating a failover group https://learn.microsoft.com/en-us/azure/azure-sql/database/auto-failover-group-overview?tabs=azure-powershell#best-practices-for-sql-managed-instance and https://learn.microsoft.com/en-us/azure/azure-sql/managed-instance/failover-group-add-instance-tutorial?tabs=azure-portal
A failover group for a managed instance will have a primary server and failover server with the same user databases on each kept in synch.
But yes, whether this suits your needs depends on the question Thomas asked of what is the purpose of the copy.
Related
What would be a good approach to running a repetitive task for each row in a large postgres db table on a different per row interval in Node.js.
To give you some more context, here's a quick description of the application:
It's a chat based customer support app.
It consists of teams, which can be either a client team or a support team. Teams have users, which can be either client users or support users.
Client users send messages to a support team and wait for one of that team's users to answer their question.
When there's an unanswered client message waiting for a response, every agent for the receiving support team will receive a notification every n seconds (n being set on a per-team basis by the team admin).
So this task needs to infinitely loop through the rows in the teams table and send notifications if:
The team has messages waiting to be answered.
N seconds have passed since the last notification was sent (N being the number of seconds set by the team admin).
There might be a better approach to this condition altogether.
So my questions are:
What is an efficient way to infinitely loop through a postgres table with no upper limit on the number rows?
Should I load 1 row at a time? Several at a time?
What would be a good way to do this in Node?
I'm using Knex. Does Knex provide a mechanism for lazy loading a table and iterating through the rows?
A) Running a repetitive task via node can be done via a the js built-in function 'setInterval'.
// run the intervalFnc() every 5 seconds
const timerId = setTimeout(intervalFnc, 5000);
function intervalFnc() { console.log("Hello"); }
// to quit running it:
clearTimeout(timerId);
Then your interval function can do the actual work. An alternative would be to use cron (linux), or some OS process scheduler to trigger the function. I would use this method if you want to do it every minute, and a cron job if you want to do it every hour (in between these times becomes more debatable).
B) An efficient way...
B-1) Retrieving a block of records from a DB will be more efficient than one at a time. Knex has .offset and .limit clauses to choose a group of records to retrieve. A sample from the knex doc:
knex.select('*').from('users').limit(10).offset(30)
B-2) Database indexed access is important for performance if your tables are very large. I would recommend including an status flag field in your table to note which records are 'in-process', and also include a "next-review-timestamp" field with both fields being both indexed. Retrieve the records that have status_flag='in-process' AND next_review_timestamp <= now(). Sample:
knex('users').where('status_flag', 'in-process').whereRaw('next_review_timestamp <= now()')
Hope this helps!
I have a Client Request for my Data Factory Solution
They want to run my Data-Factory when ever the i/p file is available in the Blob Storage/any location.To be very clear they doesn't want to run the solution in an schedule basis,because some day the file won't shows up.So i want an intelligence to search whether the file is available to be process in the location or not.If yes then i have to run my Data factory Solution to process that file,else no need to run the Data factor
Thanks in Advance
Jay
I think you've currently got 3 options to dealing with this. None of which are exactly what you want...
Option 1 - use C# to create a custom activity that does some sort of checking on the directory before proceeding with other downstream pipelines.
Option 2 - Add a long delay to the activity so the processing retires for the next X days. Sadly only a maximum of 10 long retires is allowed currently.
Option 3 - Wait for a newer version of Azure Data Factory that might allow the possibility of more event driven activities, rather than using a scheduled time slice approach.
Apologies this isn't exactly the answer you want. But this gives you current options.
Note
This is getting quite long so I will try and re-edit parts through the day.
These databases are no long active, which means I can play with them to work out what is going wrong.
The only thing left to answer: Given two databases running on Azure Databases at S3 (100 DTU). Should any secondary ever get significantly behind the primary database? Even while the DTU is hammered to 100% for over half the day. The reason for the DTU being hammered being IO writes mostly.
The Start: a few problems.
DTU limits were hit on Monday, Tuesday and to some extent on Wednesday for a significant amount of time. 3PM UTC - 6AM UTC.
Problem 1 (lag in data on the secondary): This had appeared to have caused a lag of data in the secondary of about 9 1/2 hours. The servers were effectively being spammed with updates causing a lot of IO updates. 6-8 million records on one table for the 24 hour period for example. This problem drove the reason for the post:
Shouldn't these be much more in sync?
The data became out of sync on Monday morning and continued out of sync until Friday. On Thursday some new databases were started up to replace these standard SQL databases and so they were left to rot. Well, for me to experiment with at least.
The application causing the redundant queries couldn't be fixed for a few days (I'm doubting they will ever fix it now) so that leaves changing the instance type. That action was attempted on the current instance but, the instance must disconnect with all standard replicas to increase to the performance tier. This led to the second problem (see below). The replica taking its time to be removed. Began on Wednesday morning and did not complete until Friday.
Problem 2 (can't remove the replica):
(Solved itself after two days)
Disconnecting the secondary database process began ~ Wed 8UTC (when the primary was at about 80GBs). The secondary being about 11GB behind in size at this point.
Setup
The databases (primary and secondary) are S3 that is geo-replicated (North + West Europe).
It has an audit log table(which I read from the secondary - normally with an SQL query), but this is currently 9 1/2 hours behind the last entry for the primary database. Running the query again on the secondary a few seconds later it is slowly catching up, but appears to be relative to the refresh rather than playing catch-up.
Both primary and secondary (read-only) databases are S3 (about to be bumped to P2).
the azure documentation states:
Active Geo-Replication (readable secondaries) is now available for all databases in all service tiers. In April 2017, the non-readable secondary type will be retired and existing non-readable databases will automatically be upgraded to readable secondaries.
How has the secondary has got so far behind? seconds to minutes would be acceptable. Hours not so much. The link above describes it as slightly:
While at any given point, the secondary database might be slightly behind the primary database, the secondary data is guaranteed to always be transactionally consistent with changes committed to the primary database.
Given the secondary is about to be destroyed and replaced by a higher level (need to remove replicas when upgrading from standard -> premium). I'm curious to know if it will happen again as well as what the definition of slight might be in this instance?
Notes: The primary did reach maximum DTU for a few hours but didn't harm the performance significantly, which is where the 9-hour difference was noticed.
Stats
Update for TheGameiswar:
I can't query it right now as it started removing itself as a replica (to be able to move the primary up to the P2 level, but that began hours ago at ~8.30UTC and 5 hours later it is still going). I think it's quite broken now.
Query - nothing special:
SELECT TOP 1000 [ID]
,[DateCreated]
,[SrcID]
,[HardwareID]
,[IP_Port]
,[Action]
,[ResponseTime]
,[ErrorCode]
,[ExtraInfo]
FROM [dbo].[Audit]
order by datecreated desc
I can't compare the tables anymore as it's quite stuck and refusing connections.
The 586 hours (10-14GB) are inserts into the primary database audit table. It was similar yesterday when noticing the 9 1/2 hour difference in data.
When the attempt to remove the replica (another person stated the process) it had about 10GB difference in size.
Cant compare data but can show DB-size at equivalent time
Primary DB Size (last 24 hours):
Secondary DB Size (last 24 hours):
Primary database size - week view
Secondary database size - week view
As mentioned ... it is being removed as a replica... but is still playing catch up with the DB size if you observe the charts above.
Stop replication errored for serverName: ---------, databaseName: Cloud4
ErrorCode: 400
ErrorMessage: A terminate operation is already in progress for database 'Cloud4'.
Update 2 - Replication - dm_continuous_copy_status
Replication is still removing ... moving on...
select * from sys.dm_continuous_copy_status
sys.dm_exec_requests
Querying from Thursday
Appears to be quite empty. The only record being
Replica removed itself at last.
The replica has removed itself after 2 days. At the 80GB mark that I predicted. It waited to replay the data in the transactions (till the point it was removed as a replica) before it would remove the replica.
A Week after on the P2 databases
DTU is holding between 20-40% at busy periods and currently performing ~12 million data entries every 24 hours (a similar amount for reads, but writing is much worse on the indexes and the table). 70-100% inserts extra in a week. This time, the replica is not struggling to keep up, which is good but that is likely due to it not reaching 100% DTU.
Conclusion
The replicas are useful but not in this case. This one caused degraded performance for several days that could have been averted. A simple increase to the performance tier until the cause of the problem could be fixed. IF the replica looks like it is dragging behind and you are on the border of Basic -> Standard or Standard -> Performance it would be safe to remove the replica as soon as possible and increase to a different tier.
Now we are on P2. The database is increasing at 20GB a day... and they say they have fixed the problem that sends 15 thousand redundant updates per minute. Thanks to the query performance insight for highlight that as querying the table is extremely painful on the DTU (even querying the last minute of data in that table is bad on the DTU. ~15 thousand new records every minute).
62617: insert ...
62618: select ...
62619: select ...
A positive from the above is that it's moved from 586 hours combined time for the insert statements (7.5 million entry rows per day) on S3 to 3 hours on P2 (12.4 million row rows per day). An extremely significant decrease in processing time. It did start with an empty table on Thursday but that has surpassed the previous size in a week whereas the previous one took a few months to get there.
It's doing well on the new tier. It should be ~5% if the applications were using the database responsibly and the secondary is up to date.
Spoke too soon. Now on P2
Someone thought it was a good idea to run an SQL query that repeats itself that deletes a thousand rows at a time. 12 million new rows a day.
10AM - 12AM it's managed to remove about 5.2 million rows. Now the database is showing signs of being in the same state as last week. Im curious if that is what happened now.
Can anyone please tell me why it might be taking 12+ seconds to insert 1000 rows into a SQL database hosted on Azure? I'm just getting started with Azure, and this is (obviously) absurd...
Create Table xyz (ID int primary key identity(1,1), FirstName varchar(20))
GO
create procedure InsertSomeRows as
set nocount on
Declare #StartTime datetime = getdate()
Declare #x int = 0;
While #X < 1000
Begin
insert into xyz (FirstName) select 'john'
Set #X = #X+1;
End
Select count(*) as Rows, DateDiff(SECOND, #StartTime, GetDate()) as SecondsPassed
from xyz
GO
Exec InsertSomeRows
Exec InsertSomeRows
Exec InsertSomeRows
GO
Drop Table xyz
Drop Procedure InsertSomeRows
Output:
Rows SecondsPassed
----------- -------------
1000 11
Rows SecondsPassed
----------- -------------
2000 13
Rows SecondsPassed
----------- -------------
3000 14
It's likely the performance tier you are on that is causing this. With a Standard S0 tier you only have 10 DTUs (Database throughput units). If you haven't already, read up on the SQL Database Service Tiers. If you aren't familiar with DTUs it is a bit of a shift from on-premises SQL Server. The amount of CPU, Memory, Log IO and Data IO are all wrapped up in which service tier you select. Just like on premises if you start to hit the upper bounds of what your machine can handle things slow down, start to queue up and eventually start timing out.
Run your test again just as you have been doing, but then use the Azure Portal to watch the DTU % used while the test is underway. If you see that the DTU% is getting maxed out then the issue is that you've chosen a service tier that doesn't have enough resources to handle you've applied without slowing down. If the speed isn't acceptable, then move up to the next service tier until the speed is acceptable. You pay more for more performance.
I'd recommend not paying too close attention to the service tier based on this test, but rather on the actual load you want to apply to the production system. This test will give you an idea and a better understanding of DTUs, but it may or may not represent the actual throughput you need for your production loads (which could be even heavier!).
Don't forget that in Azure SQL DB you can also scale your Database as needed so that you have the performance you need but can then back down during times you don't. The database will be accessible during most of the scaling operations (though note it can take a time to do the scaling operation and there may be a second or two of not being able to connect).
Two factors made the biggest difference. First, I wrapped all the inserts into a single transaction. That got me from 100 inserts per second to about 2500. Then I upgraded the server to a PREMIUM P4 tier and now I can insert 25,000 per second (inside a transaction.)
It's going to take some getting used to using an Azure server and what best practices give me the results I need.
My theory: Each insert is one log IO. Here, this would be 100 IOs/sec. That sounds like a reasonable limit on an S0. Can you try with a transaction wrapped around the inserts?
So wrapping the inserts in a single transaction did indeed speed this up. Inside the transaction it can insert about 2500 rows per second
So that explains it. Now the results are no longer catastrophic. I would now advise looking at metrics such as the Azure dashboard DTU utilization and wait stats. If you post them here I'll take a look.
one way to improve performance ,is to look at Wait Stats of the query
Looking at Wait stats,will give you exact bottle neck when a query is running..In your case ,it turned out to be LOGIO..Look here to know more about this approach :SQL Server Performance Tuning Using Wait Statistics
Also i recommend changing while loop to some thing set based,if this query is not a Psuedo query and you are running this very often
Set based solution:
create proc usp_test
(
#n int
)
Begin
begin try
begin tran
insert into yourtable
select n ,'John' from
numbers
where n<#n
commit
begin catch
--catch errors
end catch
end try
end
You will have to create numbers table for this to work
I had terrible performance problems with updates & deletes in Azure until I discovered a few techniques:
Copy data to a temporary table and make updates in the temp table, then copy back to a permanent table when done.
Create a clustered index on the table being updated (partitioning didn't work as well)
For inserts, I am using bulk inserts and getting acceptable performance.
I just ran some code which reports its performance on an Azure Web Sites instance; the result seemed a little off. I re-ran the operation, and indeed it seems consistent: System.Diagnostics.Stopwatch sees an execution time of 12 seconds for an operation that actually took more than three minutes (at least 3m16s).
Debug.WriteLine("Loading dataset in database ...");
var stopwatch = new Stopwatch();
stopwatch.Start();
ProcessDataset(CurrentDataSource.Database.Connection as SqlConnection, parser);
stopwatch.Stop();
Debug.WriteLine("Dataset loaded in database ({0}s)", stopwatch.Elapsed.Seconds);
return (short)stopwatch.Elapsed.Seconds;
This process runs in the context of a WCF Data Service "action" and seeds test data in a SQL Database (this is not production code). Specifically, it:
Opens a connection to an Azure SQL Database,
Disables a null constraint,
Uses System.Data.SqlClient.SqlBulkCopy to lock an empty table and load it using a buffered stream that retrieves a dataset (2.4MB) from Azure Blob Storage via the filesystem, decompresses it (GZip, 4.9MB inflated) and parses it (CSV, 349996 records, parsed with a custom IDataReader using TextFieldParser),
Updates a column of the same table to set a common value,
Re-enables the null constraint.
No less, no more; there's nothing particularly intensive going on, I figure the operation is mostly network-bound.
Any idea why time is slowing down?
Notes:
Interestingly, timeouts for both the bulk insert and the update commands had to be increased (set to five minutes). I read that the default is 30 seconds, which is more than the reported 12 seconds; hence, I conclude that SqlClient measures time differently.
Reports from local execution seem perfectly correct, although it's consistently faster (4-6s using LocalDB) so it may just be that the effects are not noticeable.
You used stopwatch.Elapsed.Seconds to get total time but it is wrong. Elapsed.Seconds is the seconds component of the time interval represented by the TimeSpan structure. Please try stopwatch.Elapsed.TotalSeconds instead.