Azure SQL Server Database serverless "Auto-pause": How fast is resuming? - azure

In Azure, there is this "auto-pause" feature for serverless SQL Server Databases
Question:
Is there a quantitative measure (as opposed to qualitative) of how fast is "Resume" on a serverless SQL Database on Azure?
I'm asking this because in my experience with DTU based tiers (S0, S1, S2, etc.), changing from one tier to another takes around 2-3 minutes, and within that interval all queries fail with timeout errors.
I want to know if "resuming" offers a similar experience (I wouldn't like the query that triggers the resume to be erratic)

The latency to auto-resume and auto-pause a serverless database is generally order of 1 minute to auto-resume and 1-10 minutes after the expiration of the delay period to auto-pause.
According to the documentation here - https://learn.microsoft.com/en-us/azure/azure-sql/database/serverless-tier-overview?view=azuresql
From personal experience, a 1GB DB takes between 1-2 minutes to resume. During this time any queries, including the one that prompted the resume will timeout.

Related

Create database within Azure SQL elastic pool takes almost 10 minutes to complete

We use Azure SQL databases and an elastic pool (level "Standard").
Usually the creation of a new customer database takes approximately 1-2 minutes but suddenly it started taking way longer (up to 10 minutes) and I have no idea why this is happening. I checked the pool in the Azure portal and everything seems fine. We are still far away from reaching the given limits (257/500 databases; ~11GB/200GB data size). Upscaling for a short period of time has no effect.
Is there anything else I can do?
I think there are some ongoing issue at Microsoft cloud services just check if your issue related to that, if that’s true your issue should be temporary

SQL Azure Premium tier is unavailable for more than a minute at a time and we're around 10-20% utilization, if that

We run a web service that gets 6k+ requests per minute during peak hours and about 3k requests per minute during off hours. Lots of data feeds compiled from 3rd party web services and custom generated images. Our service and code is mature, we've been running this for years. A lot of work by good developers has gone into our service's code base.
We're migrating to Azure, and we're seeing some serious problems. For one, we are seeing our Premium P1 SQL Azure database routinely become unavailable for 1-2 full entire minutes. I'm sorry, but this seems absurd. How are we supposed to run a web service with requests waiting 2 minutes for access to our database? This is occurring several times a day. It occurs less after switching from Standard level to Premium level, but we're nowhere near our DB's DTU capacity and we're getting throttled hard far too often.
Our SQL Azure DB is Premium P1 and our load according to the new Azure portal is usually under 20% with a couple spikes each hour reaching 50-75%. Of course, we can't even trust Azure's portal metrics. The old portal gives us no data for our SQL, and the new portal is very obviously wrong at times (our DB was not down for 1/2 an hour, like the graph suggests, but it was down for more than 2 full minutes):
Azure reports the size of our DB at a little over 12GB (in our own SQL Server installation, the DB is under 1GB - that's another of many questions, why is it reported as 12GB on Azure?). We've done plenty of tuning over the years and have good indices.
Our service runs on two D4 cloud service instances. Our DB libraries are all implementing retry logic, waiting 2, 4, 8, 16, 32, and then 48 seconds before failing completely. Controllers are all async, most of our various external service calls are async. DB access is still largely synchronous but our heaviest queries are async. We heavily utilize in-memory and Redis caching. The most frequent use of our DB is 1-3 records inserted for each request (those tables are queried only once every 10 minutes to check error levels).
Aside from batching up those request logging inserts, there's really not much more give in our application's db access code. We're nowhere near our DTU allocation on this database, and the server our DB is on has like 2000 DTU's available to be allocated still. If we have to live with 1+ minute periods of unavailability every day, we're going to abandon Azure.
Is this the best we get?
Querying stats in the database seems to show we are nowhere near our resource limits. Also, on premium tier we should be guaranteed our DTU level second-by-second. But, again, we go more than an entire solid minute without being able to get a database connection. What is going on?
I can also say that after we experience one of these longer delays, our stats seem to reset. The above image was a couple minutes before a 1 min+ delay and this is a couple minutes after:
We have been in contact with Azure's technical staff and they confirm this is a bug in their platform that is causing our database to go through failover multiple times a day. They stated they will be deploying fixes starting this week and continuing over the next month.
Frankly, we're having trouble understanding how anyone can reliably run a web service on Azure. Our pool of Websites randomly goes down for a few minutes a few times a month, taking our public sites down. If our cloud service returns too many 500 responses something in front of it is cutting off all traffic and returning 502's (totally undocumented behavior as far as we can tell). SQL Azure has very limited performance and obviously isn't ready for prime time.

Occasional delays in response from azure cloud service

I maintain an azure cloud service. It is set to auto-scale based on load. To monitor the health of this service I have another service which pings this service every 2 minutes. The usual response time from this service is around 100ms.
Once or twice a week I see that the service does not respond. It is not really a worry for me - because it happens quite infrequently. I still am trying to figure out what could be causing the service to not respond. I do not think the problem is with the pinging service - I don't see any of the other services (not on azure, but on other servers) that it pings having any issues.
What could be causing these occasional delays. Any other azure service owners seeing such delays ?
Having quite similar problems. But I use Applications Inside, so I have some statistics. For example that reponse time increases together with SQL azure access time and CPU usage. My average response time according to Applications Inside is about 600ms and average RPS is about 0,6. During these problems RPS usually higher than avarage - up to 1.5, but average response time grows up to 1min! (During the day my RPS can grow up to 3 or even higher without any reponse time growth). As I have 1min sql connection timeout and I have drammatical growth of total SQL azure access time during this periods I can assume that problem happens bacause of SQL Azure. This also happens once a day or two, for about 10-15 minutes max and my ping service also always reports that service doesn't respond.
So my advice here - install Application Insights to analyze what happens dusring these response delays. It would be great if you share your results here.
P.S. I also use autoscale based on load. Though it doesn't really help in these concrete situations.

Detect if SQL Azure is throttling

I have an Azure worker role that inserts a batch of records into a table. Yesterday, it took at most 5 minutes to insert the records, but today it has been taking up to a couple of hours. I suspect that the process is being throttled, but I don't get any exceptions. Does SQL Azure always return an error if you are being throttled, or is there another way to detect if you are being throttled?
In case of CPU throttling SQL Database will not throw an error but will slowdown the operation. At this time there is no mechanism to determine whether this form of throttling is taking place other than possibly looking at the query stats telling that the work is taking place slowly (if your CPU time is lower than usual). Check this link for details about this behavior: performance and elasticity guid (look for "Performance Thresholds Monitored by Engine Throttling").
One of the newer capabilities is the ability to monitor the number of outstanding requests a SQL Azure database has. You can do this with this query:
select count(*) from sys.dm_exec_requests
As you will see in this documentation reaching the limit of worker threads is a key reason for being throttled. Also documented here is that as you approach 180 worker threads you can expect to be throttled.
This is one of the things used in the Cotega monitoring service for SQL Azure to detect issues. [Disclaimer: I work on this service]

SQL Azure query termination

I remember reading somewhere that SQL Azure is going to terminate long-running queries. Is there a time limit on how long a query can run against a database before it is terminated? Where I work, I run complex queries against large tables that take about 5 minutes each.
SQL Azure Connection Limit is written by MSDN Library and technet wiki.
http://msdn.microsoft.com/en-us/library/ee336245.aspx#cc
http://social.technet.microsoft.com/wiki/contents/articles/sql-azure-connection-management.aspx
For example,
SQL Azure kills all transactions after they run for 24 hours. If you
lose a connection due to this reason, you will receive error code
40549.
and
Large transactions, transactions executed in large batches, or large
sorts might consume a significant tempdb space. When a session uses
more than 5 GB of tempdb space (= 655,360 pages), the session is
terminated.
Azure will terminate web-requests that are IDLE for more than 1 minute and that are made over the load-balancer (meaning from the outside to its web servers).
Azure will also throttle your slow-running queries, but only if they take resources away from the other tenants in the SQL server instance that your database is in. I do not believe there are any published statistics as to precise metrics when such throttling will occur.

Resources