SQL Azure query termination - azure

I remember reading somewhere that SQL Azure is going to terminate long-running queries. Is there a time limit on how long a query can run against a database before it is terminated? Where I work, I run complex queries against large tables that take about 5 minutes each.

SQL Azure Connection Limit is written by MSDN Library and technet wiki.
http://msdn.microsoft.com/en-us/library/ee336245.aspx#cc
http://social.technet.microsoft.com/wiki/contents/articles/sql-azure-connection-management.aspx
For example,
SQL Azure kills all transactions after they run for 24 hours. If you
lose a connection due to this reason, you will receive error code
40549.
and
Large transactions, transactions executed in large batches, or large
sorts might consume a significant tempdb space. When a session uses
more than 5 GB of tempdb space (= 655,360 pages), the session is
terminated.

Azure will terminate web-requests that are IDLE for more than 1 minute and that are made over the load-balancer (meaning from the outside to its web servers).
Azure will also throttle your slow-running queries, but only if they take resources away from the other tenants in the SQL server instance that your database is in. I do not believe there are any published statistics as to precise metrics when such throttling will occur.

Related

Azure SQL Server Database serverless "Auto-pause": How fast is resuming?

In Azure, there is this "auto-pause" feature for serverless SQL Server Databases
Question:
Is there a quantitative measure (as opposed to qualitative) of how fast is "Resume" on a serverless SQL Database on Azure?
I'm asking this because in my experience with DTU based tiers (S0, S1, S2, etc.), changing from one tier to another takes around 2-3 minutes, and within that interval all queries fail with timeout errors.
I want to know if "resuming" offers a similar experience (I wouldn't like the query that triggers the resume to be erratic)
The latency to auto-resume and auto-pause a serverless database is generally order of 1 minute to auto-resume and 1-10 minutes after the expiration of the delay period to auto-pause.
According to the documentation here - https://learn.microsoft.com/en-us/azure/azure-sql/database/serverless-tier-overview?view=azuresql
From personal experience, a 1GB DB takes between 1-2 minutes to resume. During this time any queries, including the one that prompted the resume will timeout.

How do I improve query execution parallelization for a single user in Azure SQL Datawarehouse?

We have a new report dashboard that's loaded in our web application, where the data is sourced from Azure SQLDataWareHouse.
The dashboard is made up of ~8-10 tiles, each displaying a different metric, loaded by a different query.
The various queries are executed from the webapp using some straightforward ADO.NET code, to connect to the DW with a dashboard user account.
I've read both articles on Memory & Concurrency Limits and Resource Classes but there's something I'm just not understanding.
For our DW service level (Gen2 - DW200c), the server should support running 8 concurrent queries.
Similarly, we've added our dashboard user to the staticrc80 resource group which should give it access to all 8 concurrency slots.
But this doesn't seem to help. Am I right in understanding that regardless of these resource configurations, it'll still only execute a sinlge query for a single user at a time ?
And that multiple queries executed under the same user account would still be queued up?
One alternative seems to be that I could have a different user account for each tile,
make 8 separate connections,
run 8 seperate queries,
where each query account is assigned to the staticrc10 role.
Am I missing something fundamental here. This DW is dedicated to a single app, with a single user account reader user account. How do I configure that account in terms of resource classe etc... to make full use of the 8 paralell query/200 DWU resource allocations.
According to the documentation, static resource class staticrc80 at DWU200c uses 8 resource slots, so where DWU200c has a max of 8 concurrent slots, I would expect a single connection to use all of them, and therefore your concurrent queries will queue, one at a time.
Consider switching your user to staticrc10 which will allow up to 8 concurrent queries. No need to make 8 different users.
Can I ask, are you using Power BI? Also, DWU200c is pretty low for any workload, it's really just for keeping things ticking over.
multiple queries executed under the same user account would still be queued up?
Your observed behavior may have nothing to do with the concurrency slots. It could be that the client is not sending all the queries in parallel. A client connection to SQL Server (or Synapse) can only process one query at a time. A client is free to open as many connections as it wants, but they typically don't. Two connections per client is the most you typically see.
Stepping back, if you're working on improving performance of a dashboard, have you looked at Result Set Caching? It's intended to improve the response time for common queries, which often happens with dashboard tiles.

SQL Azure Premium tier is unavailable for more than a minute at a time and we're around 10-20% utilization, if that

We run a web service that gets 6k+ requests per minute during peak hours and about 3k requests per minute during off hours. Lots of data feeds compiled from 3rd party web services and custom generated images. Our service and code is mature, we've been running this for years. A lot of work by good developers has gone into our service's code base.
We're migrating to Azure, and we're seeing some serious problems. For one, we are seeing our Premium P1 SQL Azure database routinely become unavailable for 1-2 full entire minutes. I'm sorry, but this seems absurd. How are we supposed to run a web service with requests waiting 2 minutes for access to our database? This is occurring several times a day. It occurs less after switching from Standard level to Premium level, but we're nowhere near our DB's DTU capacity and we're getting throttled hard far too often.
Our SQL Azure DB is Premium P1 and our load according to the new Azure portal is usually under 20% with a couple spikes each hour reaching 50-75%. Of course, we can't even trust Azure's portal metrics. The old portal gives us no data for our SQL, and the new portal is very obviously wrong at times (our DB was not down for 1/2 an hour, like the graph suggests, but it was down for more than 2 full minutes):
Azure reports the size of our DB at a little over 12GB (in our own SQL Server installation, the DB is under 1GB - that's another of many questions, why is it reported as 12GB on Azure?). We've done plenty of tuning over the years and have good indices.
Our service runs on two D4 cloud service instances. Our DB libraries are all implementing retry logic, waiting 2, 4, 8, 16, 32, and then 48 seconds before failing completely. Controllers are all async, most of our various external service calls are async. DB access is still largely synchronous but our heaviest queries are async. We heavily utilize in-memory and Redis caching. The most frequent use of our DB is 1-3 records inserted for each request (those tables are queried only once every 10 minutes to check error levels).
Aside from batching up those request logging inserts, there's really not much more give in our application's db access code. We're nowhere near our DTU allocation on this database, and the server our DB is on has like 2000 DTU's available to be allocated still. If we have to live with 1+ minute periods of unavailability every day, we're going to abandon Azure.
Is this the best we get?
Querying stats in the database seems to show we are nowhere near our resource limits. Also, on premium tier we should be guaranteed our DTU level second-by-second. But, again, we go more than an entire solid minute without being able to get a database connection. What is going on?
I can also say that after we experience one of these longer delays, our stats seem to reset. The above image was a couple minutes before a 1 min+ delay and this is a couple minutes after:
We have been in contact with Azure's technical staff and they confirm this is a bug in their platform that is causing our database to go through failover multiple times a day. They stated they will be deploying fixes starting this week and continuing over the next month.
Frankly, we're having trouble understanding how anyone can reliably run a web service on Azure. Our pool of Websites randomly goes down for a few minutes a few times a month, taking our public sites down. If our cloud service returns too many 500 responses something in front of it is cutting off all traffic and returning 502's (totally undocumented behavior as far as we can tell). SQL Azure has very limited performance and obviously isn't ready for prime time.

Massive test against azure getting connection refused or service unavailable

We have a cloud service that gets requests from users, passes the data (two params) to table entities and puts them into cloudtables (using BatchTableOperations to InsertOrReplace rows). The method is that simple, trying to keep it light and fast (partition key and parttionkey/rowkey pairs issues are controlled).
We need the Cloud Service to cope with about 10k to 15k "concurrent" requests. We first used queues to get users data and a Worker Role to process queue messages and put them into SQL. Although no error rose and no data was lost, processing was too slow for our needs. Now we are trying cloud tables to see if we can process data faster. With smaller amounts of requests, process is fast, but as we get more requests, errors occur and data is lost.
I've set up a few virtual machines for testing in the same virtual network that the cloud service is on, to prevent firewall to stop requests. A jMeter test with 1000 threads and 5 loops, gets 0% error. Same test from 2 virtual machines is ok too. Adding a third machine causes first errors (0.14% requests get Service unavailable 503 errors). Massive tests from 10 machines, 1000 threads and 2 loops gets massive 503 and/or connection refused errors. We have tried scaling cloud service up to 10 instances but that makes little difference on results.
I'm a bit stuck with this issue, and don't know if I'm focussing the problem with the right tools. Any suggestion will be highly welcome.
The issue may be related to throttling at the storage level. Please look at the scalability targets specified by Windows Azure Storage team here: http://blogs.msdn.com/b/windowsazurestorage/archive/2012/11/04/windows-azure-s-flat-network-storage-and-2012-scalability-targets.aspx. You may want to try doing the load test keeping these scalability targets into consideration.

Detect if SQL Azure is throttling

I have an Azure worker role that inserts a batch of records into a table. Yesterday, it took at most 5 minutes to insert the records, but today it has been taking up to a couple of hours. I suspect that the process is being throttled, but I don't get any exceptions. Does SQL Azure always return an error if you are being throttled, or is there another way to detect if you are being throttled?
In case of CPU throttling SQL Database will not throw an error but will slowdown the operation. At this time there is no mechanism to determine whether this form of throttling is taking place other than possibly looking at the query stats telling that the work is taking place slowly (if your CPU time is lower than usual). Check this link for details about this behavior: performance and elasticity guid (look for "Performance Thresholds Monitored by Engine Throttling").
One of the newer capabilities is the ability to monitor the number of outstanding requests a SQL Azure database has. You can do this with this query:
select count(*) from sys.dm_exec_requests
As you will see in this documentation reaching the limit of worker threads is a key reason for being throttled. Also documented here is that as you approach 180 worker threads you can expect to be throttled.
This is one of the things used in the Cotega monitoring service for SQL Azure to detect issues. [Disclaimer: I work on this service]

Resources