I have an Azure SQL Elastic Pool with 800 eDTUs and 400GB of Storage.
In past, I used to have 400 eDTUs, and when I scale to 800 eDTUs I had a small downtime in the website (~15-30 seconds), so before making any changes to the storage, I'd like to check if I should expect downtime as well.
Does anyone know?
We do scaling real-time in eDTU's and DTU's and no downtime observed.
URL :https://learn.microsoft.com/en-us/azure/azure-sql/database/scale-resources
I don't think don't need downtime when you Increase DTU's. Definitely, there is certain impact which you can read from here
Regardless, the switch can result in a brief service interruption when
databases are unavailable generally for less than 30 seconds and often
for only a few seconds
Related
We use Azure SQL databases and an elastic pool (level "Standard").
Usually the creation of a new customer database takes approximately 1-2 minutes but suddenly it started taking way longer (up to 10 minutes) and I have no idea why this is happening. I checked the pool in the Azure portal and everything seems fine. We are still far away from reaching the given limits (257/500 databases; ~11GB/200GB data size). Upscaling for a short period of time has no effect.
Is there anything else I can do?
I think there are some ongoing issue at Microsoft cloud services just check if your issue related to that, if that’s true your issue should be temporary
We currently have an elastic pool of databases in Azure that we would like to scale based on high eDTU usage. There are 30+ databases in the pool and they currently use 100GB of storage (although this is likely to increase).
We were planning on increasing the eDTU's allocated to the pool when we detect high eDTU usage. However a few posts online have made me question how well this will work. The following quote is taken from the azure docs - https://learn.microsoft.com/en-us/azure/sql-database/sql-database-resource-limits
The duration to rescale pool eDTUs can depend on the total amount of storage space used by all databases in the pool. In general, the rescaling latency averages 90 minutes or less per 100 GB.
If i am understanding this correctly this means that if we want to increase the eDTUs we will have to wait for on average 90 minutes per 100GB. If this is the case scaling dynamically won't be suitable for us as 90 minutes to wait for an increase in performance is far too long.
Can anyone confirm if what i have said above is correct? And are there any alternative recommendations to increase eDTUs dynamically without having to wait for such a long period of time?
This would also mean if we wanted to scale based on a schedule, i.e. scale up eDTUs at 8am we would actually have to initiate the scaling at 6:30am to allow for the estimated 90mins of scaling time - if my understanding of this is correct.
When you scale the pool eDTUs, Azure may have to migrate data (this is a shared database service). This will take time, if required. I have seen scaling being instant and I have seen it take a lot of time. I think that Microsoft's intent is to offer cost savings via Elastic Pools and not the thru ability to quickly change eDTUs.
The following is the answer provided by a Microssoft Azure SQL Database manager:
For rescaling a Basic/Standard pool within the same tier, some service
optimizations have occurred so that the rescaling latency is now
generally proportional to the number of databases in the pool and
independent of their storage size. Typically, the latency is around
30 seconds per database for up to 8 databases in parallel provided
pool utilization isn’t too high and there aren’t long running
transactions. For example, a Standard pool with 500 databases
regardless of size can often be rescaled in around 30+ minutes (i.e.,
~ 500 databases * 30 seconds / 8 databases in parallel).
In the case of a Premium pool, the rescaling latency is still
proportional to size-of-data.
This Azure SQL Database manager promised to update Azure documentation as soon as they finish implementing more improvements.
Thank you for your patience waiting for this answer.
We run a web service that gets 6k+ requests per minute during peak hours and about 3k requests per minute during off hours. Lots of data feeds compiled from 3rd party web services and custom generated images. Our service and code is mature, we've been running this for years. A lot of work by good developers has gone into our service's code base.
We're migrating to Azure, and we're seeing some serious problems. For one, we are seeing our Premium P1 SQL Azure database routinely become unavailable for 1-2 full entire minutes. I'm sorry, but this seems absurd. How are we supposed to run a web service with requests waiting 2 minutes for access to our database? This is occurring several times a day. It occurs less after switching from Standard level to Premium level, but we're nowhere near our DB's DTU capacity and we're getting throttled hard far too often.
Our SQL Azure DB is Premium P1 and our load according to the new Azure portal is usually under 20% with a couple spikes each hour reaching 50-75%. Of course, we can't even trust Azure's portal metrics. The old portal gives us no data for our SQL, and the new portal is very obviously wrong at times (our DB was not down for 1/2 an hour, like the graph suggests, but it was down for more than 2 full minutes):
Azure reports the size of our DB at a little over 12GB (in our own SQL Server installation, the DB is under 1GB - that's another of many questions, why is it reported as 12GB on Azure?). We've done plenty of tuning over the years and have good indices.
Our service runs on two D4 cloud service instances. Our DB libraries are all implementing retry logic, waiting 2, 4, 8, 16, 32, and then 48 seconds before failing completely. Controllers are all async, most of our various external service calls are async. DB access is still largely synchronous but our heaviest queries are async. We heavily utilize in-memory and Redis caching. The most frequent use of our DB is 1-3 records inserted for each request (those tables are queried only once every 10 minutes to check error levels).
Aside from batching up those request logging inserts, there's really not much more give in our application's db access code. We're nowhere near our DTU allocation on this database, and the server our DB is on has like 2000 DTU's available to be allocated still. If we have to live with 1+ minute periods of unavailability every day, we're going to abandon Azure.
Is this the best we get?
Querying stats in the database seems to show we are nowhere near our resource limits. Also, on premium tier we should be guaranteed our DTU level second-by-second. But, again, we go more than an entire solid minute without being able to get a database connection. What is going on?
I can also say that after we experience one of these longer delays, our stats seem to reset. The above image was a couple minutes before a 1 min+ delay and this is a couple minutes after:
We have been in contact with Azure's technical staff and they confirm this is a bug in their platform that is causing our database to go through failover multiple times a day. They stated they will be deploying fixes starting this week and continuing over the next month.
Frankly, we're having trouble understanding how anyone can reliably run a web service on Azure. Our pool of Websites randomly goes down for a few minutes a few times a month, taking our public sites down. If our cloud service returns too many 500 responses something in front of it is cutting off all traffic and returning 502's (totally undocumented behavior as far as we can tell). SQL Azure has very limited performance and obviously isn't ready for prime time.
I maintain an azure cloud service. It is set to auto-scale based on load. To monitor the health of this service I have another service which pings this service every 2 minutes. The usual response time from this service is around 100ms.
Once or twice a week I see that the service does not respond. It is not really a worry for me - because it happens quite infrequently. I still am trying to figure out what could be causing the service to not respond. I do not think the problem is with the pinging service - I don't see any of the other services (not on azure, but on other servers) that it pings having any issues.
What could be causing these occasional delays. Any other azure service owners seeing such delays ?
Having quite similar problems. But I use Applications Inside, so I have some statistics. For example that reponse time increases together with SQL azure access time and CPU usage. My average response time according to Applications Inside is about 600ms and average RPS is about 0,6. During these problems RPS usually higher than avarage - up to 1.5, but average response time grows up to 1min! (During the day my RPS can grow up to 3 or even higher without any reponse time growth). As I have 1min sql connection timeout and I have drammatical growth of total SQL azure access time during this periods I can assume that problem happens bacause of SQL Azure. This also happens once a day or two, for about 10-15 minutes max and my ping service also always reports that service doesn't respond.
So my advice here - install Application Insights to analyze what happens dusring these response delays. It would be great if you share your results here.
P.S. I also use autoscale based on load. Though it doesn't really help in these concrete situations.
I remember reading somewhere that SQL Azure is going to terminate long-running queries. Is there a time limit on how long a query can run against a database before it is terminated? Where I work, I run complex queries against large tables that take about 5 minutes each.
SQL Azure Connection Limit is written by MSDN Library and technet wiki.
http://msdn.microsoft.com/en-us/library/ee336245.aspx#cc
http://social.technet.microsoft.com/wiki/contents/articles/sql-azure-connection-management.aspx
For example,
SQL Azure kills all transactions after they run for 24 hours. If you
lose a connection due to this reason, you will receive error code
40549.
and
Large transactions, transactions executed in large batches, or large
sorts might consume a significant tempdb space. When a session uses
more than 5 GB of tempdb space (= 655,360 pages), the session is
terminated.
Azure will terminate web-requests that are IDLE for more than 1 minute and that are made over the load-balancer (meaning from the outside to its web servers).
Azure will also throttle your slow-running queries, but only if they take resources away from the other tenants in the SQL server instance that your database is in. I do not believe there are any published statistics as to precise metrics when such throttling will occur.