We currently have an elastic pool of databases in Azure that we would like to scale based on high eDTU usage. There are 30+ databases in the pool and they currently use 100GB of storage (although this is likely to increase).
We were planning on increasing the eDTU's allocated to the pool when we detect high eDTU usage. However a few posts online have made me question how well this will work. The following quote is taken from the azure docs - https://learn.microsoft.com/en-us/azure/sql-database/sql-database-resource-limits
The duration to rescale pool eDTUs can depend on the total amount of storage space used by all databases in the pool. In general, the rescaling latency averages 90 minutes or less per 100 GB.
If i am understanding this correctly this means that if we want to increase the eDTUs we will have to wait for on average 90 minutes per 100GB. If this is the case scaling dynamically won't be suitable for us as 90 minutes to wait for an increase in performance is far too long.
Can anyone confirm if what i have said above is correct? And are there any alternative recommendations to increase eDTUs dynamically without having to wait for such a long period of time?
This would also mean if we wanted to scale based on a schedule, i.e. scale up eDTUs at 8am we would actually have to initiate the scaling at 6:30am to allow for the estimated 90mins of scaling time - if my understanding of this is correct.
When you scale the pool eDTUs, Azure may have to migrate data (this is a shared database service). This will take time, if required. I have seen scaling being instant and I have seen it take a lot of time. I think that Microsoft's intent is to offer cost savings via Elastic Pools and not the thru ability to quickly change eDTUs.
The following is the answer provided by a Microssoft Azure SQL Database manager:
For rescaling a Basic/Standard pool within the same tier, some service
optimizations have occurred so that the rescaling latency is now
generally proportional to the number of databases in the pool and
independent of their storage size. Typically, the latency is around
30 seconds per database for up to 8 databases in parallel provided
pool utilization isn’t too high and there aren’t long running
transactions. For example, a Standard pool with 500 databases
regardless of size can often be rescaled in around 30+ minutes (i.e.,
~ 500 databases * 30 seconds / 8 databases in parallel).
In the case of a Premium pool, the rescaling latency is still
proportional to size-of-data.
This Azure SQL Database manager promised to update Azure documentation as soon as they finish implementing more improvements.
Thank you for your patience waiting for this answer.
Related
I am designing a learning management system and inflow for the website is more in some cases and less in another time. I would like to know about the getting the vCPU's which are scaled up to make it down after the stipulated time. I found a document regarding scaling up but didn't find a way to scale it down.
Any help is appreciated.
There is a chance of auto scaling for the normal services in azure cloud services, that means for stipulated time you can increase or decrease as mentioned in the link.
When it comes for vCPU which is cannot be performed automatically. vCPU can be scaled up based on the request criteria and in the same manner we need to request the support team to scale those down to the normal.
There is no specific procedure to make the auto scaling for vCPU operations. We can increase the capacity of core, but to reduce to the normal, we need to approach the support system for manual changing. You can change it from 10 cores to next level 16 cores, but cannot be performed automatic scaling down from 16 cores to 10 cores.
I have an Azure SQL Elastic Pool with 800 eDTUs and 400GB of Storage.
In past, I used to have 400 eDTUs, and when I scale to 800 eDTUs I had a small downtime in the website (~15-30 seconds), so before making any changes to the storage, I'd like to check if I should expect downtime as well.
Does anyone know?
We do scaling real-time in eDTU's and DTU's and no downtime observed.
URL :https://learn.microsoft.com/en-us/azure/azure-sql/database/scale-resources
I don't think don't need downtime when you Increase DTU's. Definitely, there is certain impact which you can read from here
Regardless, the switch can result in a brief service interruption when
databases are unavailable generally for less than 30 seconds and often
for only a few seconds
We run a web service that gets 6k+ requests per minute during peak hours and about 3k requests per minute during off hours. Lots of data feeds compiled from 3rd party web services and custom generated images. Our service and code is mature, we've been running this for years. A lot of work by good developers has gone into our service's code base.
We're migrating to Azure, and we're seeing some serious problems. For one, we are seeing our Premium P1 SQL Azure database routinely become unavailable for 1-2 full entire minutes. I'm sorry, but this seems absurd. How are we supposed to run a web service with requests waiting 2 minutes for access to our database? This is occurring several times a day. It occurs less after switching from Standard level to Premium level, but we're nowhere near our DB's DTU capacity and we're getting throttled hard far too often.
Our SQL Azure DB is Premium P1 and our load according to the new Azure portal is usually under 20% with a couple spikes each hour reaching 50-75%. Of course, we can't even trust Azure's portal metrics. The old portal gives us no data for our SQL, and the new portal is very obviously wrong at times (our DB was not down for 1/2 an hour, like the graph suggests, but it was down for more than 2 full minutes):
Azure reports the size of our DB at a little over 12GB (in our own SQL Server installation, the DB is under 1GB - that's another of many questions, why is it reported as 12GB on Azure?). We've done plenty of tuning over the years and have good indices.
Our service runs on two D4 cloud service instances. Our DB libraries are all implementing retry logic, waiting 2, 4, 8, 16, 32, and then 48 seconds before failing completely. Controllers are all async, most of our various external service calls are async. DB access is still largely synchronous but our heaviest queries are async. We heavily utilize in-memory and Redis caching. The most frequent use of our DB is 1-3 records inserted for each request (those tables are queried only once every 10 minutes to check error levels).
Aside from batching up those request logging inserts, there's really not much more give in our application's db access code. We're nowhere near our DTU allocation on this database, and the server our DB is on has like 2000 DTU's available to be allocated still. If we have to live with 1+ minute periods of unavailability every day, we're going to abandon Azure.
Is this the best we get?
Querying stats in the database seems to show we are nowhere near our resource limits. Also, on premium tier we should be guaranteed our DTU level second-by-second. But, again, we go more than an entire solid minute without being able to get a database connection. What is going on?
I can also say that after we experience one of these longer delays, our stats seem to reset. The above image was a couple minutes before a 1 min+ delay and this is a couple minutes after:
We have been in contact with Azure's technical staff and they confirm this is a bug in their platform that is causing our database to go through failover multiple times a day. They stated they will be deploying fixes starting this week and continuing over the next month.
Frankly, we're having trouble understanding how anyone can reliably run a web service on Azure. Our pool of Websites randomly goes down for a few minutes a few times a month, taking our public sites down. If our cloud service returns too many 500 responses something in front of it is cutting off all traffic and returning 502's (totally undocumented behavior as far as we can tell). SQL Azure has very limited performance and obviously isn't ready for prime time.
I am using MVC3, ASP.NET4.5, C#, Razor, EF6.1, SQL Azure
I have been doing some load testing using JMeter, and I have found some surprising results.
I have a test of 30 concurrent users, ramping up over 10 secs. The test plan is fairly simple:
1) Login
2) Navigate to page
3) Do query
4) Navigate back
5) Logout.
I am using "small" "standard" instances.
I have found that when my Azure setup is configured to "autoscale", it behaves like my test with one "small" instance with no autoscale. When I setup two "small" instances with no autoscale, it goes twice as fast, or rather the average process time per request is 2x, over the test. So it appears that it is NOT autoscaling. I have tried setting the CPU trigger to a lower target ie 40-70. Still no joy.
On further investigation, when "Autoscale" was first introduced, it seems it evaluated the metrics over the previous hour, and now I see references to "10 minutes". I thought that once the CPU started hitting the target value, then it immediately triggered the new instance, which must be the whole point of "autoscale". If I have a burst of concurrent usage, I need the extra instances now, hence a reason for using a PAAS . Since my test took less than 10 minutes, "Autoscale" never kicked in. So what should be the time that Autoscale takes to kick in?
Thanks.
Azure will check the CPU metric every 5 minutes, and if it exceeds the threshold that is set, will increase the instance count at that point.
Interestingly, Azure will decrease instance counts after 2 hours of remaining below the threshold.
Source: How to Scale Websites
Quoted relevant section:
Note: When Scale by Metric is enabled, Microsoft Azure checks the CPU
of your website once every five minutes and adds instances as needed
at that point in time. If CPU usage is low, Microsoft Azure will
remove instances once every two hours to ensure that your website
remains performant. Generally, putting the minimum instance count at 1
is appropriate. However, if you have sudden usage spikes on your
website, be sure that you have a sufficient minimum number of
instances to handle the load. For example, if you have a sudden spike
of traffic during the 5 minute interval before Microsoft Azure checks
your CPU usage, your site might not be responsive during that time. If
you expect sudden, large amounts of traffic, set the minimum instance
count higher to anticipate these bursts.
It is now possible in the new Azure portal (https://portal.azure.com) to configure scaling based upon different metrics:
CPU
Memory usage
Data in/out
Http queue length
Disk Queue length
And also to configure scale up time and scale down time. In the graph it will show you the current amount of instances (solid line) vs your max configured (dashed line) and your configured metrics. When the metric exceeds the line (=configured scale up for that given metric) it will scale up & vica versa.
I currently have a web application deployed to "Web Sites" - This is configured in standard mode and it performs really well from what I have seen so far.
I have a few questions:
1)My instance size is currently small - however I can scale out to 10 instances. Does this also mean that if I change my instance size to medium or large, I can still have 10 instances?
2)What is the maximum number of instances I can have for an azure web site?
3)Is there any SLA for a single azure instance?
4)Is it possible to change the instance size programatically or is better to just change the instance count
1) Yes
2) 10 for standard.
3) Yes, for Websites Basic and Standard, MS guarantee a 99.9% monthly availability.
4) It depends on a lot of factors. The real question is "Is it better for your app to scale up or scale out?"
Yes, the default limit is 10 instances regardless of the size.
The default limit is 10 instances, but you can contact Azure Support to have the limit increased. Default and "real" limits for Azure services are documented here.
According to the Websites pricing page Free and Shared sites have no SLA and Basic and Standard sites have 99.9% uptime SLA. Having a single instance means that during the 0.1% outage time (43.8 minutes per month) your site will be down. If you have multiple instances then most likely at least one will be up at any given time.
Typically instance auto-scaling is used to handle variation in demand while instance size would be used for application performance. If you only get 100 requests per day but each request is slow because it's maxing out CPU then adding more instances won't help you. Likewise if you're getting millions of requests that are being processed quickly but the volume is maxing out your resources then adding more instances is probably the better solution.