AzureML: Autoscale ML Endpoint - azure-machine-learning-service

I have my model hosted on ACI compute. I'm trying to investigate what it would take to support auto-scaling of the underlying instances? If auto scaling isnt possible, then is there documentation to manually scale the endpoint?
Basically, I need to support high availability on this model endpoint.
A thought that I had was to manually publish the model to 2 endpoints and then add a Load Balander in front. Seems a little hacky...
Thanks!

We usually recommend deploying to AKS for high availability. https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-azure-kubernetes-service

Related

Azure SignalR Auto-scaling

I am using Azure SignalR service instance. SignalR service currently only supports 1000 concurrent connections per service instance per unit. If the number of concurrent SignalR connections exceed 1000, the service instances will have to be increased manually, and reduced manually as the users decrease.
Looking for a suitable solution to auto-scale (scale up and scale down) the SignalR service instances based on the demand.
If any idea, please share. Thanks.
Azure SignalR service doesn't support any auto-scaling capabilities out of the box.
If you want to automatically increase or decrease the number of units based on the current number of concurrent connections, you will have to implement your own solution. You may for example try to do this using a Logic App as suggested here.
The common approach is otherwise to increase the number of units manually using the portal, the REST API or the Azure CLI.
They solved the disconnection issue when scaling, according to https://github.com/Azure/azure-signalr/issues/1096#issuecomment-878387639
And for the auto-scaling feature they are working on it, and in the mean-time here are 2 ways of doing so:
Using Powershell function https://gist.github.com/mattbrailsford/84d23e03cd18c7b657e1ce755a36483d
Using Logic App https://staffordwilliams.com/blog/2019/07/13/auto-scaling-signalr-service-with-logic-apps/
Azure SignalR Service supports autoscale as of 2022 if you select premium pricing tear.
Go to Scale up on the SignalR Service and select Premium pricing
tear.
Go to Scale out and create a custom autoscale.
The examples says that you can scale up if the metric "Connection Quota Utilization" is over 70% (should be about 700 out of your 1000 connections for your first unit). You can also scale down with a similar rule. The examples says to scale down when the connection quota is under 20%.
20% from the example seems a bit restrictive, but I guess its to avoid unneeded scaling. The client connections should be closed and reconnected while scaling down, so doing so very frequently is probably a bad idea.
https://learn.microsoft.com/en-us/azure/azure-signalr/signalr-howto-scale-autoscale

Scaling Azure Computer Vision Service

I am experimenting with the Azure Computer Vision service and it is working well.
The only issue is that I want to be able to process 6,000 files relatively quick.
Now Azure imposes a 10-Transaction Per Second limit which is fine, but how can I scale the service?
I was thinking of spinning up multiple instances, but it seems like an ugly way to solve this issue.
Please let me know the best way to tackle this issue.
Thanks!
Please contact Azure support for scaling your Azure Computer Vision resource.
You can directly ask the support to have an upper limitation.
Submit a support request from Azure for that:

What are the options to host Orleans on Azure without using the Cloud Services?

I want to host an Orleans project on Azure, but don't want to use the (classic) Cloud Services model (I want an ARM template project). The web app sample uses the old web / worker model - what is best option? There is a Service Fabric sample - is that the best route? The nearest equivalent to the web/worker model is VM Scale Sets - is that a well tested option?
IMO, app service is closet to web role.
Worker role however, depending on the point of view
From system architecture point of view, I think Scale Set is the closet. You get an identical set of VMs running your application. However you lost all management features. How your cluster handle application configurations, work loads on each node, service interruptions from server failure or deployments are pretty much DIY. Also you need to provision the VM with dependencies for your application.
From operations point of view, I think Service Fabric is the closest. It handles problems above but then you are dealing with design/implementation changes and learning curve from the added fabric layer in the architecture. Could be small, could be big depending on the complexity of your project. Besides, service fabric is still relatively new and nothing is for sure. Best case you follow the sample change a few lines of code and it works like a charm. Worst case you may want to complete refactor orleans solution into service fabric solution.
App service would be the easiest among the three. If it doesn't meet your requirement, I personally would try Service Fabric. Same reason why people are moving to cloud and you would opt for ARM solution.

How to properly scale Azure cloud service with worker roles for performance?

I split one Azure cloud service into multiple cloud services assuming this would help performance issues, but after I'm done with the implementations and I'm increasing the number of instances i.e. scaling up for every cloud service I'm still having performance worse than single cloud service. Scaling up doesn't seem to bring positive effect. I have different queues across every cloud service, that's one area I can think of causing this problem, but why scaling up doesn't work at all?
As other people noted there is not enough information in your email for us to help. One guide I can point you to for storage is a scalability and performance checklist which has some good tips in it. See here: https://azure.microsoft.com/en-us/documentation/articles/storage-performance-checklist/.

Azure API Management Scalability

Azure API Management has promises of 1000 requests per second for an instance. (I don't know this is a correct rate but let's assume it is). My question is how can we scale web service without scaling its infrastructure just by scaling API Management instance.
For example if Azure API Management supports 1000 requests per second for an instance, then backend service also should support the same request handling threshold in its infrastructure. If this is the case what is really meant by scaling up the web service by Azure API Management.
By using Azure API management you can turn on caching easily, which can significantly reduce the traffic to your back-end. In addition, your API Management instance can be scaled up easily to have more VMs behind it. However, if the back-end cannot handle the traffic (after caching), then you might need a more scalable back-end :)
Miao is correct. However remember Azure API Management scaling will only work with GET request. Plus cache size provided by API Management is of only 1GBas of today [may increase in future]; with no monitoring as of today. So if you need monitoring of API Management cache then use external cache like Redis.
When you talk about scalability it will be at all layers. API Management consumption plan can be good option to think through for auto scaling. Then think of Azure VMSS or App service auto scale for scaling backed APIs. And if your backend APIS are talking to DB then think of something like Autoscale for DB on Azure like SQL Azure HyperScale.
So scalability is not only at API Management level but think carefully at all layers.
Sample implementation of Cache in API Management is here - https://sanganakauthority.blogspot.com/2019/09/improve-azure-api-management.html

Resources