VM scale set can be used to create multiple VM's based on the business requirement and, Also, Azure batch is also used to execute job in multiple VM's.
What is the exact difference between Azure Batch and VM Scale set?
Azure Batch is a Platform as a Service offering that has an entire plaform for scheduling, submitting tasks and obtaining their results. Jobs and tasks are submitted using Node pools. Node pools can be comprised of VMSS compute resourses.
Whereas a VMSS is an Infrastructure as a Service that provides compute resources for any intended purposes. While you can spin up your own VMSS for running tasks, you would have to also implement your own job, task and compute coordinator service around it in order to simulate the Azure Batch service offerings.
At a high-level, Azure Batch provides two fundamental pieces for scheduling Batch and HPC workloads in the cloud:
Managed infrastructure
Cloud-native job scheduling
Azure Batch presents infrastructure at a managed layer "above" VMSS and CloudServices. Azure Batch orchestrates the pieces underneath to provide a concept called Batch pools, which provide potentially higher scale (as multiple deployments can be orchestrated together transparently) and higher resiliency to failures as Batch automatically recovers virtual machines or cloud service instances which have degraded.
Additionally, and just as important, Azure Batch provides cloud-native job scheduling. This portion is fully managed, i.e., you don't have to run a scheduler yourself. In a nutshell, Azure Batch provides concepts for job queues and tasks which you can define within the programmatic (API/SDK) or tooling that is available. Azure Batch operates on these concepts to execute the work you define (e.g., a command-line with dependencies or a Docker container); tasks can even span multiple nodes (e.g., MPI jobs). Azure Batch has the ability to retry these tasks if they fail on different nodes within a pool. Azure Batch provides an autoscale system that allows you to dynamically resize your infrastructure (Batch pools) that respond to node metrics and the number of jobs/tasks executing in the system.
Please refer to the technical overview as a starting point.
azure batch intent is to run jobs, vmss workloads. technically they do overlap a fair bit, but job is something rather short lived\bursty, whereas workload has to be working all the time
VM Scaleset is used to provide automatic scaling for an application and load balancing of traffic.VM Scale sets are good for running web applications/api based workloads where automatic scaling of the applications is handled and traffic load balancing is done.
Azure Batch is for tasks, scheduling jobs, running intrinsically parallel and tightly coupled workloads. It can provide scaling and load balancing of different nodes/VM that would be used for performing a high computation job. It would probably not be a suitable target for long-running services. A common scenario for Batch involves scaling out intrinsically parallel work, such as the rendering of images for 3D scenes, on a pool of compute nodes.
Related
I have some workload which needs to be run a few times per week. It requires some heavy computational work and runs about one hour (with 16 cores and 32gb memory). It is possible to run it in a container.
Azure offers many different possibilities to run containers. (I have no knowledge of most of the Azure services, so my conclusions might be wrong.) Firstly, I thought Azure Container Instances is perfect for this scenario, but it only offers containers with up to 4 vCPU and 16gb memory. There is no need for orchestration with a single container, so Azure Kubernetes Service and Azure Service Fabric come with too much overhead. Similarly, Azure Batch also offers computational clusters which are not needed for a single workload.
Which Azure service is the best fit for this use case?
While a "best fit" question is likely to be closed. Anyways, here's a suggestion.
Don't dismiss AKS. You can easily create a 1 node cluster using a VM that fits your required configuration. Using the standard SLA, you don't pay for the master node and you can stop your cluster after each run and stop being charged. No need to bother about orchestration, see this as a VM that has everything to run your container that you'll use like an ACI.
When would I prefer Azure Functions to Azure Container Instances, considering they both offer the possibility to perform run-once tasks and they bill on consumption?
Also, reading this Microsoft Learn Module:
Serverless compute can be thought of as a function as a service (FaaS), or a microservice that is hosted on a cloud platform.
Azure Functions is a platform that allows you to run plain code (instead of containers). The strength of Azure Functions is the rich set of bindings (input- and output bindings) it supports. If you want to execute a piece of code when something happen (e. g. a blob was added to a storage Account, a timer gets triggered, ....) then I definitely would go with Azure Functions.
If you want to run some container-based workload for a short period of time and you don't have an orchestrator (like Azure Kubernetes Services) in place - Azure Container Instances makes sense.
Take a look at this from Microsoft doc
Source: https://learn.microsoft.com/en-us/dotnet/architecture/modernize-with-azure-containers/modernize-existing-apps-to-cloud-optimized/choosing-azure-compute-options-for-container-based-applications
If you would like to simplify application development model where your application architecture has microservices that are more granular, such that various functionalities are reduced typically to a single function then, Azure functions can be considered for usage.
In case, the solution needs some extension to existing azure application with event trigger based use cases , the azure functions can be better choice . Here, the specific code (function) shall be invoked only for specific event or trigger as per requirement and the function instances are created and destroyed on demand (compute on demand - function as a service (FaaS) ).
More often, the event driven architecture is seen in IoT where typically you can define a specific trigger that causes execution of Azure function. Accordingly, Azure functions have its place in IoT ecosystem as well.
If the solution has fast bursting and scaling requirement, then container Instances can be used whereas if the requirement is predictable scaling then, VMs can be used.
Azure function avoids allocation of extra resources (VMs) and also the cost is considered only when the function is processing work. Here, we need not take care of infrastructure such as where the code is going to execute, server configuration, memory etc. For ACI, the cost is per-second where it is accounted based on the time the container runs - CaaS(Container As A Service).
ACI enables for quickly spawning a container for performing the operation and deletion of it when done where the cost is only for few hours of usage rather than a dedicated VM which would be of high cost. ACI enables one to run a container by avoiding dependency on orchestrators like Kubernetes in scenarios where we would not need orchestration functions like service discovery, mesh and co-ordination features.
The key difference is that, in case of Azure function, the function is the unit of work whereas in container instance, the entire container contains the unit of work. So, Azure functions start and end based on event triggers whereas the microservices in containers shall get executed the entire time.
The processing / execution time also plays a critical role where if the event handler function consumes processing time of 10 minutes or more to execute, it is better to host in VM because the maximum timeout that is configurable for functions is 10 minutes.
There are typical solutions that utilize both the functionalities such that Azure function shall be triggered for minimal processing / decision making and inturn can invoke container instance for specific burst processing / complete processing.
Also, ACI along with AKS form a powerful deployment model for microservices where AKS can be for typical deployment of microservices and ACIs for handling the burst workloads thereby reducing the challenges in management of scaling and ensuring effective utilization of the per second usage cost model.
Can we use Azure Functions along with Azure Batch? Please Advise.
I am working on a POC to decide which one to use for our background processes.
I too was in similar dilemma till I tried both of them for my use case.
The major difference between the two is that Azure Function has a hard timeout limit of I guess 10 minutes which you can not exceed. What I mean is that if your script/execution runs beyond 10 minutes then Azure function will kill it automatically.
Whereas Azure batch is essentially a configuration of pools or VMs in which you can run long running jobs where you are not bothered about the time of its execution. Essentially they are old VMs (low costs too). Difference between batch and Azure VMs is that Azure VMs have high speed VMs but in batch you can configure the periodic jobs where in Azure VMs you need to code in such a way that it executed like a periodic job
And yes it is possible to use Functions with Azure batch. You can configure your script as HTTP trigger in Function which you can call (get/post) through Azure Batch VMs.
Hope it helps.
May be we should expand this topic to Azure services for Batch processing in general. I did come across an article from Microsoft that goes through these options in general (which includes Web Jobs, and Kubernetes options).
But, frankly, even after reading the article; the confusion remains. For example, Azure Batches can be scheduled; but not sure if they can be triggered based on other Azure services like how Azure web jobs handles it. I get a feeling that Azure Batch is pitched where you need high + parallel computing at low costs. Because, none of the other options directly allow you to low-priority and low-cost compute instances. Correct me please!
#AzureBatch #AzureWebJobs #AzureAKS #AzureFunctions
I am running an optimization model (using Google.OrTools) that I build in .Net framework. When I run in my local, the application was running with a CPU of more than 99%, so my team has decided to move this application to Azure ScaleSet where I have one VM and I configured to Scale up to 10 VMs. The problem I face is the same >99% CPU only in my main VM even though new VMs have been added (scaled-up), the CPU on that VMs are <1%. I am now confused about working with ScaleSets in Azure.
In my above case, I am thinking that the job has not been shared with other VMs. How can I resolve this?
Please note that I am running my application using a Console Application and this job does not have frequent connections with database and also Drive, this job is a purely mathematical problem.
Customer will use Azure VMSS as the front endpoint(Or backend pool).
Azure VMSS autoscale ability reduces the management overhead to monitor and tune your scale set as customer demand changes over time.
Azure VMSS will use Azure load balancer to route traffic to all VMSS instances, in this way, all instances CPU usage are consistent.
If your service running without other requests, or other connections, the CPU usage is 99%, it means you should resize that VM to a high size.
First, your preferences and your budget don't determine whether your workload can scale out rather than scale up.
An Azure scale set includes some backend VMs and a load balancer. The load balancer distributes requests to the backend servers.
Your workload can take advantage of an Azure scale set if it consists of multiple, independent requests. The canonical example of this kind of workload is a web server. Running this kind of workload on an Azure scale set doesn't usually require any changes to code.
You might be able to run your workload on a scale set if you have a single request that can be broken down into smaller pieces that can be processed independently. For this kind of parallel processing to work, you'd probably have to rewrite some of your code. The load balancer would see these smaller pieces as multiple requests.
Other ways to improve mathematical performance include
using a different, more appropriate language,
running your code on a GPU rather than a CPU, or
leveraging a third-party system, like Wolfram Mathematica.
I'm sure there are other ways.
Imagine you have 10 physical machines in the lab. How would you split up this task to run faster, on all the machines?
A scale set is a collection of VMs. To make use of scale sets, and autoscale, your compute intensive job needs to be parallelizable. For example, if you can split it into many sub-tasks, then each VM in the scale set can request a sub-task, compute it, send the result somewhere for aggregation, and request another task.
Here is an example of a compute intensive task running on 1000 VMs in a scale set: https://techcommunity.microsoft.com/t5/Microsoft-Ignite-Content-2017/The-journey-to-provision-and-manage-a-thousand-VM-application/td-p/99113
Azure Cloud Services have auto-scale based on CPU / Queue. We have a set of machines running API for uploading and processing files. Although we moved the processing part on Worker Role that scale depending on the queue size, the servers but also take care of the upload while responding to other operations like downloading.
Right now we're using more machines for the just in case scenario, but we want to build a way to scale and to be cost-efficient while having a great upload experience for our users.
What would your approach be for creating a way to detect the network usage across all machines from the same Cloud Service and auto-scale if necessary?
I would:
1) Create metrics that calculate the amount of time it takes to download/upload a file.
2) Aggregate the metrics in some persistence layer (we have plenty in Azure)
3) Create a service that looks those metrics
4) Check the thresholds
5) Use the Management Libraries for .NET to trigger scaling on the Cloud Service(s) affected.
This approach also scales with your solution. You can eventually separate the scaling part from the checking part and have them as two different services, communicate asynchronously.
We also have an old, open source now project that does some of that for you, so you don't have to reinvent the wheel. It's called WASABi. Be careful though as this is not maintained anymore but as I said, you can use it as inspiration.