Azure DevOps build using Docker becoming progressively slower - azure

I'm building multiple projects using a single docker build, generating an image and pushing that into AWS ECR. I've recently noticed that builds that were taking 6-7 minutes are now taking on the order of 25 minutes. The Docker build portion of the process that checks out git repos and does the project builds takes ~5 minutes, but what is really slow are the individual Docker build commands such as COPY, ARG, RUN, ENV, LABEL etc. Each one is taking a very long time resulting in an additional 18 minutes or so. The timings vary quite a bit, even though the build remains generally the same.
When I first noticed this degradation Azure was reporting that their pipelines were impacted by "abuse", which I took as a DDOS against the platform (early April 2021). Now, that issue has apparently been resolved, but the slow performance continues.
Are Azure DevOps builds assigned random agents? Should we be running some kind of cleanup process such as docker system prune etc?

Are Azure DevOps builds assigned random agents? Should we be running some kind of cleanup process such as docker system prune etc?
Based on your description:
The timings vary quite a bit, even though the build remains generally the same.
This issue should still be a performance problem of the hosted agent.
And based on the settings of Azure DevOps, every time you run the pipeline with host-agent, the system will randomly match a new qualified agent. Azure DevOps builds assigned random new agent, so we do not need run some kind of cleanup process.
To verify this, you could set your private agent to check if the build time is much different each time (The first build time may be a bit longer because there is no local cache resource).
By the way, if you still want to determine whether the decline in hosted performance is causing your problem, you should contact the Product team directly, and they can check the region where your organization is located to determine whether there is degradation in the region.

Related

Azure Pipelines: How to block pipeline A if pipeline B is running

I have two pipelines (also called "build definitions") in azure pipelines, one is executing system tests and one is executing performance tests. Both are using the same test environment. I have to make sure that the performance pipeline is not triggered when the system test pipeline is running and vice versa.
What I've tried so far: I can access the Azure DevOps REST-API to check whether a build is running for a certain definition. So it would be possible for me to implement a job executing a script before the actual pipeline runs. The script then just checks for the build status of the other pipeline by checking the REST-API each second and times out after e.g. 1 hour.
However, this seems quite hacky to me. Is there a better way to block a build pipeline while another one is running?
If your project is private, the Microsoft-hosted CI/CD parallel job limit is one free parallel job that can run for up to 60 minutes each time, until you've used 1,800 minutes (30 hours) per month.
The self-hosted CI/CD parallel job limit is one self-hosted parallel job. Additionally, for each active Visual Studio Enterprise subscriber who is a member of your organization, you get one additional self-hosted parallel job.
And now, there isn't such setting to control different agent pool parallel job limit.But there is a similar problem on the community, and the answer has been marked. I recommend you can check if the answer is helpful for you. Here is the link.

How cloud services are provisioned (and billed) once a new deployment is requested to Azure REST API?

I'm using Azure REST API to create, deploy and start a Cloud Service (classic) (cspkg hosted in Azure Storage) with hundreds of instances. I'm noticing that time Azure takes to provision and start the requested instances is really heterogeneous. First instances might start in 6-7 minutes but last ones might take up to 15-20 minutes, about 10 minutes longer than first ones. So my questions are:
Is this the expected behaviour? If so, what's the logic behind this? Could I do anything to speed things up?
How is Azure billing this? Is it counting the total count of instances since the very initial time when Cloud Service is deployed? or is it taking into account the specific timing on each individual instance?
UPDATE: I've been testing more scenarios and I've found a puzzling surprise. If I replace all the processes that my Cloud Service instances should run by a simple wait for some minutes (run .bat file with timeout command) then all the instances start almost at the same time (about 15 seconds between fastest and slowest instance). It was not just luck and random behaviour, I've proved that this behavior is repeatable and I can't even try to explain the root reason.
I also checked this a few weeks ago, and the startup time, depends on the size of the machine, if it is large it has more resources, so the boot time is faster, and also, if there is any error, exception on startup the VM will recycle till it can successfully start. I googled it, but did not find any solution to speed this up, so I don't think it is possible to do anything about the startup time. In the background every time when you deploy something, it will create a Windows Server, and boot it up and deploy your package on it and puts your web roles behind load balancer, this is why it takes so long, because a lot of things are happening.
The billing part is also not the best for the classic cloud services, you have to pay for it even during the startup and recycle, and even when it is turned off, so if you are done with your update, you should delete the VMs from your staging slot or scale it down, because you will pay for it even if it is turned off.

How can I decrease deployment time of Node app on Google App Engine

Right now the time is around 10 minutes, but my app uses 2 minutes on npm install, which app engine does on every deploy, and then runs in about 5 seconds. Why does it take so long time, and is there any tricks that can be done to lower this?
I have heard other places that this is because of changing routes, and that docker slows things down. But I would believe a company like google could manage to atleast cut this down to 1/3 of the current speed.
There are some older questions, but I would like to have an up to date answer
Google cloud deploy so slow
why does google appengine deployment take several minutes to update service
https://groups.google.com/forum/#!topic/google-appengine/hZMEkmmObDU
At the moment, App Engine Flexible deployments are indeed quite slow but as stated in the links you provided (this still stands true), most of the deployment time needed is incurred by actions you can't act upon (load balancer and network configuration, etc...). What you CAN do to speed it up is to:
limit the size of the app you're deploying
limit the complexity of the build necessary in the Dockerfile, if present
ensure you have a fast and reliable internet connection during deployment
Now, there is one option to bypass most of the new setting-up overheads during development. You may specify an already existing version name as parameter during deployment and also specify --no-promote flag:
gcloud app deploy --version <existing-version-number> --no-promote
I've tried it myself and it drastically reduced the deployment time, to ~1m30 for a Hello World app. It does an in-place replacement instead of a new one. Of course, most of the saved time is due to skipped overhead and you'll have to manually direct traffic to that new version. Also, versioning clarity will obviously be impacted, that's why I wouldn't recommend it for production deployment.

Multiple instances of continuous Webjob on single VM in Azure

I have a continuous Webjob running on my Azure Website. It is responsible for doing some work after retrieving items from a QueueTrigger. I am attempting to increase the rate in which the items are processed off the Queue. As I scale out my App Service Plan, the processing rate increases as expected.
My concern is that it seems wasteful to pay for additional VMs just to run additional instances of my Webjob. I am looking for options/best practices to run multiple instances of the same Webjob on a single server.
I've tried starting multiple JobHosts in individual threads within Main(), but either that doesn't work or I was doing something wrong... the Webjob would fail to run due to what looks like each thread trying to access 'WebJobSdk.marker'. My current solution is to publish my Webjob multiple times, each time modifying 'webJobName' slightly in 'webjob-publish-settings.json' so that the same project is considered a different Webjob at publish time. This works great so far, expect that it creates a lot of additional work each time I need to make any update.
Ultimately, I'm looking for some advice on what the recommended way of accomplishing this would be. Ideally, I would like to get the multiple instances running via code, and only have to publish once when I need to update the code.
Any thoughts out there?
You can use the JobHostConfiguration.QueuesConfiguration.BatchSize and NewBatchThreshold settings to control the concurrency level of your queue processing. The latter NewBatchThreshold setting is new in the current in progress beta1 release. However, by enabling "prerelease" packages in your Nuget package manager, you'll see the new release if you'd like to try it. Raising the NewBatchThreshold setting increases the concurrency level - e.g. setting it to 100 means that once the number of currently running queue functions drops below 100, a new batch of messages will be fetched for concurrent processing.
The marker file bug was fixed in this commit a while back, and again is part of the current in progress v1.1.0 release.

Deploying Projects on EC2 vs. Windows Azure

I've been working with Windows Azure and Amazon Web Services EC2 for a good many months now (almost getting to the years range) and I've seen something over and over that seems troubling.
When I deploy a .NET build into Windows Azure into a web role (or service role) it takes usually 6-15 minute for it to startup. In AWS's EC2 it takes about the same to startup the image and then a minute or two to deploy the app to IIS (pending of course its setup).
However when I boot up an AWS instance with SUSE Linux & Mono to run .NET, I get one of these booted and deploy code to it in about 2-3 minutes (again, pending it is setup).
What is going on with Windows OS images that cause them to take soooo long to boot up in the cloud? I don't want FUD, I'm curious about the specific details of what goes on that causes this. Any specific technical information regarding this would be greatly appreciated! Thanks.
As announced at PDC, Azure will soon start to offer full IIS on Azure web roles. Somewhere in the keynote demo by Don Box, he showed that this allows you to use the standard "publish" options in Visual Studio to deploy to the cloud very quickly.
If I recall correctly, part of what happens when starting a new Azure role is configuring the network components, and I remember some speaker at a conference mentioning once that that was very time consuming. This might explain why adding additional instances to an already running role is usually faster (but not always: I have seen this take much more than 15 minutes as well on ocassion).
Edit: also see this PDC session.
I don't think the EC2 behavior is specific to the cloud. Just compare boot times of Windows and Linux on a local system - in my experience, Linux just boots faster. Typically, this is because the number of services/demons launched is smaller, as is the number of disk accesses that each of them needs to make during startup.
As for Azure launch times: it's difficult to tell, and not comparable to machine boots (IMO). Nobody knows what Azure does when launching an application. It might be that they need to assemble the VM image first, or that a lot of logging/reporting happens that slows down things.
Don't forget, there is a Fabric controller that needs to check for fault zones and deploy your VMs across multiple fault zones (to give you high availability, at least when there are more than two instances). I can't say for sure, but that logic itself might take some extra time. This might also explain why network setup could be a little complicated.
This will of course explain the difference (if any) between boot times in the cloud and boot times for windows locally or in Amazon. Any difference in operating systems is completely dependent on the way the OS is built!

Resources