Jenkins: Queue jobs if there are available Azure VM - azure

Situation:
I have a pipeline job that executes tests in parallel. I use Azure VMs that I start/stop on each build of the job thru Powershell. Before I run the job, it checks if there are available VMs on azure (offline VMs) then use that VMs for that build. If there is no available VMs then I will fail the job. Now, one of my requirements is that instead of failing the build, I need to queue the job until one of the nodes is offline/available then use those nodes.
Problem:
Is there any way for me to this? Any existing plugin or a build wrapper that will allow me to queue the job based on the status of the nodes? I was forced to do this because we need to stop the Azure VM to lessen the cost usage.
As of the moment, I am still researching if this is possible or any other way for me to achieve this. I am thinking of any groovy script that will check the nodes and if there are no available, I will manually add it to the build queue until at least 1 is available. The closest plugin that I got is Run Condition plugin but I think this will not work.
I am open to any approach that will help me achieve this. Thanks

Related

Azure Automation Use Case

I have a certain script (python), which needs to be automated that is relatively memory and CPU intensive. For a monthly process, it runs ~300 times, and each time it takes somewhere from 10-24 hours to complete, based on input. It takes certain (csv) file(s) as input and produces certain file(s) as output, after processing of course. And btw, each run is independent.
We need to use configs and be able to pass command line arguments to the script. Certain imports, which are not default python packages, need to be installed as well (requirements.txt). Also, need to take care of logging pipeline (EFK) setup (as ES-K can be centralised, but where to keep log files and fluentd config?)
Last bit is monitoring - will we be able to restart in case of unexpected closure?
Best way to automate this, tools and technologies?
My thoughts
Create a docker image of the whole setup (python script, fluent-d config, python packages etc.). Now we somehow auto deploy this image (on a VM (or something else?)), execute the python process, save the output (files) to some central location (datalake, eg) and destroy the instance upon successful completion of process.
So, is what I'm thinking possible in Azure? If it is, what are the cloud components I need to explore -- answer to my somehows and somethings? If not, what is probably the best solution for my use case?
Any lead would be much appreciated. Thanks.
Normally for short living jobs I'd say use an Azure Function. Thing is, they have a maximum runtime of 10 minutes unless you put them on an App Service Plan. But that will costs more unless you manually stop/start the app service plan.
If you can containerize the whole thing I recommend using Azure Container Instances because you then only pay for what you actual use. You can use an Azure Function to start the container, based on an http request, timer or something like that.
You can set a restart policy to indicate what should happen in case of unexpected failures, see the docs.
Configuration can be passed from the Azure Function to the container instance or you could leverage the Azure App Configuration service.
Though I don't know all the details, this sounds like a good candidate for Azure Batch. There is no additional charge for using Batch. You only pay for the underlying resources consumed, such as the virtual machines, storage, and networking. Batch works well with intrinsically parallel (also known as "embarrassingly parallel") workloads.
The following high-level workflow is typical of nearly all applications and services that use the Batch service for processing parallel workloads:
Basic Workflow
Upload the data files that you want to process to an Azure Storage account. Batch includes built-in support for accessing Azure Blob storage, and your tasks can download these files to compute nodes when the tasks are run.
Upload the application files that your tasks will run. These files can be binaries or scripts and their dependencies, and are executed by the tasks in your jobs. Your tasks can download these files from your Storage account, or you can use the application packages feature of Batch for application management and deployment.
Create a pool of compute nodes. When you create a pool, you specify the number of compute nodes for the pool, their size, and the operating system. When each task in your job runs, it's assigned to execute on one of the nodes in your pool.
Create a job. A job manages a collection of tasks. You associate each job to a specific pool where that job's tasks will run.
Add tasks to the job. Each task runs the application or script that you uploaded to process the data files it downloads from your Storage account. As each task completes, it can upload its output to Azure Storage.
Monitor job progress and retrieve the task output from Azure Storage.
(source)
I would go with Azure Devops and a custom agent pool. This agent pool could include some virtual machines (maybe only one) with docker installed. I would then install all the necessary packages that you mentioned on this docker container and also the DevOps agent (it will be needed to communicate with the agent pool).
You could pass every parameter needed in the build container agents through Azure Devops tasks and also have a common storage layer for build and release pipeline. This way you could mamipulate/process your files on the build pipeline and then using the same folder create a task on the release pipeline to export/upload those files somewhere.
As this script should run many times through the month, you could have many containers so that to run more than one job at a given time.
I follow the same procedure for a corporate environment. I keep a VM running windows with multiple docker machines to compile diferent code frameworks. Each container includes different tools and is registered to a custom agent pool. Jobs are distributed across those containers and build and release pipelines integrate with multiple processing.
You probably suppose to use Azure Data Factory for moving and transforming data.
Then you can also use ADF for calling Azure Batch that will be using python.
https://learn.microsoft.com/en-us/azure/batch/tutorial-run-python-batch-azure-data-factory
Adding more info could probably suggest other better suggestions.

Docker containers runs great locally. Now I need it on schedule in cloud

I've containerized a logic that I have to run on a schedule. If I do my docker run locally (whatever my image is local or it is using the one from the hub) everything works great.
Now I need though to run that "docker run" on a scheduled base, on the cloud.
Azure would be preferred, but honestly, I'm looking for the easier and cheapest way to achieve this goal.
Moreover, my schedule can change, so maybe today that job runs once a day, in the future that can change.
What do you suggest?
You can create an Azure Logic app to trigger the start of a Azure Container Instance. As you have a "run-once" (every N minute/hour/..) container, the restart-policy should be set to "Never", so that the container only executes and then stops after the scheduling.
The Logic app needs to have the permissions to start the Container, so add a role assignment on the ACI to the managed identity of the logic App.
Screenshot shows the workflow with a Recurrence trigger, that starts an existing container every minute.
Should be quite cheap and utilizes only Azure services, without any custom infrastructure
Professionally I used 4 ways to run cron jobs/ scheduled builds. I give a quick summary of all with it pros/cons.
GitLab scheduled builds (free)
My personal preference would be to setup a scheduled pipeline in GitLab. Simply add the script to a .gitlab-ci.yml, configure the scheduled build and you are done. This is the lightweight option and works in most cases, if the execution time is not too long. I used this approach for scraping simple pages.
Jenkins scheduled builds (not-free)
I used the same approach as GitLab with Jenkins. But Jenkins comes with more overhead and you have to configure the entire Jenkins on multiple machines.
Kubernetes CronJob (expensive)
My third approach would be using a kubernetes cronjob. However, I would only use this if I consume a lot of memory/ram, or have a long execution time. I used this approach for dumping really large data sets.
Run a cron job from a container (expensive)
My last option would be to deploy a docker container on either a VM or a Kubernetes cluster and configure a cron job from within that docker container. You can even use docker-in-docker for that. This gives maximum flexibility, but comes with some challenges. Personally I like the separation of concerns when it comes to down-times etc. That's why never run a cron job as main process.

Azure web app deployment using vscode is faster than devops pipeline

Currently, I am working on Django based project which is deployed in the azure app service. While deploying into the azure app service there were two options, one via using DevOps and another via vscode plugin. Both the scenario is working fine, but strangle while deploying into app service via DevOps is slower than vscode deployment. Usually, via DevOps, it takes around 17-18 minutes whereas via vscode it takes less than 14 min.
Any reason behind this.
Assuming you're using Microsoft hosted build agents, the following statements are true:
With Microsoft-hosted agents, maintenance and upgrades are taken care of for you. Each time you run a pipeline, you get a fresh virtual machine. The virtual machine is discarded after one use.
and
Parallel jobs represents the number of jobs you can run at the same time in your organization. If your organization has a single parallel job, you can run a single job at a time in your organization, with any additional concurrent jobs being queued until the first job completes. To run two jobs at the same time, you need two parallel jobs.
Microsoft provides a free tier of service by default in every organization that includes at least one parallel job. Depending on the number of concurrent pipelines you need to run, you might need more parallel jobs to use multiple Microsoft-hosted or self-hosted agents at the same time.
This first statement might cause an Azure Pipeline to be slower because it does not have any cached information about your project. If you're only talking about deploying, the pipeline first needs to download (and extract?) an artifact to be able to deploy it. If you're also building, it might need to bring in the entire source code and/or external packages before being able to build.
The second statement might make it slower because there might be less parallelization possible than on the local machine.
Next to these two possible reasons, the agents will most probably not have the specs of your development machine, causing them to run tasks slower than they can on your local machine.
You could look into hosting your own agents to eliminate these possible reasons.
Do self-hosted agents have any performance advantages over Microsoft-hosted agents?
In many cases, yes. Specifically:
If you use a self-hosted agent, you can run incremental builds. For example, if you define a pipeline that does not clean the repo and does not perform a clean build, your builds will typically run faster. When you use a Microsoft-hosted agent, you don't get these benefits because the agent is destroyed after the build or release pipeline is completed.
A Microsoft-hosted agent can take longer to start your build. While it often takes just a few seconds for your job to be assigned to a Microsoft-hosted agent, it can sometimes take several minutes for an agent to be allocated depending on the load on our system.
More information: Azure Pipelines Agents
When you deploy via DevOps pipeline. you will go through a lot more steps. See below:
Process the pipeline-->Request Agents(wait for an available agent to be allocated to run the jobs)-->Downloads all the tasks needed to run the job-->Run each step in the job(Download source code, restore, build, publish, deploy,etc.).
If you deploy the project in the release pipeline. Above process will need to be repeated again in the release pipeline.
You can check the document Pipeline run sequence for more information.
However, when you deploy via vscode plugin. Your project will get restored, built on your local machine, and then it will be deployed to azure web app directly from your local machine. So we can see deploying via vscode plugin is faster, since much less steps are needed.

Setting for running pipelines in sequence - Azure Devops

Is there a parameter or a setting for running pipelines in sequence in azure devops?
I currently have a single dev pipeline in my azure DevOps project. I use this for infrastructure because I build, test, and deploy using scripts in multiple stages in my pipeline.
My issue is that my stages are sequential, but my pipelines are not. If I run my pipeline multiple times back-to-back, agents will be assigned to every run and my deploy scripts will therefore run in parallel.
This is an issue if our developers commit close together because each commit kicks off a pipeline run.
You can reduce the number of parallel jobs to 1 in your project settings.
I swear there was a setting on the pipeline as well but I can't find it. You could do an API call as part or your build/release to pause and start the pipeline as well. Pause as the first step and start as the last step. This will ensure the active pipeline is the only one running.
There is a new update to Azure DevOps that will allow sequential pipeline runs. All you need to do is add a lockBehavior parameter to your YAML.
https://learn.microsoft.com/en-us/azure/devops/release-notes/2021/sprint-190-update
Bevan's solution can achieve what you want, but there has an disadvantage that you need to change the parallel number manually back and forth if sometimes need parallel job and other times need running in sequence. This is little unconvenient.
Until now, there's no directly configuration to forbid the pipeline running. But there has a workaruond that use an parameter to limit the agent used. You can set the demand in pipeline.
After set it, you'll don't need to change the parallel number back and forth any more. Just define the demand to limit the agent used. When the pipeline running, it will pick up the relevant agent to execute the pipeline.
But, as well, this still has disadvantage. This will also limit the job parallel.
I think this feature should be expand into Azure Devops thus user can have better experience of Azure Devops. You can raise the suggestion in our official Suggestion forum. Then vote it. Our product group and PMs will review it and consider taking it into next quarter roadmap.

Manage multiple systems for a job using Jenkins

I'm pretty new to Jenkins and I'm not sure if it can be used for the following:
We have multiple branches (let's say 5 branches)
For each branch we have to run a test suite that requires 2 servers, 1 linux client and 1
windows client
We want to share these resources between the jobs, say that we have a pool of 6 servers, 3 linux clients and 3 windows clients
Is it possible to manage this through Jenkins? The job can be kicked off through a simple shell script, but it has to "reserve" the resources and pass these as parameters to the shell script. It should also queue the test suite jobs if no resources are currently available.
I looked into the Jenkins basis but so far only found the "build slave" model, where you run jobs on managed clients. But I haven't found any solution to manage multiple resources yet. Is that possible through Jenkins?
Thanks in advance!
Yes this is possible through Jenkins. You must have assigned labels to your slave nodes, so jobs can be so designed that they use pairs of linux and windows. While configuring the job assign, in the textbox, "Restrict where this project can be run" -> "Label Expression" labels of your slaves.
you can also find above icon there from here you can find about the various operators that can be utilized for putting conditions also.
Also take reference from here if you are new to Jenkins

Resources