Another Azure ML bug caused by new Compute Common Runtime - azure

Many of my Azure ML Studio Designer pipelines began failing today. I was able to make a minimum repro:
Simply excluding columns with the Select Columns In Dataset node will fail with a JobConfigurationMaxSizeExceeded error.
This appears to be a bug introduced by Microsoft's rollout of their new Compute Common Runtime.
If I go into any nodes failing with the JobConfigurationMaxSizeExceeded exception and manually set AZUREML_COMPUTE_USE_COMMON_RUNTIME:false in their Environment JSON field, then they will subsequently work correctly. This is not documented anywhere that I could find, I stumbled over this fix through trial-and-error, and I wasted many hours trying to fix our failing pipelines today.
Does anyone know where I can find a list of possible effects of the Compute Common Runtime migration in Azure ML? I could not find any documentation on this and/or how it might affect existing Azure ML pipelines.

runtime environment variable should be set on the run configuration. property on the Environment object is deprecated.
https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.core.runconfig.runconfiguration?view=azure-ml-py#variables

Related

Azure devops artefact retention

I’ve got a mono repo, which has 10 separate CICD pipelines written in yaml.
I’ve noticed lately that we’ve lost a vast number of runs, and some of them had successful production releases.
Am I right in thinking that the project rententiob settings applies to all pipelines? Rather than individual?
I’ve been reading on the ms website and I think in order to retain them going forward, I have to use the API via a powershell script.
I assume the said script needs to run after a successful deployment to production.
I’m quite surprised that there isn’t a global option to say ‘keep all production releases’
The project Retention policy settings will be applied to all pipeline runs not individual. So you could not use this setting to retention specific successful production releases directly.
To achieve this, you could use the PowerShell script to retention these specific runs with "Condition". Add the PowerShell script as the last task of your deployment to check if this one needs to be retained. Refer to this official doc: https://learn.microsoft.com/en-us/azure/devops/pipelines/build/run-retention?view=azure-devops
Here is an example to retention forever based on condition:
- powershell: |
   $contentType = "application/json";
   $headers = #{ Authorization = 'Bearer $(System.AccessToken)' };
   $rawRequest = #{ daysValid = 365000 ; definitionId = $(System.DefinitionId); ownerId = 'User:$(Build.RequestedForId)'; protectPipeline = $false; runId = $(Build.BuildId) };
   $request = ConvertTo-Json #($rawRequest);
   $uri = "$(System.CollectionUri)$(System.TeamProject)/_apis/build/retention/leases?api-version=6.0-preview.1";
   Invoke-RestMethod -uri $uri -method POST -Headers $headers -ContentType $contentType -Body $request;
  displayName: 'PowerShell Script'
  condition: {Your customize condition}

Any tool for check the result of migration from TFS 2015 to Azure DevOps to ensure that everything was successfully migrated?

We are planning to migrate over from TFS 2015 to Azure DevOps, and the task assigned to me is to find out the way to do a comparison after the migration between what we have on TFS and Azure to ensure that all the tasks, bugs, etc was successfully migrated over. I've checked with the Guide from Azure and found nothing about such post migration checking and comparison. Is there any tool for this or we can only do the whole checking and comparison manually?
There is no such tool.
I have never experienced a partial migration. Due to the way the import works that is also VERY unlikely. Either the complete import fails, or the data is going to be there. I've done many of these migrations as well as server migrations/upgrades and the kind of data-loss you're worried about has never happened.
The one thing you'd need to be careful of are the changes to the retention policies.

Compile time vs Runtime Azure Pipelines

I come across the terms "Compile time" and "Run time" pretty much everywhere when i learn about
azure pipelines.
However, i still didn't find a clear explanation about them.
I have found this page in Microsoft's documentation, but it doesn't explain these terms very clearly.
I would be happy if someone could explain these terms in the context of the whole
run sequence of Azure Pipelines.
Thanks!
When using YAML Azure Devops pipelines you have your pipelines as code definition. Compile time happens before runtime and you can pass parameters to your YAML before it is compiled (parsed in reality). It will evaluate expressions and replace them in your YAML before even starting any tasks. On runtime, the "compiled" yaml will try for example try to read variables from your Azure Devops pipeline.
Here is an example from Microsoft DOSC:
https://learn.microsoft.com/en-us/azure/devops/pipelines/process/expressions?view=azure-devops
Expressions are probably the most affected thing when it comes to differences between compile time and run time.
Also a pretty nice article about this:
https://adamtheautomator.com/azure-devops-variables-complete-guide/

Data Factory DevOps SSIS-IntegrationRuntime

We're planning to use CI/CD pipelines for Data Factory.
In one of our pipelines we use SSIS packages that needs to be called. To call SSIS packages you need to specify an Azure-SSIS IR that must be used.
The Azure-SSIS IR has a different naming on every environment.
Now, it is not possible to set this value dynamic (the option "Add dynamic content [Alt+P]" is not available on this field)
Is there a simple solution to change the Azure-SSIS IR during the deployment?
Thanks in advance
Your linked services aren't named by environment are they? (they most definitley should not be)
The default out of the box cloud runtime is also not named by environment.
Your runtimes should not be named by environment either.
IMHO your naming convention is incorrect. You should challenge it - there's no reason to include an environment designator in any runtime names.
Yes, your parent data factory should definitely have a different name per environment. That's where the distinction is made. Your runtimes should not.
In direct answer to your question, the way I have dealt with this in the past is added a powershell script task to the build part of DevOps that transforms the deployment asset and basically find/replaces the name the delivers the result as a build artifact

Azure Data Factory V2 multiple environments like in SSIS

I'm coming from a long SSIS background, we're looking to use Azure data factory v2 but I'm struggling to find any (clear) way of working with multiple environments. In SSIS we would have project parameters tied to the Visual Studio project configuration (e.g. development/test/production etc...) and say there were 2 parameters for SourceServerName and DestinationServerName, these would point to different servers if we were in development or test.
From my initial playing around I can't see any way to do this in data factory. I've searched google of course, but any information I've found seems to be around CI/CD then talks about Git 'branches' and is difficult to follow.
I'm basically looking for a very simple explanation and example of how this would be achieved in Azure data factory v2 (if it is even possible).
It works differently. You create an instance of data factory per environment and your environments are effectively embedded in each instance.
So here's one simple approach:
Create three data factories: dev, test, prod
Create your linked services in the dev environment pointing at dev sources and targets
Create the same named linked services in test, but of course these point at your tst systems
Now when you "migrate" your pipelines from dev to test, they use the same logical name (just like a connection manager)
So you don't designate an environment at execution time or map variables or anything... everything in test just runs against test because that's the way the linked servers have been defined.
That's the first step.
The next step is to connect only the dev ADF instance to Git. If you're a newcomer to Git it can be daunting but it's just a version control system. You save your code to it and it remembers every change you made.
Once your pipeline code is in git, the theory is that you migrate code out of git into higher environments in an automated fashion.
If you go through the links provided in the other answer, you'll see how you set it up.
I do have an issue with this approach though - you have to look up all of your environment values in keystore, which to me is silly because why do we need to designate the test servers hostname everytime we deploy to test?
One last thing is that if you a pipeline that doesn't use a linked service (say a REST pipeline), I haven't found a way to make that environment aware. I ended up building logic around the current data factories name to dynamically change endpoints.
This is a bit of a bran dump but feel free to ask questions.
Although it's not recommended - yes, you can do it.
Take a look at Linked Service - in this case, I have a connection to Azure SQL Database:
You have possibilities to use dynamic content for either the server name and database name.
Just add a parameter to your pipeline, pass it to the Linked Service and use in the required field.
Let me know whether I explained it clearly enough?
Yes, it's possible although not so simple as it was in VS for SSIS.
1) First of all: there is no desktop application for developing ADF, only the browser.
Therefore developers should make the changes in their DEV environment and from many reasons, the best way to do it is a way of working with GIT repository connected.
2) Then, you need "only":
a) publish the changes (it creates/updates adf_publish branch in git)
b) With Azure DevOps deploy the code from adf_publish replacing required parameters for target environment.
I know that at the beginning it sounds horrible, but the sooner you set up an environment like this the more time you save while developing pipelines.
How to do these things step by step?
I describe all the steps in the following posts:
- Setting up Code Repository for Azure Data Factory v2
- Deployment of Azure Data Factory with Azure DevOps
I hope this helps.

Resources