Run Python Code within Azure Devops Pipeline and Export output to folder in Devops Repos ($(Build.SourcesDirectory)) - python-3.x

How could I rewrite this python script so that it can run within azure devops pipeline and export the dataframe as a csv to the devops repository. I'm able to achieve this locally but would like to achieve this remotely.
Put different, how can I export a pandas dataframe to devops repos folder as a csv file using an azure devops pipeline task. Below is the python script that needs to run as a pipeline task.
local_path in this case should be azure devops path.
from azureml.core import Workspace, Dataset
local_path = 'data/prepared.csv'
dataframe.to_csv(local_path)

⚠️You really should not do this. Azure pipelines are for building code, not for processing data. Assuming that you meant Azure DevOps Pipelines, opposed to Azure ML Pipelines.
Also you should not commit data to your repository.
If you still want to proceed, here is an example for what you try to achieve. Note that for the last line, i.e. git push, you need to give the agent permission to write the repository. See Run Git commands in a script for an approximate☹️ documentation on how to do that on your account.
trigger: none
pool:
vmImage: 'ubuntu-latest'
steps:
- checkout: self
persistCredentials: true
- task: UsePythonVersion#0
inputs:
versionSpec: '3.8'
addToPath: true
architecture: 'x64'
- script: |
python your_data_generating_script.py
git config --global user.email "you#example.com"
git config --global user.name "Your Name"
git add data/prepared.csv
git commit -m'test commit'
git push origin HEAD:master
displayName: 'push data to master'

Related

Publish button on Azure Synapse using code

Hello guys I'm currently working with Azure Synapse studio. My situation could be described in this way:
I have 3 env: Dev, Test and Prod, each of them has a Azure synapse workspace but I can access only to the Dev one. I need to make some changes from Dev also for the other 2 env (sql script, pipelines etc) and then publish them to other env without touching them.
So I think Azure DevOps can be the solution.
From Dev Syanapse studio Workspace I created 3 branches 1 per env, all of them linked to an Azure DevOps repo. Also Test and Prod are linked to the same repo.
The problem is that the code on Test and Prod workspace could be different from the code on Dev. So I can't use the same ARM template (generated by publishing on the publish branch of the workspace) for all the 3 environment. A good way could be find a way to hit the publish button also on the other envs without using the portal, for example by a REST API ? It is possible ?
Now I only set up the 3 branch solution so I can magae the 3 env directly from Dev env but I think that this will not be the right solution, are changes applied on other envs ? Can I run SQL scripts or pipelines manually from other envs ?
This is my current situation on the other envs I asked to set collaboration and publish branch with the same value as the env branch name (test-test-test and prod-prod-prod)
with the new version (V2) of the Synapse workspace deployment (in Preview 2022-06), it is now possible to deploy from any branch using Azure Devops, so no need for a workspace_publish branch or the Publish button.
Just make the object json files available as artifacts to the release pipeline, and select "Validate and deploy" as the Operation Type.
I am working with Microsoft directly, building a Synapse warehouse myself for a large corporation. We have the same issue, in that the Publish button must be pressed manually for the ARM templates to be generated. Microsoft have confirmed that there is no automatic method for this available right now; we had hoped to receive a Preview AzDevOps deployment task this month, but it turns out that it simply allows us to validate the JSON assets - it still deploys using the ARM template.
We have also looked at using Azure Data Factory tools to deploy from the JSON component files, but we run into issues with the dedicated pool stored procedure tasks being unsupported. :(
The only standard option to achieve this is by creating GitHub repository and then creating Continuous Integration and creating a self-hosted Azure DevOps VM agent or use an Azure DevOps hosted agent.
Then you can setup release pipelines in Azure DevOps to work with different environments. But still you need to commit the changes in the GitHub repository for each environment, there is no Publish button kind of this available.
Refer Continuous integration and delivery for an Azure Synapse Analytics workspace for more details.
This was bothering me as well, so put together the following to be run once any PR is approved to merge into the Synapse collaboration branch, in our case, "main".
For your case, you can modify to target the relevant workspaces.
See below Azure DevOps pipeline code.
What it does is:
runs the Synapse workspace validation task, which also generates the workspace template jsons as an artifact that need to be published to the workspace_publish branch.
It will then check out your publish branch and commit and push the templates that were generated from the previous task
Such that the workspace UI does not think there are any unpublished changes when you click the "Publish" button, we need to update the workspace configuration to reflect the latest commit ID from the workspace COLLABORATION branch (main in this example) that was used to generate what we pushed to the PUBLISH branch in the previous step.
Any suggestions/improvements welcome. Hope this helps.
name: $(TeamProject)_$(Build.DefinitionName)_$(SourceBranchName)_$(Date:yyyyMMdd)$(Rev:.r) # sets Build.BuildNumber
trigger:
branches:
include:
- main
paths:
include:
- synapse/*
resources:
repositories:
- repository: 'Synapse-Publish'
type: git
name: Synapse # update to the name of your repo
ref: workspace_publish # update to the name of your synapse PUBLISH branch
variables:
repoName: $(Build.Repository.Name)
azureSubscription: your_subscription
azureTenantId: your_tenant_guid
adoOrg: your_azure_devops_org_name
adoProject: your_azure_devops_project_name
SourceWorkspaceName: your_synapse_workspace_name
workspacePublishBranch: workspace_publish # should be the same for you but update if not
stages:
- stage: build_stage
displayName: Build, Run Validations, Publish NonProd if merged to main
jobs:
# other jobs excluded from this snippet
- job: publish_workspace_artifacts_job
displayName: Publish for $(SourceWorkspaceName) $(workspacePublishBranch)
# only kick off workspace publish job for non-PR builds
condition: and(not(or(failed(), canceled())), ne(variables['Build.Reason'], 'PullRequest'))
pool:
name: 'linux-vmss' # update this for whatever you need
steps:
- checkout: self # main
clean: true
persistCredentials: true
- task: Synapse workspace deployment#2
displayName: Generate workspace artifact templates
condition: true
continueOnError: false
inputs:
operation: 'validate' # despite this name, it also generates the templates
ArtifactsFolder: '$(Build.SourcesDirectory)/$(repoName)/synapse'
TargetWorkspaceName: $(SourceWorkspaceName)
- checkout: 'Synapse-Publish' # workspace_publish
clean: true
persistCredentials: true
- task: CmdLine#2
displayName: 'Set git user'
inputs:
workingDirectory: '$(System.DefaultWorkingDirectory)'
failOnStderr: true
script: |
git config --global user.email "whatever.you.want#your_org.com"
git config --global user.name "Whatever You Want"
- task: AzurePowerShell#5
displayName: Publish to $(SourceWorkspaceName) $(workspace_publish)
condition: true
inputs:
azureSubscription: '$(azureSubscription)'
ScriptType: InlineScript
Inline: |
# the output from the workspace validate step above are saved here, also published as artifact with name = the synapse workspace name
# Get-ChildItem $(Build.SourcesDirectory)/ExportedArtifacts -Name
cd $(Build.SourcesDirectory)/$(repoName)
git pull origin $(workspacePublishBranch)
git switch $(workspacePublishBranch)
Move-Item -Path $(Build.SourcesDirectory)/ExportedArtifacts/*.json -Destination $(Build.SourcesDirectory)/$(repoName)/$(SourceWorkspaceName) -Force -Verbose
git add $(Build.SourcesDirectory)/$(repoName)/$(SourceWorkspaceName)/*.json
$diff = git diff --cached
$status = git status
if (!($status.ToLower() -like "*nothing to commit*"))
{
echo "##[section]git push changes to repo";
git commit -m "Update $(workspacePublishBranch) for source workspace $(SourceWorkspaceName) [skip ci]";
git pull --rebase;
git push origin $(workspacePublishBranch);
}
else
{
echo "##[warning]No new changes to push for source workspace $(SourceWorkspaceName) templates";
git reset –-hard origin/$(workspacePublishBranch)
git clean -fxd
}
azurePowerShellVersion: 'LatestVersion'
- task: AzurePowerShell#5
displayName: Update $(SourceWorkspaceName) Git Config # this is required so when you click "Publish" within the workspace it doesn't think there are any changes vs. what's already published
inputs:
azureSubscription: '$(azureSubscription)'
ScriptType: InlineScript
Inline: |
# get latest version of this module which now has the LastCommitId parameter that we need
Install-Module -Name Az.Synapse -Confirm:$false -RequiredVersion 1.5.0 -Force
Import-Module -Name Az.Synapse -MinimumVersion 1.5.0
cd $(Build.SourcesDirectory)/$(repoName)
[String] $latestCommitHash = git log -n 1 origin/main --pretty=format:"%H" # format to get only the hash value of the latest commit
$config = New-AzSynapseGitRepositoryConfig `
-RepositoryType AzureDevOpsGit `
-TenantId $(azureTenantId) `
-AccountName $(adoOrg) `
-ProjectName $(adoProject) `
-RepositoryName $(repoName) `
-CollaborationBranch main `
-RootFolder "/synapse" `
-LastCommitId $latestCommitHash
echo "##[section] Updating $(SourceWorkspaceName) git configuration to point to the latest main branch commit ID"
# see https://learn.microsoft.com/en-us/powershell/module/az.synapse/update-azsynapseworkspace?view=azps-8.0.0
Update-AzSynapseWorkspace -Name $(SourceWorkspaceName) -GitRepository $config
azurePowerShellVersion: 'LatestVersion'

Azure Pipelines. Run script from resource repo

I have yaml file for the azure pipline in a repo. And I need to run powershell script from a different repo.
As far as I understood I can add side repo to resources section in yaml and then use task:ShellScript#2 with scriptPath parameter. But as I understood it works relatively for repo in which yaml is placed. And I'm not sure how can I access file from a different repo.
Yes, you have to use repository resource and checkout that repo as follows:
resources:
repositories:
- repository: devops
type: github
name: kmadof/devops-templates
endpoint: kmadof
steps:
- checkout: self
- checkout: devops
- task: ShellScript#2
inputs:
scriptPath: $(Agent.BuildDirectory)/devops/scripts/some-script.sh

Is it possible to checkout Gitlab repository in YML which sits in Github?

So I'm trying to learn deployement with Azure Devops. I have this Angular app sitting in Gitlab which already has a CI/CD pipeline with jenkins to kubernetes cluster. So i was thinking to do the same with Azure Devops via YAML. Which is not possible according to Azure docs directly from gitlab.
So what i'm trying to do is create CI pipeline from github which takes checkout from gitlab UI repo and build it for deployement.
I have created a Repository Resource in my below pipeline YAMl file. Azure give me error saying:
Repository JpiPipeline references endpoint https://gitlab.com/myusername/myUiRepo.git which does not exist or is not authorized for use
trigger:
- master
resources:
repositories:
- repository: UiPipeline. #alias
type: git
name: repository_name
# ref: refs/heads/master # ref name to use; defaults to 'refs/heads/master'
endpoint: https://#gitlab.com/myusername/myUiRepo.git # <-- Is this possible
stages:
- stage: Checkout
jobs:
- job: Build
pool:
vmImage: 'Ubuntu-16.04'
continueOnError: true
steps:
- checkout: JpiPipeline
- script: echo "hello to my first Build"
Repository type gitlab is not support in YAML pipeline yet. The currently supported types are git, github, and bitbucket, see supported types.
The workaround to get the gitlab repo sources is to run git command inside the script tasks.
For below example Yaml pipeline:
- checkout: none to avoid checkout the github source.
Use git clone https://username:password#gitlab.com/useraccount/reponame.git to clone the gitlab repo inside a script task.
stages:
- stage: Checkout
jobs:
- job: Build
pool:
vmImage: 'Ubuntu-latest'
steps:
- checkout: none
- script: |
git clean -ffdx
git clone https://username:password#gitlab.com/useraccount/reponame.git
#if your password or username contain # replace it with %40
Your gitlab repo will be clone to folder $(system.defaultworkingdirectory)/reponame
Another workaround is to classic UI pipeline. Gitlab repo type is supported in Classic UI pipeline.
You can choose Use the classic editor to create a classic ui pipeline.
When you come to select source page. Choose other git and click Add connection to add your gitlab repo url. Then the pipeline will get the sources for your gitlab repo.

Azure build pipeline

when I run the build pipeline I am getting ##[error]File not found: 'git'. I have an agent running on a server. I installed Git on the server. The pipeline is using this agent and is tied to an Azure repo. I am using simple script as below. Please advice.
trigger:
- master
pool: 'build agent'
vmImage: 'ubuntu-latest'
steps:
- script: echo Hello, world!
displayName: 'Run a one-line script'
script: |
echo Add other tasks to build, test, and deploy your project.
echo See https://aka.ms/yaml
displayName: 'Run a multi-line script'
Here is all that you have to do if you want to create your Azure Pipeline:
Browse to Azure Pipelines and click on New Pipeline
Select Azure Repo when asked about the source of your codebase
Select your repository
Review your Pipeline YAML and click on Run
And voila, you have your first build running!
For customizing your build pipeline further, please check the various built-in build and release tasks.
Here is the YAML schema for your reference.

Is it possible to refrence files inside Azure DevOps pipeline templates when these templates reside in a standalone repo?

I'm setting up several pipelines in Azure DevOps. To make my teams life easier, I'm using job templates.
These job templates are in a a proper repository, just for them.
For every pipeline I define the repository to get the templates from.
Some tasks in these templates run powershell code, and I want this code to be in a script file, to be reusable and stored in the same repo as the template.
When the pipelines runs, the template is embeded, it tries to locate the powershell script inside project repo actually being built/deployed.
How can i achieve this?
The workaround is to have inline code which I really don't want to have.
Any constructive answer will be very appreciated.
Thanks
After some digging I couldn't find any way to specify a script file as source to powershell task in a template.
Inside pipeline definition:
resources:
repositories:
- repository: templates
type: git
name: deploy-templates
variables:
artifactName: 'Trade Data ETL - $(Build.SourceBranchName)'
stages:
- stage: Build
displayName: Build
variables:
- group: DEV-Credential-Group
- group: COMMON-Settings-Group
jobs:
- template: ssis/pipelines/stage-build.yml#templates # Template reference
parameters:
artifactName: '$(artifactName)'
Inside template file:
- task: PowerShell#2
inputs:
filePath: ssis/pipelines/scripts/build-ssis-project.ps1
arguments: '-ProjectToBuild "tradedata-ldz-ssis/tradedata-ldz-ssis.dtproj'
pwsh: true
Update 2021
According to learn.microsoft.com, you can now also check out multiple repositories without custom scripting.
If you check out more than one repository, a separate folder containing the repository is created below $(Build.SourcesDirectory).
You can define multiple repositories like this:
resources:
repositories:
- repository: devops
type: git
name: DevOps
ref: main
- repository: infrastructure
type: git
name: Infrastructure
ref: main
And in the steps simply check them out as follows:
steps:
- checkout: self
- checkout: devops
- checkout: infrastructure
# List all available repositories
- script: ls
Original Answer
Currently the resources command only supports yml files in other repositories. However, you could simply checkout the repository in a task and then run the desired powershell script.
steps:
- task: PowerShell#2
inputs:
targetType: inline
script: |
git clone -b <your-desired-branch> https://azuredevops:$($env:token)#dev.azure.com/<your-organization>/<your-project>/_git/<your-repository> <target-folder-name>
./<target-folder-name>/foo.ps1
env:
token: $(System.AccessToken)
This script would checkout an arbitrary branch and execute a script foo.ps1 in the root of the target repository.
Call - checkout: templates inside the template file. This might only work when you insert a template but it successfully sees the repository resource and pulls it down.
You can copy the script files from source directory. Currently, you have not mentioned the root folder -
ssis/pipelines/scripts/build-ssis-project.ps1
Assuming, you are building on a repo where the powershell script resides -
Try -
- task: PowerShell#1
inputs:
scriptName: '$(ScriptsDir)/ssis/pipelines/scripts/build-ssis-project.ps1'
Pass the value of ScriptsDir where it could be the build source directory or build working directory

Resources