Hello guys I'm currently working with Azure Synapse studio. My situation could be described in this way:
I have 3 env: Dev, Test and Prod, each of them has a Azure synapse workspace but I can access only to the Dev one. I need to make some changes from Dev also for the other 2 env (sql script, pipelines etc) and then publish them to other env without touching them.
So I think Azure DevOps can be the solution.
From Dev Syanapse studio Workspace I created 3 branches 1 per env, all of them linked to an Azure DevOps repo. Also Test and Prod are linked to the same repo.
The problem is that the code on Test and Prod workspace could be different from the code on Dev. So I can't use the same ARM template (generated by publishing on the publish branch of the workspace) for all the 3 environment. A good way could be find a way to hit the publish button also on the other envs without using the portal, for example by a REST API ? It is possible ?
Now I only set up the 3 branch solution so I can magae the 3 env directly from Dev env but I think that this will not be the right solution, are changes applied on other envs ? Can I run SQL scripts or pipelines manually from other envs ?
This is my current situation on the other envs I asked to set collaboration and publish branch with the same value as the env branch name (test-test-test and prod-prod-prod)
with the new version (V2) of the Synapse workspace deployment (in Preview 2022-06), it is now possible to deploy from any branch using Azure Devops, so no need for a workspace_publish branch or the Publish button.
Just make the object json files available as artifacts to the release pipeline, and select "Validate and deploy" as the Operation Type.
I am working with Microsoft directly, building a Synapse warehouse myself for a large corporation. We have the same issue, in that the Publish button must be pressed manually for the ARM templates to be generated. Microsoft have confirmed that there is no automatic method for this available right now; we had hoped to receive a Preview AzDevOps deployment task this month, but it turns out that it simply allows us to validate the JSON assets - it still deploys using the ARM template.
We have also looked at using Azure Data Factory tools to deploy from the JSON component files, but we run into issues with the dedicated pool stored procedure tasks being unsupported. :(
The only standard option to achieve this is by creating GitHub repository and then creating Continuous Integration and creating a self-hosted Azure DevOps VM agent or use an Azure DevOps hosted agent.
Then you can setup release pipelines in Azure DevOps to work with different environments. But still you need to commit the changes in the GitHub repository for each environment, there is no Publish button kind of this available.
Refer Continuous integration and delivery for an Azure Synapse Analytics workspace for more details.
This was bothering me as well, so put together the following to be run once any PR is approved to merge into the Synapse collaboration branch, in our case, "main".
For your case, you can modify to target the relevant workspaces.
See below Azure DevOps pipeline code.
What it does is:
runs the Synapse workspace validation task, which also generates the workspace template jsons as an artifact that need to be published to the workspace_publish branch.
It will then check out your publish branch and commit and push the templates that were generated from the previous task
Such that the workspace UI does not think there are any unpublished changes when you click the "Publish" button, we need to update the workspace configuration to reflect the latest commit ID from the workspace COLLABORATION branch (main in this example) that was used to generate what we pushed to the PUBLISH branch in the previous step.
Any suggestions/improvements welcome. Hope this helps.
name: $(TeamProject)_$(Build.DefinitionName)_$(SourceBranchName)_$(Date:yyyyMMdd)$(Rev:.r) # sets Build.BuildNumber
trigger:
branches:
include:
- main
paths:
include:
- synapse/*
resources:
repositories:
- repository: 'Synapse-Publish'
type: git
name: Synapse # update to the name of your repo
ref: workspace_publish # update to the name of your synapse PUBLISH branch
variables:
repoName: $(Build.Repository.Name)
azureSubscription: your_subscription
azureTenantId: your_tenant_guid
adoOrg: your_azure_devops_org_name
adoProject: your_azure_devops_project_name
SourceWorkspaceName: your_synapse_workspace_name
workspacePublishBranch: workspace_publish # should be the same for you but update if not
stages:
- stage: build_stage
displayName: Build, Run Validations, Publish NonProd if merged to main
jobs:
# other jobs excluded from this snippet
- job: publish_workspace_artifacts_job
displayName: Publish for $(SourceWorkspaceName) $(workspacePublishBranch)
# only kick off workspace publish job for non-PR builds
condition: and(not(or(failed(), canceled())), ne(variables['Build.Reason'], 'PullRequest'))
pool:
name: 'linux-vmss' # update this for whatever you need
steps:
- checkout: self # main
clean: true
persistCredentials: true
- task: Synapse workspace deployment#2
displayName: Generate workspace artifact templates
condition: true
continueOnError: false
inputs:
operation: 'validate' # despite this name, it also generates the templates
ArtifactsFolder: '$(Build.SourcesDirectory)/$(repoName)/synapse'
TargetWorkspaceName: $(SourceWorkspaceName)
- checkout: 'Synapse-Publish' # workspace_publish
clean: true
persistCredentials: true
- task: CmdLine#2
displayName: 'Set git user'
inputs:
workingDirectory: '$(System.DefaultWorkingDirectory)'
failOnStderr: true
script: |
git config --global user.email "whatever.you.want#your_org.com"
git config --global user.name "Whatever You Want"
- task: AzurePowerShell#5
displayName: Publish to $(SourceWorkspaceName) $(workspace_publish)
condition: true
inputs:
azureSubscription: '$(azureSubscription)'
ScriptType: InlineScript
Inline: |
# the output from the workspace validate step above are saved here, also published as artifact with name = the synapse workspace name
# Get-ChildItem $(Build.SourcesDirectory)/ExportedArtifacts -Name
cd $(Build.SourcesDirectory)/$(repoName)
git pull origin $(workspacePublishBranch)
git switch $(workspacePublishBranch)
Move-Item -Path $(Build.SourcesDirectory)/ExportedArtifacts/*.json -Destination $(Build.SourcesDirectory)/$(repoName)/$(SourceWorkspaceName) -Force -Verbose
git add $(Build.SourcesDirectory)/$(repoName)/$(SourceWorkspaceName)/*.json
$diff = git diff --cached
$status = git status
if (!($status.ToLower() -like "*nothing to commit*"))
{
echo "##[section]git push changes to repo";
git commit -m "Update $(workspacePublishBranch) for source workspace $(SourceWorkspaceName) [skip ci]";
git pull --rebase;
git push origin $(workspacePublishBranch);
}
else
{
echo "##[warning]No new changes to push for source workspace $(SourceWorkspaceName) templates";
git reset –-hard origin/$(workspacePublishBranch)
git clean -fxd
}
azurePowerShellVersion: 'LatestVersion'
- task: AzurePowerShell#5
displayName: Update $(SourceWorkspaceName) Git Config # this is required so when you click "Publish" within the workspace it doesn't think there are any changes vs. what's already published
inputs:
azureSubscription: '$(azureSubscription)'
ScriptType: InlineScript
Inline: |
# get latest version of this module which now has the LastCommitId parameter that we need
Install-Module -Name Az.Synapse -Confirm:$false -RequiredVersion 1.5.0 -Force
Import-Module -Name Az.Synapse -MinimumVersion 1.5.0
cd $(Build.SourcesDirectory)/$(repoName)
[String] $latestCommitHash = git log -n 1 origin/main --pretty=format:"%H" # format to get only the hash value of the latest commit
$config = New-AzSynapseGitRepositoryConfig `
-RepositoryType AzureDevOpsGit `
-TenantId $(azureTenantId) `
-AccountName $(adoOrg) `
-ProjectName $(adoProject) `
-RepositoryName $(repoName) `
-CollaborationBranch main `
-RootFolder "/synapse" `
-LastCommitId $latestCommitHash
echo "##[section] Updating $(SourceWorkspaceName) git configuration to point to the latest main branch commit ID"
# see https://learn.microsoft.com/en-us/powershell/module/az.synapse/update-azsynapseworkspace?view=azps-8.0.0
Update-AzSynapseWorkspace -Name $(SourceWorkspaceName) -GitRepository $config
azurePowerShellVersion: 'LatestVersion'
Related
How to authorize variable in a yaml template in another repo to be used in a different repo. IOW, how to declare variables in a template once and use in multiple repos in azure devops
I am trying to migrate from classic pipelines to yaml in azure devops. So i am trying to set up a repo to host all yaml templates so it can be referenced and reused by multiple repos for builds, etc.
I wrote this yaml pipeline to sample prototyping it:
`name: FirstPL
trigger:
- my_test_branch
pool: my-agent
resources:
repositories:
- repository: blah
type: git
name: foo/bar
ref: refs/heads/poc
variables:
- template: pipeline_vars.yml#blah
steps:
- script: echo $(variable_from_pipeline_vars)
`
However when i run this i get the follwoing error:
An error occurred while loading the YAML build pipeline. Variable group was not found or is not authorized for use. For authorization details, refer to https://aka.ms/yamlauthz.
How can i declare my variables and variables groups once in a template in a repo that is dedicated to host those templates and then use them over and again in multiple repos using the resourcs syntax above? Also, I tried to find a way authorize the variables template but couldn't find anything to enable this.
How to authorize variable in a yaml template in another repo to be
used in a different repo. IOW, how to declare variables in a template
once and use in multiple repos in azure devops. However when I run
this i get the follwoing error:
An error occurred while loading the YAML build pipeline. Variable group was not found or is not authorized for use. For authorization
details, refer to
https://aka.ms/yamlauthz.
You can directly add the variable group in your azure DevOps project in the Library tab and save all your variables from pipeline_vars.yml in the variable group like below:-
Now, You can access this variable group in your YAML pipeline of multiple repos like the below:-
# Starter pipeline
# Start with a minimal pipeline that you can customize to build and deploy your code.
# Add steps that build, run tests, deploy, and more:
# https://aka.ms/yaml
pool:
vmImage: ubuntu-latest
workspace:
clean: all
resources:
repositories:
- repository: repo_a
type: git
name: InternalProjects/repo_a
trigger:
- main
- release
- repository: repo_b
type: git
name: InternalProjects/repo_b
trigger:
- main
variables:
- group: SharedVariables
steps:
- checkout: repo_a
- checkout: repo_b
- script: |
echo $(databaseServerName)
- task: AzureCLI#2
inputs:
azureSubscription: 'xxx subscription(xxxxxxxxx-f598-44d6-b4fd-xxxxxxxxxxxx)'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: 'az resource list --location uksouth'
Output:-
It asks for approving permission for the Variable group to run in the pipeline like below:-
Console:-
Tried the same with another repo repo_b in the project and it asks to approve access for repositories and variable groups like the below:-
Output:-
If you want this variable to be accessed in multiple stages/repos/pipelines within the project without authorization prompt. You can click on Security on top and allow it:-
I created one variables template and referenced it in the YAML pipeline to use across multiple repos by checking out another repo like below:-
# Starter pipeline
# Start with a minimal pipeline that you can customize to build and deploy your code.
# Add steps that build, run tests, deploy, and more:
# https://aka.ms/yaml
pool:
vmImage: ubuntu-latest
workspace:
clean: all
resources:
repositories:
- repository: repo_a
type: git
name: InternalProjects/repo_a
trigger:
- main
- release
- repository: repo_b
type: git
name: InternalProjects/repo_b
trigger:
- main
variables:
- template: pipeline_vars.yml
steps:
- checkout: repo_a
- checkout: repo_b
- script: |
echo $(environmentName)
- task: AzureCLI#2
inputs:
azureSubscription: 'xxx subscription(xxxxxxxx-f598-44d6-b4fd-e2b6e97xxxxxx)'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: 'az resource list --location uksouth'
Output:-
I tried to reference the same template in another repo where it does not exist it could not read the pipeline_vars.yml file as it does not exist in the repo.
You can make use of variable groups like the above to reference the variables in this pipeline.
One of the possible reasons for this is that the project that hosts the repository with the variables does not allow access to it's repositories from yaml pipelines.
To verify, go to your project's settings -> Pipelines -> Settings -> Verify "Protect access to repositories in YAML pipelines" . This setting is enabled by default. You could set it to off or add a checkout step to your pipeline yaml. See here for more information.
I am setting up my Azure DevOps pipeline (an Azure CLI task) with the intention of deploying a resource group and several resources within it. So far I have been able to deploy and validate from my local pc with no issues however when I configure my pipeline in DevOps I get the following error message:
C:\devops_work\11\s\main_v2.bicep(55,29) : Error BCP091: An error occurred reading file. Could not find a part of the path 'C:\devops_work\11\isv-bicep\storage_account.bicep'.
For context, 'main_v2.bicep' is my "main file" where the modules are called, in this case, "storage_account.bicep"
The same error occurs for all other modules. A couple of details regarding my pipeline:
I am using my own agent pool
My code sits in an Azure Repository
I have tried checking 'Checkout submodules' (Any nested submodules within)
The files all sit at the root level of the repository
My pipeline is not a YAML pipeline
Any help or insight into this is duly appreciated
You need to specify the working directory and ensure that your repository is being cloned.
steps:
- checkout: self
- powershell: |
az deployment group create `
-f "your-bicep-file.bicep" `
-g "your-resource-group-name"
workingDirectory: $(Build.SourcesDirectory)
If the pipeline is not in the same repository as your bicep files, on the checkout change self by the name of the repository alias, and complement the working directory with the path (By default it is cloned into $(Build.SourcesDirectory), but if you check out more than one repo it adds an extra directory).
steps:
- checkout: <your repo alias>
- powershell: |
az deployment group create `
-f "your-bicep-file.bicep" `
-g "your-resource-group-name"
workingDirectory: $(Build.SourcesDirectory)/<your repo alias>
I want to create access to my private feed from pipeline (Project A), but everytime when i try run dotnet restore to restore .net project with nugets from private feed (project scoped feed in Project B) i got
error NU1301: Unable to load the service index for source <<url_to_my_feed>>
My pipeline.yml looks like:
[previous jobs]
- task: NuGetAuthenticate#0
- task: Docker#2
displayName: Build image
inputs:
command: build
containerRegistry: $(dockerRegistryServiceConnection)
repository: $(contentHostRepositoryName)
Dockerfile: "$(Build.SourcesDirectory)/src/modules/content/src/Dockerfile"
arguments: '--build-arg PAT=$(VSS_NUGET_ACCESSTOKEN)'
tags: |
$(tag)
[next jobs]
My dockerfile looks like:
...
ARG PAT
RUN dotnet nuget add source <<url_to_my_feed>> --name <<name>> --username <<username>> --password $PAT --store-password-in-clear-text
...
When i replace '--build-arg PAT=$(VSS_NUGET_ACCESSTOKEN)' to '--build-arg PAT=<<pat_token>>' where <<pat_token>> is token, who i generate manualty to my personal account in Azure Devops - everythink work fine.
What i try:
using $(System.AccessToken) instead $(VSS_NUGET_ACCESSTOKEN)
in feed set contribution permission to Project A Build Service
in project B disable Limit job authorization scope to current project for non-release pipelines and Limit job authorization scope to current project for release pipelines
use NuGetAuthenticate#0 and NuGetAuthenticate#1
into Pipelines Security in Project A allow Project Build Service for everythink
before using $(System.AccessToken) main into varaible
Add the Project Collection Build Service (Organization) and let it has related permission.
Since you use yml file pipeline, and you disable Limit job authorization scope to current project for non-release pipelines, so project build service account will change to project collection build service account.
And please use $(System.AccessToken) to auth.
I'm attempting to create a Scheduled Azure Pipeline where I clone a self hosted BitBucket git repository using a Service Connection and mirror it to an existing Azure git repository.
A client keeps a a repository of code on their own BitBucket server. I'd like to set up a pipeline where I pull any changes from that repo on a scheduled interval into my own Azure repository so I can set up automated deployments.
I keep getting hung up on the Service Connection part of things. The Service Connection is setup as "Other Git" and contains all of the credentials I need to access the remote BitBucket server.
trigger: none
schedules:
- cron: "*/30 * * * *" # RUN EVERY 30 MINUTES
displayName: Scheduled Build
branches:
include:
- my-branch
always: true # RUNS ALWAYS REGARDLESS OF CHANGES MADE
pool:
name: Azure Pipelines
steps:
- task: AzureCLI#2
name: setVariables
displayName: Set Output Variables
continueOnError: false
inputs:
azureSubscription: "Service Connection Name"
scriptType: ps
scriptLocation: inlineScript
addSpnToEnvironment: true
inlineScript: |
Write-Host "##vso[task.setvariable variable=username;isOutput=true]$($env:username)"
Write-Host "##vso[task.setvariable variable=password;isOutput=true]$($env:password)"
- powershell: |
# Use the variables from above to pull latest from
# BitBucket then change the remote origin and push
# everything to my Azure repo
displayName: 'PowerShell Script'
When I run this I end up getting an error stating:
The pipeline is not valid. Job: setVariables input connectedServiceNameARM
expects a service connection of type AzureRM but the proviced service connection is of type git.
How can I access variables from a git service connection in my YAML pipeline?
The AzureCLI task only accepts service connections of the Azure Resource Manager type. So the git connection you are using doesn't work.
According to your needs, you can check out the repo first. There is a Bitbucket Cloud Service connection for Bitbucket repositories. You can use it to check out multiple repositories in your pipeline if you keep the yaml files in the azure repo.
Here is the sample yaml and screenshot:
resources:
repositories:
- repository: MyBitbucketRepo
type: bitbucket
endpoint: MyBitbucketServiceConnection
name: MyBitbucketOrgOrUser/MyBitbucketRepo
trigger: none
schedules:
- cron: "*/30 * * * *" # RUN EVERY 30 MINUTES
displayName: Scheduled Build
branches:
include:
- my-branch
always: true # RUNS ALWAYS REGARDLESS OF CHANGES MADE
pool:
name: Azure Pipelines
steps:
- checkout: MyBitbucketRepo
- powershell: |
# Use the variables from above to pull latest from
# BitBucket then change the remote origin and push
# everything to my Azure repo
displayName: 'PowerShell Script'
How could I rewrite this python script so that it can run within azure devops pipeline and export the dataframe as a csv to the devops repository. I'm able to achieve this locally but would like to achieve this remotely.
Put different, how can I export a pandas dataframe to devops repos folder as a csv file using an azure devops pipeline task. Below is the python script that needs to run as a pipeline task.
local_path in this case should be azure devops path.
from azureml.core import Workspace, Dataset
local_path = 'data/prepared.csv'
dataframe.to_csv(local_path)
⚠️You really should not do this. Azure pipelines are for building code, not for processing data. Assuming that you meant Azure DevOps Pipelines, opposed to Azure ML Pipelines.
Also you should not commit data to your repository.
If you still want to proceed, here is an example for what you try to achieve. Note that for the last line, i.e. git push, you need to give the agent permission to write the repository. See Run Git commands in a script for an approximate☹️ documentation on how to do that on your account.
trigger: none
pool:
vmImage: 'ubuntu-latest'
steps:
- checkout: self
persistCredentials: true
- task: UsePythonVersion#0
inputs:
versionSpec: '3.8'
addToPath: true
architecture: 'x64'
- script: |
python your_data_generating_script.py
git config --global user.email "you#example.com"
git config --global user.name "Your Name"
git add data/prepared.csv
git commit -m'test commit'
git push origin HEAD:master
displayName: 'push data to master'