git pull in Azure Data Factory

git pull in Azure Data Factory - azure

When working with the regular source code, (Java, C++, etc..) there are things like
git pull ..
git fetch ..
git push ..
to synch your remote git repo branch with your local branch.
What is the equivalent of such in the Azure Data Factory world ?
So, I am using azure data factory with the Azure git repo.
I am working in the particular feature branch - "fefature branch"
And my pipeline has a copy activity that hits a data set in its "Sink" stage.
Here is a screen shot but .. it's pretty simple and seems right
I see that my code for Data set definition (Json) in the remote Git repository is different from what I see in the Azure portal gui (being pointed to that same remote branch). ADF Gui in the Azure Portal is correct, the one in the git repo contains some stuff that I already deleted, but it does not gets deleted there (Why??)
So, when I 'Debug' pipeline I get errors which indicate this discrepancy as a problem. I want ty sync the environments and .. given that I do not understand how the discrepancies came about, I don't know how to fix an issue?. Any help is appreciated.

In the ADF world, we use publish and create a new pull request to merge the new changes from a feature branch to the main branch.
it seems like your git repository version is not up to date with the live ADF.
If there are any pending changes in your main branch, then you can click on Publish button to merge the changes
And if you are working on the feature branches, you can merge the changes using the new pull request.
If you have multiple feature branches, then you will need to manually compare the different versions to resolve these conflicts.

Related

Can I run Azure DevOps pipeline without committing it?

I am planning to experiment building a pipeline using Azure DevOps. One thing that I noticed early on is, after azure-pipelines.yml created, I have to commit this first before being able to run it. But I want to experiment on it which revolves around trial and error. Doing multiple commit just to test things out are not feasible.
In Jenkins I can just define my steps and try to run it without committing the file.
Is this also possible to do in Azure DevOps?

But I want to experiment on it which revolves around trial and error. Doing multiple commit just to test things out are not feasible.
Yes it is - you just use a different code branch. That will allow you the freedom to make as many changes as you need, while putting the pipeline together and trying it out, without committing to the master branch.
Then when you're happy with the way the pipeline is running, you can merge your branch into the master branch which the pipeline normally uses.

You cannot run YAML pipelines without committing them, but you can create classic pipelines and run them without committing anything pipeline-related to the repository (except for the source code you want to build). Classic pipelines can later be turned (or copy-pasted, to be exact) into yaml pipelines with view YAML -option.
https://learn.microsoft.com/en-us/azure/devops/pipelines/get-started/pipelines-get-started?view=azure-devops#define-pipelines-using-the-classic-interface

If you're on your own branch, or in a repository without any other developers making changes then you can
Make a change
use git commit --amend to overwrite your previous commit with the new file
use git push --force-with-lease to push that up to Azure DevOps
That will hide your commit history while experimenting

ADF publish confusion in git mode

In git mode, when we want to test a pipeline, ADF forces us to publish first.
Publish action does two things is my understanding
Saves to the local ADF (DEV) as given here
Creates arm templates in a branch (adf_publish/the branch we
specify)
But to get the 'Publish' button enabled, we need to be in collaboration branch. This means no two people can work at the same time on a DEV ADF. As both people will be asked to publish by ADF before they could test the pipeline they are building.
If this is the case then why is there an option for us to connect another branch other than collaboration branch? (by changing it from the drop down)
Also what is a 'working branch'?

As we know, we only can 'Publish' in collaboration branch and changes are being pushed to to "adf_publish" branch by default. By default, the collaboration branch is named main.
If you want team work, you need to create several branches.
Working on the own branch, we can validate and debug the pipeline to make sure everything is ok.
Then click save all, it will commit on the own branch. If we want publish, we need to creat a pull request to the main branch.
4.After merged to the main branch, we can publish to the adf_publish branch.

ADF source integration issues with multiple developers

We have two developers using the same ADF. Each developer creates a git branch and starts working on it. Each developer can save the changes to their own git branch but there can only be one collaboration branch and this branch decides the publishing branch. This is causing a blockade (for one of the developer. How can we solve this ?
ADF publish branch can be set using a publish_config.json but now there is an option to set this in the adf itself. which one takes precedence? What is the best practice here?

You need to manage the work of each developer with standard git branch/merge processes. When one dev is done with work in their feature branch, then they will create a pull request to merge changes into your collaboration branch.
If the second dev has not created a feature branch yet, they can just do so after the pull request from the first dev is complete and then continue work from there. If the second dev has already created a feature branch, then they will need to merge the new changes from the collaboration branch into their feature branch to continue work before later committing to git and creating a pull request to merge changes from their feature branch back into the collaboration branch. From there, you can publish as needed.
This git work can be done through the ADF editor as well as through any other git interface you have. It's up to you.
This article discusses the process in specific detail using the ADF editor.
EDIT:
I believe you now have answers for this from 3 of the other 5 questions you posted about this same topic in the past day.
ADF publish confusion in git mode
Azure data factory working-branch confusion
When ADF publish branch is git protected how to publish?
Here is another article which describes the fundamental git process for ADF to help bring you up to speed with the fundamentals of how the different branches work, and how you can switch publish branches on the fly if needed.

How to setup authoring env to publish site to remote git repo?

I downloaded and started authoring environment (crafter-cms-authoring.zip)
Created site backed by remote git repo as described in: Create site based on a blueprint then push to remote bare git repository
Created a content type, new page.
Published everything
Now, I would expect, that I can see my changes in the remote repo. But all I can see are the initial commits from the 2. step above. No new content type, no new page, no branch "live". (The content items are however visible in the local repo)
What is missing?
Edit:
Since Creafter can by set up in many ways, in order to clarify my deployment scenario, I am adding deployment diagram + short description.
There are 3 hosts - one for each environment + shared git repo.
Authoring
This is where studio is located and content authors make changes. Each change is saved to the sandbox local git repository. When a content is published, the changes are pulled to the published local git repository. These two local repos are not accessible from other hosts.
Delivery
This is what provides published content to the end user/application.
Deployer is responsible for getting new publications to the delivery instance. It does so by polling (periodically pulling from) specific git repository. When it pulls new changes, it updates the local git repository site, and Solr indexes.
Gitlab
This hosts git repository site. It is accessible from both - Authoring and Delivery hosts. After its creation, the new site is pushed to this repo. The repo is also polled for new changes by Deployers of Delivery instances.
In order for this setup to work, the published changes must somehow end up in Gitlab's site repo, but they do not (the red communication path from Authoring Deployer to the Gitlab's site)
Solution based on #summerz answer
I implemented GitPushProcessor and configured new deployment target in authoring Deployer, adding mysite-live.yaml to /opt/crafter-cms-authoring/data/deployer/target/:
target:
env: live
siteName: codelists
engineUrl: http://localhost:9080
localRepoPath: /opt/crafter-cms-authoring/data/repos/sites/mysite/published
deployment:
pipeline:
- processorName: gitPushProcessor
remoteRepo:
url: ssh://path/to/gitlab/site/mysite

I think you might have confused push with publish.
On Publishing
Authoring (Studio) publishes to Delivery (Engine) after an approval workflow that makes content go live. Authoring is where content (and code if you like) is managed and previewed safely, then that is published to the live delivery nodes for delivery to the end-user.
On DevOps
A site's local git repository can be pushed/pulled to/from remote repositories. This means:
Code can flow from a developer's workstation to Studio (via a github, gitlab, bitbucket etc.) <== this is code moving forward (and can flow via environments like QA, Load Testing, etc.)
Content can flow back, from Studio to the developer's local workstation in a similar manner <== this is content moving backward (you can have production content on your laptop if you want)
When code flows forward from a developer to Studio, that's when Studio pulls from the remote git repo.
When content flows backward from Studio to the developer, that's when Studio pushes to the remote git repo.
Documentation
A good bird's eye view of the architecture of the system relating to publishing can be found here: http://docs.craftercms.org/en/3.0/developers/architecture.html
A good article that explains the DevOps workflow/Git stuff is here: http://docs.craftercms.org/en/3.0/developers/developer-workflow.html
Update based on the expanded question
My new understanding based on your question is: You can't allow the deployers in Delivery to access Authoring's published repo to poll due to some constraint (even over SSH and even with limits on the source IP). You'd like to use GitLab as a form of content depot that's accessible as a push from Authoring and pull from Delivery.
If my understanding is correct, I can think of two immediate solutions.
Set up a cron job in authoring to push to GitLab periodically.
You'll need to add GitLab as a remote repo in published and then set up a cron like this:
* * * * * git --git-dir /opt/crafter/data/repos/sites/{YOUR_SITE}/published/.git push 2>&1
Test it out by hand first, then cron it.
Write a deployer processor that can push content out to an end-point upon a change or, wait for the ticket: https://github.com/craftercms/craftercms/issues/2017.
Once this is built, you'll need to configure another deployer in Authoring that will push to GitLab.
In either case, beware not to update things in GitLab since you're using published and not sandbox. (See DevOps notes above to learn why.)

How to achieve gated check-in for GitLab Repository?

My requirement is whenever developer try to do check-in existing GitLab repository then before doing check-in in repository,build should trigger (Jenkins build) and Junit test case should run on new check-in and if passes then it should go forward and will allow developer to do check-in in main repository.
I am not sure but is pre-hook commit can achieve this requirement?

While you could achieve this with pre-commit hooks, it's more common to do so with post-commit hooks on the server-side.
You can achieve this by operating a branch based workflow, there are multiple to choose from - I would recommend reading through this guidance by Atlassian.
Developers will create branches from a 'main' branch (often master, but can be a 'dev' branch working towards a release for instance), then develop code on that branch. They will then push their branch and commits to the remote repository (GitLab). When ready to merge into the main branch, your developers can open a merge request onto the main branch.
On GitLab you can setup a webhook to trigger Jenkins builds when a push event occurs. I would recommend this guide to guide you through it.
In the GitLab project settings you can require a passing build before merge requests are allowed to merge.
Furthermore, your understanding of Git seems incorrect - check in is not a term used in Git. Please take a look at the Git documentation. In Git a developer creates commits against a local copy of the repository, then pushes these to a remote repository (GitLab/GitHub etc.). There is no direct equivalent of the 'check in' used in various centralised version control systems e.g. SVN.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string