gitlab-CI Efficiently fetching dependent git repos for CI jobs

gitlab-CI Efficiently fetching dependent git repos for CI jobs - gitlab

I'm trying to set up a CI job that require dependent repositories to be placed along side the repository for which I'm enabling CI. By dependent, I mean that my main repo needs the code in the dependent repo but there is no build or test dependency between the two repos
I find a way to clone a dependent repository using this command in the job's script
git clone https://gitlab-ci-token:${CI_JOB_TOKEN}#gitlab.mycompany.com/path_to_my/dependent_repo.git
The problem is that the repo is fresh cloned every time the job runs which takes way too long as the dependent repo is quite large.
Is there a way to "fetch" a dependent repo as efficiently as GitLab CI fetches its own repo (against which CI will run), basically performing a pull instead of a clone?
Should I use cache?

If you have one main repo that depend on a few other repos, I would add these as submodules. This makes it easier to handle in GitLab, and your colleagues can easily find the correct versions they need to clone for these repos. If you have some specific need to not have these as submodules, then I understand!
In GitLab, there are a few different ways of handling this. The most simple one is to use the GIT_STRATEGYvariable:
https://docs.gitlab.com/ee/ci/yaml/#git-strategy
You can set it to fetch like this:
my_job:
variables:
GIT_STRATEGY: fetch
script:
- echo test
GitLab will then try to reuse an existing working directory instead of always cloning a new one.
I had a case myself when I used a flag for git clone called --reference:
https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---reference-if-ableltrepositorygt
It has some very strange special cases that you have to think about. What the flag does is that git uses a local copy of a repository and copies objects from that one, instead of always copying from the network. This can greatly speed up cloning operations in some cases.
In addition to these suggestions, GitLab has a page with their suggestions to handle large repositories:
https://docs.gitlab.com/ee/ci/large_repositories/

Related

How does GitLab CI/CD determine if two branches can be merged?

For example, if the dev branch is behind the build branch when the two branches merge, the merge request can be created. But it is clear that it cannot merge because it is not possible to merge a backward branch into the current branch. In this case, I want to use the.gitlab-ci.yml configuration, To determine if the dev branch is behind the build branch, I wonder, can this be done? How to configure the.gitlab-ci.yml file if possible?

Basically, gitlab attempts the merge using git to determine if there's a conflict. It actually calls to gitaly to do this.
Basically the process goes like this:
The rails app in repository.rb will call to the gitaly conflict service to check for conclicts
The conflict service list_conflict_file in gitaly will call to git2go
The git2go conflicts subcommand will basically perform the git merge operation and return any conflicts that were encountered. This eventually makes its way back as a response to the call from rails in step (1)
So, if you wanted to do something similar in your CI/CD pipeline, you could use git (or a programatic API to git in your favorite language) to attempt a local merge of the two branches in order to detect conflicts.

How to update repository with built project?

I’m trying to set up GitLab CI/CD for an old client-side project that makes use of Grunt (https://github.com/yeoman/generator-angular).
Up to now the deployment worked like this:
run ’$ grunt build’ locally which built the project and created files in a ‘dist’ folder in the root of the project
commit changes
changes pulled onto production server
After creating the .gitlab-ci.yml and making a commit, the GitLab CI/CD job passes but the files in the ‘dist’ folder in the repository are not updated. If I define an artifact, I will get the changed files in the download. However I would prefer the files in ‘dist’ folder in the to be updated so we can carry on with the same workflow which suits us. Is this achievable?

I don't think commiting into your repo inside a pipeline is a good idea. Version control wouldn't be as clear, some people have automatic pipeline trigger when their repo is pushed, that'd trigger a loop of pipelines.
Instead, you might reorganize your environment to use Docker, there are numerous reasons for using Docker in a professional and development environments. To name just a few: that'd enable you to save the freshly built project into a registry and reuse it whenever needed right with the version you require and with the desired /dist inside. So that you can easily run it in multiple places, scale it, manage it etc.
If you changed to Docker you wouldn't actually have to do a thing in order to have the dist persistent, just push the image to the registry after the build is done.
But to actually answer your question:
There is a feature request hanging for a very long time for the same problem you asked about: here. Currently there is no safe and professional way to do it as GitLab members state. Although you can push back changes as one of the GitLab members suggested (Kamil Trzciński):
git push http://gitlab.com/group/project.git HEAD:my-branch
Just put it in your script section inside gitlab-ci file.
There are more hack'y methods presented there, but be sure to acknowledge risks that come with them (pipelines are more error prone and if configured in a wrong way, they might for example publish some confidential information and trigger an infinite pipelines loop to name a few).
I hope you found this useful.

Bitbucket Pipelines access other node repository

I have enabled Bitbucket Pipelines in one of my node.js repositories to have it run the build on every commit. My repository depends on another node.js repository. For development I've linked the one to the other using npm link.
I've tried a git clone of that repository that is specified in the bitbucket-pipelines.yml file, but the build gets stuck on that command. I guess it's because git is asking for authentication at that point.
Is there a way to allow the container to access other repositories in the same team? Or is there a better way altogether on how to solve this? I'd also be fine with switching to another CI tool if Bitbucket Pipelines aren't capable of this – the only requirement is that it's free for teams < 5 people.
Btw. I'd like to avoid paying for npm private packages if possible.
Thanks!

You can organize access by ssh key for another repo like described in official docs https://confluence.atlassian.com/bitbucket/access-remote-hosts-via-ssh-847452940.html

Gitlab repository vs project vs submodule

I started exploring Gitlab for version control management and I got an issue at the first step itsself. When ever I create a project its creating a new repository. I have few webapplications which are independent to each other. In that case do I need to use different repository for every project.
What I am looking for is what is what and when to use what but not able to find what is repository and what is project in gitlab website as well as through other sources as well.
Also I came across a term submodule, when can it be used. Can I create one global project and have all the webapplications as different submodules.
Can any one please help me in understanding the difference between those 3 and when to use what based on their intended way of usage. Also please help me by pointing to a good learning site where I can get the information of doing basic version control operations in gitlab.
Thanks.

Gitlab manages projects: a project has many features in addition of the Git repo it includes:
issues: powerful, but lightweight issue tracking system.
merge requests: you can review and discuss code before it is merged in the branch of your code.
wiki: separate system for documentation, built right into GitLab
snippets: Snippets are little bits of code or text.
So fear each repo you create, you get additional features in its associated project.
And you can manage users associated to that project.
See GitLab documentation for more.
The Git repo and Git submodule are pure Git notions.
In your case, a submodule might not be needed, unless you want a convenient way to remember the exact versions of different webapp repo, recorded in one parent repo.
But if that is the case, then yes, you can create one global project and have all the webapplications as different submodules.
Each of those submodules would have their own GitLab project (and Git repo).

git workflow with multiple remotes and order of operations

I have a bare git repository that I use to push to and pull from on a linux machine (let's call the bare git repository remote originlinux). From my working repository that has originlinux as a remote I push and pull until finally I decide to put it on github. I add the repository for github on their web gui and add the remote repository on my working repository (let's call the remote origingithub) using the git remote add command followed by git pull --rebase, then git push (pull before push since I wasn't allowed to simply push to a newly created github repository without getting one of these: 'hint: Updates were rejected because the tip of your current branch is behind'. I figure this has something to do with their option to create a readme file). And here's the issue, after performing these steps, the originlinux repository is completely not synced with the origingithub repository even though they have exactly the same commits and were pushed to from the same exact working repository. Could someone please explain in good detail why this occurring and also what I could do differently to prevent this from happening without reordering how I create my remote repositories? It seems like the workflow or order of operations I'm using doesn't make sense in git land, but how else would you keep multiple remote repositories sync'd on one working copy?
Thanks!

The two repositories do not have the same commits.
When you did git pull --rebase, you rewrote the entire history of the project so that every revision contains that readme file. So every commit in the history will have a different SHA1 identifier.
There are a couple of ways that you may be able to recover from this.
First, you could revert the state of your local repository to match the state or your first (non-github) remote. This would eliminate the readme file that you created on github (you can copy that to some other location and add it back in to git later if desired), along with any changes that you hadn't pushed to the first remote (including changes that haven't been committed).
git reset --hard originlinux/master
git push -f origingithub
The -f option there causes the push to be forced even though that is removing some commits. This is something that should be avoided in general, but it is sometimes necessary such as in this case.
The other option would be to just do a force push to your first remote, accepting the new history caused by the rebase.
git push -f originlinux
If the three repositories that you mentioned are the only ones, it shouldn't matter much which of these methods you use. If there are other repositories you may want try to determine which version of the history is more widely known, and keep that version.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string