git workflow with multiple remotes and order of operations - linux

I have a bare git repository that I use to push to and pull from on a linux machine (let's call the bare git repository remote originlinux). From my working repository that has originlinux as a remote I push and pull until finally I decide to put it on github. I add the repository for github on their web gui and add the remote repository on my working repository (let's call the remote origingithub) using the git remote add command followed by git pull --rebase, then git push (pull before push since I wasn't allowed to simply push to a newly created github repository without getting one of these: 'hint: Updates were rejected because the tip of your current branch is behind'. I figure this has something to do with their option to create a readme file). And here's the issue, after performing these steps, the originlinux repository is completely not synced with the origingithub repository even though they have exactly the same commits and were pushed to from the same exact working repository. Could someone please explain in good detail why this occurring and also what I could do differently to prevent this from happening without reordering how I create my remote repositories? It seems like the workflow or order of operations I'm using doesn't make sense in git land, but how else would you keep multiple remote repositories sync'd on one working copy?
Thanks!

The two repositories do not have the same commits.
When you did git pull --rebase, you rewrote the entire history of the project so that every revision contains that readme file. So every commit in the history will have a different SHA1 identifier.
There are a couple of ways that you may be able to recover from this.
First, you could revert the state of your local repository to match the state or your first (non-github) remote. This would eliminate the readme file that you created on github (you can copy that to some other location and add it back in to git later if desired), along with any changes that you hadn't pushed to the first remote (including changes that haven't been committed).
git reset --hard originlinux/master
git push -f origingithub
The -f option there causes the push to be forced even though that is removing some commits. This is something that should be avoided in general, but it is sometimes necessary such as in this case.
The other option would be to just do a force push to your first remote, accepting the new history caused by the rebase.
git push -f originlinux
If the three repositories that you mentioned are the only ones, it shouldn't matter much which of these methods you use. If there are other repositories you may want try to determine which version of the history is more widely known, and keep that version.

Related

In GitLab how to change file contents automatically when doing a push to a feature branch and then undo that change when merge into master?

I use GitLab on-premise and need to change the contents of a file automatically when I push into my feature branch so when tests run and code is executed, one of the files called from within the repo itself has modified content. When I merge that branch into master, I need to undo that change.
It's not enough to override the file contents just when the test runs, because of how the application works. It will end up pulling the repo from GitLab on it's own and execute one of the files contained within.
I have been looking into hooks a little bit, but I can't find any references or examples on how to accomplish something like this.
Currently I am changing the file manually so my CI tests run accurately. If the tests then pass, I can manually change the file back and skip the CI tests on the final push, and then merge into master.
There's not really a default way to automatically change, commit and push back into Gitlab from a pipeline, as the pipeline does not have authorization to write into the repo.
However, you can provide a "Personal Access Token" (PAT) for one of Gitlab's users (or even a special service-account created for that purpose) - either commit that to your repo (which is quite unsafe) or provide it through the "CI/CD Variables" setting from within Gitlab.
Your pipeline will then need to do something like:
# change file.txt
# add the remote with credentials authorized to commit; do not fail if the remote already exists
git remote remove pushorigin || true
git remote add pushorigin https://commituser:${PAT}#gitlab.local/path/to/project.git
# add and commit the file
git add file.txt
git commit -m "Commit Message"
# move the remote's branch tip:
git push pushorigin "HEAD:${CI_COMMIT_REF_NAME}"
I don't have a clue how to revert that automatically when/before finally merging the branch. Don't you mind those testing commits being merged back into main, possibly creating conflicts on those files?
I guess overall, the application should be modified to better support testing on those branches.

How to merge a Git branch using a different identity?

We are using Git for a website project where the develop branch will be the source of the test server, and the master branch will serve as the source for the live, production site. The reason being to keep the git-related steps (switching branches, pushing and pulling) to a minimum for the intended user population. It should be possible for these (not extremely technical) users to run a script that will merge develop into master, after being alerted that this would be pushed to live. master cannot be modified by normal users, only one special user can do the merge.
This is where I'm not sure how to integrate this identity change into my code below:
https://gist.github.com/jfix/9fb7d9e2510d112e83ee49af0fb9e27f
I'm using the simple-git npm library. But more generally, I'm not sure whether what I want to do is actually possible as I can't seem to find information about this anywhere.
My intention would be of course to use a Github personal token instead of a password.
Git itself doesn't do anything about user or permission management. So, the short answer is, don't try to do anything sneaky. Rather, use Github's user accounts they way they were intended.
What I suggest is to give this special user their own Github account, with their own copy of the repo. Let's say the main repo is at https://github.com/yourteam/repo, and the special repo is at https://github.com/special/repo.
The script will pull changes from the team repo's develop branch, and merge this into it's own master branch and push to https://github.com/special/repo.
Then, it will push its changes to the team's master branch. This step can optionally be a forced push, since no one else is supposed to mess with master, anyway. (In case someone does, using a forced push here means they have to fix their local repo to match the team repo later on, rather than having the script fail until someone fixes the team repo.)
At the same time, your CI software will notice that master has changed at https://github.com/special/repo, and will publish as you normally would. This is the linchpin: the CI doesn't pay attention to the team repo, so although your team has permission to change it, those changes don't make it into production.
This special user will need commit access to the team repo, in addition to its own GitHub repo. The easiest way is probably to use an SSH key, and run the git push command from the script, rather than trying to use the GitHub API.

How to use git namespace to hide branches

Background
I'm working with a large team using git for version control. The normal flow is:
People selecting a ticket from the "backlog queue".
Working on the issue via a local branch (i.e. git checkout -b my_feature_branch).
Making several commits as they go (i.e. git commit).
Pushing local changes to a remote branch in order to "backup" their work so it lives on more than one machine, in case the laptop is damaged or stolen (i.e. git push -u origin my_feature_branch).
Eventually creating a code review on our private github page, and doing a squashed merge from the feature branch to master.
In addition to the remote feature branches created by employees on an as-needed basis, we have several dozen release branches that are used to create the "gold builds" we ship to customers, i.e. 1.00, 1.01, 2.00, 2.01, 2.02, etc.
Problem
Some developers have begun to complain that there are too many branches, and I tend to agree. Some developers haven't been diligent about cleaning up old branches when they are no longer needed (even though github provides a one-button delete feature for this once the code review is complete).
Question
Is there a way to configure our company github deployment so that, when people use git branch via the CLI:
Only our "important/release/gold" branches appear.
The one-off developer (temporary) branches only appear via git branch -a?
The main goal of this is to reduce clutter.
Edit: I found a similar question, but the only answer is not at all applicable (don't use remote branches), which violates my key constraint of allowing people to push to remote branches as a form of data backup. The concept of private namespaces, as hinted by #Mort, seems to be exactly what I'm looking for. Now, how do I accomplish that?
Long story short: you can - but it may be a bit tricky.
You should use the namespace concept (give a look here: gitnamespaces)
Quoting from the docs:
Git supports dividing the refs of a single repository into multiple namespaces, each of which has its own branches, tags, and HEAD. Git can expose each namespace as an independent repository to pull from and push to, while sharing the object store
and
Storing multiple repositories as namespaces of a single repository avoids storing duplicate copies of the same objects, such as when storing multiple branches of the same source.
To activate a namespace you can simply:
export GIT_NAMESPACE=foo
or
git --namespace=foo clone/pull/push
When a namespace is active, through git remote show origin you can see only the remote branches created in the current namespace. If you deactivate it (unset GIT_NAMESPACE), you will see again the main remote branches.
A possible workflow in your situation may be:
Create a feature branch and work on it
export GIT_NAMESPACE=foo
git checkout -b feature_branch
# ... do the work ...
git commit -a -m "Fixed my ticket from backlog"
git push origin feature_branch # (will push into the namespace and create the branch there)
Merging upstream
unset GIT_NAMESPACE
git checkout master
git pull (just to have the latest version)
git merge --squash --allow-unrelated-histories feature_branch
git commit -a -m "Merged feature from backlog"
git push # (will push into the main refs)
The tricky part
Namespace provides a complete isolation of branches, but you need to activate and to deactivate namespace each time
Pay attention
Pay attention when pushing. Git will push in the current namespace. If you are working in the feature branch and you forgot to activate the namespace, when pushing, you will create the feature branch in the main refs.
It seems as if the simplest solution here, since you're using GitHub and a pull-request workflow, is that developers should be pushing to their own fork of the repository rather than to a shared repository. This way their remote feature branches aren't visible to anybody else, so any "clutter" they see will be entirely their own responsibility.
If everything lives in a single repository, another option would be to set up a simple web service that receives notifications from github when you close a pull request (responding to the PullRequest event). You could then have the service delete the source branch corresponding to the pull request.
This is substantially less simple than the previous solution, because it involves (a) writing code and (b) running a persistent service that is (c) accessible to github webooks and that (d) has appropriate permissions on the remote repository.
The first answers are good. If you can fork repositories and use pull-requests or just keep the branches for yourself, do it.
I will however put my 2 cents in case you are in my situation : a lot of WIP branches that you have to push to a single repository since you work on multiple workstations, don't have fork possibilities, and don't want to annoy your fellow developers.
Make branches starting with a specific prefix, i.e. wip/myuser/, fetch from / push to a custom refspec, i.e. refs/x-wip/myuser/*.
Here is a standard remote configuration after a clone:
[remote "origin"]
url = file:///c/temp/remote.git
fetch = +refs/heads/*:refs/remotes/origin/*
To push branches starting with wip/myuser/ to refs/x-wip/myuser/, you will add:
push = refs/heads/wip/myuser/*:refs/x-wip/myuser/*
This will however override the default push rule for the normal branches. To restore it, you will add:
push = refs/heads/*:refs/heads/*
Finally, to fetch you WIP branches that are now outside the conventional refs/heads/* refspec, you will add:
fetch = +refs/x-wip/myuser/*:refs/remotes/origin/wip/myuser/*
You will end up with this 2nd remote configuration:
[remote "origin"]
url = file:///c/temp/remote.git
fetch = +refs/x-wip/myuser/*:refs/remotes/origin/wip/myuser/*
fetch = +refs/heads/*:refs/remotes/origin/*
push = refs/heads/wip/myuser/*:refs/x-wip/myuser/*
push = refs/heads/*:refs/heads/*
(Git evaluates fetch / push rules from the top to the bottom, and stops as soon as one matches; this means you want to order your rules from the most to the less specific rule.)
People using the standard remote configuration will only fetch branches from refs/heads/*, while you will fetch branches from both refs/heads/* and refs/x-wip/myuser/* with the 2nd configuration.
When your branch is ready to be "public", remove the wip/myuser/ prefix.
The refspec internal documentation was useful to make it.
Please note that once you have push rules in your remote configuration, running the command...
git push
... with no arguments will no longer only push your current branch, nor use any strategy defined with the push.default configuration. It will push everything according to your remote push rules.
You will either need to always specify the remote and the branch you want to push, or use an alias as suggested in this answer.

Keeping an auto-updating clone of a bare git repository

Here is what I did
cd /git
git init --bare repo
What I want to do
I want an auto-updating non-bare clone of the repository to be available at a different location e.g. /srv/web/. What I mean is that everytime someone does a git push the contents in /srv/web/ should automatically update. Similarly, if the git repository is reverted back, then the files in /srv/web should also revert to that.
What I mean is that everytime someone does a git push [to the bare repository in /git/repo,] the contents in [non-bare] /srv/web/ should automatically update. Similarly, if the git repository is reverted back, then the files in /srv/web should also revert to that.
You have, in essence, two choices:
Make /git/repo actively update /srv/web. This /git/repo -> /srv/web path is a "push update" (not the same as git push, but might as well be): it has the "mastering" repository update the "slaving" one whenever there is an update available on the master side.
Make /srv/web actively update from /git/repo. This /git/repo <- /srv/web path is a "pull update" (not the same as git pull, unless you implement it that way, but might as well be): it has the slaving repository update from the mastering one at regular intervals.
Your second requirement ("if the git repository is reverted back") is rather mysterious. A bare repository, by definition, has no work-tree; so no one can do any work in it. It can only be updated by bringing in new commits from some other Git repository. If someone wants to do a git revert, they do it in some other repository, and then git push. So all updates to the bare repository should happen via git push and you should not need this second requirement.
Hence, I'll just ignore the second requirement entirely.
When and why to use one or the other
There's no particularly strong reason to favor either approach, but note that each has a different flaw.
If you use push updates, and the receiver is down, the update never happens. The master tries but fails to update the slave. When the slave comes back up, the master just sits around until there's a new update.
(If everything is on a single server, this problem goes away, and this method becomes the clear winner.)
If you use pull updates, there is a time-lag: however long the pull interval is, the slave can remain out of date. Furthermore, if the master goes down just before an update, the slave can remain out of date even longer than that.
Making /srv/web actively update from /git/repo (pull style update)
This is conceptually simpler. You kist have your /srv/web poll your /git/repo for any interesting updates. The poll frequency / interval determines how long it takes for the update to make it from point A to point B. To make this faster, you could poll infrequently, but also have a triggering mechanism that you invoke from, e.g., a post-receive script: "I just got some important update; please poll now." In other words, you use a hybrid of pull-and-push.
You can literally just run git pull from a crontab entry, for instance (though I recommend not using git pull ever, including here: break it up into git fetch followed by another Git command).
Making /git/repo actively update /srv/web (push style update)
[Edit: I got interrupted while writing the original answer, and mixed up the update and post-update hooks; this is now fixed.]
This is relatively straightforward, using a post-receive or post-update hook. There is also an update hook but that's the wrong place to do this. The difference between them all is I think illustrated best with an example: What happens in /git/repo if I, as someone with push access to it, do this from my own Git clone:
git push origin 1234567:refs/heads/zorg 8888888:refs/tags/lucky
Here, I am telling my Git to contact your server Git (my origin = your /git/repo) and deliver my commit 1234567 to your Git. My Git does so, along with any other objects required to make 1234567 useful. I am also telling my Git to deliver commit-or-tag 8888888 to your Git, so my Git does that, along with any other objects required to make 8888888 useful.
Once your Git has all those objects, my Git asks your Git:
Please set your branch zorg (refs/heads/zorg) to 1234567.
Please set your tag lucky (refs/tags/lucky) to 8888888.
At this point, your Git will invoke your pre-receive hook, if you have one. It delivers the old and new hash IDs for refs/heads/zorg and refs/tags/lucky on standard input. Your pre-receive hook's job is to examine these and decide yea-or-nay: "allow all these updates to proceed to the next step" or "forbid any of these updates from occurring at all."
Next, your Git will invoke your update hook twice (again, if you have one). One of these will say "hey, someone is asking to change refs/heads/zorg, here's the old and new hash values, should we let him?" The other will say "hey, someone is asking to change refs/tags/lucky, here's the old and new hash values, should we let him?" Your hook's job is to examine this one update and decide yea-or-nay: allow the update, or reject it. If you allow one and reject the other, the one update occurs and the other fails.
Finally, after all of the updates have been accepted or rejected, for whatever updates actually did occur, your Git invokes your post-receive and post-update hooks (if those exist). Your Git delivers to your post-receive hook, on standard input, one line for each update that did occur, in the same form it used in the pre-receive hook. Your post-receive hook can do whatever it wants with these input lines, but it's too late to stop the updates from happening: they are already done. Your zorg branch now points to commit 1234567 and your lucky tag now points to commit 8888888, assuming your pre-receive and update hooks did not reject these. Your Git delivers to your post-update hook, as arguments, one argument for each updated reference: refs/heads/zorg and refs/tags/lucky.
You may now take any action you like.
The obvious action to take, in the post-receive or post-update hook, is to trigger /srv/web to pick up the new commit(s) on any branch(es) you want it to update. (The update hook is not suitable as, at hook time, the actual change has not yet happened, so if your /srv/web is very fast, it might not be able to get the new objects from your /git/repo yet: they may still in the process of being cemented into place.)
The actual implementation could be as simple as: "Ditch $GIT_DIR environment variable, cd into slave repository, and run git pull." The reason to unset GIT_DIR is that any Git hook is run with this variable set, and it contains a relative path to the Git repository, which interferes with using other repositories. As before, I recommend avoiding git pull entirely.
Also, be aware that the user-ID (i.e., privileges) of the user that is running the post-receive script depends on the authentication method used to do the git push in the first place. This affects all deployment methods, even if the post-receive script simply sends a message (e.g., a packet on a socket port) to some independent process that does the slave-side update, since the privileges available to send a message may depend on user-ID.
Final note: do you really need a Git repository in the deployment area?
If your server is a typical Web server, it doesn't need a Git repository. You can simply update the equivalent of a work-tree. If your web server is on a different system, using a Git repository may be the simplest or most convenient way to achieve this, but if it is all on one machine, you can just run git --work-tree=/path/to/work-tree checkout ... from the bare repository.
(Note that what gets checked out, and how the update happens, depends on what is in the index and HEAD in the actual repository, and how the index compares to the supplied work-tree. Additional arguments to git checkout may change which branch is to be checked-out, which will update HEAD correspondingly.)
Using git is not actually perfect fit for the scenario you envision for a couple of reasons.
First you are completely reversing the normal use of git. A git repository is actually a logical picture of your project. There might be branches in the project so this logical picture is much more complex then latest version. You need to get actual branch you want to a working copy and work on it. This is what non-bare repositories are about. They are repository and a working copy. It is not the intended use of git to push latest version to a working copy.
Second there are technical difficulties about pushing to a non-bare repository. As a default behavior git would deny pushing to a non-bare repository. However there are ways to configure your non-bare for that. But that configuration is only feasible if you'll never ever modify your non-bare working copy. If you begin to modify the working copy at non-bare you'll definitely start having problems.
Third, if you're willing to serve your working copy on web keep in mind that .git directory will be served too. This might cause vulnerabilities. If you'll do this I at least recommend serving a sub folder of your project if possible. This way .git is left out.
However I'll recommend you another method for doing all this. Instead of initializing a directory under the web tree as a repository you can simply auto copy all you working copy (without repository -- .git folder) to the desired directory. Since you are only interested in serving the files that would be a more suitable method.
At your repository /git/repo, there is a folder named hooks. Create file /git/repo/hooks/post-receive under this directory with the content
#!/bin/bash
rm -rf /srv/web/*
git archive master | tar -x -C /srv/web
Also you need to give execute permission to this file.
chmod +x /git/repo/hooks/post-receive
Then after each push to this bare repo, HEAD of branch master will be copied to the directory of your choice without any repository information.
Update: I think the initial solution in the answer was not valid. So I removed it, alternative solution is still ok though.
Update 2: As #torek noticed this solution causes a small window of invalid content in the web directory. Since you indicated you'll serve the web content on local network, I guess that is not a problem. Moreover this is basically a kind of poor man's deployment scenario and should not be used any production deployment. However this can be improved with a temporary staging directory.
Replace the post-receive hook with the below script. This script reduces the time your /srv/web directory stays empty. Since rm -rf and mv are pretty fast (if your temp directory is on the same disk drive) and since repository size does not effect both commands the invalid content window will be smaller.
#!/bin/bash
STAGING=`mktemp -d`
git archive master | tar -x -C $STAGING
rm -rf /srv/web
mv $STAGING /srv/web
Or you can use a swap instead of deleting the folder first as #torek suggested.
#!/bin/bash
STAGING=`mktemp -d`
SWAP=`mktemp -d`
git archive master | tar -x -C $STAGING
mv /srv/web $SWAP
mv $STAGING /srv/web
rm -rf $SWAP
However note that you are deleting or swapping /srv/web and you'll lose any ownership, permission or ACL information of the folder if you follow this method.
You can alternatively use rsync which will still copy the files, but since it will operate selectively whole content will not be deleted at any instant. Also rsync can be tuned to preserve ownership, permissions, etc.
#!/bin/bash
STAGING=`mktemp -d`
git archive master | tar -x -C $STAGING
rsync -a --delete --remove-source-files $STAGING /srv/web

Can I recover a GIT stash from a fetch

I've got a git repo, and my colleague had a clone of that on his PC. For whatever reason, we've lost his repository due to technical issues.
A short while before we lost his repo, he stashed some work, and I did a git fetch followed by a git merge master.
Is it possible to get the content of the stash? Would the git fetch command have pulled the stash over as well?
I can view all the remote branches with git branch -a but I need the stashed data.
We're running git version 1.6.3.3 on Ubuntu 9.10 Karmic
Sorry, I don't believe you can recover from this. (arguably, if you could, it would be a security risk; someone might have stashed a password inside of a configuration file or something.)
From the documentation (git fetch --help): Fetches named heads or tags from one or more other repositories, along with the objects necessary to complete them.
Key word: named heads. Sadly, the stash isn't a named head (or a tag).

Resources