How can I get a *complete* list of changed files between two commits in github - github-api

(I know similar questions have been asked e.g. GitHub API - how to compare 2 commits but I don't think this is a duplicate)
As part of our build process we need to compare two commits in github and iterate through every file changed between them. There is a lovely API for comparing commits but it silently maxes out at 300 file changes, and while the API supports pagination you can only page through the list of commits, not the associated list of files. All my googling suggests that neither the gh CLI interface or the GraphQL API support diffing commit Ids either.
As best I can tell my options are
clone the whole repo on every build and run git diff $lastReleaseHash...$newReleaseHash --name-status at the command line, which just seems inefficient
Call github's compare API but ignore the list of files as it will max out at 300, instead page through all commits, then for each commit request the list of changed files, then manually stitch them together into an overall diff (i.e. tracking renames between commits, ignoring files that were created and then deleted within the range of commits, super tedious and error-prone)
Surely there are better options?!

You can use
git clone --bare
To clone the repository with just the VC information (no files). Then do a git diff.

Related

Compare two commits using gitlab api and get response in `.diff` format

I want to compare two commits in Gitlab and get the response in .diff format.
I tried the APIs listed in Gitlab doc https://docs.gitlab.com/ee/api/repositories.html#compare-branches-tags-or-commits. But it only returns the diff in JSON format.
How can I get the data in the git diff format?
That was initially requested in issue 23285 five years ago (Oct. 2016):
GitHub seems have undocumented ways to download the unified diff file before create a pull request, that is: by add ".diff" or ".patch" suffix when comparing two branches.
But GitLab didn't have that feature yet.
That particular feature is requested in issue 217206.
Issue 20688 suggests:
The only possible workaround, I do right now is to get the data from compare API, and then use grep to get diff, and append all of them.

Change a folder path for commit in Gitlab

Is there a way to modify folder structure (path) for pushed commits in Gitlab?
For example, my old path is homework, now I want to add a parent folder before that, i.e. superfolder/homework? Tks and welcome for any help
If you want to do so while keeping the history of homework, you need to install the Python-based tool newren/git-filter-repo.
That tool has a renaming based on paths option
In your case:
git filter-repo --path-rename homework:superfolder/homework
Note that it changes the history of the repository, which will have to be forced pushed (git push --force): not a big deal if you are the only one working on it.

My git internal files changed. How to pull them with the changes?

So I've set up a git server and I am using the folder where it puts the files when you
push to this server for deployment (it's the public, website folder).
I used this guide:
https://www.diffuse.nl/blog/how-to-set-up-git-for-automated-deployments
The problem I ran into is that I changed those internal files, in the server folder, but now when I pull from the server to my local machine, it returns the old version of the repo, without the changes I made from the server (I vim-ed into the files and made the changes without telling git for those changes..)
I want to reload the git server repository so that it recognizes and "sees" the changes I made to the files in it's "belly".
Thanks in advance!
The site you linked talks about setting up a server side automatic deployment, push-to-deploy model, using a Git post-receive hook that consists of a single git checkout command with --git-dir and --work-tree options and -f to force a checkout from a bare repository into a working tree that, to Git, is simply a temporary working tree.
This uses Git as a deployment tool. Git makes a pretty poor deployment tool, but for some cases, it's sufficient. If your case is one of those cases, this is okay. It's not great, but it can work. You do, however, need to understand its limitations.
The problem I ran into is that I changed those internal files, in the server folder, but now when I pull from the server to my local machine, it returns the old version of the repo ...
Git largely is not interested in files. Git is really all about commits. A bare repository, which is the sort you set up on the server, doesn't have any working tree at all. (That's why the post-receive hook temporarily sets one up, just long enough to use Git as a crappy deployment tool.)
This means you must understand exactly what a commit is, and what one does for you. A commit is a sort of package deal, containing two things. Each commit is also numbered, with a unique, big ugly number expressed in hexadecimal. Git desperately needs these numbers: that's the only way that Git can find the commit, by its number. But we—humans—are bad at the numbers, so Git gives us a secondary system, by which we can have Git store a number in a name, such as a branch name. Then we have Git store the right number, and we give Git the name: master, or main, or whatever name we like.
The two parts of a commit are these:
One commit contains one snapshot of every file, as a frozen-for-all-time archive. So even though Git only stores commits, you can get files out of a commit, because a commit contains files. The files that are inside the commit—in this frozen archive—are in a special format in which the files are compressed and de-duplicated. This takes care of the fact that most commits mostly have the same files as most other commits. If you have a big file—tens or hundreds of megabytes is still pretty big these days, although it's getting smaller—in a thousand or more commits, there's really only one copy of that file, thanks to the de-duplication.
And, each commit stores some metadata, or information about the commit itself. For instance, each commit stores the name and email address of its author. It stores committer information as well. There are two date-and-time stamps, and there's a free-form area where you, when you make a commit, get to supply a commit message for future reference.
The entire commit, not just the snapshot, is frozen for all time. That's how the commit number is generated. It's impossible to change anything about a commit, because if you do take a commit out of the big database of all commits, indexed by their numbers in Git's key-value store, and then change something in it and put it back, what you get is a new commit, with a new and different unique number. The old commit is still in there, under its old number. So a Git repository never shrinks: it only ever adds new commits.1
Besides this metadata, though, each commit also stores the commit number—or hash ID—of some earlier commit or commits.2 Most commits, which we call "ordinary" commits, have exactly one previous commit number. This is the parent of the commit, and it means that commits in a branch form a simple backwards-looking string:
... <-F <-G <-H
Here H stands in for the hash ID of the last commit in the chain of commits. Commit H has, in its metadata, the actual (big ugly) hash ID of earlier commit G, which in turns stores the hash ID of still-earlier commit F, and so on. In this way, Git only needs to be told the hash ID of the last commit in some chain of commits. It can find all the earlier commits on its own. And, as I mentioned earlier, we humans typically use a branch name to store this hash ID:
...--F--G--H <-- master
Here the name master stores the hash ID for us, so that we don't have to memorize it. If we add a new commit, Git simply automatically updates our name master for us:
...--F--G--H--I <-- master
The name master now points to commit I, instead of commit H, and we again just use the name master to mean get me the latest commit in the chain.
Merge commits are commits with two, or technically two or more, parents; these bring two separate chains of commits together, so that one branch name can find more than one previous commit:
I--J
/ \
...--G--H M--N <-- master
\ /
K--L
This chain "split apart" after H—became two branches—but then got "rejoined" at M when we merged one of the branches back into master. We then deleted the other branch name, because we did not need it any more, and now we have a simple chain on either side of the merge blob.
All of this is built out of two things: names, which find a commit from which we can work backwards, and commits, which store archived snapshots and metadata. And that's what Git is all about. It's not about files, it's about commits, which we find using other commits—when we work backwards—and/or names, when we want to go straight to some particular commits. And that's all!
When we hook up two Gits to each other, we pick one Git as a sender and one as a receiver. The sender and receiver talk with each other to figure out which commits they both already have. They do this with the commit numbers, because those are universal. Thanks to the magic of cryptographic hash functions, one Git and another Git can be sure that they have the exact same commit if they have the same commit number. If one Git has a commit number that the other Git lacks, the lacking Git needs to get that commit from the having Git. As long as the having Git is the sender, and we've told it to send the commit,3
1It is possible to take old, dead commits out of a repository. It's pretty tricky to do that on purpose, but certain actions generate dead commits naturally, and Git will sweep them away eventually, on its own. The nature of distributed version control, however, means that if you've given some commit to some other repository, it can come back even after you thought you'd killed it.
(I had no luck finding an image suitable for "zombie commit" here...)
2There is of course an exception to this rule. At least one commit, in any repository that has commits, can't have a parent commit, because it was the first commit. So that commit simply has no parent commit number. You can create additional parent-less commits (and in fact git stash sometimes does so for its own purposes); such commits are root commits.
3In general, commits can't be restricted, though some Git add-ons try. But when we use git push we pick which final commits we'll ask our Git to send—and their ancestor commits will come along for the ride if needed—and when we use git fetch, which is the actual opposite of git push, we can have our receiving Git not bother to ask the sender for some commits, if we like. The usual default, though, is to get everything on fetch.
Where the files come in
The thing about commits and their archives is that, on their own, they are pretty useless. Only Git can read these things, and nothing at all can write them. They're frozen for all time: we can make new ones, but we cannot overwrite old ones.
To get any work done, though, we need files. That's how our computers operate today, using files. So we have to have two things:
We need a way to tell Git: extract this commit. That's git checkout (or, since Git 2.23, git switch, though git checkout still works too).
And, we need a way to make new commits.
Both of these operations require what Git calls a working tree. That working tree is where the files are. You use git checkout or git switch to create an initial set. Then you work on/with them, use git add to update Git's copies—this involves Git's index, which I am not going to go into here—and git commit to make a new commit: a new snapshot plus metadata. This adds a new commit to your history; your existing history, consisting of all the existing Git commits, remains undisturbed.
The working tree on your server is a temporary working tree. Your server Git forgets its existence the moment after your push-to-deploy. So nothing you do in this working tree can ever make it back into the Git repository on the server.4 The fact is that you're not intended, with this setup, to do any work on the server itself.
Instead, what you are supposed to do is do work on your laptop (or whatever computer you use at your end of the connection to your server). You make any updates you want there, and use git commit to make the new commit(s). You then use git push to send the new commit(s) from your laptop to the server. The server's Git accepts these new commits,5 and then that crappy (but serviceable) post-receive hook acts to deploy the latest master commit.7
In short, you're updating your files on the wrong machine.
4That's not quite true, since you can run git --git-dir=... to temporarily hook that working tree back up to that Git repository. But that's outside the design of Git, and the scope of this answer.
5Or, it rejects them, based on whatever acceptance and rejection criteria you set up. This is all quite controllable, though you need to do a lot of programming on the server side, if you choose to control it. That's why hosting sites like GitHub and Bitbucket have simplified controls. They reserve the full power—and the great responsibility that comes with it6—to themselves, and dole out little, easier-to-use snippets of it to you.
6Yes, that's a Spider-Man reference. The concept, however, goes back quite a bit further.
7The actual deployed commit is determined by the branch name you give to the git checkout command in the script. Did you give a branch name in your script? If not, read the git checkout documentation to figure out which branch name Git will use here.

GitHub/GitLab REST API - Get diff between two branches

We want to get a difference between two GitLab/GitHub branches through REST API. We saw Git supports a command to do that but it seems they don't support for REST API. Is there any API support for this?
git diff --name-status firstbranch..yourBranchName
git diff --name-status origin/develop..origin/master
Showing which files have changed between two revisions
GitHub has dedicated URL (non-REST) for comparing branches.
Example:
https://github.com/octocat/linguist/compare/master...octocat:an-example-comparison-for-docs
Same for GitLab:
https://gitlab.com/gitlab-org/gitlab-foss/compare?from=master&to=master
Although it can be different from a git diff.
The REST API for GitHub would be: "Compare two commits"
GET /repos/:owner/:repo/compare/:base...:head
The response also includes details on the files that were changed between the two commits.
This includes the status of the change (for example, if a file was added, removed, modified, or renamed), and details of the change itself.
For example, files with a renamed status have a previous_filename field showing the previous filename of the file, and files with a modified status have a patch field showing the changes made to the file.
For GitLab: compare branch API
GET /projects/:id/repository/compare?from=master&to=feature
As #VonC says
BUT Watch out for a gotcha.
If you are working from a list of all commits in a PR.
Your result for each commit in the list will include commit_hash['commit']['url']
It has format
https://api.github.com/repos/myorg/myrepo/git/commits/7358c0d4bd18a4b7b6f30a3e3e7b34xxxxxe22e9
If you use this url to call a single commit you won't get the files!
You need to remove /git from the url, or use the commit_hash['url'] which is correct

How can I have a post-commit hook that is only called when commits are made to TRUNK?

I have a repository that has the following directories:
branches
tags
trunk
The trunk directory contains the main line of development. I have created a post-commit hook script for the repository that updates a working copy (of trunk) when a user commits back to repository.
It looks something like this:
/usr/bin/svn update /path/to/a/working/copy
I've just created a branch of the code as I'm about to start some major changes but noticed that when I commit my changes to branch it calls the post-commit hook and updates the working copy (copy of trunk).
Is there a way I can modify either my post-commit hook script or a setting that I can make that would only update the working copy if the commit was made to the trunk directory and not any other directory?
As you can see in this documentation, parameters are passed to the post-commit script.
The repository passes two arguments to this program: the path to the repository, and the new revision number that was created.
The post-commit hook could be any program of any type : a bash script, a C program, a python script...What happens is that the shell launches this program, with the two parameters.
You can find a list of interesting scripts here. A good beginning would be this python script, which uses the python svn libs.
Please note that the path provided is not the same as the path to the file that you are checking in (see Paul's answer). But using this information with the revnum should help you to get the list of the changes, from which you can determine if operations have been done on trunk or not.
In addition to the answer from Bishiboosh, it is worth noting that the hooks can be any program. That is, if you wanted to, you could write the program in C. The parameters that are passed are described in the doc.
For a good repository of scripts to get inspiration from, have a look at the subversion tools page. In general, if you want to do some conditional processing based on the contents of the transaction, and you do, since you only want to process if the files are in trunk, then it will be easiest to use Python, since that comes with a bunch of tools to examine the transactions. This script is a good place to start looking for inspiration.
Note, that the path to the parameter, is not the same as the path to the file that you are checking in. You could have multiple files in the checkin after all… What you are passed is the location of the repository, and the revision of the change. Using these two pieces of information you can get the information about the change from the repository, and use that information to decide whether to perform an action or not in the post-commit hook.
Here is another example (in Perl) That explicitly checks the path of the files in the checkin. This is a much more complicated script, but most likely the salient parts can be ripped out and re-used.

Resources