How does git parse tree objects contents? - object

I know that git stores trees as objects. But how does it extract the information from that object when needed? How does it parse that file to get the contents?
I mean how does git have access to pointers(tree, commits, blobs) all the time?

Related

How to post review with non-binary and binary file?

When I create Review request with non-binary and binary files using rbt post command, I got the review request generated. But, when I checked the list of files in diffs, I don't see the binary files.
Logically, I see as I added the binary file first time to my workspace, diff wouldn't available for it. But, how to make my reviewers aware that I'm going to submit the binary file? Most importantly, how to make sure the binary file would get available in Jenkins build process (prior to main PerForce Submit)?
I use RBTools 0.7.5 version with PerForce source control.

How can I get a *complete* list of changed files between two commits in github

(I know similar questions have been asked e.g. GitHub API - how to compare 2 commits but I don't think this is a duplicate)
As part of our build process we need to compare two commits in github and iterate through every file changed between them. There is a lovely API for comparing commits but it silently maxes out at 300 file changes, and while the API supports pagination you can only page through the list of commits, not the associated list of files. All my googling suggests that neither the gh CLI interface or the GraphQL API support diffing commit Ids either.
As best I can tell my options are
clone the whole repo on every build and run git diff $lastReleaseHash...$newReleaseHash --name-status at the command line, which just seems inefficient
Call github's compare API but ignore the list of files as it will max out at 300, instead page through all commits, then for each commit request the list of changed files, then manually stitch them together into an overall diff (i.e. tracking renames between commits, ignoring files that were created and then deleted within the range of commits, super tedious and error-prone)
Surely there are better options?!
You can use
git clone --bare
To clone the repository with just the VC information (no files). Then do a git diff.

My git internal files changed. How to pull them with the changes?

So I've set up a git server and I am using the folder where it puts the files when you
push to this server for deployment (it's the public, website folder).
I used this guide:
https://www.diffuse.nl/blog/how-to-set-up-git-for-automated-deployments
The problem I ran into is that I changed those internal files, in the server folder, but now when I pull from the server to my local machine, it returns the old version of the repo, without the changes I made from the server (I vim-ed into the files and made the changes without telling git for those changes..)
I want to reload the git server repository so that it recognizes and "sees" the changes I made to the files in it's "belly".
Thanks in advance!
The site you linked talks about setting up a server side automatic deployment, push-to-deploy model, using a Git post-receive hook that consists of a single git checkout command with --git-dir and --work-tree options and -f to force a checkout from a bare repository into a working tree that, to Git, is simply a temporary working tree.
This uses Git as a deployment tool. Git makes a pretty poor deployment tool, but for some cases, it's sufficient. If your case is one of those cases, this is okay. It's not great, but it can work. You do, however, need to understand its limitations.
The problem I ran into is that I changed those internal files, in the server folder, but now when I pull from the server to my local machine, it returns the old version of the repo ...
Git largely is not interested in files. Git is really all about commits. A bare repository, which is the sort you set up on the server, doesn't have any working tree at all. (That's why the post-receive hook temporarily sets one up, just long enough to use Git as a crappy deployment tool.)
This means you must understand exactly what a commit is, and what one does for you. A commit is a sort of package deal, containing two things. Each commit is also numbered, with a unique, big ugly number expressed in hexadecimal. Git desperately needs these numbers: that's the only way that Git can find the commit, by its number. But we—humans—are bad at the numbers, so Git gives us a secondary system, by which we can have Git store a number in a name, such as a branch name. Then we have Git store the right number, and we give Git the name: master, or main, or whatever name we like.
The two parts of a commit are these:
One commit contains one snapshot of every file, as a frozen-for-all-time archive. So even though Git only stores commits, you can get files out of a commit, because a commit contains files. The files that are inside the commit—in this frozen archive—are in a special format in which the files are compressed and de-duplicated. This takes care of the fact that most commits mostly have the same files as most other commits. If you have a big file—tens or hundreds of megabytes is still pretty big these days, although it's getting smaller—in a thousand or more commits, there's really only one copy of that file, thanks to the de-duplication.
And, each commit stores some metadata, or information about the commit itself. For instance, each commit stores the name and email address of its author. It stores committer information as well. There are two date-and-time stamps, and there's a free-form area where you, when you make a commit, get to supply a commit message for future reference.
The entire commit, not just the snapshot, is frozen for all time. That's how the commit number is generated. It's impossible to change anything about a commit, because if you do take a commit out of the big database of all commits, indexed by their numbers in Git's key-value store, and then change something in it and put it back, what you get is a new commit, with a new and different unique number. The old commit is still in there, under its old number. So a Git repository never shrinks: it only ever adds new commits.1
Besides this metadata, though, each commit also stores the commit number—or hash ID—of some earlier commit or commits.2 Most commits, which we call "ordinary" commits, have exactly one previous commit number. This is the parent of the commit, and it means that commits in a branch form a simple backwards-looking string:
... <-F <-G <-H
Here H stands in for the hash ID of the last commit in the chain of commits. Commit H has, in its metadata, the actual (big ugly) hash ID of earlier commit G, which in turns stores the hash ID of still-earlier commit F, and so on. In this way, Git only needs to be told the hash ID of the last commit in some chain of commits. It can find all the earlier commits on its own. And, as I mentioned earlier, we humans typically use a branch name to store this hash ID:
...--F--G--H <-- master
Here the name master stores the hash ID for us, so that we don't have to memorize it. If we add a new commit, Git simply automatically updates our name master for us:
...--F--G--H--I <-- master
The name master now points to commit I, instead of commit H, and we again just use the name master to mean get me the latest commit in the chain.
Merge commits are commits with two, or technically two or more, parents; these bring two separate chains of commits together, so that one branch name can find more than one previous commit:
I--J
/ \
...--G--H M--N <-- master
\ /
K--L
This chain "split apart" after H—became two branches—but then got "rejoined" at M when we merged one of the branches back into master. We then deleted the other branch name, because we did not need it any more, and now we have a simple chain on either side of the merge blob.
All of this is built out of two things: names, which find a commit from which we can work backwards, and commits, which store archived snapshots and metadata. And that's what Git is all about. It's not about files, it's about commits, which we find using other commits—when we work backwards—and/or names, when we want to go straight to some particular commits. And that's all!
When we hook up two Gits to each other, we pick one Git as a sender and one as a receiver. The sender and receiver talk with each other to figure out which commits they both already have. They do this with the commit numbers, because those are universal. Thanks to the magic of cryptographic hash functions, one Git and another Git can be sure that they have the exact same commit if they have the same commit number. If one Git has a commit number that the other Git lacks, the lacking Git needs to get that commit from the having Git. As long as the having Git is the sender, and we've told it to send the commit,3
1It is possible to take old, dead commits out of a repository. It's pretty tricky to do that on purpose, but certain actions generate dead commits naturally, and Git will sweep them away eventually, on its own. The nature of distributed version control, however, means that if you've given some commit to some other repository, it can come back even after you thought you'd killed it.
(I had no luck finding an image suitable for "zombie commit" here...)
2There is of course an exception to this rule. At least one commit, in any repository that has commits, can't have a parent commit, because it was the first commit. So that commit simply has no parent commit number. You can create additional parent-less commits (and in fact git stash sometimes does so for its own purposes); such commits are root commits.
3In general, commits can't be restricted, though some Git add-ons try. But when we use git push we pick which final commits we'll ask our Git to send—and their ancestor commits will come along for the ride if needed—and when we use git fetch, which is the actual opposite of git push, we can have our receiving Git not bother to ask the sender for some commits, if we like. The usual default, though, is to get everything on fetch.
Where the files come in
The thing about commits and their archives is that, on their own, they are pretty useless. Only Git can read these things, and nothing at all can write them. They're frozen for all time: we can make new ones, but we cannot overwrite old ones.
To get any work done, though, we need files. That's how our computers operate today, using files. So we have to have two things:
We need a way to tell Git: extract this commit. That's git checkout (or, since Git 2.23, git switch, though git checkout still works too).
And, we need a way to make new commits.
Both of these operations require what Git calls a working tree. That working tree is where the files are. You use git checkout or git switch to create an initial set. Then you work on/with them, use git add to update Git's copies—this involves Git's index, which I am not going to go into here—and git commit to make a new commit: a new snapshot plus metadata. This adds a new commit to your history; your existing history, consisting of all the existing Git commits, remains undisturbed.
The working tree on your server is a temporary working tree. Your server Git forgets its existence the moment after your push-to-deploy. So nothing you do in this working tree can ever make it back into the Git repository on the server.4 The fact is that you're not intended, with this setup, to do any work on the server itself.
Instead, what you are supposed to do is do work on your laptop (or whatever computer you use at your end of the connection to your server). You make any updates you want there, and use git commit to make the new commit(s). You then use git push to send the new commit(s) from your laptop to the server. The server's Git accepts these new commits,5 and then that crappy (but serviceable) post-receive hook acts to deploy the latest master commit.7
In short, you're updating your files on the wrong machine.
4That's not quite true, since you can run git --git-dir=... to temporarily hook that working tree back up to that Git repository. But that's outside the design of Git, and the scope of this answer.
5Or, it rejects them, based on whatever acceptance and rejection criteria you set up. This is all quite controllable, though you need to do a lot of programming on the server side, if you choose to control it. That's why hosting sites like GitHub and Bitbucket have simplified controls. They reserve the full power—and the great responsibility that comes with it6—to themselves, and dole out little, easier-to-use snippets of it to you.
6Yes, that's a Spider-Man reference. The concept, however, goes back quite a bit further.
7The actual deployed commit is determined by the branch name you give to the git checkout command in the script. Did you give a branch name in your script? If not, read the git checkout documentation to figure out which branch name Git will use here.

Github API compare a single directory between two SHAs

I'm trying to programmatically get a list of commits between two shas that have touched a certain directory.
Github's API supports comparison between two SHAS:
https://api.github.com/repos/github/linguist/compare/96d29b7662f148842486d46117786ccb7fcc8018...a20631af040b4901b7341839d9e76e31994adda3
Used to drive the UI comparison: https://github.com/github/linguist/compare/96d29b7662f148842486d46117786ccb7fcc8018...a20631af040b4901b7341839d9e76e31994adda3
Is there a way to add a path parameter to either of these? I would like just the comparison in the /lib directory, for example.

Perforce and submodules

At work we are using Perforce and I wonder if it's possible to do submodules with it with versioning.
For example I have library A used by projects B and C.
I want to make it so that when I get revision of B I also get A in subfolder:
B
---=> A(v1)
Same goes for project C, but it would need newer version of library.
C
---=> A(v1.2)
I know this kind of thing is possible with Git, but could not find anything on it for Perforce.
Thanks,
Leonty
Perforce really handles this sort of thing with views and paths. These let you assemble the right set of files to put into a workspace (or branch or label). Since a Perforce repository can contain all of the components or modules for all your products, you just select which ones you want in a working data set. You don't need the submodule (or SVN external) concept to pull in data from another repository.
You can use template workspaces to make sure that developers get the right set of files to work on. You can be a little more rigorous and write some custom tools (possibly in the Perforce broker) to provide some structure.
The closest equivalent to using submodules is found in Perforce streams, where the paths define what goes into a stream. Stream paths are inherited by child streams. This isn't a direct equivalent though.

Resources