Git - max depth in gitignore/exclude - linux

A short story: I have recently made a clean install of Arch Linux on my PC because my old install got very bloated with unnecessary packages and config directories. Now I want to keep my home directory clean and simple. I decided to use git to supervise every file and folder there but I can't just exclude every log(or any other constantly updating dir/file) as it is too much of a hassle.
The idea is to include only the first level of files and directories in $HOME/, $HOME/.config/, and $HOME/.local/share/. For instance, include .config/foo/ and exclude its contents i.e. .config/foo/* so I could check the git log when I uninstall a package what directory(es) did it create and remove them manually(of course, if I won't use it anymore)
I tried to accomplish this by adding this to my .git/info/exclude
*/*
*/*/*
*/*/*/*
*/*/*/*/*
.local/share/*/*
.local/share/*/*/*
.local/share/*/*/*/*
.local/share/*/*/*/*/*
.config/*/*
.config/*/*/*
.config/*/*/*/*
.config/*/*/*/*/*
because I read that git needs a separate wildcard for every directory level. As you probably have already understood - it didn't work.
So, the question is - how can I monitor only the files and directories in $HOME/, $HOME/.config/, and $HOME/.local/share/ without monitoring their contents. Thanks!

TL;DR
What you'll want is to use .gitignore to specifically ignore certain files and subdirectories:
*/
!.config
!.config/*
.config/*/
!.local
!.local/*
.local/*/
To see how this works, and what it does (and doesn't do) for you, read the long version. (The !.config/* is almost certainly unnecessary; I put it in when I had * as part of not saving any top level files, which isn't quite what you asked for. The same holds for !.local/*. Without actually testing it, though, I'm not sure if .config/afile matches the .config rule.)
(But note that you probably do want to source-control additional .config files. I also recommend doing this an entirely different way, using symlinks for the .foorc type files—that's what I do.)
Long
There isn't any maximum depth, other than any system-imposed maximum (which varies depending on your OS). But there's a big problem here: Git doesn't store directories.1
What Git does store, underneath its top level storage item which is the commit, are files (which Git calls blobs), with associated path names. If you ask Git to extract commit #1234567..., Git looks inside that commit, finds the path names of the various blobs, and creates directories (new, empty ones) if and when necessary to hold the specific blobs (i.e., files) that Git is extracting from that commit with the names they have as stored in that commit.
This doesn't mean that your idea is doomed, just that you're starting with a misconception. Git won't save the directory .config at all, for instance. It will just save the file .config/Trolltech.conf, for instance. If Git has saved that file in some commit, and you git checkout that specific commit, Git will create a new, empty .config if required. If the directory already exists, Git won't do anything about that. In some cases, such as moving from a commit in which that file exists to one in which it does not, Git will remove the directory as well, but in some cases it won't, and you will need to use git clean -d to make Git really remove it (if that's possible, i.e., if it's empty).
Having saved that particular file, if Git is being instructed to ignore the subdirectory .config/git, Git may not save the file .config/git/ignore. This is where things get complicated. You need to understand how Git commits work, what the index is and how (to some extent) it works, and what Git does to work with, and maintain, a work-tree.
1Git does store tree entries, which could work as a flag by which to save empty directories, but other parts of Git combine in strange ways to make this whole concept fail.
Git is built around the concept of commits
As we noted above, what Git stores, fundamentally, is the commit. A commit is a complete, mostly-standalone snapshot of some set of files, which Git calls blobs. (This deliberately ignores submodules and symbolic links, but they're stored as blobs as well, using tree entries of a type that distinguishes them from plain files.) I say "mostly-standalone" because each commit records some number of parent commit hash IDs, though most commonly, just one. A commit that stores three parent hash IDs depends on those three parent commits' existence: a repository that's missing the three parents is somehow incomplete.2 The parent linkage is not important for this particular application, but it's good to know how this works.
There is, though, one particularly difficult event in the life a commit: creating it. Once a commit is created, it is read-only. It has a unique hash ID, determined solely by the commit's content (including all its parent hash IDs). But what files go into a commit? This is the key question and is where .gitignore eventually comes into the picture.
2This is the essence of a shallow clone. A clone that is not shallow (and hence is complete) starts with the tip commits of each branch (and any tagged commits or annotated tag objects). These commits (or annotated tag objects) point back to earlier, ancestor, commits through their parent hash IDs. Since the repository is complete, those objects exist as well; they contain their parent hash IDs, and those commit objects exist, and so on. The whole process stops only when we reach some commit(s) that have no parent. Usually this is the first commit ever made, which obviously can't have a parent. Such a commit is called a root commit, and in any non-empty but complete repository, there is always at least one root commit.
The files in a new commit are set up in the index
Besides the repository itself—the repository being a database of Git objects, i.e., commits and blobs and the intermediate thing Git calls a tree (these store the files' names, among other data)—Git has this key data structure with three different names. It's variously called the index, the staging area, and the cache.
The index is normally pretty much invisible. There is one Git command, git ls-files, that can show you the contents of the index directly (git ls-files --stage, or even more verbosely, git ls-files --debug), but it's not really useful to end users. A good top-level description of the index, though, is that it's where you build your next commit.
When you run git commit, Git takes every file that is currently in the index, in whatever form it currently has in the index, and makes a new commit out of that. Those are the files stored in the new commit. The new commit's author and committer are you; the time stamp is "now"; and the parent of the new commit is whatever commit you had checked out before; but the files—the blobs and their associated names—are entirely set by whatever is in the index.3 Likewise, when you use git checkout to extract some particular commit, what Git does first is to copy that commit's files into the index.
Note that when you do make a new commit, that new commit becomes the current commit. When that happens, Git updates the current branch name—the branch you have checked out, such as master—so that it records the new commit. In fact, each branch name records just one hash ID. Git calls this the tip of the branch. As we saw in footnote 2 above, Git works backwards, starting from branch tips, to find all the commits contained within a branch. So making a new commit shoves the new commit's hash ID into the branch name table.
3Even if you use git commit -a or git commit <file>, Git really just copies files into the index—or sometimes, an (auxiliary) index—and builds the commit from that index.
The work-tree
All the files stored inside Git, both in the repository and in the index, are in a special, Git-only format. Few if any other programs on the computer can work with these files, so Git extracts each file into a usable version, where you can do work. This is your work-tree.
In general, every file that's in the current commit also appears in the work-tree. The current commit is, of course, the one you ran git checkout on. If you just ran git checkout master to check out the master branch, what you did in terms of current commit was to check out whatever commit the name master identifies: the tip commit of that branch.
As we mentioned above, all the files (blob objects) got copied into the index, at that point. Git was also able to use whatever was in the index to know what was in your work-tree before that point: for any file that was in the index (and hence in the work-tree) and now isn't in the index because of this checkout, Git should remove that file from the work-tree. And it does! For any file that Git has to replace in the index, or add to the index, Git should copy the index version to the work-tree—and it does.
What's in the index after the git checkout is exactly whatever blobs are (via any intermediate tree objects) in the commit you checked out. The work-tree versions of those files will match the index versions of those files, except that the work-tree versions are actually usable. The index versions of those files will match the commit's versions of those files—and in fact, they share the underlying storage, as the index stores just the path names and blob hash IDs.
Now, there may be files in the work-tree that Git doesn't know about. These files are, by definition, not in the index. These are untracked files. That is what an untracked file is, in Git: it's a file that's not in the index. There is nothing more to it.
(Well, you can remove a file from the index. Then it's not in the index, and hence untracked. That's not really anything more, but it's worth remembering.)
Ignoring untracked files
The problem with untracked files is that Git whines about them. :-) It's constantly griping at you, telling you that files A, B, and C are untracked. So this is where .gitignore comes in—but .gitignore is about the work-tree, and unlike commits, the work-tree does have directories.
You can list specific files in .gitignore. If those files are not in the index (are untracked), but are in the work-tree, Git would complain about them ... but then it sees that they're listed in .gitignore and shuts up.
You can also git add files en-masse, using git add . or git add --all. This has Git scan the work-tree for files, and upon finding them, git add each one to the index, to copy the work-tree version into the builds-the-next-commit index version. Clearly, if files A, B, and C are currently both untracked and ignored, though, Git shouldn't add them. So .gitignore also tells Git not to add existing untracked-and-ignored files to the index.
Existing files that are in the index are automatically tracked, so any en-masse git add that might potentially add those files, will add them, regardless of what's listed in .gitignore. In other words, adding a tracked file to .gitignore has no effect on it. Being in .gitignore only affects untracked files.
But that's files, not directories. This is where everything gets squirrelly. Files exist inside directories, in the normal file system (i.e., not in Git, but in the work-tree).
One of the big reasons Git has the index (and calls it the cache) is that looking at every file in a big file-tree tends to be extremely slow. Git can use the index to record information about all the tracked files, including information that speeds up en-masse git add --all style operations. That's fine for files that are in the index, but what about for whole subdirectories that (a) aren't in the index, so by definition they're untracked and (b) will be ignored, so they won't go into the index and will remain untracked?
Git can avoid scanning those subdirectories entirely. If .config/dir/ is going to be ignored, and Git has just come across the name .config/dir and it's a directory, why then, Git can just skip reading inside it. That's a lot faster than reading it and checking every file to see if it should be ignored.
When Git is scanning the work-tree, it starts at the top and reads the whole contents of the tree: all file names and all sub-directory names. It knows which are files and which are sub-directories, but it has not yet looked inside any of the subdirectories.
Now, Git checks all the files: are they in the index? If so, they're tracked: see if they should be updated. If not, they're untracked: see if Git should whine about them.
Next, Git checks all the sub-directories. For each sub-directory: are there any files for it that are in the index? If so, the sub-directory must be examined. But if not, is the sub-directory ignored? If so, don't even look inside it. Otherwise, look inside it, just as we would if there were files in the index.
Now, for each file or sub-directory, there can be one or more .gitignore entries. An entry ending with * matches files and directories. An entry ending with */ matches directories. An entry starting with ! means: explictly not ignored.
So, suppose Git is scanning the top level and comes across the name .a, and it's a file. Git will look for any ignore entry matching .a. If there's an entry */, well, that doesn't match .a; so .a is added, unless there's a later entry overriding it. There isn't, so we add the file .a.
Next, Git encounters .adir, which is a directory. There are no .adir files in the index, so a scan isn't forced, so Git will check for an ignore entry matching .adir. Since */ is the only match, Git gets to ignore the directory. It will now not look inside .adir at all (unless and until you somehow add .adir/file to the index, which forces Git to read .adir to check whether .adir/file still exists).
When Git comes across .config (which is a directory), there's a */ that says to ignore it, but it's overridden by !.config which says not to ignore it. There's a .config/* but this is just .config-the-directory, not .config/something. So !.config is the last applicable entry, and Git must scan .config.
Sooner or later,4 Git will look inside .config. It may find .config/afile; this matches !.config/*. The last entry that it matches tells Git that the file isn't ignored, so it will be added to the index. Then Git comes across .config/git, which is a directory. It matches !.config/*, then .config/*/; so it gets ignored. Git never looks inside .config/git at all.
This repeats for the rest of .config. There may be more .-files, which Git will process as usual, until Git comes across .local, which works just like .config here.
As always, remember that this cannot affect any existing commits. Checking out any existing commit that has some file that violates the .gitignore rules here will cause Git to extract that file, creating its parent directory or directories if needed. Moving from that commit to one that lacks that same file, Git will remove the file, and if the directory containing it goes empty, usually5 remove the directory as well.
4This is where depth-first vs breadth-first scan comes in. Git currently does ASCII-sorted, depth-first directory traversal (so it's actually "right now") because of the way Git organizes the index. It doesn't matter from our "what gets ignored and what doesn't" perspective, though.
5Every once in a while I see weird behavior here that convinces me that there must be some bugs in this. The occasional git clean -ndf to see what would be cleaned, perhaps followed by git clean -df to actually do the cleaning, is useful. But I can never reproduce it, and it's never important enough to try... :-)

Related

Remove one specific empty git commit with command-line tool

For a nodejs command-line tool I add an empty commit to a repo and then want to remove it later.
Later I have at least 3 commits. The first one is a merge commit, the second one is the empty one I created and the third one is likely one from another now merged repo. Now that my tool has done it's task I want to remove the empty commit.
git rebase --onto emptyCommitID^ emptyCommitID
resulted in: fatal: Does not point to a valid commit 'emptyCommitID^'
(since the ID is the correct one I assume the commit is invalid due to it being empty)
git rebase --keep-base --onto thirdCommit^ headCommit
resulted in fatal: cannot combine '--keep-base' with '--onto'
trying rebase -i HEAD~3 after the tool had done it's main job resulted in:fatal: invalid upstream 'HEAD~3' which might be due to either the empty commit or the merged unrelated histories idk.
I do not want to use git filter-branch --prune-empty, because the tool shall leave other potentially empty commits untouched.
(The tool is for merging repos with unrelated histories. I create the empty commit so that files are staged when merged instead of committed directly which also happens when the --no-commit flag is set in an just initialized repo without prior commits)
maybe it is possible to solve this with git rebase --interactive, but I had the described problem with the invalid upstream and view this as very difficult to implement with a command line tool, mostly using exec to execute it's commands. Do you know a solution?
I think git rebase --onto emptyCommitID^ emptyCommitID should work.
fatal: Does not point to a valid commit 'emptyCommitID^' means that the emptyCommitID has no parent. It contradicts that the second one is the empty one and the first one is its parent.

NodeGit checkout a branch but Get error "HEAD detached at origin/branch"?

I am using nodegit to checkout from clone and Open it to do something.
My code like this:
//repo is a Repository from Clone() or Open()
//branchName is your branch name, of course
repo.getBranch('refs/remotes/origin/' + branchName)
.then(function(reference) {
//checkout branch
return repo.checkoutRef(reference);
});
But after that I cd in to branch directory and typing the command like this
git status
or git branch
It's show a red line like this
HEAD detached at origin/branchname
How can I solved it?
Thanks
In general,1 origin/somebranch is not a branch name, and therefore git checkout origin/somebranch results in a detached HEAD, exactly as you are seeing.
Branch names, in Git, don't really do any good, to a first approximation.2 So there's no need to use them. To understand how and why this is the case, let's note that in Git, there are many kinds of names.
A branch name is simply a name which, when spelled out in full, begins with refs/heads/. The branch names master or main, for instance, are really refs/heads/master and refs/heads/main, correspondingly.
A tag name is a name which, when spelled out in full, begins with refs/tags/. So v2.1 is really refs/tags/v2.1.
Git calls these things—these names in general, before they're split into some particular classification—refs or references. Each reference holds one (1) hash ID. The hash ID is what really matters to Git. That's what Git needs. That's what git checkout requires: a hash ID. You can give it any valid commit hash ID, and it will check out that commit.
Hash IDs, though, are big and ugly and look random (though they're not). They are thoroughly unmemorable. What commit is 225365fb5195e804274ab569ac3cc4919451dc7f anyway? If I say v2.31.0-rc0, that probably means something—or at least, seems suggestive of something—to you; but if I say 2253blah you have probably forgotten the 2253 part long before I get to the dc7f part. So refs are for humans. They're not for Git, which only really cares about the hash IDs.
You only need a ref—such as a branch name—if you're a human. If you're a build system, a hash ID is fine. If you're writing part of a build system, just use a hash ID. If you're writing something for a human to use ... well, humans are hard.
Git has a special thing it calls "DWIM", or Do What I Mean, mode. If you run:
git checkout foobranch
and there is no branch named foobranch right now, Git will assume that you mean: Find me a name that resembles foobranch, such as origin/foobranch. Then use that name to make a branch name for me. You can disable this with git checkout --no-guess and sometimes that's a good idea. But sometimes this DWIM mode is exactly what you want.
Note, however, that if the pesky human went and made his own foobranch earlier, not related to origin/foobranch, this git checkout foobranch will get his foobranch, not related to origin/foobranch. So be careful: humans are tricky and weird. They do illogical, unexpected things.
Now, there's a reason humans often want to use a branch name, rather than any other name that results in the detached-HEAD mode. The primary reason they like this is because then, if they make a new commit, Git will change the hash ID stored in the branch name. The new commit will automatically link, backwards, to whatever commit was the one stored in the reference. Then Git will update the branch name so that it now stores the hash ID of the new commit.
This feature is exclusive to branch names. No other kind of name has this special feature. You can select detached-HEAD mode when using a branch name, by running git checkout --detach foobranch for instance. But the default is that when you use git checkout with a branch name—even one that DWIM mode is going to have to create—Git will instead go into attached HEAD mode.3 So that's why humans like branch names, and that's what Git does with them, that it doesn't do with any other reference name.
If you need to accommodate humans, you can do that by letting the DWIM mode do its thing here. That won't satisfy all humans, so watch out. It also won't work in some cases, so watch out. The detached-HEAD mode always works, though.
In NodeGit specifically, you have Branch.create as an async class method of Branch.. You also have Branch.lookup. You could use this to look up a remote-tracking name, like origin/branchname, and use that to create a new local branch, if that's your goal. But as before, watch out for all these various edge cases.
1It is possible to make a (local) branch named origin/somebranch, so that it is a branch name. The result is very confusing unless you carefully call out all names using their full spellings. Don't do this!
2Of course, they do do some good, so the first approximation is pretty rough.
3Git does not call it this, but what else could the right phrase for "opposite of detached HEAD" mode be?

Checking out to another commit with git while reading the file

I am creating a server that handles version control of files in server and let the client view them at specific commit if they wanted.
The way I implement this is that when user clicks a specific commit, I call checkout [hash of commit] to revert the file back to what it was and then read from that file.
The problem is that two people may be trying to read different commits of the same repository at the same time, meaning the state of file may change while reading from the file.
I tried checking out another commit while reading from it and it seems to work okay but I cannot be sure when the it is scaled.
I am using nodeJS and express for my server. When nodeJS starts reading file, will it still be the same state as the point when it started reading or would it change along with the change that is forced by git if I checkout another commit while reading the file?
Instead of using checkout, consider show:
git show <commit id>:<filename>
This will print the contents of the file at that commit. If you absolutely need it in a file, generate a unique temporary filename and redirect the output:
git show <commit id>:<filename> > tmpfile_uniquesuffix

GIT: git checkout --ours still showing "both modified" [duplicate]

This question already has an answer here:
git checkout --ours does not remove files from unmerged files list
(1 answer)
Closed 3 years ago.
I'm trying to resolve merge conflict in some files
both modified: myFile.h
I ran this command:
git checkout --ours myFile.h
after that I ran:
git status
It shows this:
both modified: myFile.h
Why still shows "both modified" ?
Some git checkout commands resolve a merge conflict, taking the version you checked out, and some don't. This is one of the cases that don't.
You must therefore mark the conflict as resolved manually, with: git add myFile.h.
Why this is the way it is
Merging (the action, i.e., merge-as-a-verb) is done through Git's index (also called the staging area or sometimes the cache). The index has one entry for each file that will go into the next commit you make. Normally, that one entry holds one file—but the index has four slots per entry, which are numbered. Slot zero (0) is the normal "no conflict, file is ready to commit" slot. Slots 1, 2, and 3 are used only during a merge conflict, and contain the merge base version (slot 1), --ours version (slot 2), and --theirs version (slot 3).
If slot zero is filled, the other three slots are empty. If any of the other slots are filled, slot 0 is empty and the file is in "conflicted" state. (It's possible to have just two of the other three slots filled, as is the case for add/add, modify/delete, and rename/delete conflicts.)
When you run git checkout commit -- path,1 Git copies a file from the given commit into slot zero, then from slot zero to the work-tree. Copying to slot zero wipes out slots 1-3 if they're filled, so the file is now resolved!
But, when you run git checkout --ours -- path, Git doesn't have to write anything to index slot zero, it can just get the file's contents from slot 2. So it copies from slot 2 to the work-tree, and the file is not resolved.
Note that this means you can do git checkout HEAD -- path to extract the file from the HEAD commit, writing to slot zero and thus resolving as well as writing to the work-tree. This is subtly different in another way, though. Suppose that during the merge, Git decided that the file was renamed as well as being modified. It's taken the rename into account: the file's new name is evil/zorg instead of the old name evil-zorg. If you git checkout --ours, Git will extract the old (HEAD) version of evil-zorg under the new name evil/zorg. If you git checkout HEAD, Git won't find the file under the new name!
(This is another case of Git letting implementation details show through—or, equivalently, of cramming too much stuff into one command.)
1The reason for the -- is to handle a file whose name looks like an option. (For instance, what if the file's name is --theirs?) If the path part doesn't look like an option, you don't need the --. It's a good habit to pick up, though: use the -- every time and you won't be surprised someday when your file name does resemble an option.

Layering projects on top of each other with git

Let there be:
There are different repositories repoA, repoB and repoC each respecting the same directory layout principles, which are to be merged onto a third repoM's working directory (the "master" project).
repoM has an atypical setup (--work-dir and --git-dir are sepparate). repo[A-C] are cloned as bare, and they are set as core.bare = false and core.worktree=<--work-dir-of-repoM>.
The requirements:
I need to always have an overview over the history of all files in repoM's work-dir, which could have stemmed from repo[A-C]. With this approach, I lose all that information.
Alternative:
I've been thinking about using git-subtree instead (git version 1.7.11.2, so it's already built-in), leaving repo[A-C] bare, and then
git pull -s subtree, or
git subtree ...
With the subtree pull strategy, I lose the history on a merge conflict (git blame says so).
I've never used subtree before, but from my understanding it's not possible to merge files from repo[A-C] into repoM's work-dir, those files must be put into a subdirectory of repo[A-C]. This is definitely not what I need. Why? Because of the following ...
Problem statement:
You have different git repositories each containing different sets of files, usually configuration files and some shell scripts. You want to put everything in the $HOME (which is <--work-dir-of-repoM>) directory from all those repositories. You should be able to see at all time where each file comes from, edit, commit and push changes to each one's origin. You've guessed it, it something like vundle, but generalized for any kind of configuration of any program, not just vim bundles. If a conflict occures, one should be able to track down which two authors of the same file need to get in touch with each other and make up a deal (if one needs to be made).
This is for an open-source project I'm trying to get a prototype working, so any help is highly appreciated. Also ideas about already existing projects which do this in a similar manner are highly appreciated.
Note: the "master directory" does not necessarily have to be $HOME, I've used it as a possible hint on the kind of problem this could solve.
Why not simply use Git Submodules in your "master project"?

Resources