I have a central .gitignore on my laptop and for each project that I create, I create a symbolic link to that central file so that I can keep a uniform policy across all of my projects.
All of my projects are like each other (technology-wise) and it makes sense to have a central .gitignore to reduce the burden of maintenance.
However, recently I see this message:
warning: unable to access '.gitignore': Too many levels of symbolic links
And as I searched, it seems that from git 2.3 upwards they have decided to not support the symbolic link.
I have two questions. First, is there a way to force git to support symbolic links for .gitignore? And why on Earth do they not support it anymore? Does it not make sense to reuse code? Is half of linux not reused through symbolic links?
First, is there a way to force git to support symbolic links for .gitignore?
No.
And why on Earth do they not support it anymore?
The gitattributes documentation now (as of Git 2.32) says this near the end:
NOTES
Git does not follow symbolic links when accessing a .gitattributes file in the working tree. This keeps behavior consistent when the file is accessed from the index or a tree versus from the filesystem.
While I'm not 100% sold on the reasoning here myself, it does make sense. (It seems to me that Git could just stuff the content of the .gitattributes file into the index and hence into the commits, although this would mean that on checkout it would destroy the symlink.)
Optional further reading / background
First, let's describe what a "symbolic link" is in the first place. To do this we must define what a file is (which is a pretty big job, so we'll just do very light bit of coverage): A file is a named entity, typically found in a file system (systematic collection of files), that store data for later retrieval. Being a named entity, a file has a name: for instance, README.txt, Makefile, and .gitconfig are all file names. Different OSes place different constraints on file names (e.g., Windows refuses to store a colon : character in a file name or create any file named aux with or without a suffix, so that you cannot have a C or C++ include file named aux.h or aux.hpp). Git itself places very few constraints on file names: they can contain almost any character except an ASCII NUL (b'\0' in Python, \0 in C, etc.), and forward slashes / are slightly special, but other than that a name character is just a name character and there are very few restrictions.1
On most real OSes, files can have "types". The exact mechanisms here rapidly become OS-specific and can get very complicated,2 though traditional Unix-like hierarchical file systems just have a few types: "directory", "file", "block or character device", "symbolic link", and the like. Symbolic links are in fact one of these types.
A symbolic link is a type of file in which the file's content is another file name. This file name, on a Unix-like file system, can be absolute (/home/john/somefile, /Users/torek/somefile) or relative (./somefile, ../../somefile). On these systems, opening a symbolic link results in opening the file whose name is provided by the symbolic link's content. To read the content of the symbolic link—that is, to find out what file name the link contains—we use a different operation: readlink instead of open, for instance. Modern Unix systems also have an O_NOFOLLOW flag that can be used to forbid the open system call from following the link.3
The way Git stores a symlink is as a special mode object in a commit: ordinary files are either mode 100644, meaning a non-executable file, or mode 100755, meaning an executable file. A symbolic link is stored as mode 120000 and Git stores the target name, found by calling readlink, as the content.4
1The one peculiar restriction is that you're not allowed to store anything named .git, in any mix or upper and/or lower case. This .git restriction actually applies to "name components" which are the parts between forward slashes. Due to Windows being Windows, Git-on-Windows will turn backwards slashes into forwards ones as necessary, and then places the restriction on the components.
2Traditional OSes from the 1960s through 1980s, for instance, may impose things called access methods based in part on file types. Unix simplified things a lot here.
3This is sometimes important for various security aspects. The details are beyond the scope of this article.
4These odd mode values correspond closely to the struct stat st_mode field in a Unix/Linux stat system call. That's because when Linus Torvalds first wrote the initial versions of Git, he was dealing with it—at least in part—as a kind of file system. The ability to store full Unix file modes (9 bits of rwxrwxrwx flags) was left in, and initially Git actually stored group write permissions, but this turned out to be a mistake and was removed before the first public release. The 100000 part is S_IFREG, "Stat: Inode Format REGular file". The 120000 found in a Git symbolic link is S_IFLNK, or "Stat: Inode Format symbolic LiNK". We also have mode 040000 for directories from S_IFDIR, which should now be obvious. However, Git can't store a mode 040000 entry in its index / staging-area, for no particularly good reason, which leads to the problem described in How can I add a blank directory to a Git repository?
In other words, a symbolic link means "use another file"
Wherever a symbolic link is found, it means read or write some other file. So if README.txt is a symbolic link reading /tmp/fooledyou, any attempt to read README.txt actually reads /tmp/fooledyou instead; any attempt to write README.txt actually writes to /tmp/fooledyou.
Consider, though, when this redirection—from README.txt to /tmp/fooledyou—occurs. It doesn't happen at the time you make the symbolic link itself. You can create this README.txt file last year. When I go to read README.txt, that's when the redirection occurs. So if you've changed /tmp/fooledyou since you created README.txt, I get the modern version, not the old one.
That, of course, is precisely why you wanted the symbolic link in the first place:
All of my projects are like each other (technology-wise) and it makes sense to have a central .gitignore to reduce the burden of maintenance.
In other words, you wanted to have one .gitignore, that is not version controlled, that always reflects what should be ignored based on what you learned up until right now, regardless of when it is that "right now" is.
This is the opposite of Git's normal purpose, which is to store a full snapshot of what your project looked like "back then", whenever "back then" was: the time at which you made a git commit snapshot.
My suggested possibility above is that when you run:
git add .gitignore
to update Git's idea of what should go in the .gitignore file that goes in the next commit, Git could follow the .gitignore indirection at that time, read the contents of the target of the symbolic link, and prepare that to be committed. You'd then make the commit—the snapshot and metadata—such that if, next year, you extract this particular historical commit, you'll get the historical snapshot, including the historical .gitignore.
The drawback to this is that by extracting the historical .gitignore, you "break the link": .gitignore is no longer a symbolic link at all. Instead, it is now an ordinary file, containing the historical snapshot. There's no way to get the link back except to remove the ordinary file and create a new symbolic link.
Before Git version 2.32, Git would notice when .gitignore was a symbolic link and would store, in its index / staging-area, the fact that .giginore was a symlink (mode 120000) and use the readlink system call to find the target of the symlink, and store that in the commit. Running git commit then makes a snapshot that, when extracted, creates .gitignore as a (new) symbolic link: the existing file-or-symlink is removed, and the new one is installed instead. It redirects, in the usual symlink fashion, to the saved (committed) historical location—even if that's wrong now.
As of Git version 2.32, Git will still store a symbolic link .gitignore file:
$ mkdir z; cd z
$ ../git --version
git version 2.36.1.363.g9c897eef06
$ ../git init
[messages snipped; branch renamed to main, also snipped]
$ echo testing > README
$ ln -s foo .gitignore
$ git add README .gitignore
$ git commit -m initial
[main (root-commit) 08c6626] initial
2 files changed, 2 insertions(+)
create mode 120000 .gitignore
create mode 100644 README
$ ../git ls-tree HEAD
120000 blob 19102815663d23f8b75a47e7a01965dcdc96468c .gitignore
100644 blob 038d718da6a1ebbc6a7780a96ed75a70cc2ad6e2 README
The same reasoning—that a Git commit, once it's made and stuffed into a repository, may contain a symbolic link that is no longer valid or correct—explains why Git 2.32 also refuses to follow .gitattributes and .mailmap files. Note that commands like git archive generally use the commit's version of .gitattributes to control archive substitutions, so a symbolic link stored in the repository is useless unless the target of the symbolic link is somehow correct. The repository and its commits get shipped around from one machine to another, but the targets of any committed symlinks in many cases don't.
I was confused, and I called git stash --all and git stash apply stash#{...} multiple times, and also deleted some of the untracked/ignored files.
How is it possible to check, if there are files which exist in one of the stashes, but not locally?
I guess you could run diff:
git diff --name-status stash#{10}
TL;DR
You may want to use git diff --name-only --diff-filter=D stash#{number}^3 on each valid stash#{number}. (To get a list of stashes, use git stash list.)
You may want to use git show --name-only stash#{number}^3 on each valid stash#{number}. Note that this is git show stash#..., not git stash show.
To understand which and why, read on.
Long
What git stash does is a bit complicated, but it can be summarized pretty simply:
git stash push (or the old spelling, git stash save) makes two or three commits, with none of the commits it makes being on any branch. Then it runs git reset or git clean or some combination of those, depending on flags used.
git stash apply merges some or all of the two or three commits in some stash with your existing index and work-tree.
git stash pop means run git stash apply, and then if that claims to succeed, run git stash drop.
Unfortunately, the above is actually pretty complex—it requires that you understand Git's usage of the index, for one thing–and yet is incomplete. It says that git stash push makes two commits (or sometimes three), but it does not say what is in those commits, nor what form they have in your repository. For the simplest usage of git stash, none of those matter too much, but for your case, they are crucial.
Commits
I'll just mention these briefly since, aside from the ones that git stash makes, we're not too concerned with them. Each commit holds a complete snapshot of files. Exactly which files, we'll see in a moment. Along with the snapshot, the commit contains some metadata, including who made the commit, when (date-and-time-stamp), and why (log message). Each commit has a unique hash ID, and as part of the metadata, each commit includes the hash ID of its parent—a link back to the previous commit.
A merge commit has links back to two or more parents. Commits that are linked this way are normally closely related—which is why the linkage is parent/child, after all—but unlike the snapshot-plus-metadata part, there's no hard requirement that the files in any one commit relate much to those in any other. We'll see that in a moment, with the stash commits, too.
Besides being identified by their hash IDs, commits are mostly permanent—though of course stash commits are intended to be non-permanent and will eventually go away after being dropped—and are completely read-only. This means that they cannot be used to do any new work, which is why Git has more than just commits. This is why you need a work-tree.
The index and the work-tree
As we just noted, the commits are read-only. Not only that, the files stored in each commit are in a special, compressed, read-only format. This means two or more commits can share a file that's the same in both commits, which in turn means that even if you commit some version of a file hundreds of times, Git only has to store it once. I like to refer to these internal format files as freeze-dried.
In order for you to actually use or change your files, Git has to rehydrate them, turning them back into ordinary files that you can read and write. The area in which commits are rehydrated for your use is your work-tree. Git could stop here—with frozen commits holding dehydrated files, as permanent as the commits themselves, plus temporary, ephemeral, but useful work-tree files. Other version control systems do stop here: you have, at any time, the freeze-dried copy of a file in the current commit, plus the useful one in the work-tree. But for various reasons, Git adds a third copy of the file. This extra copy sits between the commit and the work-tree, in what Git calls, variously, the index or the staging area.
The extra copy of each file that's in the index, between the frozen dehydrated copy in the current commit and the useful copy in the work-tree, is also in the dehydrated form. The key difference between it and the current-commit copy is that it's not read-only. You can overwrite it—though technically this just moves the previous one out of the way—with a new freeze-dried copy at any time. That's what git add does: it freeze-dries the work-tree copy and uses that to overwrite the index copy.
This is why you have to git add files all the time. They're already there, in the index, ready to commit—but they match the one that came out of the last commit. You've modified the work-tree copy, but the frozen copy hasn't changed—of course not, it's frozen—and neither has the index copy, which matches the committed copy. So now you have to re-compress the updated file and replace the index copy. You run git add updated.ext and Git does just that. Now your index and work-tree match, and differ from the frozen copy.
When you run git commit, Git looks not at your work-tree but at your index. Whatever is in your index right then, Git packs those (already freeze-dried) files into a new commit, and that new commit becomes your current commit. Now your index and commit match, because the new commit was made from the index.
This is also what determines whether a file is tracked. If there is a copy in the index, the file is tracked. The tracked copy—the one in the index—will be in the next commit, if you make it right now. If you have a file in your work-tree that isn't in your index, that file is untracked. That file won't be in the next commit, if you make it right now. Hence the index is, in a way, the proposed next commit. Every time you update it with git add, you're proposing to commit something slightly different.
Files that are untracked normally make various Git commands—especially git status—complain. You can shut up these complaints, and also make git add --all not copy the files into the index, by listing some or all of these files in a .gitignore. Note that listing a tracked file has no effect: it's already there in the index, so there's no question of ignoring it: it's not ignored. Being listed in .gitignore only affects untracked files, and basically just makes it hard to accidentally track them, and shuts up git status about them.
You can put new files into the index at any time, using git add. If the file wasn't there before, git add creates it in the index, rather than displacing the previous copy. You can also remove files from the index at any time, using either git rm—this removes the file from the index and the work-tree—or git rm --cached, which removes the file from the index only. At git commit time, it doesn't matter how a file is or isn't in the index, only whether it is or is not there, and if it is there, with what freeze-dried content.
It's worth a brief look at how git commit makes a new commit now. When you run git commit, as we already mentioned, Git stuffs all the tracked files into a new commit. First, though, Git gathers up the metadata: your name (from user.name), your email address (from user.email), the current date-and-time, and your log message. Git also knows which commit is the current commit. That commit's hash ID goes into the parent hash ID for the new commit. And then Git saves the index and makes the commit, which automatically gets a new, unique hash ID. As the last step of git commit, Git then writes the new commit's hash ID into the current branch name.
Hence if you used to have:
...--F--G--H <-- master (HEAD)
with commit H as the current commit, and you've just made new commit I, new commit I points back to H, and Git has shoved the hash ID of I into the branch name master, so now you have:
...--F--G--H--I <-- master
Now we can look at what git stash push does
When git stash builds a new stash without --all, it:
Writes out the index as a commit. This is really easy since that's what git commit already does. All Git has to do is not update the name master (and supply a log message for you, which it does). Let's write out a commit i (lowercase) and not put it on master. Instead, we'll remember it with a temporary variable:
...--F--G--H <-- master
|
i <-- $tempvar
Writes out the work-tree as a commit. This is tricky to do efficiently, and also requires another special trick at the end. Without going into the details of how git stash manages writing the work-tree, it's worth saying that this only writes tracked files. The trick at the end is that git stash sets things up so that this new commit, which we'll call w, has two parents instead of one. The first parent of w will be H and the second parent of w will be i:
...--F--G--H <-- master
|\
i-w <-- stash
With these two commits written out, Git updates the special name refs/stash to remember the hash ID of commit w.
This stash has no untracked files, regardless of whether they're ignored. Commit i was made from the index, so it has no untracked files by definition. The process that Git uses to make w only stores files that are in the index, so it has no untracked files either.
If you use git stash push --all, git stash push --include-untracked, or the git stash save flavors of these same commands, Git modifies the saving process a bit. It makes commit i as usual, but then it makes a commit I call u, to hold the untracked files. This extra commit either holds just the untracked files, excluding the untracked-and-ignored files, or it holds all the untracked files including the ignored ones. This commit has no parent listed (which is a good trick, but easy to do when you use the Git plumbing commands, as git stash did before it was converted to C code recently); it just floats out there on its own:
...--F--G--H <-- master
|
i <-- $i_commit
u <-- $u_commit
Now git stash save goes back to its main path and makes commit w, but this time it gives w three parents: the current commit H, the index commit i, and the untracked-files commit u:
...--F--G--H <-- master
|\
i-w <-- stash
/
u
Quick recap: what's in i, w, and u
Commit i holds the index state. There are, by definition, no untracked files in i. Commit w holds the work-tree state, again without untracked files. If commit u exists—it's optional, after all—it holds the untracked files, but no tracked files: the stash code parsimoniously saves those only in i and w.
Now git stash push cleans up
Having saved files in two or three commits, the last step of git stash push is to reset the index and work-tree. If you told git stash to create commit u, it also removes from the work-tree any file it saved in commit u.
The reset of index and work-tree is normally done by a simple git reset --hard. This leaves the index and work-tree in a state that matches the current commit H. If you made the u commit, its files are now gone from the work-tree, otherwise those files are untouched in the work-tree.
However, git stash push (unlike git stash save) has the ability to reset less than the entire work-tree. In that case, it's all done by a more complex bit of code. You can also (well, instead) throw in the option --keep-index, in which case, instead of git reset --hard or similar, Git checks out what's in commit i, so that the work-tree matches i. (It leaves the index itself alone, so i and the index continue to match.) None of this affects your immediate task, but all of it affects the ability to restore one of these stashes.
Previous stashes get "stacked"
When git stash push is done, the new stash is the one identified by refs/stash, or just stash for short. You can also describe this as stash#{0}, if you like. Any existing stashes are moved up one, to stash#{1}, stash#{2}, and so on: what was stash#{1} becomes stash#{2}, etc.
The mechanism behind this is Git's reflogs, which apply to all references: branches have master#{1}, master#{2}, and so on, too. The stash code just (ab?)uses these to implement the stash stack. Other reflogs are insert-only: there's no "pop the n'th master" command.
Restoring a stash
When you choose to apply a stash—with git stash apply or git stash pop; remember that the latter is just apply-then-drop—you tell Git which stash to use, using stash#{number} for instance. This points directly to a w commit, but the w commit allows you to reach its i commit and, if it exists, its u commit. The easy way to do that is to use the gitrevisions syntax for graph-walking. For instance, to refer to commit i, which is the second parent of w, you can write:
stash^2
because stash points to commit w and w's second parent is i. If commit u exists in this stash, stash^3 will name it.
Hence, git stash apply first looks to see whether a u commit exists. If so, git stash insists on restoring it. Restoring a u commit requires that none of the files in u exist in the work-tree right now.
What this means is that if you have a bunch of untracked files, and you're not sure which ones are in u, you can just remove (or move-out-of-the-way) all untracked files. That's by far the simplest thing to do. If you want to be carefully selective, you will need to list out the names of the files that are in commit u, and there's no front-end user-facing command for that. You can do it though, in multiple ways, as we'll see in a moment.
In any case, Git definitely has commits i and w. (The git stash code makes sure there are two such commits, plus the optional third u commit, and rejects your command-line argument if not.) So git stash apply needs to go about restoring i and w too. The way it does this is:
Save the current index state. This prevents applying a stash if you're in a conflicted merge that you have not concluded.
Unless you used --index, completely ignore commit i. Otherwise, compare i vs w's first parent—the commit that was current at the time you saved the stash—using git diff. Send the diff to git apply --cached. Technically, the actual line of code in the old git-stash script is:
git diff-tree --binary $s^2^..$s^2 | git apply --cached
($s is commit w so this uses i^ rather than w^ but i^ and w^ are the same; the diff uses diff-tree --binary so that it always works right, as plain diff won't diff binary files, and will use your per-user configuration, which is a bad idea here).
The apply step may fail. If so, git stash apply --index fails and does nothing. If the apply step succeeds, save the resulting index for later, then use git reset to reset it to match the HEAD commit.
There's also a fancy trick here: Git checks whether the saved index in step 1 matches the saved index in the stash. If so, the index already has the right contents and there's no point in doing the git apply --cached. This is not just an optimization; it's useful with git stash --keep-index: it makes git stash apply --index work for that case. (Of course, you could run git stash apply without --index, if your index already matches the stash's i, but my guess is someone thought that was too unfriendly.)
Use the merge machinery to combine commit w with your current work-tree, using w's first parent as the merge base. I won't go into details here, but this part can be quite messy. If there are merge conflicts here, and the current work-tree did not match the HEAD commit when you started, it can be extremely difficult to get back to the state you were in.
(This is one of several reasons that I recommend avoiding git stash in general. It is safe to use git stash in many cases, and if you really know what you're doing, you know how to make things safe for yourself in all cases. But git stash is advertised as a quick and easy solution for Git newbies, and in fact, it's not at all quick, nor easy, in these corner cases!)
Your case (finally!)
In your case, you have run several git stash push --all operations, so you have several to many stashes—stash#{0} through stash#{9}, say, or maybe even more—some or all of which have u commits, which you can access via stash#{number}^3.
These u commits have no parent, so if you run:
git show stash#{1}^3
for instance, Git will compare, i.e., git diff, the empty tree to the u commit for stash#{1}. This will show the files (and the contents of the files—add --name-only to get just the names) in that u commit.
That could be what you want! That shows you a list of which files are in the u commit for that stash. That's not quite what you asked for, though:
... if there are files which exist in one of the stashes, but not locally
By "locally" here I assume you mean in your existing work-tree as it stands right now, without adding or removing any files to it.
If we run:
git diff <commit-specifier>
with no additional options, Git will compare the contents of the specified commit to the contents of the work-tree. The current index plays no role in this diff, though the contents of .gitignore do. The files of interest are:
those in the given commit
those in the work-tree, whether or not they exist in the given commit, minus any files that (a) do exist in the work-tree and (b) are listed in .gitignore.
That is, suppose we name a commit—such as one of these u commits—that contains files a.ext, b.ext, and c.ext. Your current work-tree either has an a.ext or does not; the same goes for b.ext and c.ext. Your current work-tree also has d.ext and e.bin, neither of is in this u commit.
If file a.ext does not exist in your current work-tree, git diff will claim that, to convert commit u to match your work-tree, you should delete a.ext. If file b.ext does exist in your work-tree and matches that in u, Git will say nothing about it. If file c.ext does exist, but doesn't match the copy in u, Git will say that c.ext is modified, and to make c.ext in commit u make c.ext in your work-tree, particular lines need to be added and/or deleted: that's the diff instruction output.
Since d.ext does exist in your work-tree, git diff will say that to convert u to match your work-tree, you should add d.ext, with its current contents. If *.bin is ignored, though, git diff won't tell you how to add it to commit u: the assumption here is that you don't want to make a new commit that's like u but has e.bin added, as e.bin is to be ignored. That's true even if e.bin is in your index right now. While git commit cares, this particular git diff doesn't.
Since commit u in each stash lists all (and only) the files saved in it, any instruction from Git that tells you that you must delete some file from u in order to make it match your work-tree, tells you that the file exists in u and not in your work-tree. So we use --diff-filter=D to make git diff mention file a.ext. This filter will exclude c.ext, as it does exist: it just has the wrong contents. The set of files that git diff would report therefore consists solely of a.ext, which is in u but not in your work-tree. The --name-only option makes git diff print just the file name, and not the actual instructions for converting the file.
There are still more ways to handle this problem, but these two—git show or git diff with --name-only and additional options if / as needed, plus a name for the stash's u commit—seem like the simplest.
First, sorry for such a confusing pesky title, I really can't find a better way to describe this (would appreciate any changes suggested to post).
The problem
I synced a github repo. And also modified some files and codes inside according to my needs. But if I want to resync and update my tree to latest commits.
will my changes be overwritten?
Or will repo simply ignore modified files and move on to other files?
Or will there be patching process (I dont think this would be case since chances of problems with auto-patching are quite high)?
My guess is that it skips over modified files. And I may need to manually get the new commits from repo. But how to determine which files that have been modified have new commits? I just want to determine it, then probably manually fetch and modify them manually.
To clarify:
Consider files named "abc" and "def" which I modified.
The repo owner updated his repo with a lot of new commits.
I ran repo sync and it synced all files to newer commits except those I modified. Now how do I determine if the files that repo owner updated include "abc" and/or "def" too (assuming I myself modified a lot of files, so I can't manually check if each file has new commit or not)?
I don't want to see what files I have modified or a complete list of files with new commits, I just want to see if the files that I modified have new commits or not.
Is there any such possible way?
I do know how to determine files that are changed using `git status,
but how do I want to check if those changed files have any new commits.
When running repo sync, Repo will rebase any non-published topic branches (i.e. branches you haven't uploaded to Gerrit with repo upload).
Or will there be patching process (I dont think this would be case since chances of problems with auto-patching are quite high)?
Git will try, but if there's a conflict that it can't resolve by itself you have to step in and help out.
Consider files named "abc" and "def" which I modified. The repo owner updated his repo with a lot of new commits. I ran repo sync and it synced all files to newer commits except those I modified.
No. Either Repo rebases your branch (and updates/merges all files) or it doesn't do anything and it's up to you need to rebase or merge from the upstream. Git never does partial updates.
I dont want to see what files I have modified or a complete list of files with new commits, I just want to see if the files that I modified have new commits or not.
I think you're asking the wrong question, but sure, you can list the commits that modify a particular set of files or compare two commits and only display the differences in a particular set of files. Both git diff and git log accept one or more paths to files that you want to restrict the output to. To find the files you can use git ls-files -mo to obtain dirty files and untracked files in your workspace, git diff-tree --name-only -r HEAD~..HEAD to get the files modified by the most recent commit, and so on.
Putting it all together, the following command fetches the most recent state from the upstream and shows the new commits (git log HEAD..origin/master) that touch upon files that you yourself have modified on the current branch since the last update from the upstream (git diff-tree --name-only -r origin/master..HEAD):
git fetch
git log HEAD..origin/master -- $(git diff-tree --name-only -r origin/master..HEAD)
A Unix-like shell is assumed. On Windows things may look differently.
You can use git hook to track the list of files.
In your post-receive hook search for the given file and do what ever you need to do.
Another option is to track it manually using the follow flag
git log --follow <path>, it will print out the list of changes made to the given file in each commit