Checking out to another commit with git while reading the file - node.js

I am creating a server that handles version control of files in server and let the client view them at specific commit if they wanted.
The way I implement this is that when user clicks a specific commit, I call checkout [hash of commit] to revert the file back to what it was and then read from that file.
The problem is that two people may be trying to read different commits of the same repository at the same time, meaning the state of file may change while reading from the file.
I tried checking out another commit while reading from it and it seems to work okay but I cannot be sure when the it is scaled.
I am using nodeJS and express for my server. When nodeJS starts reading file, will it still be the same state as the point when it started reading or would it change along with the change that is forced by git if I checkout another commit while reading the file?

Instead of using checkout, consider show:
git show <commit id>:<filename>
This will print the contents of the file at that commit. If you absolutely need it in a file, generate a unique temporary filename and redirect the output:
git show <commit id>:<filename> > tmpfile_uniquesuffix

Related

Remove one specific empty git commit with command-line tool

For a nodejs command-line tool I add an empty commit to a repo and then want to remove it later.
Later I have at least 3 commits. The first one is a merge commit, the second one is the empty one I created and the third one is likely one from another now merged repo. Now that my tool has done it's task I want to remove the empty commit.
git rebase --onto emptyCommitID^ emptyCommitID
resulted in: fatal: Does not point to a valid commit 'emptyCommitID^'
(since the ID is the correct one I assume the commit is invalid due to it being empty)
git rebase --keep-base --onto thirdCommit^ headCommit
resulted in fatal: cannot combine '--keep-base' with '--onto'
trying rebase -i HEAD~3 after the tool had done it's main job resulted in:fatal: invalid upstream 'HEAD~3' which might be due to either the empty commit or the merged unrelated histories idk.
I do not want to use git filter-branch --prune-empty, because the tool shall leave other potentially empty commits untouched.
(The tool is for merging repos with unrelated histories. I create the empty commit so that files are staged when merged instead of committed directly which also happens when the --no-commit flag is set in an just initialized repo without prior commits)
maybe it is possible to solve this with git rebase --interactive, but I had the described problem with the invalid upstream and view this as very difficult to implement with a command line tool, mostly using exec to execute it's commands. Do you know a solution?
I think git rebase --onto emptyCommitID^ emptyCommitID should work.
fatal: Does not point to a valid commit 'emptyCommitID^' means that the emptyCommitID has no parent. It contradicts that the second one is the empty one and the first one is its parent.

Git - max depth in gitignore/exclude

A short story: I have recently made a clean install of Arch Linux on my PC because my old install got very bloated with unnecessary packages and config directories. Now I want to keep my home directory clean and simple. I decided to use git to supervise every file and folder there but I can't just exclude every log(or any other constantly updating dir/file) as it is too much of a hassle.
The idea is to include only the first level of files and directories in $HOME/, $HOME/.config/, and $HOME/.local/share/. For instance, include .config/foo/ and exclude its contents i.e. .config/foo/* so I could check the git log when I uninstall a package what directory(es) did it create and remove them manually(of course, if I won't use it anymore)
I tried to accomplish this by adding this to my .git/info/exclude
*/*
*/*/*
*/*/*/*
*/*/*/*/*
.local/share/*/*
.local/share/*/*/*
.local/share/*/*/*/*
.local/share/*/*/*/*/*
.config/*/*
.config/*/*/*
.config/*/*/*/*
.config/*/*/*/*/*
because I read that git needs a separate wildcard for every directory level. As you probably have already understood - it didn't work.
So, the question is - how can I monitor only the files and directories in $HOME/, $HOME/.config/, and $HOME/.local/share/ without monitoring their contents. Thanks!
TL;DR
What you'll want is to use .gitignore to specifically ignore certain files and subdirectories:
*/
!.config
!.config/*
.config/*/
!.local
!.local/*
.local/*/
To see how this works, and what it does (and doesn't do) for you, read the long version. (The !.config/* is almost certainly unnecessary; I put it in when I had * as part of not saving any top level files, which isn't quite what you asked for. The same holds for !.local/*. Without actually testing it, though, I'm not sure if .config/afile matches the .config rule.)
(But note that you probably do want to source-control additional .config files. I also recommend doing this an entirely different way, using symlinks for the .foorc type files—that's what I do.)
Long
There isn't any maximum depth, other than any system-imposed maximum (which varies depending on your OS). But there's a big problem here: Git doesn't store directories.1
What Git does store, underneath its top level storage item which is the commit, are files (which Git calls blobs), with associated path names. If you ask Git to extract commit #1234567..., Git looks inside that commit, finds the path names of the various blobs, and creates directories (new, empty ones) if and when necessary to hold the specific blobs (i.e., files) that Git is extracting from that commit with the names they have as stored in that commit.
This doesn't mean that your idea is doomed, just that you're starting with a misconception. Git won't save the directory .config at all, for instance. It will just save the file .config/Trolltech.conf, for instance. If Git has saved that file in some commit, and you git checkout that specific commit, Git will create a new, empty .config if required. If the directory already exists, Git won't do anything about that. In some cases, such as moving from a commit in which that file exists to one in which it does not, Git will remove the directory as well, but in some cases it won't, and you will need to use git clean -d to make Git really remove it (if that's possible, i.e., if it's empty).
Having saved that particular file, if Git is being instructed to ignore the subdirectory .config/git, Git may not save the file .config/git/ignore. This is where things get complicated. You need to understand how Git commits work, what the index is and how (to some extent) it works, and what Git does to work with, and maintain, a work-tree.
1Git does store tree entries, which could work as a flag by which to save empty directories, but other parts of Git combine in strange ways to make this whole concept fail.
Git is built around the concept of commits
As we noted above, what Git stores, fundamentally, is the commit. A commit is a complete, mostly-standalone snapshot of some set of files, which Git calls blobs. (This deliberately ignores submodules and symbolic links, but they're stored as blobs as well, using tree entries of a type that distinguishes them from plain files.) I say "mostly-standalone" because each commit records some number of parent commit hash IDs, though most commonly, just one. A commit that stores three parent hash IDs depends on those three parent commits' existence: a repository that's missing the three parents is somehow incomplete.2 The parent linkage is not important for this particular application, but it's good to know how this works.
There is, though, one particularly difficult event in the life a commit: creating it. Once a commit is created, it is read-only. It has a unique hash ID, determined solely by the commit's content (including all its parent hash IDs). But what files go into a commit? This is the key question and is where .gitignore eventually comes into the picture.
2This is the essence of a shallow clone. A clone that is not shallow (and hence is complete) starts with the tip commits of each branch (and any tagged commits or annotated tag objects). These commits (or annotated tag objects) point back to earlier, ancestor, commits through their parent hash IDs. Since the repository is complete, those objects exist as well; they contain their parent hash IDs, and those commit objects exist, and so on. The whole process stops only when we reach some commit(s) that have no parent. Usually this is the first commit ever made, which obviously can't have a parent. Such a commit is called a root commit, and in any non-empty but complete repository, there is always at least one root commit.
The files in a new commit are set up in the index
Besides the repository itself—the repository being a database of Git objects, i.e., commits and blobs and the intermediate thing Git calls a tree (these store the files' names, among other data)—Git has this key data structure with three different names. It's variously called the index, the staging area, and the cache.
The index is normally pretty much invisible. There is one Git command, git ls-files, that can show you the contents of the index directly (git ls-files --stage, or even more verbosely, git ls-files --debug), but it's not really useful to end users. A good top-level description of the index, though, is that it's where you build your next commit.
When you run git commit, Git takes every file that is currently in the index, in whatever form it currently has in the index, and makes a new commit out of that. Those are the files stored in the new commit. The new commit's author and committer are you; the time stamp is "now"; and the parent of the new commit is whatever commit you had checked out before; but the files—the blobs and their associated names—are entirely set by whatever is in the index.3 Likewise, when you use git checkout to extract some particular commit, what Git does first is to copy that commit's files into the index.
Note that when you do make a new commit, that new commit becomes the current commit. When that happens, Git updates the current branch name—the branch you have checked out, such as master—so that it records the new commit. In fact, each branch name records just one hash ID. Git calls this the tip of the branch. As we saw in footnote 2 above, Git works backwards, starting from branch tips, to find all the commits contained within a branch. So making a new commit shoves the new commit's hash ID into the branch name table.
3Even if you use git commit -a or git commit <file>, Git really just copies files into the index—or sometimes, an (auxiliary) index—and builds the commit from that index.
The work-tree
All the files stored inside Git, both in the repository and in the index, are in a special, Git-only format. Few if any other programs on the computer can work with these files, so Git extracts each file into a usable version, where you can do work. This is your work-tree.
In general, every file that's in the current commit also appears in the work-tree. The current commit is, of course, the one you ran git checkout on. If you just ran git checkout master to check out the master branch, what you did in terms of current commit was to check out whatever commit the name master identifies: the tip commit of that branch.
As we mentioned above, all the files (blob objects) got copied into the index, at that point. Git was also able to use whatever was in the index to know what was in your work-tree before that point: for any file that was in the index (and hence in the work-tree) and now isn't in the index because of this checkout, Git should remove that file from the work-tree. And it does! For any file that Git has to replace in the index, or add to the index, Git should copy the index version to the work-tree—and it does.
What's in the index after the git checkout is exactly whatever blobs are (via any intermediate tree objects) in the commit you checked out. The work-tree versions of those files will match the index versions of those files, except that the work-tree versions are actually usable. The index versions of those files will match the commit's versions of those files—and in fact, they share the underlying storage, as the index stores just the path names and blob hash IDs.
Now, there may be files in the work-tree that Git doesn't know about. These files are, by definition, not in the index. These are untracked files. That is what an untracked file is, in Git: it's a file that's not in the index. There is nothing more to it.
(Well, you can remove a file from the index. Then it's not in the index, and hence untracked. That's not really anything more, but it's worth remembering.)
Ignoring untracked files
The problem with untracked files is that Git whines about them. :-) It's constantly griping at you, telling you that files A, B, and C are untracked. So this is where .gitignore comes in—but .gitignore is about the work-tree, and unlike commits, the work-tree does have directories.
You can list specific files in .gitignore. If those files are not in the index (are untracked), but are in the work-tree, Git would complain about them ... but then it sees that they're listed in .gitignore and shuts up.
You can also git add files en-masse, using git add . or git add --all. This has Git scan the work-tree for files, and upon finding them, git add each one to the index, to copy the work-tree version into the builds-the-next-commit index version. Clearly, if files A, B, and C are currently both untracked and ignored, though, Git shouldn't add them. So .gitignore also tells Git not to add existing untracked-and-ignored files to the index.
Existing files that are in the index are automatically tracked, so any en-masse git add that might potentially add those files, will add them, regardless of what's listed in .gitignore. In other words, adding a tracked file to .gitignore has no effect on it. Being in .gitignore only affects untracked files.
But that's files, not directories. This is where everything gets squirrelly. Files exist inside directories, in the normal file system (i.e., not in Git, but in the work-tree).
One of the big reasons Git has the index (and calls it the cache) is that looking at every file in a big file-tree tends to be extremely slow. Git can use the index to record information about all the tracked files, including information that speeds up en-masse git add --all style operations. That's fine for files that are in the index, but what about for whole subdirectories that (a) aren't in the index, so by definition they're untracked and (b) will be ignored, so they won't go into the index and will remain untracked?
Git can avoid scanning those subdirectories entirely. If .config/dir/ is going to be ignored, and Git has just come across the name .config/dir and it's a directory, why then, Git can just skip reading inside it. That's a lot faster than reading it and checking every file to see if it should be ignored.
When Git is scanning the work-tree, it starts at the top and reads the whole contents of the tree: all file names and all sub-directory names. It knows which are files and which are sub-directories, but it has not yet looked inside any of the subdirectories.
Now, Git checks all the files: are they in the index? If so, they're tracked: see if they should be updated. If not, they're untracked: see if Git should whine about them.
Next, Git checks all the sub-directories. For each sub-directory: are there any files for it that are in the index? If so, the sub-directory must be examined. But if not, is the sub-directory ignored? If so, don't even look inside it. Otherwise, look inside it, just as we would if there were files in the index.
Now, for each file or sub-directory, there can be one or more .gitignore entries. An entry ending with * matches files and directories. An entry ending with */ matches directories. An entry starting with ! means: explictly not ignored.
So, suppose Git is scanning the top level and comes across the name .a, and it's a file. Git will look for any ignore entry matching .a. If there's an entry */, well, that doesn't match .a; so .a is added, unless there's a later entry overriding it. There isn't, so we add the file .a.
Next, Git encounters .adir, which is a directory. There are no .adir files in the index, so a scan isn't forced, so Git will check for an ignore entry matching .adir. Since */ is the only match, Git gets to ignore the directory. It will now not look inside .adir at all (unless and until you somehow add .adir/file to the index, which forces Git to read .adir to check whether .adir/file still exists).
When Git comes across .config (which is a directory), there's a */ that says to ignore it, but it's overridden by !.config which says not to ignore it. There's a .config/* but this is just .config-the-directory, not .config/something. So !.config is the last applicable entry, and Git must scan .config.
Sooner or later,4 Git will look inside .config. It may find .config/afile; this matches !.config/*. The last entry that it matches tells Git that the file isn't ignored, so it will be added to the index. Then Git comes across .config/git, which is a directory. It matches !.config/*, then .config/*/; so it gets ignored. Git never looks inside .config/git at all.
This repeats for the rest of .config. There may be more .-files, which Git will process as usual, until Git comes across .local, which works just like .config here.
As always, remember that this cannot affect any existing commits. Checking out any existing commit that has some file that violates the .gitignore rules here will cause Git to extract that file, creating its parent directory or directories if needed. Moving from that commit to one that lacks that same file, Git will remove the file, and if the directory containing it goes empty, usually5 remove the directory as well.
4This is where depth-first vs breadth-first scan comes in. Git currently does ASCII-sorted, depth-first directory traversal (so it's actually "right now") because of the way Git organizes the index. It doesn't matter from our "what gets ignored and what doesn't" perspective, though.
5Every once in a while I see weird behavior here that convinces me that there must be some bugs in this. The occasional git clean -ndf to see what would be cleaned, perhaps followed by git clean -df to actually do the cleaning, is useful. But I can never reproduce it, and it's never important enough to try... :-)

sync two vobs file (by clearfsimport) without checking in the updated file

I am using following command to sync B vob files from A vob
clearfsimport -master -follow -nsetevent -comment $2 /vobs/A/xxx/*.h /vobs/B/xxx/
It works fine. But it will check in all the changes automatically. Is there a way to do the same task but leave the update files in a check out status?
I want to update the file for B from A. Build my programme, and then re-cover the branch. So if the updated files is an check out status, I can do unco later. Well with my command before, everything is checked in. I cann't re-cover my branch then.
Thanks.
As VonC said, it's impossible to prevent "clearfsimport" to do the check in. And he suggested to use a label to recover back.
For me, the branch where I did "clearfsimport" is branched from a label.Let's call it LABEL_01. So I guess I can use that label for recovery. Is there an easy way (one command) to recover the files under /vobs/B/xxx/ to label LABEL_01 ? I want to do it in my bash script, so the less/easy the command is, the better.
Thanks.
After having a look at the man page for clearfsimport, no, it isn't possible to prevent the checkins.
I would set a label before the clearfsimport, and modify the config spec for the new version to be created in a branch (similar to this config spec).
That way, "re-cover" the initial branch would be easy: none of the new version would have been created in it.

Layering projects on top of each other with git

Let there be:
There are different repositories repoA, repoB and repoC each respecting the same directory layout principles, which are to be merged onto a third repoM's working directory (the "master" project).
repoM has an atypical setup (--work-dir and --git-dir are sepparate). repo[A-C] are cloned as bare, and they are set as core.bare = false and core.worktree=<--work-dir-of-repoM>.
The requirements:
I need to always have an overview over the history of all files in repoM's work-dir, which could have stemmed from repo[A-C]. With this approach, I lose all that information.
Alternative:
I've been thinking about using git-subtree instead (git version 1.7.11.2, so it's already built-in), leaving repo[A-C] bare, and then
git pull -s subtree, or
git subtree ...
With the subtree pull strategy, I lose the history on a merge conflict (git blame says so).
I've never used subtree before, but from my understanding it's not possible to merge files from repo[A-C] into repoM's work-dir, those files must be put into a subdirectory of repo[A-C]. This is definitely not what I need. Why? Because of the following ...
Problem statement:
You have different git repositories each containing different sets of files, usually configuration files and some shell scripts. You want to put everything in the $HOME (which is <--work-dir-of-repoM>) directory from all those repositories. You should be able to see at all time where each file comes from, edit, commit and push changes to each one's origin. You've guessed it, it something like vundle, but generalized for any kind of configuration of any program, not just vim bundles. If a conflict occures, one should be able to track down which two authors of the same file need to get in touch with each other and make up a deal (if one needs to be made).
This is for an open-source project I'm trying to get a prototype working, so any help is highly appreciated. Also ideas about already existing projects which do this in a similar manner are highly appreciated.
Note: the "master directory" does not necessarily have to be $HOME, I've used it as a possible hint on the kind of problem this could solve.
Why not simply use Git Submodules in your "master project"?

How to let git check for updates on the master server?

I have very poor knowledge about git and would like to ask for help.
I have a linux(-only) application which shall only be "downloaded" (i.e. cloned) with git. On startup, the app shall ask the git "master server" (github) for whether there are updates.
Does git offer a command to check for whether there is an update (without really updating - only checking)? Furthermore, can my app read the return value of that command?
If you do not want to merge, you can just git fetch yourremote/yourbranch, the remote/branch specification usually being origin/master. You could then parse the output of the command to see if new commits are actually present. You can refer to the latest fetched commit as either yourremote/yourbranch or possibly by the symref FETCH_HEAD.
Note: I was reminded that FETCH_HEAD refers to the last branch that was fetched. Hence in general you cannot rely on git fetch yourremote with FETCH_HEAD since the former fetches all tracked branches, thus the latter may not refer to yourbranch. Additionally,
you end up fetching more than strictly necessary.
also refer to Jefromi's answer to view but not actually downloaded changes
the following are not necessarily the most compact formats, just readable examples.
That being said, here are some options for checking for updates of a remote branch, which we will denote with yourremote/yourbranch:
0. Handling errors in the following operations:
0.1 If you attempt to git fetch yourremote, and git gives you an error like
conq: repository does not exist.
that probably means you don't have that remote-string defined. Check your defined remote-strings with git remote --verbose, then git remote add yourremote yourremoteURI as needed.
0.2 If git gives you an error like
fatal: ambiguous argument 'yourremote/yourbranch': unknown revision or path not in the working tree.
that probably means you don't have yourremote/yourbranch locally. I'll leave it to someone more knowledgeable to explain what it means to have something remote locally :-) but will say here only that you should be able to fix that error with
git fetch yourremote
after which you should be able to repeat your desired command successfully. (Provided you have defined git remote yourremote correctly: see previous item.)
1. If you need detailed information, git show yourremote/yourbranch and compare it to the current git show yourbranch
2. If you only want to see the differences, git diff yourbranch yourremote/yourbranch
3. If you prefer to make comparisons on the hash only, compare git rev-parse yourremote/yourbranch to git rev-parse yourbranch
4. If you want to use the log to backtrack what happened, you can do something like git log --pretty=oneline yourremote/yourbranch...yourbranch (note use of three dots).
If you really don't want to actually use bandwidth and fetch new commits, but just check whether there is anything to fetch, you can use:
git fetch --dry-run [remote]
where [remote] defaults to origin. You'll have to parse the output, though, which looks something like this:
From git://git.kernel.org/pub/scm/git/git
2e49dab..7f41b6b master -> origin/master
so it's really much easier to just fetch everything (git fetch [remote]), and then look at the diff/log e.g. between master and [remote]/master.
I'd say git fetch is a potential solution. It only updates the index, not working code. In cases of large commit sets, this would involve a download of compressed files/info, so it may be more than you want, but it is the most useful download you can do.

Resources