Is it possible to add the SHA of my current commit to the core file pattern? - linux

I'm looking to add the git sha to the core file pattern so I know exactly which commit was used to generate the core file.
Is there a way to do this?

It's not clear to me what you mean by "the core file pattern". (In particular, when a process crashes and the Linux kernel generates a core dump, it uses kernel.core_pattern. This setting is system-wide, not per-process. There is a way to run an auxiliary program—see How to change core pattern only for a particular application?—but that only gets you so far; you still have to write that program. See also https://wiki.ubuntu.com/Apport.) But there is a general problem here, which has some hacky solutions, all of which are variants on a pretty obvious method that is still a little bit clever.
The general problem
The hash of the commit you are about to make is not known until after you have made it. Worse, even if you can compute the hash of the commit you are about to make—which you can, it's just difficult—if you then change the content of some committed file that will go into the commit, so as to include this hash, you change the content of the commit you do make which means that you get a different actual commit hash.
In short, it is impossible to commit the commit hash of the commit inside the commit.
The hacky solution
The general idea is to write an untracked file that you use in your build process, so that the binary contains the commit hash somewhere easily found. For projects built with Make, see how to include git commit-number into a c++ executable? for some methods.
The same kind of approach can be used when building tarballs. Git has the ability to embed the hash ID of a file (blob object) inside a work-tree file, using the ident filter, but this is the ID of the file, which is usually not useful. So, instead, if you use git archive to produce tar or zip files, you can use export-subst, as described in the gitattributes documentation and referred-to in the git archive documentation. Note that the tar or zip archive also holds the commit hash ID directly.
Last, you can write your own custom smudge filter that embeds a commit hash ID into a work-tree file. This might be useful in languages where there is no equivalent of an external make process run to produce the binary. The problem here is that when the smudge filter reads HEAD, it's set to the value before the git checkout finishes, rather than the value after it finishes. This makes it much too difficult to extract the correct commit hash ID (if there is even a correct one—note that git describe will append -dirty if directed, to indicate that the work-tree does not match the HEAD commit, when appropriate).

Related

git forces refresh index after switching between Windows and Linux

I have a disk partition (format: NTFS) shared by Windows and Linux. It contains a git repository (about 6.7 GB).
If I only use Windows or only use Linux to manipulate the git repository everything is okay.
But everytime I switch the system, the git status command will refresh the index, which takes about 1 minute. If I run the git status in the same system again, it only take less than 1 second. Here is the result
# Just after switch from windows
[#5#wangx#manjaro:duishang_design] git status # this command takes more than 60s
Refresh index: 100% (2751/2751), done.
On branch master
nothing to commit, working tree clean
[#10#wangx#manjaro:duishang_design] git status # this time the command takes less than 1s
On branch master
nothing to commit, working tree clean
[#11#wangx#manjaro:duishang_design] git status # this time the command takes less than 1s
On branch master
nothing to commit, working tree clean
I guess there is some problem with the git cache. For example: Windows and Linux all use the .git/index file as cache file, but the git in Linux system can't recognize the .git/index changed by Windows. So it can only refresh the index and replace the .git/index file, which makes the next git status super fast and git status in Windows very slow (because the Windows system will refresh the index file again).
Is my guess correct? If so, how can I set the index file for different system? How can I solve the problem?
You are completely correct here:
The thing you're using here, which Git variously calls the index, the staging area, or the cache, does in fact contain cache data.
The cache data that it contains is the result of system calls.
The system call data returned by a Linux system is different from the system call data returned by a Windows system.
Hence, an OS switch completely invalidates all the cache data.
... how can I use set the index file for different system?
Your best bet here is not to do this at all. Make two different work-trees, or perhaps even two different repositories. But, if that's more painful than this other alternative, try out these ideas:
The actual index file that Git uses merely defaults to .git/index. You can specify a different file by setting GIT_INDEX_FILE to some other (relative or absolute) path. So you could have .git/index-linux and .git/index-windows, and set GIT_INDEX_FILE based on whichever OS you're using.
Some Git commands use a temporary index. They do this by setting GIT_INDEX_FILE themselves. If they un-set it afterward, they may accidentally use .git/index at this point. So another option is to rename .git/index out of the way when switching OSes. Keep a .git/index-windows and .git/index-linux as before, but rename whichever one is in use to .git/index while it's in use, then rename it to .git/index-name before switching to the other system.
Again, I don't recommend attempting either of these methods, but they are likely to work, more or less.
As torek mentioned, you probably don't want to do this. It's not generally a good idea to share a repo between operating systems.
However, it is possible, much like it's possible to share a repo between Windows and Windows Subsystem for Linux. You may want to try setting core.checkStat to minimal, and if that isn't sufficient, core.trustctime to false. That leads to the minimal amount of information being stored in the index, which means that the data is going to be as portable as possible.
Note, however, that if your repository has symlinks, that it's likely that nothing you do is going to prevent refreshes. Linux typically considers the length of a symlink to be its length in bytes, and Windows considers it to take one or more disk blocks, so there will be a mismatch in size between the operating systems. This isn't avoidable, since size is one of the attributes used in the index that can't be disabled.
This might not apply to the original poster, but if Linux is being used under the Windows Subsystem for Linux (WSL), then a quick fix is use git.exe even on the Linux side. Use an alias or something to make it seamless. For example:
alias git=git.exe
Auto line ending setting solved my issue as in this discussion. I am referring to Windows, WSL2, Portable Linux OS, and Linux as well which I have setup and working as my work requirement. I will update in case I face any issue while preferring this approach for updating code base from different filesystems (NTFS or Linux File System).
git config --global core.autocrlf true

How to get all changes you made to your config files (since system install) in one shot?

I wonder if there is any way i could retrieve all changes i made to my various configuration files since install(residing in /etc and so on) in one shot?
I imagine some kind of loop, that uses 'diff' to compare all those files to a 'standard installation' of ubuntu. Output should be a single file with information regarding the changes that were made and a timestamp.
Perhaps there is even a way to put all that in a script and let it run regularly to automatically keep track of future config file changes.
If the files are already modified, I guess your only option is to diff your files with a fresh install. Keep in mind some files might be specific to you computer, I'm thinking of files that can hold device-specific values like your mac address udev/rules.d/70-persistent-net.rules, your drives uuid /etc/fstab, etc.
If you're planning this ahead, there are at least two options you can consider:
use a VCS such as git.
use a filesystem that keeps a complete history of the changes made.

Obliterating already integrated feature branch

I work in a team creating plenty of feature "stream task".
Even if we do delete the stream task after integration, the associated branch still exists in the depot and is somewhat cluttering various user interfaces.
I am tempted to ask the admin to obliterate them as we go along.
I have already read carefully : http://answers.perforce.com/articles/KB/2565
However, the obliterate is always associated with the scary warning "please contact Perforce Support first". So before going down that path I would like to know what are the risks, except erasing the wrong branch.
What will happen to files that have been initially created in feature branches ? Will obliterating the original, transform the lazy copy into a full fledged file ?Since the lazy copy is in the mainline, will the oldest revision will now point to the on in the mainline ?
Will it interfere with the "interchange" command ? If I have 2 "dev" branch moving in parallel, I believe it will still work because I will be actually compare the "merge changelist" that won't be affected by the removal of the task branch ?
What happens if a file is renamed in feature branch ? Will I lose the full range of history and the 2 files will look "disconnected" ?
Is there any other risk I have not taken into account ?
Issue 3 is particularly dangerous, and could be a good reason to not go on with the plan.
I currently believe it is "safe" to obliterate an already integrated feature branch if 1 & 2 are true :
No move/add/delete has been done in the branch (this can be checked by fstat headaction property)
No subranch has been created from the branch (since we are using task stream this is enforced by default)
Please correct me if I am wrong.
In general, if a file has been integrated elsewhere, obliterating only the file in the task stream is safe, and you will still have the file by its other name.
But, the record of the changes (add/edit/delete, rename, further branching, etc.) that occurred to the file in the task stream will indeed be removed if you obliterate the task stream's history of the file, and so the overall history can end up being confusing and harder to read.
Myself, I prefer to maintain the entire history of those files, but I understand the view that, in the abstract, more history is not always better history.
When you are done with your task stream, are you deleting the stream spec? This will cause the unmodified files from the task stream to disappear, leaving you with only the history of the files that were actually modified in the task stream, which is typically a much smaller set of files.

Can git use patch/diff based storage?

As I understand it, git stores full files of each revision committed. Even though it's compressed there's no way that can compete with, say, storing compressed patches against one original revision full file. It's especially an issue with poorly compressible binary files like images, etc.
Is there a way to make git use a patch/diff based backend for storing revisions?
I get why the main use case of git does it the way it does but I have a particular use case where I would like to use git if I could but it would take up too much space.
Thanks
Git does use diff based storage, silently and automatically, under the name "delta compression". It applies only to files that are "packed", and packs don't happen after every operation.
git-repack docs:
A pack is a collection of objects, individually compressed, with delta compression applied, stored in a single file, with an associated index file.
Git Internals - Packfiles:
You have two nearly identical 22K objects on your disk. Wouldn’t it be nice if Git could store one of them in full but then the second object only as the delta between it and the first?
It turns out that it can. The initial format in which Git saves objects on disk is called a “loose” object format. However, occasionally Git packs up several of these objects into a single binary file called a “packfile” in order to save space and be more efficient. Git does this if you have too many loose objects around, if you run the git gc command manually, or if you push to a remote server.
Later:
The really nice thing about this is that it can be repacked at any time. Git will occasionally repack your database automatically, always trying to save more space, but you can also manually repack at any time by running git gc by hand.
"The woes of git gc --aggressive" (Dan Farina), which describes that delta compression is a byproduct of object storage and not revision history:
Git does not use your standard per-file/per-commit forward and/or backward delta chains to derive files. Instead, it is legal to use any other stored version to derive another version. Contrast this to most version control systems where the only option is simply to compute the delta against the last version. The latter approach is so common probably because of a systematic tendency to couple the deltas to the revision history. In Git the development history is not in any way tied to these deltas (which are arranged to minimize space usage) and the history is instead imposed at a higher level of abstraction.
Later, quoting Linus, about the tendency of git gc --aggressive to throw out old good deltas and replace them with worse ones:
So the equivalent of "git gc --aggressive" - but done properly - is to
do (overnight) something like
git repack -a -d --depth=250 --window=250

Perforce: How does files get stored with branching?

A very basic question about branching and duplicating resources, I have had discussion like this due to the size of our main branch, but put aside it is great to know how this really works.
Consider the problem of branching dozens of Gb.
What happens when you create a branch of this massive amount of information?
Am reading the official doc here and here, but am still confused on how the files are stored for each branch on the server.
Say a file A.txt exists in main branch.
When creating the branch (Xbranch) and considering A.txt won't have changes, will the perforce server duplicate the A.txt (one keeping the main changes and another for the Xbranch)?
For a massive amount of data, it becomes a matter because it will mean duplicate the dozens of Gb. So how does this really work?
Some notes in addition to Bryan Pendleton's answer (and the questions from it)
To really check your understanding of what is going on, it is good to try with a test repository with a small number of files and to create checkpoints after each major action and then compare the checkpoints to see what actual database rows were written (as well as having a look at the archive files that the server maintains). This is very quick and easy to setup. You will notice that every branched file generates records in db.integed, db.rev, db.revcx and db.revhx - let alone any in db.have.
You also need to be aware of which server version you are using as the behavior has been enhanced over time. Check the output of "p4 help obliterate":
Obliterate is aware of lazy copies made when 'p4 integrate' creates
a branch, and does not remove copies that are still in use. Because
of this, obliterating files does not guarantee that the corresponding
files in the archive will be removed.
Some other points:
The default flags for "p4 integrate" to create branches copied the files down to the client workspace and then copied them back to the server with the submit. This took time depending on how many and how big the files were. It has long been possible to avoid this using the -v (virtual) flag, which just creates the appropriate rows on the server and avoids updating the client workspace - usually hugely faster. The possible slight downside is you have to sync the files afterwards to work on them.
Newer releases of Perforce have the "p4 populate" command which does the same as an "integrate -v" but also does not actually require the target files to be mapped into the current client workspace - this avoids the dreaded "no target file(s) in client view" error which many beginners have struggled with! [In P4V this is the "Branch files..." command on right click menu, rather than "Merge/Integrate..."]
Streams has made branching a lot slicker and easier in many ways - well worth reading up on and playing with (the only potential fly in the ointment is a flat 2 level naming hierarchy, and also potential challenges in migrating existing branches with existing relationships into streams)
Task streams are pretty nifty and save lots of space on the server
Obliterate has had an interesting flag -b for a few releases which is like being able to quickly and easily remove unchanged branch files - so like retro-creating a task stream. Can potentially save millions of database rows in larger installations with lots of branching
In general, branching a file does not create a copy of the file's contents; instead, the Perforce server just writes an additional database record describing the new revision, but shares the single copy of the file's contents.
Perforce refers to these as "lazy copies"; you can learn more about them here: http://answers.perforce.com/articles/KB_Article/How-to-Identify-a-Lazy-Copy-of-a-File
One exception is if you use the "+S" filetype modifier, as in this case each branch will have its own copy of the content, so that the +S semantics can be performed properly on each branch independently.

Resources