git forces refresh index after switching between Windows and Linux - linux

I have a disk partition (format: NTFS) shared by Windows and Linux. It contains a git repository (about 6.7 GB).
If I only use Windows or only use Linux to manipulate the git repository everything is okay.
But everytime I switch the system, the git status command will refresh the index, which takes about 1 minute. If I run the git status in the same system again, it only take less than 1 second. Here is the result
# Just after switch from windows
[#5#wangx#manjaro:duishang_design] git status # this command takes more than 60s
Refresh index: 100% (2751/2751), done.
On branch master
nothing to commit, working tree clean
[#10#wangx#manjaro:duishang_design] git status # this time the command takes less than 1s
On branch master
nothing to commit, working tree clean
[#11#wangx#manjaro:duishang_design] git status # this time the command takes less than 1s
On branch master
nothing to commit, working tree clean
I guess there is some problem with the git cache. For example: Windows and Linux all use the .git/index file as cache file, but the git in Linux system can't recognize the .git/index changed by Windows. So it can only refresh the index and replace the .git/index file, which makes the next git status super fast and git status in Windows very slow (because the Windows system will refresh the index file again).
Is my guess correct? If so, how can I set the index file for different system? How can I solve the problem?

You are completely correct here:
The thing you're using here, which Git variously calls the index, the staging area, or the cache, does in fact contain cache data.
The cache data that it contains is the result of system calls.
The system call data returned by a Linux system is different from the system call data returned by a Windows system.
Hence, an OS switch completely invalidates all the cache data.
... how can I use set the index file for different system?
Your best bet here is not to do this at all. Make two different work-trees, or perhaps even two different repositories. But, if that's more painful than this other alternative, try out these ideas:
The actual index file that Git uses merely defaults to .git/index. You can specify a different file by setting GIT_INDEX_FILE to some other (relative or absolute) path. So you could have .git/index-linux and .git/index-windows, and set GIT_INDEX_FILE based on whichever OS you're using.
Some Git commands use a temporary index. They do this by setting GIT_INDEX_FILE themselves. If they un-set it afterward, they may accidentally use .git/index at this point. So another option is to rename .git/index out of the way when switching OSes. Keep a .git/index-windows and .git/index-linux as before, but rename whichever one is in use to .git/index while it's in use, then rename it to .git/index-name before switching to the other system.
Again, I don't recommend attempting either of these methods, but they are likely to work, more or less.

As torek mentioned, you probably don't want to do this. It's not generally a good idea to share a repo between operating systems.
However, it is possible, much like it's possible to share a repo between Windows and Windows Subsystem for Linux. You may want to try setting core.checkStat to minimal, and if that isn't sufficient, core.trustctime to false. That leads to the minimal amount of information being stored in the index, which means that the data is going to be as portable as possible.
Note, however, that if your repository has symlinks, that it's likely that nothing you do is going to prevent refreshes. Linux typically considers the length of a symlink to be its length in bytes, and Windows considers it to take one or more disk blocks, so there will be a mismatch in size between the operating systems. This isn't avoidable, since size is one of the attributes used in the index that can't be disabled.

This might not apply to the original poster, but if Linux is being used under the Windows Subsystem for Linux (WSL), then a quick fix is use git.exe even on the Linux side. Use an alias or something to make it seamless. For example:
alias git=git.exe

Auto line ending setting solved my issue as in this discussion. I am referring to Windows, WSL2, Portable Linux OS, and Linux as well which I have setup and working as my work requirement. I will update in case I face any issue while preferring this approach for updating code base from different filesystems (NTFS or Linux File System).
git config --global core.autocrlf true

Related

How to get all changes you made to your config files (since system install) in one shot?

I wonder if there is any way i could retrieve all changes i made to my various configuration files since install(residing in /etc and so on) in one shot?
I imagine some kind of loop, that uses 'diff' to compare all those files to a 'standard installation' of ubuntu. Output should be a single file with information regarding the changes that were made and a timestamp.
Perhaps there is even a way to put all that in a script and let it run regularly to automatically keep track of future config file changes.
If the files are already modified, I guess your only option is to diff your files with a fresh install. Keep in mind some files might be specific to you computer, I'm thinking of files that can hold device-specific values like your mac address udev/rules.d/70-persistent-net.rules, your drives uuid /etc/fstab, etc.
If you're planning this ahead, there are at least two options you can consider:
use a VCS such as git.
use a filesystem that keeps a complete history of the changes made.

Can git use patch/diff based storage?

As I understand it, git stores full files of each revision committed. Even though it's compressed there's no way that can compete with, say, storing compressed patches against one original revision full file. It's especially an issue with poorly compressible binary files like images, etc.
Is there a way to make git use a patch/diff based backend for storing revisions?
I get why the main use case of git does it the way it does but I have a particular use case where I would like to use git if I could but it would take up too much space.
Thanks
Git does use diff based storage, silently and automatically, under the name "delta compression". It applies only to files that are "packed", and packs don't happen after every operation.
git-repack docs:
A pack is a collection of objects, individually compressed, with delta compression applied, stored in a single file, with an associated index file.
Git Internals - Packfiles:
You have two nearly identical 22K objects on your disk. Wouldn’t it be nice if Git could store one of them in full but then the second object only as the delta between it and the first?
It turns out that it can. The initial format in which Git saves objects on disk is called a “loose” object format. However, occasionally Git packs up several of these objects into a single binary file called a “packfile” in order to save space and be more efficient. Git does this if you have too many loose objects around, if you run the git gc command manually, or if you push to a remote server.
Later:
The really nice thing about this is that it can be repacked at any time. Git will occasionally repack your database automatically, always trying to save more space, but you can also manually repack at any time by running git gc by hand.
"The woes of git gc --aggressive" (Dan Farina), which describes that delta compression is a byproduct of object storage and not revision history:
Git does not use your standard per-file/per-commit forward and/or backward delta chains to derive files. Instead, it is legal to use any other stored version to derive another version. Contrast this to most version control systems where the only option is simply to compute the delta against the last version. The latter approach is so common probably because of a systematic tendency to couple the deltas to the revision history. In Git the development history is not in any way tied to these deltas (which are arranged to minimize space usage) and the history is instead imposed at a higher level of abstraction.
Later, quoting Linus, about the tendency of git gc --aggressive to throw out old good deltas and replace them with worse ones:
So the equivalent of "git gc --aggressive" - but done properly - is to
do (overnight) something like
git repack -a -d --depth=250 --window=250

Fast kernel recompile

I'm trying to automate the process of recompile a upgraded kernel. (I mean version upgrade)
What I do:
Backup the object files (*.o) with rsync
Remove the directory and make mrproper
Extract new source and patch
Restore object files with rsync
But I found it doesn't make sense. Since skip compiled things need to get a hash, this should removed it.
Question: What file do I need to keep? or it doesn't exists?
BTW: I already know ccache but it broke with a little config change.
You're doing it wrong™ :-)
Keep the kernel tree as-is and simply patch it using the appropriate incremental patch. For example, for 3.x, you find these patches here:
https://www.kernel.org/pub/linux/kernel/v3.x/incr/
If you currently have 3.18.11 built and want to upgrade to 3.18.12, download the 3.18.11-12 patch:
https://www.kernel.org/pub/linux/kernel/v3.x/incr/patch-3.18.11-12.xz
(or the .gz file, if you don't have the xz utilities installed.)
and apply it. Then "make oldconfig" and "make". Whatever needs to be rebuilt will be rebuilt.
However, it's actually best to not rely on the object file dependency mechanism. Who knows if something might end up not being rebuilt even though it should due to a bug. So I'd recommend starting clean every time with a "make clean" before applying the patch, even though it will rebuild everything.
Are you really in such a big need to save build time? If yes, it might be a better idea to configure the kernel ("make menuconfig") and disable all functionality you don't need (like device drivers for hardware you don't have, file systems you don't care about, networking features you will not use, etc.) Such a kernel that's optimized for my needs only takes about 3 or 4 minutes to build (normally, the full kernel with everything enabled would need over half an hour; or even more these days, it's been a very long time since I've built non-optimized kernels.)
Some more info on kernel patches:
https://www.kernel.org/doc/Documentation/applying-patches.txt
The incremental patch is a good way since it updates time stamps properly.
(GNU) Make use time stamps to identify rebuild so just keep the time stamps to avoid rebuild.
If we need rsync, we should use it with -t option.
Also for a patch doesn't have incremental patches, we can make it manually by comparing patched files.

Implementing an update/upgrade system for embedded Linux devices

I have an application that runs on an embedded Linux device and every now and then changes are made to the software and occasionally also to the root file system or even the installed kernel.
In the current update system the contents of the old application directory are simply deleted and the new files are copied over it. When changes to the root file system have been made the new files are delivered as part of the update and simply copied over the old ones.
Now, there are several problems with the current approach and I am looking for ways to improve the situation:
The root file system of the target that is used to create file system images is not versioned (I don't think we even have the original rootfs).
The rootfs files that go into the update are manually selected (instead of a diff)
The update continually grows and that becomes a pita. There is now a split between update/upgrade where the upgrade contains larger rootfs changes.
I have the impression that the consistency checks in an update are rather fragile if at all implemented.
Requirements are:
The application update package should not be too large and it must also be able to change the root file system in the case modifications have been made.
An upgrade can be much larger and only contains the stuff that goes into the root file system (like new libraries, kernel, etc.). An update can require an upgrade to have been installed.
Could the upgrade contain the whole root file system and simply do a dd on the flash drive of the target?
Creating the update/upgrade packages should be as automatic as possible.
I absolutely need some way to do versioning of the root file system. This has to be done in a way, that I can compute some sort of diff from it which can be used to update the rootfs of the target device.
I already looked into Subversion since we use that for our source code but that is inappropriate for the Linux root file system (file permissions, special files, etc.).
I've now created some shell scripts that can give me something similar to an svn diff but I would really like to know if there already exists a working and tested solution for this.
Using such diff's I guess an Upgrade would then simply become a package that contains incremental updates based on a known root file system state.
What are your thoughts and ideas on this? How would you implement such a system? I prefer a simple solution that can be implemented in not too much time.
I believe you are looking wrong at the problem - any update which is non atomic (e.g. dd a file system image, replace files in a directory) is broken by design - if the power goes off in the middle of an update the system is a brick and for embedded system, power can go off in the middle of an upgrade.
I have written a white paper on how to correctly do upgrade/update on embedded Linux systems [1]. It was presented at OLS. You can find the paper here: https://www.kernel.org/doc/ols/2005/ols2005v1-pages-21-36.pdf
[1] Ben-Yossef, Gilad. "Building Murphy-compatible embedded Linux systems." Linux Symposium. 2005.
I absolutely agree that an update must be atomic - I have started recently a Open Source project with the goal to provide a safe and flexible way for software management, with both local and remote update. I know my answer comes very late, but it could maybe help you on next projects.
You can find sources for "swupdate" (the name of the project) at github.com/sbabic/swupdate.
Stefano
Currently, there are quite a few Open Source embedded Linux update tools growing, with different focus each.
Another one that is worth being mentioned is RAUC, which focuses on handling safe and atomic installations of signed update bundles on your target while being really flexible in the way you adapt it to your application and environment. The sources are on GitHub: https://github.com/rauc/rauc
In general, a good overview and comparison of current update solutions you might find on the Yocto Project Wiki page about system updates:
https://wiki.yoctoproject.org/wiki/System_Update
Atomicity is critical for embedded devices, one of the reasons highlighted is power loss; but there could be others like hardware/network issues.
Atomicity is perhaps a bit misunderstood; this is a definition I use in the context of updaters:
An update is always either completed fully, or not at all
No software component besides the updater ever sees a half installed update
Full image update with a dual A/B partition layout is the simplest and most proven way to achieve this.
For Embedded Linux there are several software components that you might want to update and different designs to choose from; there is a newer paper on this available here: https://mender.io/resources/Software%20Updates.pdf
File moved to: https://mender.io/resources/guides-and-whitepapers/_resources/Software%2520Updates.pdf
If you are working with the Yocto Project you might be interested in Mender.io - the open source project I am working on. It consists of a client and server and the goal is to make it much faster and easier to integrate an updater into an existing environment; without needing to redesign too much or spend time on custom/homegrown coding. It also will allow you to manage updates centrally with the server.
You can journal an update and divide your update flash into two slots. Power failure always returns you to the currently executing slot. The last step is to modify the journal value. Non atomic and no way to make it brick. Even it if fails at the moment of writing the journal flags. There is no such thing as an atomic update. Ever. Never seen it in my life. Iphone, adroid, my network switch -- none of them are atomic. If you don't have enough room to do that kind of design, then fix the design.

Putting a complete filesystem into revision control

I am currently working on a PXE bootable environment that I would like to put into revision control.
The filesystem and server will both be Linux (SLES if you must know).
I've considered using some kind of hack that stores file ownership/permissions via getfacl -R -P, but this doesn't cover symlinks or devices. And it's kind of ugly.
Tricky things that I need to be covered:
file ownership
file permissions (ACLs are not necessary)
devices
symlinks
Are there any revision control systems that will cover my needs for this?
Note: Rather than a "block volume", I need to put a "set of files" into revision control and keep individual changes.
I'm pretty sure that Perforce will handle all of those things. I believe another group at work uses it for nearly the exact same purpose that you list (storing a variant of linux for an embedded device).
etckeeper is a tool that puts the contents of /etc under revision control (with a choice of revision control systems). It has a layer on top that takes care of permissions, symlinks, etc. I'm not sure it if handles device nodes - probably not - but it could probably easily be extended to do so.
You may find that this tool can be adapted to handle an entire filesystem.
You can use versioning filesystems, that is, filesystems with version control built into the filesystem code itself.
Some examples:
NILFS
Ext3cow
In principle, any filesystem that supports snapshots could be used.
More information: Versioning file system on Wikipedia.
If your file system will not be extremely huge, you might just store it all in a single file, version control that binary file, and then mount that to work with it. It won't give you pretty file-level control, but it might work acceptably for some tasks.

Resources