git-annex use a file from a different location - linux

My understanding is that when I perform git annex add somefile, it creates a symlink for that file and places it in the .git/annex/objects folder. Then, when I initialize git-annex in some different location and sync it with the previous one, it downloads a broken symlink, unless I do git annex sync --content, which makes a full copy of the file.
I need to have large files in one location, lets say on a USB Drive, and multiple git repositories that use the large files. So I want to have just the symlinks to the large files in those git repos. How to perform the sync so git-annex downloads a valid symlink that points to a file in a single location ?

There are two ways to do that.
First is using hard-links, second is using symlinks. I recommend hard-links if all your files are going to be in the same filesystem/volume/partition, otherwise the good ol' cp --link is just going to copy the entire thing.
Using hard-links:
git clone --shared main_repo/ new_repo/
Explained by git-annex author himself
Using symlinks:
On main_repo:
git worktree add -b branch_name path_to_new_repo/
Since git-worktree uses a pointer file (which git-annex replaces with a symlink), this will work across different file systems. Changes to "different repos" will be stored in different branches. If you want them all to remain in sync, keep them in sync with standard git commands like git merge. Or you could only make changes to the master branch and git rebase master from the different branches frequently.

Related

How to prevent Git from storing copies of LFS files in .git dir?

It seems that Git is storing copies of LFS files in .git/lfs. This is taking twice of the space. I know this is a typical way Git handles the files, but I'm still wondering if there is a way to prevent Git from caching copies of them and just download from cloud when trying to revert the files.
If the files are in the lfs folder, it's that git needed them at a moment to populate your working directory.
So, no, there is no way to prevent git to cache them (except maybe by doing a sparse checkout if you really don't need to have the files handled by git-lfs in your working directory).
But you have an easy way to clean this cache directory (git will keep only the currently used files and delete the others unused) with the command:
git lfs prune

Git thinks a file within a symlinked directory has been deleted after recreating the symlink, how can I fix it?

I have a symlinked directory within my repository, which links to files elsewhere on the filesystem. For whatever reason, the symlink breaks every now and then, and it turns into a regular empty folder. So I deleted the empty folder, and recreated the symlink with ln -s ../../ ext, which appears to have worked as I can browse that folder and see the contents. But when I run git status, it appears all the files that should be visible within the ext folder are missing. How can I make git see that they are there again, within the symlinked directory?
This is on Ubuntu 18 by the way.
Your setup is odd, because Git does not follow symlinks, it just stores them.
That is, if you have a symbolic link ext -> ../.. and you run git add ext, Git creates, in the index, an entry with mode 120000 (symlink) to store the blob contents ../... Committing will create a commit that, when extracted, will create the symbolic link ext pointing to ../... Git will not store any files within ext when it is storing this symbolic link.
If, on the other hand, you have an existing commit that contains files named ext/foo and ext/bar, and you clone this repository at this commit, or extract this commit into a new and otherwise empty work-tree, Git will see that in order to write to files named ext/foo and ext/bar, your OS requires that ext exists as a directory. It will therefore create the empty directory ext, in which it will then create files foo and bar as your OS requires, so as to create files that to Git are merely named ext/foo and ext/bar. These two names, ext/foo and ext/bar, will now be in the index, so that the next commit you make will also contain these two files.
It sounds like you:
cloned a repository (perhaps with git clone --no-checkout?);
manually created a symbolic link in the work-tree named ext, pointing to some existing directory (perhaps one with some files inside it);
convinced git checkout to create ext/foo and ext/bar without first removing the symbolic link ext and replacing it with a directory ext.
This is not a supported mode of operation1 and you should not be surprised when it goes wrong.
1It leads to security issues: Git is meant not to write any files "outside" the work-tree area, and writing to files "under" a symbolic link to a directory outside the work-tree would allow this to occur. Rather than carefully limit symbolic link usage, Git just generally doesn't store files "beyond" any link in the first place—though it's probably possible, through careful manipulation of the index and, at the OS level, the file system in which your work-tree resides, to trick Git manually.
just dont put a repo in a repo, its not worth it

Git clone without including top/parent folder

We have a repo in git where the project is contained in a folder called Project. We'd like to be able to release the code to a production server, by cloning the repo, without including the "Project" folder, but with everything below it. Is this possible? The destination directory name is /var/www, which is unrelated to anything in the project. Unfortunately I can't just do a symbolic link because of the nature of our hosting provider (which we'll change soon).
My answer take the assumption that you have a git repository whose content is the following:
/.gitignore
/Project
/Project/index.php
/ProjectB
/ProjectB/pom.xml
If you don't need history at all in that copy of your repository, there is the git archive command which can do what you want except its output its data in tar or zip format:
git archive [--format=<fmt>] [--list] [--prefix=<prefix>/] [<extra>]
[-o <file> | --output=<file>] [--worktree-attributes]
[--remote=<repo> [--exec=<git-upload-archive>]] <tree-ish>
[<path>…]
Like:
git archive --format=zip --remote=git#foobar.git master -- Project | unzip
However, the git clone command does not accept a repository path, and I think it's not really git like to export only a tree view of some branch. You would probably need a submodule making Project an independent git repository, or like the git archive example, get only what you want but without versioning (which can be questionable on a production server).
Instead, you can do that:
Clone your repository to whatever path, say /opt/foobar.
Create a symbolic link of /opt/foobar/Project in /var/www.
Or reference the /opt/foobar/Project in your apache configuration (to avoid the symlink) instead of plain /var/www.

Remove git-annex repository from file tree

I tried installing git-annex yesterday to backup my files. I ran git annex add . in the root of my repository tree and then a git commit. So far everything is fine.
What I didn't know git-annex was doing was turning my entire file tree into a whole bunch of symlinks. Every single file in my whole tree is now symlinked into .git/annex/objects! This is messing up my application which depends on files not being symlinks.
My question is, how do I get rid of git-annex and restore my file system to its original state? For a normal git repo I could do rm -r .git, but I'm afraid that won't do the job in git-annex. Thanks in advance.
Okay, so I stumbled upon some docs for git-annex, and they give two commands that achieve what I wanted to do:
unannex [path ...]
Use this to undo an accidental git annex add command. You can use git annex unannex to move content out of the annex at any point, even if you've already committed it.
This is not the command you should use if you intentionally annexed a file and don't want its contents any more. In that case you should use git annex drop instead, and you can also git rm the file.
uninit
Use this to stop using git annex. It will unannex every file in the repository, and remove all of git-annex's other data, leaving you with a git repository plus the previously annexed files.
I started running git annex uninit, but my god was it slow. It took about 5 minutes to "unannex" just a single file. My filesystem tree is about 200,000 files, so that was just unacceptable.
What I ended up doing was actually surprisingly simple and worked well. I used the cp -rL flags to automatically duplicate the contents of my file tree and reverse all symlinks in the duplicate copy. And it was blazing fast: around 30 seconds for my entire file tree. Only problem was that the file permissions were not retained from my original state, so I needed to run some chmod and chcon commands to fix up the permissions.
This second method worked for me because there were no other symlinks in my schema. If you do have symlinks in your schema beyond those created by git-annex, then my little shortcut probably isn't the right choice for you, and you should consider sticking with just git annex uninit.
I would like to include my own experience of using git annex uninit, in addition to OP's answer.
I didn't have full repository annexed, but only about 40 bigger files. After deciding that I have no particular benefit of using git-annex, I tried unannexing several files and it was over in several seconds per file. Then, I ran git annex uninit and it took more than a minute only for really huge files (more than few GB). Overall, it was done in about 20 minutes, which was acceptable in my case.
So, it seems that the complexity of unannexing increases with the size of annexed file tree.
If you have a v6 repository, you can do the following:
git unnannex . --fast
which replaces the symlinks w/ hardlinks instead of slowly replacing the symlinks with the original files again.
Only v6 repositories can execute the git-annex unannex command on uncommited changes, so it could be necessary to upgrade the git-annex repo to a v6 repository.
See the Official Upgrade Guide.
In my case I had to upgrade v5 -> v6 and I only had to execute
git annex upgrade
which took a few seconds and I was done.
Have you tried to use git-annex in direct mode?
Just change your repository with
git annex direct
This will not use symlinks any longer, but some git commands do not work with such annex repositories.
Check out the explanations on their website to see if this scheme fits your purposes.
Maybe the conversion process is faster then the previous mentioned tips.
I haven't tried it by myself with big repositories.

How to conveniently sync a file between two git repositories

I have two git local repositories. Both share an identical file, under a different path and under a different name. Currently, when I make changes I have to copy the file from one directory to another.
Is there an alternative way to keep them in sync without manually overwriting the file? I don't want to create a separate repository for this file. I thought one of the following things would work, but apparently, they don't:
git submodule
git subtree
symlink soft
symlink hard
What else is there?
The only other alternative would be a post-commit hook on repoA, which would, on each commit:
check if the file is part of said commit
copy it in repoB with the right path.

Resources