Remove git-annex repository from file tree

Remove git-annex repository from file tree - linux

I tried installing git-annex yesterday to backup my files. I ran git annex add . in the root of my repository tree and then a git commit. So far everything is fine.
What I didn't know git-annex was doing was turning my entire file tree into a whole bunch of symlinks. Every single file in my whole tree is now symlinked into .git/annex/objects! This is messing up my application which depends on files not being symlinks.
My question is, how do I get rid of git-annex and restore my file system to its original state? For a normal git repo I could do rm -r .git, but I'm afraid that won't do the job in git-annex. Thanks in advance.

Okay, so I stumbled upon some docs for git-annex, and they give two commands that achieve what I wanted to do:
unannex [path ...]
Use this to undo an accidental git annex add command. You can use git annex unannex to move content out of the annex at any point, even if you've already committed it.
This is not the command you should use if you intentionally annexed a file and don't want its contents any more. In that case you should use git annex drop instead, and you can also git rm the file.
uninit
Use this to stop using git annex. It will unannex every file in the repository, and remove all of git-annex's other data, leaving you with a git repository plus the previously annexed files.
I started running git annex uninit, but my god was it slow. It took about 5 minutes to "unannex" just a single file. My filesystem tree is about 200,000 files, so that was just unacceptable.
What I ended up doing was actually surprisingly simple and worked well. I used the cp -rL flags to automatically duplicate the contents of my file tree and reverse all symlinks in the duplicate copy. And it was blazing fast: around 30 seconds for my entire file tree. Only problem was that the file permissions were not retained from my original state, so I needed to run some chmod and chcon commands to fix up the permissions.
This second method worked for me because there were no other symlinks in my schema. If you do have symlinks in your schema beyond those created by git-annex, then my little shortcut probably isn't the right choice for you, and you should consider sticking with just git annex uninit.

I would like to include my own experience of using git annex uninit, in addition to OP's answer.
I didn't have full repository annexed, but only about 40 bigger files. After deciding that I have no particular benefit of using git-annex, I tried unannexing several files and it was over in several seconds per file. Then, I ran git annex uninit and it took more than a minute only for really huge files (more than few GB). Overall, it was done in about 20 minutes, which was acceptable in my case.
So, it seems that the complexity of unannexing increases with the size of annexed file tree.

If you have a v6 repository, you can do the following:
git unnannex . --fast
which replaces the symlinks w/ hardlinks instead of slowly replacing the symlinks with the original files again.
Only v6 repositories can execute the git-annex unannex command on uncommited changes, so it could be necessary to upgrade the git-annex repo to a v6 repository.
See the Official Upgrade Guide.
In my case I had to upgrade v5 -> v6 and I only had to execute
git annex upgrade
which took a few seconds and I was done.

Have you tried to use git-annex in direct mode?
Just change your repository with
git annex direct
This will not use symlinks any longer, but some git commands do not work with such annex repositories.
Check out the explanations on their website to see if this scheme fits your purposes.
Maybe the conversion process is faster then the previous mentioned tips.
I haven't tried it by myself with big repositories.

Related

git-annex use a file from a different location

My understanding is that when I perform git annex add somefile, it creates a symlink for that file and places it in the .git/annex/objects folder. Then, when I initialize git-annex in some different location and sync it with the previous one, it downloads a broken symlink, unless I do git annex sync --content, which makes a full copy of the file.
I need to have large files in one location, lets say on a USB Drive, and multiple git repositories that use the large files. So I want to have just the symlinks to the large files in those git repos. How to perform the sync so git-annex downloads a valid symlink that points to a file in a single location ?

There are two ways to do that.
First is using hard-links, second is using symlinks. I recommend hard-links if all your files are going to be in the same filesystem/volume/partition, otherwise the good ol' cp --link is just going to copy the entire thing.
Using hard-links:
git clone --shared main_repo/ new_repo/
Explained by git-annex author himself
Using symlinks:
On main_repo:
git worktree add -b branch_name path_to_new_repo/
Since git-worktree uses a pointer file (which git-annex replaces with a symlink), this will work across different file systems. Changes to "different repos" will be stored in different branches. If you want them all to remain in sync, keep them in sync with standard git commands like git merge. Or you could only make changes to the master branch and git rebase master from the different branches frequently.

Git checkout untracked issue

I'm collaborating with a few other people on a Drupal website which we are version controlling Git. We setup a local Git repository containing our commits.
After a colleague pushed some updates and I fetched and merged into my local dev branch, I began experiencing the following problems:
user#server:/var/www/Intranet/sites/intranet/modules/custom$ git checkout dev
error: The following untracked working tree files would be overwritten by checkout:
themes/bigcompany/panels/layouts/radix_bryant_flipped/radix-bryant-flipped.png
themes/bigcompany/panels/layouts/radix_bryant_flipped/radix-bryant-flipped.tpl.php
themes/bigcompany/panels/layouts/radix_bryant_flipped/radix_bryant_flipped.inc
Please move or remove them before you can switch branches.
Aborting
The issue above typically shows up when I try to checkout into other branches which fails and I am effectively trapped in my current branch.
Referring to this question, there is a suggestion my issue is related to the gitignore file. However, my .gitignore file has nothing indicating any part of my themes directory should be ignored as the following shows:
# .gitignore for a standard Drupal 7 build based in the sites subdirectory.
# Drupal
files
settings.php
settings.*.php
# Sass.
.sass-cache
# Composer
vendor/
# Migrate sourec files
modules/custom/haringeygovuk_migrate/source_data
As mentioned above, my attempts to execute git checkout into any branch fails with the message above. I decided to force it with the -f switch and successfully switched into my target branch but I lost a couple of hundred lines of code - which I'd love to avoid going forward.
I work on a Linux-Ubuntu VirtualBox which my colleagues prefer working in a WAMP setup and use the Git Bash terminal emulator for executing the Git commands. Could the difference in environments be causing these serious issues?
How can I resolve this issue?

Well, the situation is rather simple. You, in your current branch, don't have certain files under the control of Git, but at the same time, you have those files in your working tree. The branch you're trying to switch to, has those files, so git would need to override files in the working tree to perform checkout.
To prevent possible data loss, Git stops the process of switching the branches and notifies you that you should either add those files under the control of Git in a separate commit in your current branch, and only then perform the switch, or simply remove those files from the git way.
Likely you have chosen the second way. Generally you should "force" any operation only if you really understand what you're doing.

Ignoring files in Git after they have been committed

I have recently switched to git.
In my previous and first commit since using git, I noticed it also listed the pyc files during the commit. I didn't think anything about it and commited and pushed them.
Now I realize they keep getting updated during the development and its very annoying seeing them in the list. It just produces too much noise.
I did some research and did echo "*.pyc" >> .gitignore in the project directory.
This didn't help though as the pyc files are still being shown. Could it be that because I have previously committed those pyc files I can no longer ignore them? (since they are tracked now and the state has changed again) If so I am damned forever or is there still hope to delete the files in the hindsight from the repository?
Thanks

Just git rm the .pyc files, and make sure your .gitignore is set to ignore them from now on. You are correct that git will not ignore the committed files because they've already been added. If you are keen about not deleting the off the disk, then use git rm --cached to delete the files. That just deletes them from the index without deleting from the disk.

In a Mercurial repository, how to find lastly committed 10 files contained in a subdirectory?

In a Mercurial repository, how to find lastly committed 10 files contained in a subdirectory? I want to do so because I'm a little worried that some files are mistakenly committed.

Using revsets is probably the best approach.
A close approximation may be hg log -r "last(file('subdirectory/*'), 10)".
This command returns the last 10 commits that touched any file in subdirectory. From there, you could review each commit for the files affected.
If --template "{files}\n" is added to the command, it will list the files touched in each of the commits. However, the list would include files outside subdirectory as well. See hg help templates for details.

Can't Hard Link the gitconfig File

I am attempting to create a git repository to store all of my dotfiles and config files. My idea was to simply create hard links to all of the files I cared about and store those links in their own directory that I could turn into a repository.
I've hit a bit of a snag though with my ~/.gitconfig file. It seems that whenever I run the 'git config' command the link that I created no longer points to the right location e.g. the file in the repository no longer updates properly.
Here is an example using the shell and interactive ruby to determine the files linked state.
# Create the link
$ ln .gitconfig .conf_files/gitconfig # Create the link
# The files are in fact linked
[1] pry(main)> File.identical?('.gitconfig', '.conf_files/gitconfig')
=> true
# Update the gitconfig file by running a 'git config' command
$ git config --global alias.last 'log -1 HEAD'
# The files are no longer linked.
[2] pry(main)> File.identical?('.gitconfig', '.conf_files/gitconfig')
=> false
I assume this has something to do with the way that git is writing the .gitconfig file. Does anyone know why this would happen, or have any creative ideas for a workaround?

Try Eli Barzilay's solution in his comment at http://www.xxeo.com/archives/2010/02/16/dotfiles-in-git-finally-did-it.html:
So I’ve finally found a solution that takes the best of both: put the repo
in a subdirectory, and instead of symlinks, add a configuration option for
“core.worktree” to be your home directory. Now when you’re in your home
directory you’re not in a git repo (so the first problem is gone), and you
don’t need to deal with fragile symlinks as in the second case. You still
have the minor hassle of excluding paths that you don’t want versioned (eg,
the “*” in “.git/info/exclude” trick), but that’s not new.

This is completely normal, and is in fact the recommended way to overwrite config files. Git creates a temporary file, writes out the config, and then moves the new file over the old one. This way, you don't get an incomplete config file (data loss) if Git gets interrupted.
You can always write a script to copy or link your config files into your central repository.

Checkout this answer, perhaps it may be of help:
https://stackoverflow.com/a/3731139/1431696
In the meantime, have you considered doing the links in reverse? Create your repository full of config files, etc, and then in the place that you actually use your files, create a hard link to the 'real' file, which sits in the repository.

Thanks to Dietrich Epp's answer and advice I have decided to approach this problem from a different angle by creating the repository at the root of my filesystem, and using .gitignore to track only the files I am interested in.
My .gitignore file now looks like this:
/*
!/etc/
/etc/*
# etc files
!/etc/rc.conf
!/etc/asound.conf
!/etc/mercurial/
!/home/
!/home/matt/
/home/matt/*
# Home files
!/home/matt/.xinitrc
!/home/matt/.gitconfig
!/home/matt/.bashrc
# Vim files
!/home/matt/.vimrc
!/home/matt/.vim/
.netrwhist
In addition to not having to copy the files separately and keep them in two separate locations this has the benefit that should I need I can easily revert the changes without having to manually copy the files as well.
Thanks for the help guys!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string