dump directory data to a file for new/modified comparison later on a linux server - linux

Is it possible to take some kind of "dump" of a directory on a Linux (Ubuntu) server that I can later use to compare against for new/modified files?
The idea being something like this:
Dump directory data (like file hashes)
24 hours later I take another dump and compare against #1 to find new or modified files

Well, this is not the answer you might be looking for but I would use GIT to track down the changes, or may be even git-annex if the files are too big for example.
Initialize the git repository in the directory you want to track: git --init
tell git to track all files: git add .
commit the changes: git commit -a -m "initial commit"
after 24 hours make git diff to see the changes

Related

List LFS tracked files in pre-receive hook

I am trying to write a git pre-receive hook that rejects LFS files larger than a certain size (among other things).
I am trying to execute git lfs ls-files -l -s <new-ref-value> in my script, but it returns
2ec20be70bb1be824e124a61eabac40405d60de62c76d263eff9923f18c098ed - binary.dll (63 B)
Could not scan for Git LFS tree: missing object: a405ce05ac78ea1b820d036676831a474ddf8f90
I cannot even ignore the error message because it stops after the first file.
I guess that the problem has to do with the fact that the commits have not been "validated" on the remote yet. The frustrating thing is that the information that I need (new file paths + sizes) is accessible since it's printed for the first file.
Is there a way to run the git lfs ls-files command with the new ref value successfully at this stage?
Can I obtain the list of the added file paths and sizes in any other way?
EDIT: If that's relevant in any way, the Git server is a GitLab instance in its default configuration.

How to fix merge conflicts for a lot of files in git?

I am using the git mergetool command to fix conflicts. However I have thousands of conflicts, is there way to simplify this so I get everything from the remote?
I am asked to enter c, d or a in the command.
{local}: deleted
{remote}: created file
Use (c)reated or (d)eleted file, or (a)bort?
Since I have thousands of files, I don't want to keep sending c. Is there way to just do this in bulk?
You can solve this outside of git mergetool: run git status --porcelain to get a list of all unmerged files and their states in machine-readable format.
If your Git is new enough, it will support --porcelain=v2. See the git status documentation for details on the output formats. Output format v2 is generally superior for all purposes, but you should be able to make do with either one.
Next, you must write a program. Unfortunately Git has no supplied programs for this. Your program can be fairly simple depending on the specific cases you want to solve, and you can use shell scripting (sh or bash) as the programming language, to keep it easy.
Since you're concerned about the cases where git mergetool says:
Use (m)odified or (d)eleted file, or (a)bort?
you are interested in those cases where the file name is missing in the stage 1 ("base") version and also missing in the stage 2 ("local") version, but exists in the stage 3 ("remote") version. (See the git status documentation again and look at examples of your git status --porcelain=v2 output to see how to detect these cases. Two of the three modes will be zero.) For those particular path names, simply run git add on the path name to mark the file as resolved in favor of the created file.
Once you have marked all such files, you can go back to running git mergetool to resolve additional conflicts, if there are any.
Note that your "program" can consist of running:
git status --porcelain=v2 > /tmp/commands.sh
and then editing /tmp/commands.sh to delete all but the lines containing files that you want to git add. Then change all of those lines to read git add <filename> where <filename> is the name of the file. Exit the editor and run sh /tmp/commands.sh to execute all the git add commands. That's your program!
supposing you want their change and modified yours you can do a pull as like:
git pull -X theirs
Other stackOverflow answers
git pull -X
git merge strategies this link will help understand any other merge strategies for the futuro
If you want that all the change you did will be deleted and you will be sync with the remote.
You should do the following:
git stash
git pull
And if you want to restore the change you did you should type:
git stash pop
Basically 'git stash' is moving the change to a temp repository.
you can learn more in:
NDP software:: Git Cheatsheet

git-annex use a file from a different location

My understanding is that when I perform git annex add somefile, it creates a symlink for that file and places it in the .git/annex/objects folder. Then, when I initialize git-annex in some different location and sync it with the previous one, it downloads a broken symlink, unless I do git annex sync --content, which makes a full copy of the file.
I need to have large files in one location, lets say on a USB Drive, and multiple git repositories that use the large files. So I want to have just the symlinks to the large files in those git repos. How to perform the sync so git-annex downloads a valid symlink that points to a file in a single location ?
There are two ways to do that.
First is using hard-links, second is using symlinks. I recommend hard-links if all your files are going to be in the same filesystem/volume/partition, otherwise the good ol' cp --link is just going to copy the entire thing.
Using hard-links:
git clone --shared main_repo/ new_repo/
Explained by git-annex author himself
Using symlinks:
On main_repo:
git worktree add -b branch_name path_to_new_repo/
Since git-worktree uses a pointer file (which git-annex replaces with a symlink), this will work across different file systems. Changes to "different repos" will be stored in different branches. If you want them all to remain in sync, keep them in sync with standard git commands like git merge. Or you could only make changes to the master branch and git rebase master from the different branches frequently.

Git ignore and changing the history (on Windows)

I've already read several posts about this here (like Git ignore & changing the past, How to remove files that are listed in the .gitignore but still on the repository?, and Applying .gitignore to committed files), but they have several problems:
Commands that only work on Linux.
Incomplete commands (like the first post I've linked to).
Only for one file.
I have pretty much no experience with Git so I was hoping for some help here.
What I'm basically trying to do is rescue one of my projects history. It's currently Hg and I converted it to Git with Hg-Git (all very easy) and it includes the history (great!). However, I also added a .gitignore file and added several new files & folders that I want completely gone from the history (like the bin and obj folders, but also files from ReSharper). So I'm looking for a way to apply the .gitignore file to all of my history. The commands should work on Windows as I have no intention of installing Linux for this.
No need to add the .gitignore in the history (there is no added value to do it), just add it for your future commits.
For the remove of files and directories in your history, use bfg-repo-cleaner which is fast, easy and works very well on Windows (done in scala).
It will do the job for you!
This is working for me:
Install hg-git.
cd HgFolder
hg bookmark -r default master
mkdir ../GitFolder
cd ../GitFolder
git init --bare
cd ../HgFolder
hg push ../GitFolder
Move all files from GitFolder to a '.git' folder (in this GitFolder) and set this folder to hidden (not the subfolders and files).
cd ../GitFolder
git init
git remote add origin https://url.git
Copy all current content (including .gitignore) to GitFolder.
git add .
git commit -m "Added existing content and .gitignore".
git filter-branch --index-filter "git rm --cache d -r --ignore-unmatch 'LINES' 'FROM' 'GITIGNORE'" --prune-empty --tag-name-filter cat -- --all
git rm -r --cached .
git add .
git gc --prune=now --aggressive
git push origin master --force
There is probably an easier way to do this and it might not be perfect but this had the result I wanted.

How can I remove last two commits in git and get my commit 1 as changes in working directory?

Here is the scenario:
On a smallish Ubuntu web server I have a git server running. The website is checked out from git repository. I had a bulk of changes that I added and committed. Git diff showed 50+ code files updated and 20000+ image files. I did not paid much attention to this thinking these should be ignored, my fault. A bit stupid but I thought it was quickest to just commit all changes as a bulk. Let's call it commit A
# Commit A
git add .
git commit -m "Changes so far in this year"
I discovered that I forgot to exclude working/output files (huge number of generated images). Other than these files (around 20000) I had about 50+ files with code changes.
After reading online and git manual I understood that best bet was to update .gitignore and generate a list of files to remove and remove cached. This should remove from commit but not the local folder. Let this be commit B
# Commit B
vi .gitignore
git ls-files -ci --exclude-standard -z | xargs -0 git rm --cached
git add .
git commit -m "Cleanup of generated files from commit history"
Trouble is that now my git push fails with following error
git push origin master
Counting objects: 19219, done.
error: pack-objects died of signal 9
error: pack-objects died with strange error
error: child process died of signal 9
error: died of signal 9
error: failed to push some refs to '/srv/gitrepositories/xxxx.git'
Answers on this question about error 9 suggests it might be due to git running out of memory.
My options?
Is commit A & commit B made up of huge
number of objects, which looking at the count above it seems?
Is there a better way to clean this mess up with possible option to remove commit A & commit B altogether from history and get my changes intact?
Idealy I want to go back to stage where my git diff reports only the 50+ code files. Images are now ignored by .gitignore
Can I delete a git commit but keep the changes? Is this what it sounds like? Can I do it twice for both commit A and commit B?
Yes, you can use git reset HEAD~2 to clear the last 2 commit from history permanently while keeping the changes in the working directory, then git push -f to force push your changed history to the remote.
If your repo is shared with others it's not advisable to change your commit history.

Resources