Git Diff and Copy - linux

I am wanting to run a Static Code Analysis (PMD) report against the files that have been added or modified as part of a pull request on bitbucket. The files that have been modified etc are available locally within the pipeline image, however I need to do a git diff to identify the changes ONLY between the source branch (pulling from) and the target branch (to be merged into). I will then be executing the PMD CLI (with rulesets etc) against a directory that will contain only the "changed files" to highlight any issues with those files specifically as part of the change.
I basically want to copy out the files indicated in the git diff result. I hope this provides some more context.
I have tried finding some examples and done testing however I am just not getting it right due to my lack of understanding on these crazy linux commands :)
So far I have the below command, but results in an empty folder.
git diff --name-only --pretty $BITBUCKET_PR_DESTINATION_BRANCH $BITBUCKET_BRANCH | xargs -i {} cp {} -t ~/branch-diff/

xargs might have problems will a number of files - argument would be too big. I Propose something like
for name in $(git diff --name-only --pretty $BITBUCKET_PR_DESTINATION_BRANCH $BITBUCKET_BRANCH); do cp $name ~/branch-diff/; done
As a result you will have all these files in one directory (without directory tree). Other question is that is it really what you need.

Firstly, the issues with your current solution:
xargs doesn't play nicely with filenames which have spaces in. You may not have that problem now, and you can work around it, but it's better to just avoid this if possible.
cp does not build a directory tree - which you can trivially verify - so it wouldn't do what you asked anyway.
git does not produce pathnames relative to the current path, but to the working tree base.
The filenames produced by git diff $BITBUCKET_PR_DESTINATION_BRANCH $BITBUCKET_BRANCH don't even have to exist in your working tree, but only in (at least one of) the branches.
If there's a diff between the two versions of a file, you haven't said which one you want copied!
A functional script using standard file tools would look something like:
#!/bin/bash
# diffcopy.sh
#
DESTDIR="$1"
BRANCH1="$2"
BRANCH2="$3"
# so relative paths match git output
SRCDIR="$(git rev-parse --show-toplevel)"
# choose the branch whose files we want to copy
git checkout "$BRANCH2"
# make the output directory
mkdir -p "$DESTDIR"
# sync the changed files
rsync -a --files-from=<(git diff --name-only "${BRANCH1}".."${BRANCH2}") "$SRCDIR" "$DESTDIR"
# restore working copy
git checkout -
There may be a better way to do this purely in git, but I don't know it.

If you have the GNU variant of cp and xargs, you can do this:
git diff --name-only -z $BITBUCKET_PR_DESTINATION_BRANCH $BITBUCKET_BRANCH |
xargs -0 cp --target-directory="$HOME/branch-diff/" --parents
This does not spawn a cp per file, but copies as many files as possible with one cp process. By specifying --target-directory, the destination can come first on the cp command, and xargs can paste as many source file names at the and of the cp command as it likes. --parents keeps the directory names of the source files.
The -z in git diff separates file names by a NUL character instead of line breaks, and the -0 of xargs knows how to take the NUL separated path list apart without stumbling over whitespace characters in file names.

Related

How to pipe only recent git changes to flake8

I would like to pipe only recent git changes to flake8 (technically, flake8_nb) on the command line, including a number of flags, passing through grep to get only files that match a certain pattern. I
What I have tried is this:
git diff --name-only | grep "ipynb$" | flake8_nb --config=.flake8_nb_lite
However, this runs flake8 on all the files in the directory instead of just the ones that are the output of grep and ones that don't have recent changes. The problem is definitely the last step, as everything up to the final pipe is correct.
you're looking for xargs
git diff --name-only -- '*.ipynb' | xargs flake8_nb --config=.flake8_nb_lite
I also made one other simplification: remove the grep since you can use git's file matching directly
xargs will take arguments as input and turn them into positional arguments
if your arguments may contain spaces you might want to use -z (git diff option) and -0 (xargs option) to break on null byte characters instead

renaming and replacing by string in git repo

I am working on a project named XXX.
I want to replace every instance of XXX to YYY. (I wish to replace the string inside all files and also rename files/directories that contain the string XXX to YYY).
What I have done and where I'm stuck:
git checkout -b renameFix
// in zsh
sed -i -- 's/XXX/YYY/g' **/*(D.) // replace
zmv '(**/)(*XXX*)' '$1${2//XXX/YYY}' // rename files and dirs
Now, when I run git status, I get "fatal: unknown index entry format 0x2f700000" error.
Is there a different approach I can use?
Your problem is that your glob (**/*(D.)) traverses the .git directory.
You can either remove the D qualifier, to avoid globbing "hidden" files (files that start with a period) or add another qualifier to filter out files that starts with git.
Something like this might work:
ignore_git() { ! [[ $REPLY =~ "^.git" ]] }
printf '%s\n' **/*(D.+ignore_git)
I have added the printf so that you can verify that you list the files that you want.
You can also take a look at git ls-files which can produce a list of files that git tracks.

How do I check for a specific string in a specific file through all the commits of each release branch? [duplicate]

I have deleted a file or some code in a file sometime in the past. Can I grep in the content (not in the commit messages)?
A very poor solution is to grep the log:
git log -p | grep <pattern>
However, this doesn't return the commit hash straight away. I played around with git grep to no avail.
To search for commit content (i.e., actual lines of source, as opposed to commit messages and the like), you need to do:
git grep <regexp> $(git rev-list --all)
git rev-list --all | xargs git grep <expression> will work if you run into an "Argument list too long" error.
If you want to limit the search to some subtree (for instance, "lib/util"), you will need to pass that to the rev-list subcommand and grep as well:
git grep <regexp> $(git rev-list --all -- lib/util) -- lib/util
This will grep through all your commit text for regexp.
The reason for passing the path in both commands is because rev-list will return the revisions list where all the changes to lib/util happened, but also you need to pass to grep so that it will only search in lib/util.
Just imagine the following scenario: grep might find the same <regexp> on other files which are contained in the same revision returned by rev-list (even if there was no change to that file on that revision).
Here are some other useful ways of searching your source:
Search working tree for text matching regular expression regexp:
git grep <regexp>
Search working tree for lines of text matching regular expression regexp1 or regexp2:
git grep -e <regexp1> [--or] -e <regexp2>
Search working tree for lines of text matching regular expression regexp1 and regexp2, reporting file paths only:
git grep -l -e <regexp1> --and -e <regexp2>
Search working tree for files that have lines of text matching regular expression regexp1 and lines of text matching regular expression regexp2:
git grep -l --all-match -e <regexp1> -e <regexp2>
Search working tree for changed lines of text matching pattern:
git diff --unified=0 | grep <pattern>
Search all revisions for text matching regular expression regexp:
git grep <regexp> $(git rev-list --all)
Search all revisions between rev1 and rev2 for text matching regular expression regexp:
git grep <regexp> $(git rev-list <rev1>..<rev2>)
You should use the pickaxe (-S) option of git log.
To search for Foo:
git log -SFoo -- path_containing_change
git log -SFoo --since=2009.1.1 --until=2010.1.1 -- path_containing_change
See Git history - find lost line by keyword for more.
-S (named pickaxe) comes originally from a git diff option (Git v0.99, May 2005).
Then -S (pickaxe) was ported to git log in May 2006 with Git 1.4.0-rc1.
As Jakub Narębski commented:
this looks for differences that introduce or remove an instance of <string>.
It usually means "revisions where you added or removed line with 'Foo'".
the --pickaxe-regex option allows you to use extended POSIX regex instead of searching for a string.
Example (from git log): git log -S"frotz\(nitfol" --pickaxe-regex
As Rob commented, this search is case-sensitive - he opened a follow-up question on how to search case-insensitive.
Hi Angel notes in the comments:
Executing a git log -G<regexp> --branches --all (the -G is same as -S but for regexes) does same thing as the accepted one (git grep <regexp> $(git rev-list --all)), but it soooo much faster!
The accepted answer was still searching for text after ≈10 minutes of me running it, whereas this one gives results after ≈4 seconds 🤷‍♂️.
The output here is more useful as well
My favorite way to do it is with git log's -G option (added in version 1.7.4).
-G<regex>
Look for differences whose added or removed line matches the given <regex>.
There is a subtle difference between the way the -G and -S options determine if a commit matches:
The -S option essentially counts the number of times your search matches in a file before and after a commit. The commit is shown in the log if the before and after counts are different. This will not, for example, show commits where a line matching your search was moved.
With the -G option, the commit is shown in the log if your search matches any line that was added, removed, or changed.
Take this commit as an example:
diff --git a/test b/test
index dddc242..60a8ba6 100644
--- a/test
+++ b/test
## -1 +1 ##
-hello hello
+hello goodbye hello
Because the number of times "hello" appears in the file is the same before and after this commit, it will not match using -Shello. However, since there was a change to a line matching hello, the commit will be shown using -Ghello.
git log can be a more effective way of searching for text across all branches, especially if there are many matches, and you want to see more recent (relevant) changes first.
git log -p --all -S 'search string'
git log -p --all -G 'match regular expression'
These log commands list commits that add or remove the given search string/regex, (generally) more recent first. The -p option causes the relevant diff to be shown where the pattern was added or removed, so you can see it in context.
Having found a relevant commit that adds the text you were looking for (for example, 8beeff00d), find the branches that contain the commit:
git branch -a --contains 8beeff00d
If you want to browse code changes (see what actually has been changed with the given word in the whole history) go for patch mode - I found a very useful combination of doing:
git log -p
# Hit '/' for search mode.
# Type in the word you are searching.
# If the first search is not relevant, hit 'n' for next (like in Vim ;) )
Search in any revision, any file (Unix/Linux):
git rev-list --all | xargs git grep <regexp>
Search only in some given files, for example XML files:
git rev-list --all | xargs -I{} git grep <regexp> {} -- "*.xml"
The result lines should look like this:
6988bec26b1503d45eb0b2e8a4364afb87dde7af:bla.xml: text of the line it found...
You can then get more information like author, date, and diff using git show:
git show 6988bec26b1503d45eb0b2e8a4364afb87dde7af
I took Jeet's answer and adapted it to Windows (thanks to this answer):
FOR /F %x IN ('"git rev-list --all"') DO #git grep <regex> %x > out.txt
Note that for me, for some reason, the actual commit that deleted this regex did not appear in the output of the command, but rather one commit prior to it.
For simplicity, I'd suggest using GUI: gitk - The Git repository browser. It's pretty flexible
To search code:
To search files:
Of course, it also supports regular expressions:
And you can navigate through the results using the up/down arrows.
Whenever I find myself at your place, I use the following command line:
git log -S "<words/phrases i am trying to find>" --all --oneline --graph
Explanation:
git log - Need I write more here; it shows the logs in chronological order.
-S "<words/phrases i am trying to find>" - It shows all those Git commits where any file (added/modified/deleted) has the words/phrases I am trying to find without '<>' symbols.
--all - To enforce and search across all the branches.
--oneline - It compresses the Git log in one line.
--graph - It creates the graph of chronologically ordered commits.
For anyone else trying to do this in Sourcetree, there is no direct command in the UI for it (as of version 1.6.21.0). However, you can use the commands specified in the accepted answer by opening Terminal window (button available in the main toolbar) and copy/pasting them therein.
Note: Sourcetree's Search view can partially do text searching for you. Press Ctrl + 3 to go to Search view (or click Search tab available at the bottom). From far right, set Search type to File Changes and then type the string you want to search. This method has the following limitations compared to the above command:
Sourcetree only shows the commits that contain the search word in one of the changed files. Finding the exact file that contains the search text is again a manual task.
RegEx is not supported.
I was kind of surprised here and maybe I missed the answer I was looking for, but I came here looking for a search on the heads of all the branches. Not for every revision in the repository, so for me, using git rev-list --all is too much information.
In other words, for me the variation most useful would be
git grep -i searchString $(git branch -r)
or
git branch -r | xargs git grep -i searchString
or
git branch -r | xargs -n1 -i{} git grep -i searchString {}
And, of course, you can try the regular expression approach here. What's cool about the approach here is that it worked against the remote branches directly. I did not have to do a check out on any of these branches.
Adding more to the answers already present.
If you know the file in which you might have made do this:
git log --follow -p -S 'search-string' <file-path>
--follow: lists the history of a file
Inspired by the answer https://stackoverflow.com/a/2929502/6041515, I found
git grep seems to search for the full code base at each commit, not just the diffs, to the result tends to be repeating and long. This script below will search only the diffs of each git commit instead:
for commit in $(git rev-list --all); do
# search only lines starting with + or -
if git show "$commit" | grep "^[+|-].*search-string"; then
git show --no-patch --pretty=format:'%C(yellow)%h %Cred%ad %Cblue%an%Cgreen%d %Creset%s' --date=short $commit
fi
done
Example output, the bottom git commit is the one that first introduced the change I'm searching for:
csshx$ for commit in $(git rev-list --all); do
> if git show "$commit" | grep "^[+|-].*As csshX is a command line tool"; then
> git show --no-patch --pretty=format:'%C(yellow)%h %Cred%ad %Cblue%an%Cgreen%d %Creset%s' --date=short $commit
> fi
> done
+As csshX is a command line tool, no special installation is needed. It may
987eb89 2009-03-04 Gavin Brock Added code from initial release
Jeet's answer works in PowerShell.
git grep -n <regex> $(git rev-list --all)
The following displays all files, in any commit, that contain a password.
# Store intermediate result
$result = git grep -n "password" $(git rev-list --all)
# Display unique file names
$result | select -unique { $_ -replace "(^.*?:)|(:.*)", "" }
Okay, twice just today I've seen people wanting a closer equivalent for hg grep, which is like git log -pS but confines its output to just the (annotated) changed lines.
Which I suppose would be handier than /pattern/ in the pager if you're after a quick overview.
So here's a diff-hunk scanner that takes git log --pretty=%h -p output and spits annotated change lines. Put it in diffmarkup.l, say e.g. make ~/bin/diffmarkup, and use it like
git log --pretty=%h -pS pattern | diffmarkup | grep pattern
%option main 8bit nodefault
// vim: tw=0
%top{
#define _GNU_SOURCE 1
}
%x commitheader
%x diffheader
%x hunk
%%
char *afile=0, *bfile=0, *commit=0;
int aline,aremain,bline,bremain;
int iline=1;
<hunk>\n ++iline; if ((aremain+bremain)==0) BEGIN diffheader;
<*>\n ++iline;
<INITIAL,commitheader,diffheader>^diff.* BEGIN diffheader;
<INITIAL>.* BEGIN commitheader; if(commit)free(commit); commit=strdup(yytext);
<commitheader>.*
<diffheader>^(deleted|new|index)" ".* {}
<diffheader>^"---".* if (afile)free(afile); afile=strdup(strchrnul(yytext,'/'));
<diffheader>^"+++".* if (bfile)free(bfile); bfile=strdup(strchrnul(yytext,'/'));
<diffheader,hunk>^"## ".* {
BEGIN hunk; char *next=yytext+3;
#define checkread(format,number) { int span; if ( !sscanf(next,format"%n",&number,&span) ) goto lostinhunkheader; next+=span; }
checkread(" -%d",aline); if ( *next == ',' ) checkread(",%d",aremain) else aremain=1;
checkread(" +%d",bline); if ( *next == ',' ) checkread(",%d",bremain) else bremain=1;
break;
lostinhunkheader: fprintf(stderr,"Lost at line %d, can't parse hunk header '%s'.\n",iline,yytext), exit(1);
}
<diffheader>. yyless(0); BEGIN INITIAL;
<hunk>^"+".* printf("%s:%s:%d:%c:%s\n",commit,bfile+1,bline++,*yytext,yytext+1); --bremain;
<hunk>^"-".* printf("%s:%s:%d:%c:%s\n",commit,afile+1,aline++,*yytext,yytext+1); --aremain;
<hunk>^" ".* ++aline, ++bline; --aremain; --bremain;
<hunk>. fprintf(stderr,"Lost at line %d, Can't parse hunk.\n",iline), exit(1);
git rev-list --all | xargs -n 5 git grep EXPRESSION
is a tweak to Jeet's solution, so it shows results while it searches and not just at the end (which can take a long time in a large repository).
So are you trying to grep through older versions of the code looking to see where something last exists?
If I were doing this, I would probably use git bisect. Using bisect, you can specify a known good version, a known bad version, and a simple script that does a check to see if the version is good or bad (in this case a grep to see if the code you are looking for is present). Running this will find when the code was removed.
Scenario: You did a big clean up of your code by using your IDE.
Problem: The IDE cleaned up more than it should and now you code does not compile (missing resources, etc.)
Solution:
git grep --cached "text_to_find"
It will find the file where "text_to_find" was changed.
You can now undo this change and compile your code.
A. Full, unique, sorted, paths:
# Get all unique filepaths of files matching 'password'
# Source: https://stackoverflow.com/a/69714869/10830091
git rev-list --all | (
while read revision; do
git grep -F --files-with-matches 'password' $revision | cat | sed "s/[^:]*://"
done
) | sort | uniq
B. Unique, sorted, filenames (not paths):
# Get all unique filenames matching 'password'
# Source: https://stackoverflow.com/a/69714869/10830091
git rev-list --all | (
while read revision; do
git grep -F --files-with-matches 'password' $revision | cat | sed "s/[^:]*://"
done
) | xargs basename | sort | uniq
This second command is useful for BFG, because it only accept file names and not repo-relative/system-absolute paths.
Check out my full answer here for more explanation.
Command to search in git history
git log -S"alter" --author="authorname" --since=2021.1.1 --until=2023.1.1 -- .

Centos copy file into another file, if exists, create a version

Does anyone know of a way to (via bash) setup a "versioning" copy of a file into another? For example: I am copying file into file.bak. If file.bak exists, I am currently overwriting. What I'd like to do is set it up so that it creates multiple files: file, file.bak, file.bak.1, file.bak.2, etc...
Right now, I'm using:
cp -rf file file.bak
This currently overwrites the file(as expected)
or:
cp --backup=t file1 file2
repeat few times to see the result...
see https://www.gnu.org/software/coreutils/manual/html_node/cp-invocation.html
Simply use a test
[ -e file.bak ] && cp -r file file.bak.$(date +%s) || cp -r file file.bak
This will create a unique backup if file.bak already exists in the form file.bak.1411505497
There are many ways to skin this cat.
Since you're using Linux, it's likely you've got the GNU mv command, which may include a --backup option. You could wrap this in a shell function:
bkp() {
file="$1"
if [ -f "$file" ]; then
/bin/mv -v --backup=numbered "$(mktemp ${file}XXX)" "$file"
#/bin/rm "$file"
fi
}
You can put this in your .bashrc, for example. Then you can use this as follows:
# bkp foo
This will copy foo to numbered backup files. You can uncomment the rm if this is, for example, a log file that you're rotating.
Another option, which is more portable to operating systems that don't use GNU tools (i.e. FreeBSD, OSX) might be something like this quick-and-dirty solution might work:
bkp() {
file="$1"
if [ -f "$file" ]; then
# increment existing files up to 10
for n in {9..1}; do
if [ -f $file.$n ]; then
# remove -v if you want less noise.
mv -v "${file}.$n" "${file}.$[n+1]"
fi
done
# move the original to first backup position
mv "$file" "$file.1"
else
echo "Not found: $file" >&2
fi
}
It suffers in that it won't compact your list of files (and will throw errors) if some numbers are missing, but that's stuff you can add if it's important. You'd use it pretty much the same way, changing the final mv to a cp if you need to keep the original in place.
Final option I'll mention is in comments as well. Since you've said that you're using this solution to back up "system files" (which I assume you mean to be things in /etc/) you should consider using an actual version control system to control your versions of these files.
Many options exist, but I'd recommend RCS for its simplicity and low overhead. Simply install the package, mkdir /etc/RCS to keep your /etc directory clean, read the man pages for rcs, ci, co, rlog, rcsdiff and perhaps rcsintro, and you're good to go. You'll get better control of diffs and history, opportunity for comments, none of the overhead of a repository for a large VCS like SVN or Git. I've been using this on various servers for years, as RCS is still built in to the base system in FreeBSD. :)

rsync selected sub folders

I want to transfer selective sub folders from a range of parent folders:
/home/user/sample_rsync/
FolderA/sub1
FolderA/sub2
FolderA/sub3
FolderB/sub1
FolderB/sub2
FolderB/sub3
FolderC/sub1
FolderC/sub2
FolderC/sub3
Say from the above example I want to copy just sub1 from each directory. i.e. in my destination I want the following folders to be created (along with the files they contain)
/destination/
sample_rsync/FolderA/sub1
sample_rsync/FolderB/sub1
sample_rsync/FolderC/sub1
How do I go about doing this?
I tried out
rsync -avh -f"- *" -f"+ *sub1/*" /home/user/sample_rsync /destination/
In an attempt to exclude everything and then just include sub1's - didnt work.
Any way I can get this working?
Assuming your source folders are in a file called "sources" as typed in your first code sement (without trailing / characters)
for s in $(cat sources)
do
rsync -av ${s} /destination/sample_rsync/$(echo ${s}| awk -F "/" '{print $1}')
done
of course this is only valid if you have a certain level deep directories in your sources file. If the depth level of the directories to be copied changes, this script will need to be heavily modified. But at least it is a starting point I hope.
upon your question below, you might want to use something like this: (ignore the code segment above. I just left it there for history purposes)
cd /home/user/sample_rsync
for dir in $(find ./ -type d -name sub1)
do
dest=$(echo ${dir} | sed -e "1,1s+/sub1++")
mkdir /destination/sample_rsync/${dest}
rsync -av ${dir} /destination/sample_rsync/${dest}
done
please do not take it as the word of gospel. I have not tested the code whatsoever. So. it might yield some unexpected results. Please test it on a system that you wouldn't mind having problems if it gets haywire.

Resources