Find total size of uncommitted or untracked files in git - linux

I have a big horrible pile of code and I am setting it up in version control.
I would like a command I can run on Linux to give me the total size of the files that would be committed and pushed if I ran git add -A && git commit -am 'initial commit'
The total size is needed, also a break down by folder would be handy.
I will then use this to build up my ignores so that I can get the repo to a realistic size before I push it up

I think I have answered my own question:
for f in `git status --porcelain | sed 's#^...##'`; do du -cs $f | head -n 1; done | sort -nr; echo "TOTAL:"; du -cs .
However I'm open to any better ideas or useful tricks. My current output is 13GB :)
The above command is basically there, it gives me the total line by line from git status but doesn't give me the total sum. I'm currently getting the total of all files at the end which is not correct. I tried some use of bc but couldn't get it to work

I adapted the answer of edmondscommerce by adding a simple awk statement which sums the output of the for loop and prints the sum (divided by 1024*1024 to convert to Mb)
for f in `git status --porcelain | sed 's#^...##'`; do du -cs $f | head -n 1; done | sort -nr | awk ' {tot = tot+$1; print } END{ printf("%.2fMb\n",tot/(1024*1024)) }'
Note that --porcelain prints pathnames relative to the root of the git repos. So, if you do this in a subdirectory the du statement will not be able to find the files..
(whoppa; my first answer in SoF, may the force be with it)

I've used a modified version of this, because I had files with spaces in them which made it crash. I was also unsure about the size calculations and removed a useless head:
git status --porcelain | sed 's/^...//;s/^"//;s/"$//' | while read path; do
du -bs "$path" ;
done | sort -n | awk ' {tot = tot+$1; print } END { printf("%.2fMB\n",tot/(1024*1024)) }'
I prefer to use while as it's slightly safer than for: it can still do nasty things with files that have newlines in them so I wish there was a to pass null-separate files yet still be able to grep for the status, but I couldn't find a nice way for that.

Since version 2.11, git provides a handy "count-objects" command :
git count-objects -H
If this is not enough, I would recommend git-sizer from github :
https://github.com/github/git-sizer
git-sizer --verbose
Detailed usage here : https://github.com/github/git-sizer/#usage

Since you're just adding everything, I don't see any reason to go via Git. Just use the ordinary Unix tools: du, find, &c.

Related

How do I check for a specific string in a specific file through all the commits of each release branch? [duplicate]

I have deleted a file or some code in a file sometime in the past. Can I grep in the content (not in the commit messages)?
A very poor solution is to grep the log:
git log -p | grep <pattern>
However, this doesn't return the commit hash straight away. I played around with git grep to no avail.
To search for commit content (i.e., actual lines of source, as opposed to commit messages and the like), you need to do:
git grep <regexp> $(git rev-list --all)
git rev-list --all | xargs git grep <expression> will work if you run into an "Argument list too long" error.
If you want to limit the search to some subtree (for instance, "lib/util"), you will need to pass that to the rev-list subcommand and grep as well:
git grep <regexp> $(git rev-list --all -- lib/util) -- lib/util
This will grep through all your commit text for regexp.
The reason for passing the path in both commands is because rev-list will return the revisions list where all the changes to lib/util happened, but also you need to pass to grep so that it will only search in lib/util.
Just imagine the following scenario: grep might find the same <regexp> on other files which are contained in the same revision returned by rev-list (even if there was no change to that file on that revision).
Here are some other useful ways of searching your source:
Search working tree for text matching regular expression regexp:
git grep <regexp>
Search working tree for lines of text matching regular expression regexp1 or regexp2:
git grep -e <regexp1> [--or] -e <regexp2>
Search working tree for lines of text matching regular expression regexp1 and regexp2, reporting file paths only:
git grep -l -e <regexp1> --and -e <regexp2>
Search working tree for files that have lines of text matching regular expression regexp1 and lines of text matching regular expression regexp2:
git grep -l --all-match -e <regexp1> -e <regexp2>
Search working tree for changed lines of text matching pattern:
git diff --unified=0 | grep <pattern>
Search all revisions for text matching regular expression regexp:
git grep <regexp> $(git rev-list --all)
Search all revisions between rev1 and rev2 for text matching regular expression regexp:
git grep <regexp> $(git rev-list <rev1>..<rev2>)
You should use the pickaxe (-S) option of git log.
To search for Foo:
git log -SFoo -- path_containing_change
git log -SFoo --since=2009.1.1 --until=2010.1.1 -- path_containing_change
See Git history - find lost line by keyword for more.
-S (named pickaxe) comes originally from a git diff option (Git v0.99, May 2005).
Then -S (pickaxe) was ported to git log in May 2006 with Git 1.4.0-rc1.
As Jakub Narębski commented:
this looks for differences that introduce or remove an instance of <string>.
It usually means "revisions where you added or removed line with 'Foo'".
the --pickaxe-regex option allows you to use extended POSIX regex instead of searching for a string.
Example (from git log): git log -S"frotz\(nitfol" --pickaxe-regex
As Rob commented, this search is case-sensitive - he opened a follow-up question on how to search case-insensitive.
Hi Angel notes in the comments:
Executing a git log -G<regexp> --branches --all (the -G is same as -S but for regexes) does same thing as the accepted one (git grep <regexp> $(git rev-list --all)), but it soooo much faster!
The accepted answer was still searching for text after ≈10 minutes of me running it, whereas this one gives results after ≈4 seconds 🤷‍♂️.
The output here is more useful as well
My favorite way to do it is with git log's -G option (added in version 1.7.4).
-G<regex>
Look for differences whose added or removed line matches the given <regex>.
There is a subtle difference between the way the -G and -S options determine if a commit matches:
The -S option essentially counts the number of times your search matches in a file before and after a commit. The commit is shown in the log if the before and after counts are different. This will not, for example, show commits where a line matching your search was moved.
With the -G option, the commit is shown in the log if your search matches any line that was added, removed, or changed.
Take this commit as an example:
diff --git a/test b/test
index dddc242..60a8ba6 100644
--- a/test
+++ b/test
## -1 +1 ##
-hello hello
+hello goodbye hello
Because the number of times "hello" appears in the file is the same before and after this commit, it will not match using -Shello. However, since there was a change to a line matching hello, the commit will be shown using -Ghello.
git log can be a more effective way of searching for text across all branches, especially if there are many matches, and you want to see more recent (relevant) changes first.
git log -p --all -S 'search string'
git log -p --all -G 'match regular expression'
These log commands list commits that add or remove the given search string/regex, (generally) more recent first. The -p option causes the relevant diff to be shown where the pattern was added or removed, so you can see it in context.
Having found a relevant commit that adds the text you were looking for (for example, 8beeff00d), find the branches that contain the commit:
git branch -a --contains 8beeff00d
If you want to browse code changes (see what actually has been changed with the given word in the whole history) go for patch mode - I found a very useful combination of doing:
git log -p
# Hit '/' for search mode.
# Type in the word you are searching.
# If the first search is not relevant, hit 'n' for next (like in Vim ;) )
Search in any revision, any file (Unix/Linux):
git rev-list --all | xargs git grep <regexp>
Search only in some given files, for example XML files:
git rev-list --all | xargs -I{} git grep <regexp> {} -- "*.xml"
The result lines should look like this:
6988bec26b1503d45eb0b2e8a4364afb87dde7af:bla.xml: text of the line it found...
You can then get more information like author, date, and diff using git show:
git show 6988bec26b1503d45eb0b2e8a4364afb87dde7af
I took Jeet's answer and adapted it to Windows (thanks to this answer):
FOR /F %x IN ('"git rev-list --all"') DO #git grep <regex> %x > out.txt
Note that for me, for some reason, the actual commit that deleted this regex did not appear in the output of the command, but rather one commit prior to it.
For simplicity, I'd suggest using GUI: gitk - The Git repository browser. It's pretty flexible
To search code:
To search files:
Of course, it also supports regular expressions:
And you can navigate through the results using the up/down arrows.
Whenever I find myself at your place, I use the following command line:
git log -S "<words/phrases i am trying to find>" --all --oneline --graph
Explanation:
git log - Need I write more here; it shows the logs in chronological order.
-S "<words/phrases i am trying to find>" - It shows all those Git commits where any file (added/modified/deleted) has the words/phrases I am trying to find without '<>' symbols.
--all - To enforce and search across all the branches.
--oneline - It compresses the Git log in one line.
--graph - It creates the graph of chronologically ordered commits.
For anyone else trying to do this in Sourcetree, there is no direct command in the UI for it (as of version 1.6.21.0). However, you can use the commands specified in the accepted answer by opening Terminal window (button available in the main toolbar) and copy/pasting them therein.
Note: Sourcetree's Search view can partially do text searching for you. Press Ctrl + 3 to go to Search view (or click Search tab available at the bottom). From far right, set Search type to File Changes and then type the string you want to search. This method has the following limitations compared to the above command:
Sourcetree only shows the commits that contain the search word in one of the changed files. Finding the exact file that contains the search text is again a manual task.
RegEx is not supported.
I was kind of surprised here and maybe I missed the answer I was looking for, but I came here looking for a search on the heads of all the branches. Not for every revision in the repository, so for me, using git rev-list --all is too much information.
In other words, for me the variation most useful would be
git grep -i searchString $(git branch -r)
or
git branch -r | xargs git grep -i searchString
or
git branch -r | xargs -n1 -i{} git grep -i searchString {}
And, of course, you can try the regular expression approach here. What's cool about the approach here is that it worked against the remote branches directly. I did not have to do a check out on any of these branches.
Adding more to the answers already present.
If you know the file in which you might have made do this:
git log --follow -p -S 'search-string' <file-path>
--follow: lists the history of a file
Inspired by the answer https://stackoverflow.com/a/2929502/6041515, I found
git grep seems to search for the full code base at each commit, not just the diffs, to the result tends to be repeating and long. This script below will search only the diffs of each git commit instead:
for commit in $(git rev-list --all); do
# search only lines starting with + or -
if git show "$commit" | grep "^[+|-].*search-string"; then
git show --no-patch --pretty=format:'%C(yellow)%h %Cred%ad %Cblue%an%Cgreen%d %Creset%s' --date=short $commit
fi
done
Example output, the bottom git commit is the one that first introduced the change I'm searching for:
csshx$ for commit in $(git rev-list --all); do
> if git show "$commit" | grep "^[+|-].*As csshX is a command line tool"; then
> git show --no-patch --pretty=format:'%C(yellow)%h %Cred%ad %Cblue%an%Cgreen%d %Creset%s' --date=short $commit
> fi
> done
+As csshX is a command line tool, no special installation is needed. It may
987eb89 2009-03-04 Gavin Brock Added code from initial release
Jeet's answer works in PowerShell.
git grep -n <regex> $(git rev-list --all)
The following displays all files, in any commit, that contain a password.
# Store intermediate result
$result = git grep -n "password" $(git rev-list --all)
# Display unique file names
$result | select -unique { $_ -replace "(^.*?:)|(:.*)", "" }
Okay, twice just today I've seen people wanting a closer equivalent for hg grep, which is like git log -pS but confines its output to just the (annotated) changed lines.
Which I suppose would be handier than /pattern/ in the pager if you're after a quick overview.
So here's a diff-hunk scanner that takes git log --pretty=%h -p output and spits annotated change lines. Put it in diffmarkup.l, say e.g. make ~/bin/diffmarkup, and use it like
git log --pretty=%h -pS pattern | diffmarkup | grep pattern
%option main 8bit nodefault
// vim: tw=0
%top{
#define _GNU_SOURCE 1
}
%x commitheader
%x diffheader
%x hunk
%%
char *afile=0, *bfile=0, *commit=0;
int aline,aremain,bline,bremain;
int iline=1;
<hunk>\n ++iline; if ((aremain+bremain)==0) BEGIN diffheader;
<*>\n ++iline;
<INITIAL,commitheader,diffheader>^diff.* BEGIN diffheader;
<INITIAL>.* BEGIN commitheader; if(commit)free(commit); commit=strdup(yytext);
<commitheader>.*
<diffheader>^(deleted|new|index)" ".* {}
<diffheader>^"---".* if (afile)free(afile); afile=strdup(strchrnul(yytext,'/'));
<diffheader>^"+++".* if (bfile)free(bfile); bfile=strdup(strchrnul(yytext,'/'));
<diffheader,hunk>^"## ".* {
BEGIN hunk; char *next=yytext+3;
#define checkread(format,number) { int span; if ( !sscanf(next,format"%n",&number,&span) ) goto lostinhunkheader; next+=span; }
checkread(" -%d",aline); if ( *next == ',' ) checkread(",%d",aremain) else aremain=1;
checkread(" +%d",bline); if ( *next == ',' ) checkread(",%d",bremain) else bremain=1;
break;
lostinhunkheader: fprintf(stderr,"Lost at line %d, can't parse hunk header '%s'.\n",iline,yytext), exit(1);
}
<diffheader>. yyless(0); BEGIN INITIAL;
<hunk>^"+".* printf("%s:%s:%d:%c:%s\n",commit,bfile+1,bline++,*yytext,yytext+1); --bremain;
<hunk>^"-".* printf("%s:%s:%d:%c:%s\n",commit,afile+1,aline++,*yytext,yytext+1); --aremain;
<hunk>^" ".* ++aline, ++bline; --aremain; --bremain;
<hunk>. fprintf(stderr,"Lost at line %d, Can't parse hunk.\n",iline), exit(1);
git rev-list --all | xargs -n 5 git grep EXPRESSION
is a tweak to Jeet's solution, so it shows results while it searches and not just at the end (which can take a long time in a large repository).
So are you trying to grep through older versions of the code looking to see where something last exists?
If I were doing this, I would probably use git bisect. Using bisect, you can specify a known good version, a known bad version, and a simple script that does a check to see if the version is good or bad (in this case a grep to see if the code you are looking for is present). Running this will find when the code was removed.
Scenario: You did a big clean up of your code by using your IDE.
Problem: The IDE cleaned up more than it should and now you code does not compile (missing resources, etc.)
Solution:
git grep --cached "text_to_find"
It will find the file where "text_to_find" was changed.
You can now undo this change and compile your code.
A. Full, unique, sorted, paths:
# Get all unique filepaths of files matching 'password'
# Source: https://stackoverflow.com/a/69714869/10830091
git rev-list --all | (
while read revision; do
git grep -F --files-with-matches 'password' $revision | cat | sed "s/[^:]*://"
done
) | sort | uniq
B. Unique, sorted, filenames (not paths):
# Get all unique filenames matching 'password'
# Source: https://stackoverflow.com/a/69714869/10830091
git rev-list --all | (
while read revision; do
git grep -F --files-with-matches 'password' $revision | cat | sed "s/[^:]*://"
done
) | xargs basename | sort | uniq
This second command is useful for BFG, because it only accept file names and not repo-relative/system-absolute paths.
Check out my full answer here for more explanation.
Command to search in git history
git log -S"alter" --author="authorname" --since=2021.1.1 --until=2023.1.1 -- .

Why does this git/grep/vim shortcut bash code not work as advertised?

At the end of this bash tutorial video link ,Spencer Krum shows a neat hack: how to open a file in vim at the line number where an immediately preceding 'git grep -n' search string was located. It seemed neat, but his code does not work as described on two different linux boxes I tried it with. It just always opens a blank file. original code here
My steps:
First, I made sure to have git initialized and files inside the directory to grep for. For example, I had a somefile.txt with three lines of words:
its
a
mystery
Then I ran: git init;git add .; git commit
Then, modify your .bashrc file to redefine vim as shown below. Make sure to source the .bashrc file when complete: source ~/.bashrc
Finally, run the git grep -n command with a search term you know is in a file in your git directory that is your current working directory. Finally, run vim. It should open your file with the cursor at the search term line. But it doesn't: git grep -n mystery;vim
#Spencers Original
vim () {
last_command=$(history | tail -n 2 | head -n 1)
if [[ $last_command =~ 'git grep' ]] && [[ "$*" =~ :[0-9]+:$ ]]; then
line_number=$(echo $* | awk -F: '{print $(NF-1)}')
/usr/bin/vim +${line_number} ${*%:${line_number}:}
else
/usr/bin/vim "$#"
fi
}
To get the desired result, I had to simplify the regex in the second clause of the first if statement, and also create some temp variables to run additional eval logic on.
Below is my revised code:
#My revised version
vim () {
last_command=$(history | tail -n 2 | head -n 1)
rempws="${last_command#*" "}"
remtws="${rempws%*" "}"
file_name="$(eval $remtws | awk -F: '{print $(NF-2)}')"
line_number="$(eval $remtws | awk -F: '{print $(NF-1)}')"
if [[ $last_command =~ 'git grep' ]] && [[ $line_number =~ [0-9] ]]
then
/usr/bin/vim +${line_number} ${file_name}
else
/usr/bin/vim "$#"
fi
}
Is this related to a bash update since 2015? I don't think it's git related, as my git grep -n command does return a string in the form 'somefile.txt:3:mystery'. Everybody in the audience loved it, and it's still on github on the non-functioning form, so I am worried that I am missing something fundamental about bash.
Show me why I'm dumb.
Ok, I finally understood what this is about.
His code was never really meant to somehow access the output of the git grep command.
Instead, it's a realization that after a git grep you're likely to want to open one of the results.
But, in his workflow, he still relies on copy & paste of the git grep results into the Vim command-line.
The only difference is that, right after a git grep, you can pass Vim a filename with a line number separated by : rather than having to pass two separate arguments.
In his example, the result of git grep started with:
CHANGELOG.rst:75: ...
So, if at the next command you execute:
$ vim CHANGELOG.rst:75:
(Assuming you copied the last part from the git grep results.)
Then the bash function would trigger this command instead:
$ vim +75 CHANGELOG.rst
Which will open this file on line 75.
If you like the idea of this feature, a much cleaner way to implement that is to install and enable the bogado/file-line Vim plug-in, which implements support for that kind of filename + line number arguments in Vim itself.
(Also, there's really no good reason to only recognize the file:line syntax right after a git grep. If you want that behavior, it's better to always get it, not just sometimes. It should be consistent.)
An even better alternative is to use the quickfix feature of Vim, which was conceived for exactly this kind of situation. You can either set 'grepprg' manually, to invoke git grep with the appropriate arguments, or you can adopt a plug-in such as vim-fugitive, which implements a :Ggrep command that calls git grep and presents the results using the quickfix list.
As pointed out by #romainl, you might also want to look into git jump, which is a git command you can enable on your system to have git find interesting locations (such as git grep or git diff output) and have git itself open those results in Vim (using the quickfix list when appropriate.)

Tortoise SVN pre-commit script allow commit that contains defined string

Creating a pre commit script that will only allow commits that contain a specific string somewhere in the file Test.cfg
Currently I have it working in that it will look through every file committed and blocks commits that contain a specified string
REPOS="$1"
TXN="$2"
SVNLOOK=/usr/bin/svnlook
$SVNLOOK diff -t "$TXN" "$REPOS" | \
grep -i "Sting to search here" > /dev/null && { echo "String exists so block commit" 1>&2; exit 1; }
What I am after is for the above code to pretty do the complete opposite so that if the string exists allow the commit and if not then prevent the commit. It would also be nice if I could specify which file should be searched as currently it searches every file and some of the commits can contain 1000's of files
I beg your pardon, but svnlook diff in your case is ugly stupid way. Re-read svnlook subcommands topic in SVNBook, pay attention to svnlook tree/svnlook changed + svnlook cat
Full business-logic of your test can|have to be something like (I'm too lazy to write full bashism here, it will be your duty)
IF $FILENAME exist in transaction ( I'll prefer svnlook tree --full-paths ... just because svnlook changed ... will require additional | gawk {print $2} for clean filename) AND $FILENAME contains $STRING (svnlook cat "$FILENAME" | grep "STRING" ...) DO SOMETHING
Don't forget also process possible edge cases:
$FILENAME doesn't exist in transaction, but presented in WC with correct $STRING, but file not modified according to svn status
The same as above, but modified
pp 1-2, but with disallow $STRING
Due to above notes, I'll recommend to explore|check possibility of replacing file+string by testing custom revision's property in hook (shorter, easier, more manageable)

perforce: how to find the changelist which deletes a line for a file?

So I just found that someone removed a line from a "global" file and the removal is very likely wrong. I need to trace which changelist did the removal, but it is a global file, everyone edits it from many branches. I randomly picked a couple, they both have that line. Any suggestion to do this more systematically?
Time-lapse view is a really good tool for this. You can check out this video for a better idea of how it works.
I would suggest collecting all change# of the file, then using binary search, grabbing each of the change, and grepping for specific line you are looking for, and the character '-' or '<' (depends on your du setting) in the first line.
The line below will give you all the changes:
p4 filelog yourfile.cpp | egrep "^... \#[0-9]+ change" | cut '-d ' -f 4
If you do not want to do binary search manually or write a code to do that in shell or anything else, then I would suggest a brute force, and scan all changes in search for that line.
For example:
p4 filelog yourfile.cpp | egrep "^... \#[0-9]+ change" | cut '-d ' -f 4 | while read change ; do
p4 describe $change | egrep "^<.*your line that was deleted"
[ $? = 0 ] && echo $change
done
Output in my example:
< /* remove the confirmation record for the outstanding write if found */
234039
Where 234039 is the change number that contains your deletion.
I hope it will help.

Linux command most recent non soft link file

Linux command: I am using following command which returns the latest file name in the directory.
ls -Art | tail -n 1
When i run this command it returns me latest file changed which is actually soft link, i wants to ignore soft link in my result, and wants to get file names other then soft link how can i do that any quick help appreciated.
May be can i specify regex matched latest file file name is
rum-12.53.2.war
-- Latest file in directory without softlink
ls -ArtL | tail -n 1
-- Latest file without extension
ls -ArtL | sed 's/\(.*\)\..*/\1/' | tail -n 1
The -L option for ls does dereference the link, i.e. you'll see the information of the reference instead of the link. Is this what you want? Or would you like to completely ignore links?
If you want to ignore links completely you can use this solution, although I am sure there exists an easier one:
a=$( ls -Artl | grep -v "^l" | tail -1 )
aa=()
for i in $(echo $a | tr " " "\n")
do
aa+=($i)
done
aa_length=${#aa[#]}
echo ${aa[aa_length-1]}
First you store the output of your ls in a variable called a. By grepping for "^l" you chose only symbolic links and with the -v option you invert this selection. So you basically have what you want, only downside is that you need to use the -l option for ls, as otherwise there's no grepping for "^l". So in the second part you split the variable a by " " and fill an array called aa (sorry for the bad naming). Then you need only the last item in aa, which should be the filename.

Resources