find all lines with specific phrase in specific commit - linux

Lets assume I have a commit with a known hash, and the commit touches 1000 files of 5000 files of the project.
Among some of those files there was added the log message LOG_WARNING(...);, lets say 500 times. Which I want to replace by LOG_INFO(...);.
I don't want to replace all LOG_WARNING(...); in the project (lets say it has 10000 of them), just ones, related to the specified commit.
I'm ready to walk over each of 500 lines I have to modify, but I'm trying to avoid walking over 10000 existing log-lines in the codebase.
What is the best way (practice) to do it?

I would do it that way:
git show --name-only <commit> | xargs sed -i 's/LOG_WARNING/LOG_INFO/'
The git command give the filenames part of the commit.
xargs provides these files to sed which replaces the wanted pattern.

What did help to me:
git diff (...) > patchfile -- extract all changes of current commit to a patchfile
edit patchfile -- using any editing tool & script, but in the patchfile I had to deal with only LOG_WARNING of specified commit.
git reset --hard -- to get rid of the commit I'm going to modify
git apply patchfile -- applies "patch", containing exactly my commit, but with replacement I wanted to.
It does the job. And relatively quickly.

Related

How to "git add -all", but limit it to maximum 100 at a time?

There are about 3000 files that I need to commit to a repo. Most are images. My problem is if I do what I normally do:
git add --all
... then I can't push because the git server has various limits that it just keeps hitting. I tried adding workarounds for these limits, but the truth is, I don't normally do such big commits, so I would prefer to not change the settings.
Instead I was hoping there is a way to ONLY add the first 100 untracked files and then stop. Then I can do a "git commit" and a "git push" and all should be well with the world.
Any idea how to do this?
If you have bash available, this should work: list all untracked files, select the first 100 to pass to git add as argument.
git ls-files --others --exclude-standard | head -n 100 | xargs git add

Commit without add and how to see remote branch log

1.I'm new to git and would like to know what happen if I've a file which was modified and already staged in the past(but now modified again), and I want to commit the file using a command such as :
git commit -m "yada yada" ~/home/Dan/project/file_to_commit.py
Is this equivalent to doing:
a.git add ~/home/dan/project/file_to_commit.py
b.git commit -m "yada yada"
If not please explain.
2.How can I see the remotes branch commits logs?(pushes) without doing git pull?
Thanks
This might be better as two separate questions, and the second question is already answered correctly, but I'll take a stab at answering both. Before I start, though, I want to say that the path ~/home/Dan/project/file_to_commit.py is pretty suspect: it suggests that your git directory is /.git, which is not a good idea. I'm going to assume, below, that the .git directory is much further down and you're just adding project/file or file (and I'll trim the paths in the question).
Note that the TL;DR version of the first answer is that they're almost the same, and you only need to know about the difference in some edge cases. (Hence the existing answer from eleventhend is probably good enough for most purposes.)
Q1: Add and commit vs git commit path
... I've a file which was modified and already staged in the past (but now modified again), and I want to commit the file using a command such as:
git commit -m "yada yada" file_to_commit.py
Is this equivalent to doing:
git add file_to_commit.py
git commit -m "yada yada"
If not please explain.
No, it's not exactly equivalent. This is a little bit tricky and it will help a lot if we define some terms and get a few things pinned down a bit more.
Also, you say "already staged in the past (but now modified again)", which leaves a bit of ambiguity: did you do a git commit in between these operations? I'll address both the "yes" and "no" cases by describing the full, general case.
The index
First, we need to define git's index or staging area (it has even a few more names, e.g., cache as in git diff --cached, but "index" and "staging area" are the two most common terms). Git has a (one, single) standard index—let's call this "the" index, and when we want to refer to another, different index, we'll spell out which other one we mean. When you run most normal git commands, including git add, git updates this index.1
Next, we need to take some notes on what's actually in the index. Aside from some interesting but not relevant here cases like merges, and one thing I'm leaving out on purpose, in essence, the index has one entry per file that will be in the next commit.2 That is, suppose you've made a commit, or checked out some branch, so that your current commit, which is now in your work tree, has 100 files in it. (Your work tree may have additional untracked and/or ignored files, as long as it also has those 100 files.)
Using git add
When you run git add, git stores a new copy of each of the files being added into the repository, which it calls blob objects. It calculates a hash value for each blob as it adds it to the repository, then puts the new hash into the index.
When you run git commit—at least, without either paths or -a—git reads the index and uses that to form the new commit. If the new commit would have the same tree as the previous commit,3 git requires that you add the --allow-empty flag. (This doesn't mean that the index is empty, but rather that the index matches the index you'd get by re-checking-out the current commit. So --allow-empty might be the wrong name for this flag; it maybe should have been called --allow-same or allow-unchanged or some such.)
Hence, if you do git add path and then git commit -m message, you'll get a commit that uses the index as updated by the git add, which may have additional updates from before that git add. Since there's just the one entry per path, though, if you do:
... hack on foo.py ...
$ git add foo.py
$ echo '# add a final comment' >> foo.py
$ git add foo.py
$ git commit -m 'update foo'
there will only be one update to foo.py in the new commit.
So what's the difference?
I claimed above that git commit path (and git commit -a) is not exactly the same as doing the git add and then git commit. Let's look at the precise difference.
When you give path names (or -a) to git commit, git uses a new, different, temporary index. It starts by copying something—exactly what gets a bit complicated—to the temporary index, then it adds each file that is to be committed, then it makes a commit from the temporary index. Once the commit is done, git goes back to using the normal, ordinary index, and it updates the index. That is, after adding stuff to the temporary index and committing, it also adds to the regular index.
To see how this really works we need some examples. Suppose we have two files and we git add a change to one of them:
# assume file1 and file2 are in the HEAD commit
echo add stuff to file1 >> file1
git add file1
echo add stuff to file2 too >> file2
At this point, git status will tell us that we have changes to file1 that are staged to be committed, and changes to file2 that are not staged to be committed.
If we run git add file2; git commit, we'll get both updates in the new commit. Once the commit is done, git status will show there is nothing to commit. But if, instead, we do:
git commit -m message file2
and then run git status, we'll see that file1 is still staged and ready to commit. The commit we just made has only the change to file2.
This is because the git commit file2 command started by making a temporary index using the HEAD commit, adding file2, making the commit, and then updating the normal index with the new file2. This last bit is important: if git didn't copy the change back into the (regular) index, the index would still have the old version of file2, and the next commit would undo the change we just committed.
This copy-back step has an important side effect. Suppose that we have a complicated change to foo.py. For instance, suppose we went to fix a bug, and along the way we refactored a few functions. We've tested the fix and it works, so we did git add foo.py and were about to commit it:
... hack on foo.py ...
... test, hack more, test, until fixed ...
git add foo.py
git commit -m 'refactor foo.py and then fix a bug'^C
At this point we realized that we shouldn't commit both changes together: we should commit the refactored code first, and then commit the bug fix.4
Well, we've already git add-ed the refactored and fixed code, so it's safely stashed away in the index, right? (No, WRONG! Danger Will Robinson! But let's proceed, since this is an example.) So we can just undo the fix part, leaving only the refactoring, and commit that first:
... edit foo.py to remove just the fix ...
git commit -m 'refactor code to prep for bug fix' foo.py
Once that commit is done, we can commit the staged version:
git commit -m 'fix bug #12345' foo.py
Alas, git now tells us that there's nothing to commit. What happened to the carefully staged full-fix version of foo.py?
The answer is, git commit foo.py wiped it out. Git first added the refactor-only foo.py to a temporary index and committed that; but then it copied the refactor-only foo.py back to the regular index, losing our carefully staged full-fix version. We can either re-create it, or fish around in the repository for the "dangling blob" that is left behind because of this.
(This should probably be considered a bug in git, although it's not clear what to do with the staged but uncommitted version.)
git commit -a and/or paths: --only vs --include
Here I need to quote myself from just a moment ago. When using -a or paths, git commit starts by copying something—exactly what gets a bit complicated. If you look closely at the git commit documentation, you will find the -i or --include option (and a corresponding, but default, -o / --only option). These control what goes into the temporary index.
When using --include, git commit creates its temporary index from the current index. When using the default --only mode, git commit creates its temporary index from the HEAD commit.
(Because of the copy-back step at the end, the only way to see that both commands are in fact using a temporary index is to view the index in the middle of the commit operation. If we use --include and check after the commit is done, the regular index will match the new HEAD commit, as if git commit were adding to the regular index rather than to the temporary index. Fortunately it's very easy to view the regular index "in the middle", by not supplying the -m flag, so that git commit fires up the editor. While that's going on, run git status in another window, or using job control. Here's an example:
# At this point I've modified both a.py and mxgroup.py
# but have not `git add`ed either one.
$ git add a.py
$ git status --short
M a.py
M mxgroup.py
# We see that a plain "git commit" would commit a.py,
# but not mxgroup.py.
$ git commit -i mxgroup.py
# now waiting in the editor
# Now, in another window:
$ git status -s
M a.py
M mxgroup.py
This shows that the (regular) index is still set up the way we had it. Once we write the message out and exit the editor, the commit process will update the regular index for the new mxgroup.py entry, and the now-committed a.py change is also in the new HEAD commit, so git status will show neither file.)
Q2: Viewing logs
How can I see the remotes branch commits logs?(pushes) without doing git pull?
Git itself operates almost entirely locally. You may be able to do this directly with web viewers, but it's pretty convenient to just do locally, by first running git fetch.
The git pull command is actually meant as a convenience. There are two steps needed to incorporate other people's commits:
obtain those commits so that you have them locally; and
merge or rebase using those commits.
These two steps are handled by different git commands: git fetch does step 1, and git merge and git rebase—you must pick one of the two—does whichever version of step 2 you want.
The git pull command simply does step 1, then does step 2. It chooses merge unless you instruct it otherwise. For historical reasons, it has multiple ways of choosing which operation to run in step 2.
My recommendation is that as a newbie to git, you avoid git pull entirely. There are two reasons for this, only one of which is valid today (unless you're stuck with very old versions of git):
Traditionally, git pull has had various bugs, some of which can even lose work. (As far as I know these are all fixed since git 2.0.)
While it is convenient, git pull obscures what's happening and makes you choose merge-vs-rebase too early. It is true that rebase is almost always the right answer, but git pull defaults to doing merge, which means that its default is wrong for new users. Plus, of course, there's that "obscures what's happening" issue. If you knew about fetch and then rebase as separate steps, this question probably would not even have come up.
1The index is just a file, and you can find it in .git/index. You can make git use a different index by setting GIT_INDEX_FILE in the environment, but this is mainly meant for use by scripts like git stash.
2The cases I'm leaving out are:
Conflicted merges, which record up to three entries per path, using non-zero stage numbers. Once your resolve the conflict and git add the result, the nonzero stages are cleaned out and the normal stage-0 result is stored instead, and we're back to the normal, ready-to-commit case for that index intry.
Removing a file that is in the current commit (git rm, with or without --cached). This writes a special stage-0 entry marking the file as to-be-omitted, rather than simply removing the entry.
3If you're committing a merge, git allows the tree to match those of any or all parents, since the merge commit needs to record the multiple parents. The "empty" test is thus applied only to non-merge, single-parent commits. This is documented much better in modern git than it was in old versions of git, but it still has the wrong name.
4This has nothing to do with git itself. The idea here is to commit small, readable, understandable, and most importantly testable changes. Any time you find yourself writing up a commit as "do A and B, and fix C, and add D and E" it's an indication that you should probably split this into one commit per thing—in this case, about 5 separate commits.
[updated]
It is actually equivalent. When you commit a file directly, using git commit <filepath>, it stages the current file before committing. You do have to stage the file the first time the file is added before committing it (tell the repository to start tracking the file) using git add <file>.
Sample workflow:
git add <file>
Make some changes, yada yada
git commit -m "yada yada" <file>
Make some more changes, blah blah
git commit -m "blah blah" <file>
2.
To see the commit logs of a remote git repository, first use git fetch on the repository, then run git log <path/branch> to view the log.
See here: https://github.com/abhikp/git-test/wiki/View-the-commit-log-of-a-remote-branch

What's the easiest way to use the output paths from a git command in a subsequent git command?

I far too frequently use the mouse to do things like this:
/home/me-$ git log --name-status -1
commit a10e63af1f4b1b2c28055fed55d4f2bb3225a541
Author: Me <me#me.com>
Date: Tue Aug 18 13:04:04 2015 -0400
XYZ-376 make ctors public
M x/y/z/Class1.java
M x/y/z/Class2.java
/home/me-$ git checkout -- x/y/z/Class2.java # <-- copy/paste with the mouse
I know that some git commands accept wildcards, and this mitigates this problem somewhat, but I'm wondering if there is a way do specifically reference pathspecs, etc. from previous commands.
How can I run commands like this without using the mouse, and without retyping long paths by hand?
I typically use a subshell ($(<command in subshell here...>)) for this.
For example, sometimes I had many files deleted and I had to git rm every one of them.
There's the command git ls-files --deleted that returns the names of all the missing files. I can combine it with git rm like this:
git rm $(git ls-files --deleted)
This is somewhat a bad example, because (as I discovered later), this operation can be achieved much easier with git add --all. But I think it illustrates the point.
In your case, if you wanted to checkout all files that have been changed in the previous commit, it would be hard to parse the output of git log --name-status, because it contains additional information, but you could use something like git diff HEAD^ --name-only instead.
So:
git checkout $(git diff HEAD^ --name-only)
will do it in your example.
One nice thing that I noticed using the $(...) syntax is that it works both in Bash and in PowerShell.
This'd be the kind of thing you run a shell under emacs for, run all your shells in it and have a command to walk back through the buffer looking for patterns in the output.
For retrieving output from a previous command that you didn't capture inside the shell session, you're going to have to get it from your terminal emulator's buffers somehow. The xterm family has a configurable "copy the whole scrollback buffer" thingy, then xclip -o will print the selection and you can pipe it through an extraction filter.
But it's either capture the output within the session or scrape it from the output buffers afterwards, that's everywhere the data's ever been.
On OS X, "Mouseless Copy" supported by iTerm2 (and probably some other terminal emulators) is a workable solution: https://www.iterm2.com/features.html
search for some portion of the string (⌘F)
expand selection right (tab) or left (shift-tab)
paste selection with (option-enter) or copy/paste in the usual way

Git: How to list all (staged) files attempting to be committed

Wrote a bash script for the prepare-commit-msg git hook. It lists all staged files that exist, but I only want the staged files that are attempting to be committed (i.e. Example of desired input/output at the bottom of the page).
My script's job is to prevent a commit from happening if the files attempting to be committed did not follow a certain commenting convention (i.e. think java docs). Not only this, but it edits and auto formats the comments to meet my commenting convention. This is extremely important to note because I can't just grab the SHA-1 of the commit because this script needs to happen before that key is ever created.
This works perfectly when I execute commit -a (i.e. commit all files). However, I run into problems when I want to just commit a few of my staged files.
Is there a way I can catch only the staged files that are attempting to be committed, not just every single staged file that exists?
For example, let's say my staged files were the following:
file1.txt
file2.txt
file3.txt
file4.txt
file5.txt
When I execute git commit file1.txt file2.txt file3.txt, I want to catch file1.txt file2.txt file3.txt in my script...but not file4.txt and file5.txt.
Is there anyway to do this?
EDIT: Definitely not a duplicate. The solution to the "duplicate" question is definitely not what I'm asking for.
$ git status -s -uno
M E
A R
The file E is modified, the file R is staged(added).
An unstaged file has the action marker in the second column (after git reset E, to unstage the file E):
$ git status -s -uno
M E
A R
These can be dropped with grep -v '^ ' for example.
Here is a complete proof in my test directory:
Tracked Files
~/test/ed $ git ls-tree HEAD
100644 blob 96bf192a9be8d1cecc314f66bb1ef5961564e983 E
100644 blob 11470e37f3d22a2548ce5c85040a44c9581d7727 I
100644 blob 8f2f9e95d9b00595d1588ccef91495c06295f5fa O
Filesystem Files (all, as in git commit -a)
~/test/ed $ ls -l .
total 16
-rw-r--r-- 1 ingo ingo 140 25. Jun 05:48 E
-rw-r--r-- 1 ingo ingo 143 25. Jun 05:39 I
-rw-r--r-- 1 ingo ingo 106 25. Jun 05:29 O
-rw-r--r-- 1 ingo ingo 157 25. Jun 05:28 R
Status of the working directory: Changes against HEAD and staged files
~/test/ed $ git status -s -uno
M E
A R
The output without the modified files that are not yet or no more (git reset) in the index (aka. not staged or unstaged)
~/test/ed $ git status -s -uno|grep -v '^ '
A R
Staged filenames only, without the operation flag
~/test/ed $ git status -s -uno|grep -v '^ '|awk '{print $2}'
R
Git commit operation, status and control
Git introduces its own terminology. Some of these words have been used in a wrong way, I will describe the misunderstood concepts and the problematic commands that lead to the erroneous formulation.
Luckily git has a very strong, defined language, where each term has an exact meaning, some of them can be seen in git help gitglossary. To understand the concepts git uses, the git help git page is worth to be read 5-50 times together with the introductory pages that are linked from there.
If you installed a git version without the documentation, slap your system administrator. I assume, that most people who actively read questions, answers and articles are there own administrators, so slap yourself, but not too hard ;) Of course the docs can be found on the net, but they are an integral part of a to-be-used git installation.
Luckily git was initiated and its core was completely written by one of the most excellent minds of our days or at least, by one who uses strictest logic concepts, instead of applying killer tools, to write and control his software development: Linus Thorvalds.
That makes it possible to use the same terms with defined meanings, when we talk about git and operations in a git repository. I won't go to deep though, as some of the concepts are developed with quite advanced theoretical terms in computer science in mind.
The git repository
There are two main types of git repositories, called bare and non-bare, or I sometimes say checked-out (git help init). In this article I just talk about non-bare repositories, where the tracked files of the repository live in the working directory
from gitglossary(7):
working tree
The tree of actual checked out files. The working tree normally
contains the contents of the HEAD commit’s tree, plus any local
changes that you have made but not yet committed.
Note for the Noobs: gitglossary(7) means the manual page with the name "gitglossary" in section 7. With man this page can be reached with man -s7 gitglossary. With git help gitglossary exactly the same will show, with git help --web gitglossary you see a well formatted document in your browser, if your system is configured to remote call a html page into your browser session. With Windows, where there is no man you will always be directed into the browser. For git commands such as add the manual page is man 1 git-add or git-add(1).
Tracked Files
We have seen here, that the term tracked means that the git repository knows and controls that file. The glossary does not come from the gitglossary(7), but from git-add(1), option
-u, --update
Update the index just where it already has an entry matching
<pathspec>. This removes as well as modifies index entries to
match the working tree, but adds no new files.
If no <pathspec> is given when -u option is used, all tracked
files in the entire working tree are updated (old versions of
Git used to limit the update to the current directory and
its subdirectories).
The command git add --update is one of the most important operations to understand the handling of in the working tree by git.
Here shows the problem
with git commit file1.txt file2.txt file4.txt, but lets first define some more terms.
Staged Files or Index
The set of staged files build the index (see gitglossary(7) for index, but ignore the several merge levels or the unmerged index). For our purpose
The index is a stored version of your working tree.
namely that stored version of your working tree that is prepared to be committed as one commit (again gitgloassary(7)
commit
`As a noun: A single point in the Git history;
... "revision" or "version" are synonyms from other version control systems. As Git users we say "commit".
... to be continued (26.Friday) ...

change mtime on git pull

Does anybody know way to change mtime to repo commit time (or any other, but depends on commit metadata) for added/updated files?
We have some logic, which tests files mtime, but backend servers have different mtime on files which were changed, because of this we have some bugs.
Assuming you are getting updating/adding files when you do a git fetch, you can create a git-rebase-and-touch script file that does the rebase for you along with touching all files/directories in each new revision.
The script would look like:
#!/bin/bash
saveIFS=${IFS}
IFS=$'\n'
startrev=$(git rev-parse HEAD)
git rebase
for rev in $(git rev-list --reverse ${startrev}..HEAD); do
stamp=$(git log --pretty="%aD" ${rev}~..${rev})
IFS=$'\n'
for filename in $(git diff --name-only ${rev}~..${rev}); do
file=""
IFS='/'
for part in ${filename}; do
file=${file}/${part}
file=${file#/}
touch -c --date="${stamp}" "${file}"
done
done
done
IFS=${saveIFS}
If you currently use git pull now, use git fetch instead.
It's bloody dangerous tweaking file timestamps, and it's even more dangerous to assume, as you're doing here, that a timestamp means something other than what it ordinarily means. With anything, not just timestamps, doing that hurts reliability and maintainability, it makes comprehension and auditing difficult. The files changed for a legitimate reason, and your system broke.
The timestamps you want to check are recorded in commit metadata, and getting to them isn't efficient enough. Switch to extracting the timestamps to an index file or some such and check them there. Otherwise you're reduced to telling people learning your setup that "not everything is what it seems to be".

Resources