perforce: how to find the changelist which deletes a line for a file? - perforce

So I just found that someone removed a line from a "global" file and the removal is very likely wrong. I need to trace which changelist did the removal, but it is a global file, everyone edits it from many branches. I randomly picked a couple, they both have that line. Any suggestion to do this more systematically?

Time-lapse view is a really good tool for this. You can check out this video for a better idea of how it works.

I would suggest collecting all change# of the file, then using binary search, grabbing each of the change, and grepping for specific line you are looking for, and the character '-' or '<' (depends on your du setting) in the first line.
The line below will give you all the changes:
p4 filelog yourfile.cpp | egrep "^... \#[0-9]+ change" | cut '-d ' -f 4
If you do not want to do binary search manually or write a code to do that in shell or anything else, then I would suggest a brute force, and scan all changes in search for that line.
For example:
p4 filelog yourfile.cpp | egrep "^... \#[0-9]+ change" | cut '-d ' -f 4 | while read change ; do
p4 describe $change | egrep "^<.*your line that was deleted"
[ $? = 0 ] && echo $change
done
Output in my example:
< /* remove the confirmation record for the outstanding write if found */
234039
Where 234039 is the change number that contains your deletion.
I hope it will help.

Related

Find total size of uncommitted or untracked files in git

I have a big horrible pile of code and I am setting it up in version control.
I would like a command I can run on Linux to give me the total size of the files that would be committed and pushed if I ran git add -A && git commit -am 'initial commit'
The total size is needed, also a break down by folder would be handy.
I will then use this to build up my ignores so that I can get the repo to a realistic size before I push it up
I think I have answered my own question:
for f in `git status --porcelain | sed 's#^...##'`; do du -cs $f | head -n 1; done | sort -nr; echo "TOTAL:"; du -cs .
However I'm open to any better ideas or useful tricks. My current output is 13GB :)
The above command is basically there, it gives me the total line by line from git status but doesn't give me the total sum. I'm currently getting the total of all files at the end which is not correct. I tried some use of bc but couldn't get it to work
I adapted the answer of edmondscommerce by adding a simple awk statement which sums the output of the for loop and prints the sum (divided by 1024*1024 to convert to Mb)
for f in `git status --porcelain | sed 's#^...##'`; do du -cs $f | head -n 1; done | sort -nr | awk ' {tot = tot+$1; print } END{ printf("%.2fMb\n",tot/(1024*1024)) }'
Note that --porcelain prints pathnames relative to the root of the git repos. So, if you do this in a subdirectory the du statement will not be able to find the files..
(whoppa; my first answer in SoF, may the force be with it)
I've used a modified version of this, because I had files with spaces in them which made it crash. I was also unsure about the size calculations and removed a useless head:
git status --porcelain | sed 's/^...//;s/^"//;s/"$//' | while read path; do
du -bs "$path" ;
done | sort -n | awk ' {tot = tot+$1; print } END { printf("%.2fMB\n",tot/(1024*1024)) }'
I prefer to use while as it's slightly safer than for: it can still do nasty things with files that have newlines in them so I wish there was a to pass null-separate files yet still be able to grep for the status, but I couldn't find a nice way for that.
Since version 2.11, git provides a handy "count-objects" command :
git count-objects -H
If this is not enough, I would recommend git-sizer from github :
https://github.com/github/git-sizer
git-sizer --verbose
Detailed usage here : https://github.com/github/git-sizer/#usage
Since you're just adding everything, I don't see any reason to go via Git. Just use the ordinary Unix tools: du, find, &c.

rsync verbose with final stats but no file list

I see that when I use rsync with the -v option it prints the changed files list and some useful infos at the end, like the total transfer size.
Is it somewhat possible to cut out the first (long) part and just print the stats? I am using it in a script, and the log shouldn't be so long. Only the stats are useful.
Thank you.
As I was looking for an answer and came across this question:
rsync also supports the --stats option.
Best solution for now i think :
rsync --info=progress0,name0,flist0,stats2 ...
progress0 hides progress
progress2 display progress
name0 hides file names
stats2 displays stats at the end of transfer
This solution is more a "hack" than the right way to do it because the output is generated but only filtered afterwards. You can use the option --out-format.
rsync ... --out-format="" ... | grep -v -E "^sending|^created" | tr -s "\n"
The grep filter should probably be updated with unwanted lines you see in the output. The tr is here to filter the long sequence of carriage returns.
grep -E for extended regexes
grep -v to invert the match. "Selected lines are those not matching any of the specified patterns."
tr -s to squeeze the repeated carriage returns into a single one

Help needed to nab the malware viral activity using awk

I am facing issues with my server as sometimes the malwares are adding their code at the end or start of the files. I have fixed the security loopholes to the extent of my knowledge. My hosting provider has informed that the security is adequate now, but I have become paranoid with the viral/malware activity on my site. I have a plan, but I am not well versed with Linux editors like sed or awk or gawk so help needed from your side. I can do this using my PHP knowledge but that would be very resource intensive.
Since malwares/virus add code at the start or end of the file (so that the website does not show any error), can you please let me know how to write a command which would recursively look into all .php files (I will use the help to make changes in other type of files) in parent and all sub-directories and add a particular tag at the start and end of the file, say, XXXXXX_START, and YYYYYY_END.
Then I need a script which would read all the .php files and check if the first line of the code is XXXXX_START and last line is YYYYYYY_END and create a report if any file is found to be different.
I will setup a cron to check all the files and email the report to me if any discrepancy found.
I know this is not 100% foolproof as virus may add the data after the commented lines, but this is the best option I could think of.
I have tried the following commands to add data at the start -
sed -i -r '1i add here' *.txt
but this isn't recursive and it adds line to only the parent directory files.
Then I found this -
BEGIN and END are special patterns. They are not used to match input records. Rather, they are used for supplying start-up or clean-up information to your awk script. A BEGIN rule is executed, once, before the first input record has been read. An END rule is executed, once, after all the input has been read. For example:
awk 'BEGIN { print "Analysis of `foo'" }
/foo/ { ++foobar }
END { print "`foo' appears " foobar " times." }' BBS-list
But unfortunately, I could not decipher anything.
Any help on above mentioned details is highly appreciated. Any other suggestions are welcomed.
Regards,
Nitin
You can use the following to modify the files (also creates backup files called .bak):
find . -name "*.php" | xargs sed -i.bak '1iSTART_XXXX
$aEND_YYYY'
You could use the following shell script for checking the files:
for f in `find . -name "*.php" -print`
do
START_LINE=`head -1 $f`
END_LINE=`tail -1 $f`
if [[ $START_LINE != "START_XXXX" ]]
then
echo "$f: Mismatched header!"
fi
if [[ $END_LINE != "END_YYYY" ]]
then
echo "$f: Mismatched footer!"
fi
done
Use version control and/or backups; in the event of suspicious activity, zap the live site and reinstall from backups or your version control source.
$ find . -type f | grep "txt$" | xargs sed -i -r '1i add here'
Will apply that command to all files in or under the current directory. You could probably fold the grep logic into find, but I like simple incantations.

need to remove lines from syslog that have a certain string of data duplicate

hey guys wonder if anyone can help with this little dilema
Trying to remove lines from a syslog text file that have duplicate strings
Mar 10 06:51:11[http-8080-1] INFO com.MYCOMPANY.webservices.userservice.web.UserServiceController [u:2533274802474744|360] Authorize [platformI$tformIdAndOs=2533274802474744|360, userRegion=America|360]
then a few lines down
Mar 10 06:52:03 [http-8080-1] INFO com.MYCOMPANY.webservices.userservice.web.UserServiceController [u:2533274802474744|360] Authorize [platformI$tformIdAndOs=2533274802474744|360, userRegion=America|360
got the same thing in terms of a u: number but the issue is I need to remove duplicates and just leave one and the file has multiple duplicates of different u: numbers and it's 14,000 lines long.
can anyone tell me if I can use awk? sed? or sort for something like this to ? removing lines that have a certain string in there that's a duplicate.
I basically need to de-dupe but the problem is only one little part of the string is the indicator.
Any help is appreciated! thanks
There is probably a better way to do this, but here's my first stab at it:
First, create a new file, call it
uvalues.txt
Read the file line by line, for each
line grep for "u:", store the result
in $u
if $u exists in uvalues.txt, ignore
this line
if $u does not exist in uvalues.txt, write this line to another file, write $u to uvalues.txt
repeat
Code would be something like this:
#!/bin/bash
touch uvalues.txt
for l in `cat file.txt`; do
uvalue=`echo "$l" | grep "u:" | cut -f2 -d':' | cut -f1 -d'|'`
#if uvalue is not empty, check it against our temp file
if [ -n "$uvalue" ]; then
existing_value=`grep "$uvalue" uvalues.txt`;
#if it is empty, it means it's not a duplicate
if [ -z "$existing_value" ]; then
echo $l >> save.txt
echo $uvalue >> uvalues.txt
fi
fi
done
rm uvalues.txt

Egrep acts strange with -f option

I've got a strangely acting egrep -f.
Example:
$ egrep -f ~/tmp/tmpgrep2 orig_20_L_A_20090228.txt | wc -l
3
$ for lines in `cat ~/tmp/tmpgrep2` ; do egrep $lines orig_20_L_A_20090228.txt ; done | wc -l
12
Could someone give me a hint what could be the problem?
No, the files did not changed between executions. The expected answer for the egrep line count is 12.
UPDATE on file contents: the searched file contains cca 13000 lines, each of them are 500 char long, the pattern file contains 12 lines, each of them are 24 char long. The pattern always (and only) occurs on a fixed position in the seached file (26-49).
UPDATE on pattern contents: every pattern from tmpgrep2 are a 24 char long number.
If the search patterns are found on the same lines, then you can get the result you see:
Suppose you look for:
abc
def
ghi
jkl
and the data file is:
abcdefghijklmnoprstuvwxzy
then the one-time command will print 1 and the loop will print 4.
Could it be that the lines read contain something that the shell is expanding/substituting for you, in the second version? Then that doesn't get done by grep when it reads the patterns itself, thus leading to a different sent of patterns being matched.
I'm not totally sure if the shell is doing any expansion on the variable value in an invocation like that, but it's an idea at least.
EDIT: Nope, it doesn't seem to do any substitutions. But it could be quoting issue, if your patterns contain whitespace the for loop will step through each token, not through each line. Take a look at the read bash builtin.
Do you have any duplicates in ~/tmp/tmpgrep2? Egrep will only use the dupes one time, but your loop will use each occurrence.
Get rid of dupes by doing something like this:
$ for lines in `sort < ~/tmp/tmpgrep2 | uniq` ; do egrep $lines orig_20_L_A_20090228.txt ; done | wc -l
I second #unwind.
Why don't you run without wc -l and see what each search is finding?
And maybe:
for lines in `cat ~/tmp/tmpgrep2` ; do echo $lines ; done
Just to see now the shell is handling $lines?
The others have already come up with most of the things I would look at. The next thing I would check is the environment variable GREP_OPTIONS, or whatever it is called on your machine. I've gotten the strangest error messages or behaviors when using a command line argument that interfered with the environment settings.

Resources