In my codebase; I see line <123> is in a specific file; and i'd like to see when it was introduced. I can do 'p4 annotate' to find when it was last modified; but i'm sure there is a way to step back to introduction.. I'm using 2009.2; not the latest if that matters...
-Chris
edit
This was probably a bad question; I solved my problem by walking back the tree until finding where it was added, basically..
p4 annotage | grep
p4 annotage myFile#rev-1 | grep
p4 annotage myFile#rev-1 | grep
If you have p4v installed, you should use the time-lapse view. That will give you an accounting of all lines in the file, who introduced or changed those change, what changelist, etc. Timelaspse is an awesome tool and will give you what you need without needing to resort to grepping through old versions.
Solution in ruby
file = "/path/to/your/file"
period = "#8,10" # could be #2012/10/01,#now
$all_authors = []
def find_changed_lines file, period
puts "File: #{file}#{period}"
ls = `p4 annotate -a #{file}#{period}`.split"\n"
a = []; b = []; prevrev = ""; linen = 0; authors = []
# find [first, last] aka [min, max] file revisions for given period
ls.each{ |l| l[ /^([0-9]+)-([ 0-9]+):/ ]; a<<$1.to_i if $1; b<<$2.to_i if $2; }
first = a.min.to_s; last = b.max.to_s
# find changed lines
ls.each{ |l|
l[ /^([0-9]+)-([ 0-9]+):/ ]
# find line number
linen +=1 if $2==last
# reject lines if #not changed line or #itermediate change
unless ($1==first and $2==last) or ($1!=first and $2!=last)
# find rev#
rev = $2==last ? $1 : ($2.to_i+1).to_s
# print cl desc info based on rev# unless printed for prev line
if prevrev != rev
cldesc = `p4 filelog -m 1 #{file}##{rev} | head -2 | tail -1`
puts cldesc
# collect change author
authors << cldesc[/by (.*)#/, 1]
prevrev = rev
end
# print changed line with line number
puts l.sub(/^.*:/, "#{linen}:" )
end
}
puts "Change authors: #{authors.uniq!.join(', ')}"
$all_authors += authors
end
find_changed_lines file, period
Related
Hello let say I have a file such as :
$OUT some text
some text
some text
$OUT
$OUT
$OUT
how can I use sed in order to replace the 3 $OUT into "replace-thing" ?
and get
$OUT some text
some text
some text
replace-thing
With sed:
sed -n '1h; 1!H; ${g; s/\$OUT\n\$OUT\n\$OUT/replace-thing/g; p;}' file
GNU sed does not require the semicolon after p.
With commentary
sed -n ' # without printing every line:
# next 2 lines read the entire file into memory
1h # line 1, store current line in the hold space
1!H # not line 1, append a newline and current line to hold space
# now do the search-and-replace on the file contents
${ # on the last line:
g # replace pattern space with contents of hold space
s/\$OUT\n\$OUT\n\$OUT/replace-thing/g # do replacement
p # and print the revised contents
}
' file
This is the main reason I only use sed for very simple things: once you start using the lesser-used commands, you need extensive commentary to understand the program.
Note the commented version does not work on the BSD-derived sed on MacOS -- the comments break it, but removing them is OK.
In plain bash:
pattern=$'$OUT\n$OUT\n$OUT' # using ANSI-C quotes
contents=$(< file)
echo "${contents//$pattern/replace-thing}"
And the perl one-liner:
perl -0777 -pe 's/\$OUT(\n\$OUT){2}/replace-thing/g' file
for this particular task, I recommend to use awk instead. (hope that's an option too)
Update: to replace all 3 $OUT use: (Thanks to #thanasisp and #glenn jackman)
cat input.txt | awk '
BEGIN {
i = 0
p = "$OUT" # Pattern to match
n = 3 # N matches
r = "replace-thing"
}
$0 == p {
++i
if(i == n){
print(r)
i = 0 #reset counter (optional)
}
}
$0 != p {
i = 0
print($0)
}'
If you just want to replace the 3th $OUT usage, use:
cat input.txt | awk '
BEGIN {
i = 0
p = "\\$OUT" # Pattern to match
n = 3 # Nth match
r = "replace-thing"
}
$0 ~ p {
++i
if(i == n){
print(r)
}
}
i <= n || $0 !~ p {
print($0)
}'
This might work for you (GNU sed):
sed -E ':a;N;s/[^\n]*/&/3;Ta;/^(\$OUT\n?){3}$/d;P;D' file
Gather up 3 lines in the pattern space and if those 3 lines each contain $OUT, delete them. Otherwise, print/delete the first line and repeat.
log.txt will be as below, which are the ID data with its own timestamp (detection_time) that will continuously update in this log.txt file. The ID data will be unpredictable number. It could be from 0000-9999 and the same ID could be appeared in the log.txt again.
My goal is to filter the ID that appears again in the log.txt within 15 sec from its first appearance by using shell script. Can anyone help me with this?
ID = 4231
detection_time = 1595556730
ID = 3661
detection_time = 1595556731
ID = 2654
detection_time = 1595556732
ID = 3661
detection_time = 1595556733
To be more clear, from log.txt above, the ID 3661 first appear at time 1595556731 and then appear again at 1595556733 which is just 2 sec after the first appearance. So it is matched to my condition which is want the ID that appear again within 15sec. I would like this ID 3661 to be filtered by my shell script
The output after running the shell script will be
ID = 3661
My problem is I don't know how to develop the programming algorithm in shell script.
Heres what i try by using ID_new and ID_previous variable but ID_previous=$(ID_new) detection_previous=$(detection_new) are not working
input="/tmp/log.txt"
ID_previous=""
detection_previous=""
while IFS= read -r line
do
ID_new=$(echo "$line" | grep "ID =" | awk -F " " '{print $3}')
echo $ID_new
detection_new=$(echo "$line" | grep "detection_time =" | awk -F " " '{print $3}')
echo $detection_new
ID_previous=$(ID_new)
detection_previous=$(detection_new)
done < "$input"
EDIT
log.txt actually the data is in a set contain ID, detection_time, Age and Height. Sorry for not mention this in the first place
ID = 4231
detection_time = 1595556730
Age = 25
Height = 182
ID = 3661
detection_time = 1595556731
Age = 24
Height = 182
ID = 2654
detection_time = 1595556732
Age = 22
Height = 184
ID = 3661
detection_time = 1595556733
Age = 27
Height = 175
ID = 3852
detection_time = 1595556734
Age = 26
Height = 156
ID = 4231
detection_time = 1595556735
Age = 24
Height = 184
I've tried the Awk solution. the result is
4231 3661 2654 3852 4231 which are all the IDs in the log.txt
The correct output should be 4231 3661
From this, I think Age and Height data might affect to the Awk solution because its inserted between the focused data which are ID and detection_time.
Assuming the time stamps in the log file are increasing monotonically, you only need a single pass with Awk. For each id, keep track of the latest time it was reported (use an associative array t where the key is the id and the value is the latest timestamp). If you see the same id again and the difference between the time stamps is less than 15, report it.
For good measure, keep a second array p of the ones we have already reported so we don't report them twice.
awk '/^ID = / { id=$3; next }
# Skip if this line is neither ID nor detection_time
!/^detection_time = / { next }
(id in t) && (t[id] >= $3-15) && !(p[id]) { print id; ++p[id]; next }
{ t[id] = $3 }' /tmp/log.txt
If you really insist on doing this natively in Bash, I would refactor your attempt to
declare -A dtime printed
while read -r field _ value
do
case $field in
ID) id=$value;;
detection_time)
if [[ dtime["$id"] -ge $((value - 15)) ]]; then
[[ -v printed["$id"] ]] || echo "$id"
printed["$id"]=1
fi
dtime["$id"]=$value ;;
esac
done < /tmp/log.txt
Notice how read -r can easily split a line on whitespace just as well as Awk can, as long as you know how many fields you can expect. But while read -r is typically an order of magnitude slower than Awk, and you'll have to agree that the Awk attempt is more succinct and elegant, as well as portable to older systems.
(Associative arrays were introduced in Bash 4.)
Tangentially, anything that looks like grep 'x' | awk '{ y }' can be refactored to awk '/x/ { y }'; see also useless use of grep.
Also, notice that $(foo) attempts to run foo as a command. To simply refer to the value of the variable foo, the syntax is $foo (or, optionally, ${foo}, but the braces add no value here). Usually you will want to double-quote the expansion "$foo"; see also When to wrap quotes around a shell variable
Your script would only remember a single earlier event; the associative array allows us to remember all the ID values we have seen previously (until we run out of memory).
Nothing prevents us from using human-readable variable names in Awk either; feel free to substitute printed for p and dtime for t to have complete parity with the Bash alternative.
Hopefully someone out there in the world can help me, and anyone else with a similar problem, find a simple solution to capturing data. I have spent hours trying a one liner to solve something I thought was a simple problem involving awk, a csv file, and saving the output as a bash variable. In short here's the nut...
The Missions:
1) To output every other column, starting from the LAST COLUMN, with a specific iteration count.
2) To output every other column, starting from NEXT TO LAST COLUMN, with a specific iteration count.
The Data (file.csv):
#12#SayWhat#2#4#2.25#3#1.5#1#1#1#3.25
#7#Smarty#9#6#5.25#5#4#4#3#2#3.25
#4#IfYouLike#4#1#.2#1#.5#2#1#3#3.75
#3#LaughingHard#8#8#13.75#8#13#6#8.5#4#6
#10#AtFunny#1#3#.2#2#.5#3#3#5#6.5
#8#PunchLines#7#7#10.25#7#10.5#8#11#6#12.75
Desired results for Mission 1:
2#2.25#1.5#1#3.25
9#5.25#4#3#3.25
4#.2#.5#1#3.75
8#13.75#13#8.5#6
1#.2#.5#3#6.5
7#10.25#10.5#11#12.75
Desired results for Mission 2:
SayWhat#4#3#1#1
Smarty#6#5#4#2
IfYouLike#1#1#2#3
LaughingHard#8#8#6#4
AtFunny#3#2#3#5
PunchLines#7#7#8#6
My Attempts:
The closes I have come to solving any of the above problems, is an ugly pipe (which is OK for skinning a cat) for Mission 1. However, it doesn't use any declared iterations (which should be 5). Also, I'm completely lost on solving Mission 2.
Any help to simplify the below and solving Mission 2 will be HELLA appreciated!
outcome=$( awk 'BEGIN {FS = "#"} {for (i = 0; i <= NF; i += 2) printf ("%s%c", $(NF-i), i + 2 <= NF ? "#" : "\n");}' file.csv | sed 's/##.*//g' | awk -F# '{for (i=NF;i>0;i--){printf $i"#"};printf "\n"}' | sed 's/#$//g' | awk -F# '{$1="";print $0}' OFS=# | sed 's/^#//g' );
Also, if doing a loop for a specific number of iterations is helpful in solving this problem, then magic number is 5. Maybe a solution could be a for-loop that is counting from right to left and skipping every other column as 1 iteration, with the starting column declared as an awk variable (Just a thought I have no way of knowing how to do)
Thank you for looking over this problem.
There are certainly more elegant ways to do this, but I am not really an awk person:
Part 1:
awk -F# '{ x = ""; for (f = NF; f > (NF - 5 * 2); f -= 2) { x = x ? $f "#" x : $f ; } print x }' file.csv
Output:
2#2.25#1.5#1#3.25
9#5.25#4#3#3.25
4#.2#.5#1#3.75
8#13.75#13#8.5#6
1#.2#.5#3#6.5
7#10.25#10.5#11#12.75
Part 2:
awk -F# '{ x = ""; for (f = NF - 1; f > (NF - 5 * 2); f -= 2) { x = x ? $f "#" x : $f ; } print x }' file.csv
Output:
SayWhat#4#3#1#1
Smarty#6#5#4#2
IfYouLike#1#1#2#3
LaughingHard#8#8#6#4
AtFunny#3#2#3#5
PunchLines#7#7#8#6
The literal 5 in each of those is your "number of iterations."
Sample data:
$ cat mission.dat
#12#SayWhat#2#4#2.25#3#1.5#1#1#1#3.25
#7#Smarty#9#6#5.25#5#4#4#3#2#3.25
#4#IfYouLike#4#1#.2#1#.5#2#1#3#3.75
#3#LaughingHard#8#8#13.75#8#13#6#8.5#4#6
#10#AtFunny#1#3#.2#2#.5#3#3#5#6.5
#8#PunchLines#7#7#10.25#7#10.5#8#11#6#12.75
One awk solution:
NOTE: OP can add logic to validate the input parameters.
$ cat mission
#!/bin/bash
# format: mission { 1 | 2 } { number_of_fields_to_display }
mission=${1} # assumes user inputs "1" or "2"
offset=$(( mission - 1 )) # subtract one to determine awk/NF offset
iteration_count=${2} # assume for now this is a positive integer
awk -F"#" -v offset=${offset} -v itcnt=${iteration_count} 'BEGIN { OFS=FS }
{ # we will start by counting fields backwards until we run out of fields
# or we hit "itcnt==iteration_count" fields
loopcnt=0
for (i=NF-offset ; i>=0; i-=2) # offset=0 for mission=1; offset=1 for mission=2
{ loopcnt++
if (loopcnt > itcnt)
break
fstart=i # keep track of the field we want to start with
}
# now printing our fields starting with field # "fstart";
# prefix the first printf with a empty string, then each successive
# field is prefixed with OFS=#
pfx = ""
for (i=fstart; i<= NF-offset; i+=2)
{ printf "%s%s",pfx,$i
pfx=OFS
}
# terminate a line of output with a linefeed
printf "\n"
}
' mission.dat
Some test runs:
###### mission #1
# with offset/iteration = 4
$ mission 1 4
2.25#1.5#1#3.25
5.25#4#3#3.25
.2#.5#1#3.75
13.75#13#8.5#6
.2#.5#3#6.5
10.25#10.5#11#12.75
#with offset/iteration = 5
$ mission 1 5
2#2.25#1.5#1#3.25
9#5.25#4#3#3.25
4#.2#.5#1#3.75
8#13.75#13#8.5#6
1#.2#.5#3#6.5
7#10.25#10.5#11#12.75
# with offset/iteration = 6
$ mission 1 6
12#2#2.25#1.5#1#3.25
7#9#5.25#4#3#3.25
4#4#.2#.5#1#3.75
3#8#13.75#13#8.5#6
10#1#.2#.5#3#6.5
8#7#10.25#10.5#11#12.75
###### mission #2
# with offset/iteration = 4
$ mission 2 4
4#3#1#1
6#5#4#2
1#1#2#3
8#8#6#4
3#2#3#5
7#7#8#6
# with offset/iteration = 5
$ mission 2 5
SayWhat#4#3#1#1
Smarty#6#5#4#2
IfYouLike#1#1#2#3
LaughingHard#8#8#6#4
AtFunny#3#2#3#5
PunchLines#7#7#8#6
# with offset/iteration = 6;
# notice we pick up field #1 = empty string so output starts with a '#'
$ mission 2 6
#SayWhat#4#3#1#1
#Smarty#6#5#4#2
#IfYouLike#1#1#2#3
#LaughingHard#8#8#6#4
#AtFunny#3#2#3#5
#PunchLines#7#7#8#6
this is probably not what you're asking but perhaps will give you an idea.
$ awk -F_ -v skip=4 -v endoff=0 '
BEGIN {OFS=FS}
{offset=(NF-endoff)%skip;
for(i=offset;i<=NF-endoff;i+=skip) printf "%s",$i (i>=(NF-endoff)?ORS:OFS)}' file
112_116_120
122_126_130
132_136_140
142_146_150
you specify the number of skips between columns and the end offset as input variables. Here, for last column end offset is set to zero and skip column is 4.
For clarity I used the input file
$ cat file
_111_112_113_114_115_116_117_118_119_120
_121_122_123_124_125_126_127_128_129_130
_131_132_133_134_135_136_137_138_139_140
_141_142_143_144_145_146_147_148_149_150
changing FS for your format should work.
In an attempt to debug Apache on a very busy server, we have used strace to log all our processes. Now, I have 1000s of individual straces in a folder and I need to find the ones that have a value of 1.0+ or greater. This is the command we used to generate the straces
mkdir /strace; ps auxw | grep httpd | awk '{print"-p " $2}' | xargs strace -o /strace/strace.log -ff -s4096 -r
This has generated files with the name strace.log.29382 (Where 29382 is the PID of the process).
Now, if I run this command:
for i in `ls /strace/*`; do echo $i; cat $i | cut -c6-12 | sort -rn | head -c 8; done
it will output the filename and top runtime value. i.e.
/strace/strace.log.19125
0.13908
/strace/strace.log.19126
0.07093
/strace/strace.log.19127
0.09312
What I am looking for is only to output those with a value of 1.0 or greater.
Sample data: https://pastebin.com/Se89Jt1i
This data does not contain any thing 1.0+ But its the first set of #s trying to filter against only.
What I do not want to have show up
0.169598 close(85) = 0
What I do want to find
1.202650 accept4(3, {sa_family=AF_INET, sin_port=htons(4557), sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, [16], SOCK_CLOEXEC) = 85
My cat sorts the values so the highest value in the file is always first.
As I am more used to use perl, a solution with perl which should be possible to translate with awk.
One-liner
perl -ane 'BEGIN{#ARGV=</strace/*>}$max=$F[0]if$F[0]>$max;if(eof){push#A,$ARGV if$max>1;$max=0};END{print"$_\n"for#A}'
No need to sort files to get the maximum value just storing it in a variable. The part which can be interresting to modify to get information:
push#A,$ARGV
can be changed to
push#A,"$ARGV:$max"
to get the value.
How it works :
-a flag: from perl -h : autosplit mode with -n or -p (splits $_ into #F) by default delimited by one ore more spaces.
BEGIN{} and END{} blocks are executed at the beginning and the end, the part which is not in thoose blocks is executed for each line as with awk.
</strace/*> is a glob matching which gives a list of files
#ARGV is a special array which contains command line argument (here list of files to process)
eof is a function which returns true when current line is the last of current file
$ARGV is current file name
push to append elements to an array
The script version with warnings which are useful to fix bugs.
#!/usr/bin/perl
use strict;
use warnings;
sub BEGIN {
use File::Glob ();
#ARGV = glob('/strace/*');
}
my (#A,#F);
my $max = 0;
while (defined($_ = readline ARGV)) {
#F = split(' ', $_, 0);
$max = $F[0] if $F[0] > $max;
if (eof) {
push #A, "${ARGV}:$max" if $max > 1;
$max = 0;
}
}
print "$_\n" foreach (#A);
I have thousands of text files that I have imported that contain a piece of text that I would like to remove.
It is not just a block of text but a pattern.
<!--
# Translator(s):
#
# username1 <email1>
# username2 <email2>
# usernameN <emailN>
#
-->
The block if it appears it will have 1 or more users being listed with their email addresses.
I have another small awk program that accomplish the task in a very few rows of code. It can be used to remove patterns of text from a file. Start as well as stop regexp can be set.
# This block is a range pattern and captures all lines between( and including )
# the start '<!--' to the end '-->' and stores the content in record $0.
# Record $0 contains every line in the range pattern.
# awk -f remove_email.awk yourfile
# The if statement is not needed to accomplish the task, but may be useful.
# It says - if the range patterns in $0 contains a '#' then it will print
# the string "Found an email..." if uncommented.
# command 'next' will discard the content of the current record and search
# for the next record.
# At the same time the awk program begins from the beginning.
/<!--/, /-->/ {
#if( $0 ~ /#/ ){
# print "Found an email and removed that!"
#}
next
}
# This line prints the body of the file to standard output - if not captured in
# the block above.
1 {
print
}
Save the code in 'remove_email.awk' and run it by:
awk -f remove_email.awk yourfile
This sed solution might work:
sed '/^<!--/,/^-->/{/^<!--/{h;d};H;/^-->/{x;/^<!--\n# Translator(s):\n#\(\n# [^<]*<email[0-9]\+>\)\+\n#\n-->$/!p};d}' file
An alternative (perhaps better solution?):
sed '/^<!--/{:a;N;/^-->/M!ba;/^<!--\n# Translator(s):\n#\(\n# \w\+ <[^>]\+>\)+\n#\n-->/d}' file
This gathers up the lines that start with <!-- and end with --> then pattern matches on the collection i.e. the second line is # Translator(s): the third line is #, the fourth and perhaps more lines follow # username <email address>, the penultimate line is # and the last line is -->. If a match is made the entire collection is deleted otherwise it is printed as normal.
for this task you need look-ahead, which is normally done with a parser.
Another solution, but not very efficient would be:
sed "s/-->/&\n/;s/<!--/\n&/" file | awk 'BEGIN {RS = "";FS = "\n"}/username/{print}'
HTH Chris
perl -i.orig -00 -pe 's/<!--\s+#\s*Translator.*?\s-->//gs' file1 file2 file3
Here is my solution, if I understood your problem correctly. Save the following to a file called remove_blocks.awk:
# See the beginning of the block, mark it
/<!--/ {
state = "block_started"
}
# At the end of the block, if the block does not contain email, print
# out the whole block.
/^-->/ {
if (!block_contains_user_email) {
for (i = 0; i < count; i++) {
print saved_line[i];
}
print
}
count = 0
block_contains_user_email = 0
state = ""
next
}
# Encounter a block: save the lines and wait until the end of the block
# to decide if we should print it out
state == "block_started" {
saved_line[count++] = $0
if (NF>=3 && $3 ~ /#/) {
block_contains_user_email = 1
}
next
}
# For everything else, print the line
1
Assume that your text file is in data.txt (or many files, for that matter):
awk -f remove_blocks.awk data.txt
The above command will print out everything in the text file, minus the blocks which contain user email.