AWK epoch diff with current and previous line - linux

I have a file like this called Sample:-
206,,,206,14.9,0,2012/04/24 00:00:05
206,,,206,14.9,0,2012/04/24 00:00:21
205,,,205,14.9,0,2012/04/24 00:00:23
205,,,205,14.9,0,2012/04/24 00:00:29
207,,,207,14.9,0,2012/04/24 00:00:34
205,,,205,14.9,0,2012/04/24 00:00:40
204,,,204,14.9,0,2012/04/24 00:00:46
202,,,202,14.9,0,2012/04/24 00:00:52
201,,,201,14.9,0,2012/04/24 00:01:00
202,,,202,14.9,0,2012/04/24 00:01:04
And the following AWK command:-
awk -F, '{ gsub("/"," ",$7); gsub(":"," ",$7); t+=(mktime($7)-mktime(p)); printf ("%s,%s,%s\n",mktime($7),mktime(p),t); p=$7 }' Sample
Giving the following output:-
1335222005,-1,1335222006
1335222021,1335222005,1335222022
1335222023,1335222021,1335222024
1335222029,1335222023,1335222030
1335222034,1335222029,1335222035
1335222040,1335222034,1335222041
1335222046,1335222040,1335222047
1335222052,1335222046,1335222053
1335222060,1335222052,1335222061
1335222064,1335222060,1335222065
for each line, the 7th column is converted to an epoch date and the difference between the epoch date on the previous line is calculated and added to t.
On the first line being processed, because p is currently not a date, mktime returns -1 throwing out my figures.
What I want to do is, tell the AWK script, if line 1 is being processed then assume the difference is 6. At the moment it is subtracting -1 from 1335222005 resulting in 1335222006.
I want to say, start t at 6, then on the second line, work out the difference in epoch seconds to the previous line and increment t by that amount.

You just need to do something special for line 1.
awk -F, '
{gsub(/[\/:]/," ",$7); this_time = mktime($7)}
NR != 1 {t += this_time - prev_time; print this_time, prev_time, t}
{prev_time = this_time}
' << END
Given your input data, this prints
1335240021 1335240005 16
1335240023 1335240021 18
1335240029 1335240023 24
1335240034 1335240029 29
1335240040 1335240034 35
1335240046 1335240040 41
1335240052 1335240046 47
1335240060 1335240052 55
1335240064 1335240060 59
Alternately, a convenient way to initialize a variable is with awk'f -v option
awk -v t=6 '... same as before ...'

In awk you can initialize a variable in a BEGIN block, and exist two variables to get line number, both are useful for your case, FNR and NR:
BEGIN { t = 6 }
or
FNR == 1 { t = 6 }

Would using BEGIN (see here) help?
That will allow initialization of t variable to whatever you want. Something like
awk -F, 'BEGIN {t=6} { gsub("/"," ",$7); gsub(":"," ",$7); t+=(mktime($7)-mktime(p)); printf ("%s,%s,%s\n",mktime($7),mktime(p),t); p=$7 }' Sample

Related

Shell script that prints out everyline between the first number(as in firstline) and the second number(last line)?

The program has to be written with head and tail, but I have no clue how.
example.txt:
This is line 0
This is line 1
This is line 2
...
This is line 10
expect result:
$ ./program.sh 1 3 example.txt
This is line 1
This is line 2
This is line 3
//There also has to be an error message if the first number is greater than the second.
program.sh just like:
awk -v first="$1" -v last="$2" 'NR >= first && NR <= last' "$3"
Check from man awk:
NR: The total number of input records seen so far.
-v var=val: assign the value val to the variable var, before execution of the program begins. Such variable values are available to the BEGIN rule of an AWK program.

How to write a code for more than one file in awk

I wrote a script in AWK called exc7
./exc7 file1 file2
In every file there is a matrix
file1 :
2 6 7
10 5 4
3 8 4
file2:
-60 10
10 -60
The code that I wrote is :
#!/usr/bin/awk -f
{
for (i=1;i<=NF;i++)
A[NR,i]=$i
}
END{
for (i=1;i<=NR;i++){
sum += A[i,1]
}
for (i=1;i<=NF;i++)
sum2 += A[1,i]
for (i=0;i<=NF;i++)
sum3 += A[NR,i]
for (i=0;i<=NR;i++)
sum4 += A[i,NF]
print sum,sum2,sum3,sum4
if (sum==sum2 && sum==sum3 && sum==sum4)
print "yes"
}
It should check for every file if the sum of the first column and the last and the first line and the last is the same. It will print the four sum and say yes if they are equal.Then it should print the largest sum of all number in all the files.
when I try it on one file it is right like when I try it on file1 it prints:
15 15 15 15
yes
but when I try it on two or more files like file1 file 2 the output is :
-35 8 -50 -31
you should use FNR instead of NR and with gawk you can use ENDFILE instead of END. However, this should work with any awk
awk 'function sumline(last,rn) {n=split(last,lr);
for(i=1;i<=n;i++) rn+=lr[i];
return rn}
function printresult(c1,r1,rn,cn) {print c1,r1,rn,cn;
print (r1==rn && c1==cn && r1==c1)?"yes":"no"}
FNR==1{if(last)
{rn=sumline(last);
printresult(c1,r1,rn,cn)}
rn=cn=c1=0;
r1=sumline($0)}
{c1+=$1;cn+=$NF;last=$0}
END {rn=sumline(last);
printresult(c1,r1,rn,cn)}' file1 file2
15 15 15 15
yes
-50 -50 -50 -50
yes
essentially, instead of checking end of file, you can check start of the file and print out the previous file's results. Need to treat first file differently. You still need the END block to handle the last file.
UPDATE
Based on the questions you asked, I think it's better for you to keep your script as is and change the way you call it.
for file in file1 file2;
do echo "$file"; ./exc7 "$file";
done
you'll be calling the script once for each file, so all the complications will go away.

How to delete first 10 lines containing certain string?

Suppose i have a file with this structures
1001.txt
1002.txt
1003.txt
1004.txt
1005.txt
2001.txt
2002.txt
2003.txt
...
Now how can I delete first 10 numbers of line which start with '2'? There might be more than 10 lines start with '2'.
I know I can use grep '^2' file | wc -l to find number of lines which start with '2'. But how to delete the first 10 numbers of line?
You can pipe your list through this Perl one-liner:
perl -p -e '$_="" if (/^2/ and $i++ >= 10)'
Another in awk. Testing with value 2 as your data only had 3 lines of related data. Replace the latter 2 with a 10.
$ awk '/^2/ && ++c<=2 {next} 1' file
1001.txt
1002.txt
1003.txt
1004.txt
1005.txt
2003.txt
.
.
.
Explained:
$ awk '/^2/ && ++c<=2 { # if it starts with a 2 and counter still got iterations left
next # skip to the next record
} 1 # (else) output
' file
awk alternative:
awk '{ if (substr($0,1,1)=="2") { count++ } if ( count > 10 || substr($0,1,1)!="2") { print $0 } }' filename
If the first character of the line is 2, increment a counter. Then only print the line if count is greater than 10 or the first character isn't 2.

awk print output on same line

I am counting nucleotides in the contigs of a fasta file. My file looks like
>1
ATACCTACTA
ATTTACGTCA
GTA
>2
ATATTCGTAT
GTCTCGATCT
A
>3
etc.
My command is
awk '/^>/ {if (seqlen){print seqlen}; print ;seqlen=0; } { seqlen += length($0)}END{print seqlen}'
The output is now like
>1
23
>2
21
How to get the output on the same line, like
>1 23
>2 21
and more few changes and voila (thanks to #Ed Morton):
awk '/^>/ {if(seqlen)print k,seqlen; seqlen=0; k=$0; next;} { seqlen += length($0);}END{print k,seqlen;}' filename
This one works for me:
awk '/^>/ && NR>1 {printf " %d \n", x; }/^>/{ printf "%s", $0 }!/^>/{ x += length($0) } file
I hope it works now as expected.
try:
awk '/^>/{printf("%s ",$0);getline;printf("%s\n",length($0))}' Input_file
Checking if a line is starting from > then printing that line now using getline to jump to next line. printing the length of current line with new line, mentionint the Input_file then.
EDIT:
awk '/^>/{if(VAL){print Q OFS VAL;Q=VAL="";Q=$0;next};Q=$0;next} {VAL=VAL?VAL+length($0):length($0)} END{print Q,VAL}' Input_file
Checking if any line starting from > then checking if VAL variable is NOT NULL if not then print variable Q's and VAL's value and then nullify then Q,VAL variables and next will skip all further statements else make Q as $0 and use next to skep further statements. So creating a variable named VAL which will calculate the length of each line and add to it's own value. in END section print values of Q, VAL.

AWK interpretation awk -F'AUTO_INCREMENT=' 'NF==1{print "0";next}{sub(/ .*/,"",$2);print $2}'

I've going through some simple bash scripts at work that someone else wrote month ago and I've found this line:
| awk -F'AUTO_INCREMENT=' 'NF==1{print "0";next}{sub(/ .*/,"",$2);print $2}'
Can someone help me to interpret this line in simple words. Thank you!
awk -F'AUTO_INCREMENT=' ' # Set 'AUTO_INCREMENT=' as a field separator
NF==1 { # If number of fields is one i.e. a blank line
print "0"; # print '0'
next # Go to next record i.e. skip following code
}
{
sub(/ .*/,"",$2); # Delete anything after a space in the second field
print $2 # Print the second field
}'
Example
Sample inputs
AUTO_INCREMENT=3
AUTO_INCREMENT=10 20 30 foo bar
Output
3
0
10

Resources