I've below task to achieve. I completed it using awk/sed/tail/grep together but I believe it's doable using only awk - therefore I'm asking for your kind help:
What will be awk syntax for -
Get last line from file A (csv format)
LAST=$(tail -n1 A)
Check if line from file A exist in file B (csv as well), if yes...
NO=$(grep -nw "$LAST" B|awk -F: {print $1})'
Check if there are newer lines in file B, if yes...
BELOW=$(expr $NO + 1)
if awk "NR==$BELOW" B; then
Delete everything in file B from $NO to the 2nd row
sed -i "2,$NO d" B; fi
BIG THANKS for any help - appreciated!
Something like this might work:
awk 'FNR == NR { last = $0; next }
!newer && $0 == last { newer = 1; next }
newer || FNR == 1' A B
Basically FNR == NR { last = $0; next } sets last to to each line as long as we are in the first file, so in the end it will be the first file's last line.
In the second file if we are not in the newer lines and the line equals last, the next line is newer than what was in the first file: !newer && $0 == last { newer = 1; next }.
And when we are either in the first line or the newer lines of the second file, it is printed: newer || FNR == 1.
This differs from the original in that it prints out the newer lines of B, instead of modifying B in place. Of course you can redirect the output to a temporary file in the shell and then move it over B if it contains more than one line. Or have Awk return an exit status and use that, e.g.,:
tmpf=`mktemp B'.XXXXXX'`
awk 'FNR == NR { last = $0; next }
!newer && $0 == last { newer = NR; next }
newer || FNR == 1 { print }
END { exit (!newer || NR == newer) }' \
A B >"$tmpf" && mv "$tmpf" B || rm -f "$tmpf"
Admittedly not entirely in Awk anymore, but I'd say close enough and better in practice.
Related
I have the following file:
1,A
2,B
3,C
10000,D
20,E
4000,F
I want to select the lines having a count greater than 10 and less than 5000. the output should be E and F. In C++ or any other language is a piece of cake. I really wanted to know how can I do it with a linux command.
I tried the following command
awk -F ',' '{$1 >= 10 && $1 < 5000} { count++ } END { print $1,$2}' test.txt
But it is only givine 4000,F.
just do:
awk -F',' '$1 >= 10 && $1 < 5000' test.txt
you put boolean check in {....}, and don't use the result at all. it doesn't make any sense. You should do either {if(...) ...} or booleanExpression{do...}
useless count++
you have only print statement in END so only last line was printed out.
Your script does actually:
print the last line of the test.txt, no matter what it is.
I am trying to write an awk script and before anything is done tell the user how many lines are in the file. I know how to do this in the END section but unable to do so in the BEGIN section. I have searched SE and Google but have only found a half dozen ways to do this in the END section or as part of a bash script, not how to do it before any processing has taken place at all. I was hoping for something like the following:
#!/usr/bin/awk -f
BEGIN{
print "There are a total of " **TOTAL LINES** " lines in this file.\n"
}
{
if($0==4587){print "Found record on line number "NR; exit 0;}
}
But have been unable to determine how to do this, if it is even possible. Thanks.
You can read the file twice:
awk 'NR!=1 && FNR==1 {print NR-1} <some more code here>' file{,}
In your example:
awk 'NR!=1 && FNR==1 {print "There are a total of "NR-1" lines in this file.\n"} $0==4587 {print "Found record on line number "NR; exit 0;}' file{,}
You can use file file instead of file{,} (it just makes it show up twice)
NR!=1 && FNR==1 this will be true only at first line of second file.
To use an awk script containing:
#!/usr/bin/awk -f
NR!=1 && FNR==1 {
print "There are a total of "NR-1" lines in this file.\n"
}
$0==4587 {
print "Found record on line number "NR; exit 0
}
call:
awk -f myscript file{,}
To do this robustly and for multiple files you need something like:
$ cat tst.awk
BEGINFILE {
numLines = 0
while ( (getline line < FILENAME) > 0 ) {
numLines++
}
print "----\nThere are a total of", numLines, "lines in", FILENAME
}
$0==4587 { print "Found record on line number", FNR, "of", FILENAME; nextfile }
$
$ cat file1
a
4587
c
$
$ cat file2
$
$ cat file3
d
e
f
4587
$
$ awk -f tst.awk file1 file2 file3
----
There are a total of 3 lines in file1
Found record on line number 2 of file1
----
There are a total of 0 lines in file2
----
There are a total of 4 lines in file3
Found record on line number 4 of file3
The above uses GNU awk for BEGINFILE. Any other solution is difficult to implement such that it will handle empty files (you need an array to track files being parsed and print info the the FNR==1 and END sections after the empty file has been skipped).
Using getline has caveats and should not be used lightly, see http://awk.info/?tip/getline, but this is one of the appropriate and robust uses of it. You can also test for non-readable files in BEGINFILE by testing ERRNO and skipping the file (see the gawk manual) - that situation will cause other scripts to abort.
BEGIN {
s="cat your_file.txt|wc -l";
s | getline file_size;
close(s);
print file_size
}
This will put the size of the file named your_file.txt into the awk variable file_size and print it out.
If your file name is dynamic you can pass the filename on the commandline and change the script to use the variable.
E.g. my.awk
BEGIN {
s="cat "VAR"|wc -l";
s | getline file_size;
close(s);
print file_size
}
Then you can call it like this:
awk -v VAR="your_file.txt" -f my.awk
If you use GNU awk and need a robust, generic solution that accommodates multiple, possibly empty input files, use Ed Morton's solution.
This answer uses portable (POSIX-compliant) code. Within the constraints noted, it is robust, but Ed's GNU awk solution is both simpler and more robust.
Tip of the hat to Ed Morton for his help.
With a single input file, it is simpler to handle line counting with a shell command in the BEGIN block, which has the following advantages:
on invocation, the filename doesn't have to be specified twice, unlike in the accepted answer
Also note that the accepted answer doesn't work as intended (as of this writing); the correct form is (see the comments on the answer for an explanation):
awk 'NR==FNR {next} FNR==1 {print NR-1} $0==4587 {print "Found record on line number "NR; exit 0}' file{,}
the solution also works with an empty input file.
In terms of performance, this approach is either only slightly slower than reading the file twice in awk, or even a little faster, depending on the awk implementation used:
awk '
BEGIN {
# Execute a shell command to count the lines and read
# result into an awk variable via <cmd> | getline <varname>.
# If the file cannot be read, abort. (The shell has already printed an error msg.)
cmd="wc -l < \"" ARGV[1] "\""; if ((cmd | getline count) < 1) exit 1; close(cmd)
printf "There are a total of %s lines in this file.\n\n", count
}
$0==4587 { print "Found record on line number " NR; exit 0 }
' file
Assumptions:
The filename is passed as the 1st operand (non-option argument) on the command line, accessed as ARGV[1].
The filename doesn't contain embedded " chars.
The following solutions deal with multiple files and make analogous assumptions:
All operands passed are filenames. That is, all arguments after the program must be filenames, and not variable assignments such as var=value.
No filename contains embedded " chars.
No processing is to take place if any of the input files do not exist or cannot be read.
It's not hard to generalize this to handling multiple files, but the following solution doesn't print the line count for empty files:
awk '
BEGIN {
# Loop over all input files and store their line counts in an array.
for (i=1; i<ARGC; ++i) {
cmd="wc -l < \"" ARGV[i] "\""; if ((cmd | getline count) < 1) exit 1; close(cmd)
counts[ARGV[i]] = count
}
}
# At the beginning of every (non-empty) file, print the line count.
FNR==1 { printf "There are a total of %s lines in file %s.\n\n", counts[FILENAME], FILENAME }
# $0==4587 { print "%s: Found record on line number %d\n", FILENAME, NR; exit 0 }
' file1 file2 # ...
Things get a little trickier if you want the line count to be printed for empty files also:
awk '
BEGIN {
# Loop over all input files and store their line counts in an array.
for (i=1; i<ARGC; ++i) {
cmd="wc -l < \"" ARGV[i] "\""; if ((cmd | getline count) < 1) exit 1; close(cmd)
counts[ARGV[i]] = count
}
fileCount = ARGC - 1
fmtStringCount = "There are a total of %s lines in file %s.\n\n"
}
# At the beginning of every (non-empty) file, print the line count.
FNR==1 {
++fileIndex
# If there were intervening empty files, print their counts too.
while (ARGV[fileIndex] != FILENAME) {
printf fmtStringCount, 0, ARGV[fileIndex++]
}
printf fmtStringCount, counts[FILENAME], FILENAME
}
# Process input lines
$0==4587 { print "%s: Found record on line number %d\n", FILENAME, NR; exit 0 }
# If there are any remaining empty files a the end, print their counts too.
END {
while (fileIndex < fileCount) { printf fmtStringCount, 0, ARGV[++fileIndex] }
}
' file1 file2 # ...
You can get the number of lines by wc and cut, and set to awk variable with -v option, then you can use the variable in awk script.
cat awk.txt \
| awk -v FNC=`wc -l awk.txt | cut -wf 2` \
'BEGIN { print "FNC: " FNC } { print $0 }'
I have a column
1
1
1
2
2
2
I would like to insert a blank line when the value in the column changes:
1
1
1
<- blank line
2
2
2
I would recommend using awk:
awk -v i=1 'NR>1 && $i!=p { print "" }{ p=$i } 1' file
On any line after the first, if value of the "i"th column is different to the previous value, print a blank line. Always set the value of p. The 1 at the end evaluates to true, which means that awk prints the line. i can be set to the column number of your choice.
while read L; do [[ "$L" != "$PL" && "$PL" != "" ]] && echo; echo "$L"; PL="$L"; done < file
awk(1) seems like the obvious answer to this problem:
#!/usr/bin/awk -f
BEGIN { prev = "" }
/./ {
if (prev != "" && prev != $1) print ""
print
prev = $1
}
You can also do this with SED:
sed '{N;s/^\(.*\)\n\1$/\1\n\1/;tx;P;s/^.*\n/\n/;P;D;:x;P;D}'
The long version with explanations is:
sed '{
N # read second line; (terminate if there are no more lines)
s/^\(.*\)\n\1$/\1\n\1/ # try to replace two identical lines with themselves
tx # if replacement succeeded then goto label x
P # print the first line
s/^.*\n/\n/ # replace first line by empty line
P # print this empty line
D # delete empty line and proceed with input
:x # label x
P # print first line
D # delete first line and proceed with input
}'
One thing I like about using (GNU) SED (what which is not clear if it is useful to you from your question) is that you can easily apply changes in-place with the -i switch, e.g.
sed -i '{N;s/^\(.*\)\n\1$/\1\n\1/;tx;P;s/^.*\n/\n/;P;D;:x;P;D}' FILE
You could use getline function in Awk to match the current line against the following line:
awk '{f=$1; print; getline}f != $1{print ""}1' file
Hi i am looking for an awk that can find two patterns and print the data between them to
a file only if in the middle there is a third patterns in the middle.
for example:
Start
1
2
middle
3
End
Start
1
2
End
And the output will be:
Start
1
2
middle
3
End
I found in the web awk '/patterns1/, /patterns2/' path > text.txt
but i need only output with the third patterns in the middle.
And here is a solution without flags:
$ awk 'BEGIN{RS="End"}/middle/{printf "%s", $0; print RT}' file
Start
1
2
middle
3
End
Explanation: The RS variable is the record separator, so we set it to "End", so that each Record is separated by "End".
Then we filter the Records that contain "middle", with the /middle/ filter, and for the matched records we print the current record with $0 and the separator with print RT
This awk should work:
awk '$1=="Start"{ok++} ok>0{a[b++]=$0} $1=="middle"{ok++} $1=="End"{if(ok>1) for(i=0; i<length(a); i++) print a[i]; ok=0;b=0;delete a}' file
Start
1
2
middle
3
End
Expanded:
awk '$1 == "Start" {
ok++
}
ok > 0 {
a[b++] = $0
}
$1 == "middle" {
ok++
}
$1 == "End" {
if (ok > 1)
for (i=0; i<length(a); i++)
print a[i];
ok=0;
b=0;
delete a
}' file
Just use some flags with awk:
/Start/ {
start_flag=1
}
/middle/ {
mid_flag=1
}
start_flag {
n=NR;
lines[NR]=$0
}
/End/ {
if (start_flag && mid_flag)
for(i=n;i<NR;i++)
print lines[i]
start_flag=mid_flag=0
delete lines
}
Modified the awk user000001
awk '/middle/{printf "%s%s\n",$0,RT}' RS="End" file
EDIT:
Added test for Start tag
awk '/Start/ && /middle/{printf "%s%s\n",$0,RT}' RS="End" file
This will work with any modern awk:
awk '/Start/{f=1;rec=""} f{rec=rec $0 ORS} /End/{if (rec~/middle/) printf "%s",rec}' file
The solutions that set RS to "End" are gawk-specific, which may be fine but it's definitely worth mentioning.
I have a huge textfile, approx 400.000 lines 80 charachters wide on liux.
Need to "unfold" the file, merging four lines into one
ending up having 1/4 of the lines, each line 80*4 charachters long.
any suggestions?
perl -pe 'chomp if (++$i % 4);'
An easier way to do it with awk would be:
awk '{ printf $0 } (NR % 4 == 0) { print }' filename
Although if you wanted to protect against ending up without a trailing newline it gets a little more complicated:
awk '{ printf $0 } (NR % 4 == 0) { print } END { if (NR % 4 != 0) print }' filename
I hope I understood your question correctly. You have an input line like this (except your lines are longer):
abcdef
ghijkl
mnopqr
stuvwx
yz0123
456789
ABCDEF
You want output like this:
abcdefghijklmnopqrstuvwx
yz0123456789ABCDEF
The following awk program should do it:
{ line = line $0 }
(NR % 4) == 0 { print line; line = "" }
END { if (line != "") print line }
Run it like this:
awk -f merge.awk data.txt