input.txt
ABC
CDE
EFG
XYZ
ABC
PQR
EFG
From above file i want to print lines between 'ABC' and first occurrence of 'EFG'.
Expected output :
ABC
CDE
EFG
ABC
PQR
EFG
How can i print lines from one word to first occurrence of second word?
EDIT: In case you want to print all occurrences of lines coming between ABC to DEF and leave others then try following.
awk '/ABC/{found=1} found;/EFG/{found=""}' Input_file
Could you please try following.
awk '/ABC/{flag=1} flag && !count;/EFG/{count++}' Input_file
$ awk '/ABC/,/EFG/' file
Output:
ABC
CDE
EFG
ABC
PQR
EFG
This might work for you (GNU sed):
sed -n '/ABC/{:a;N;/EFG/!ba;p}' file
Turn off implicit printing by using the -n option.
Gather up lines between ABC and EFG and then print them. Repeat.
If you want to only print between the first occurrence of ABC to EFG, use:
sed -n '/ABC/{:a;N;/EFG/!ba;p;q}' file
To print the second through fourth occurrences, use:
sed -En '/ABC/{:a;N;/EFG/!ba;x;s/^/x/;/^x{2,4}$/{x;p;x};x;}' file
Related
This question already has answers here:
sed delete lines not containing specific string
(4 answers)
Closed 1 year ago.
How do I delete the first column (string) if the line has only one string on the first column?
abc def geh
ijk
123 xyz 345
mno
Expected output
abc def geh
123 xyz 345
A simple awk does the job without regex:
awk 'NF > 1' file
abc def geh
123 xyz 345
This will work for the cases when line has leading or trailing space or there are lines with just the white spaces.
A lot of option are available. One of them could be this :
grep " " myfile.txt
The output corresponding of the expected result. This command filter just the line with at least one space.
This works if first string have no space at end, if not this one works too :
awk 'NF > 1' myfile.txt
I have a file which looks like:
AA
2
3
4
CCC
111
222
333
XXX
12
23
34
I am looking for awk command to search for a string 'CCC' from above and print all the lines that occur after 'CCC' but stop reading as soon as i reach 'XXX'.
A very simple command does the read for me but does not stop at XXX.
awk '$0 == "CCC" {i=1;next};i && i++' c.out
Could you please try following.
Solution 1st: With sed.
sed -n '/CCC/,/XXX/p' Input_file
Solution 2nd: With awk.
awk '/CCC/{flag=1} flag; /XXX/{flag=""}' Input_file
Solution 3rd: In case you want to print from string CCC to XXX but not these strings then do following.
awk '/CCC/{flag=1;next} /XXX/{flag=""} flag' Input_file
"Do something between this and that" can easily be solved with a range pattern:
awk '/CCC/,/XXX/' # prints everything between CCC and XXX (inclusive)
But it's not exactly what you've asked. You wanted to print everything after CCC and quit (stop reading) on XXX. This translates to
awk '/XXX/{exit};f;/CCC/{f=1}'
cat file1.txt
abc bcd abc ...
abcd bcde cdef ...
abcd bcde cdef ...
abcd bcde cdef ...
efg fgh ...
efg fgh ...
hig ...
My expected result is like as below:
abc bcd abc ...
abcd bcde cdef ...
<!!! pay attention, above sentence has repeated 3 times !!!>
efg fgh ...
<!!! pay attention, above sentence has repeated 3 times !!!>
hig ...
I have found a way to deal with the issues, but my code is a little noisy.
cat file1.txt | uniq -c | sed -e 's/ \+/ /g' -e 's/^.//g' | awk '{print $0," ",$1}'| sed -e 's/^[2-9] /\n/g' -e 's/^[1] //g' |sed -e 's/[^1]$/\n<!!! pay attention, above sentence has repeated & times !!!> \n/g' -e 's/[1]$//g'
abc bcd abc ...
abcd bcde cdef ...
<!!! pay attention, above sentence has repeated 3 times !!!>
efg fgh ...
<!!! pay attention, above sentence has repeated 2 times !!!>
hig ...
I was wondering if you could show me more high-efficiency way to achieve the goal.Thanks a lot.
sort + uniq + sed solution:
sort file1.txt | uniq -c | sed -E 's/^ +1 (.+)/\1\n/;
s/^ +([2-9]|[0-9]{2,}) (.+)/\2\n<!!! pay attention, the above sentence has repeated \1 times !!!>\n/'
The output:
abc bcd abc ...
abcd bcde cdef ...
<!!! pay attention, the above sentence has repeated 3 times !!!>
efg fgh ...
<!!! pay attention, the above sentence has repeated 2 times !!!>
hig ...
Or with awk:
sort file1.txt | uniq -c | awk '{ n=$1; sub(/^ +[0-9]+ +/,"");
printf "%s\n%s",$0,(n==1? ORS:"<!!! pay attention, the above sentence has repeated "n" times !!!>\n\n") }'
$ awk '
$0==prev { cnt++; next }
{ prt(); prev=$0; cnt=1 }
END { prt() }
function prt() {
if (NR>1) print prev (cnt>1 ? ORS "repeated " cnt " times" : "") ORS
}
' file
abc bcd abc ...
abcd bcde cdef ...
repeated 3 times
efg fgh ...
repeated 2 times
hig ...
If you're lines are not already grouped, then you could use
awk '
NR == FNR {count[$0]++; next}
!seen[$0]++ {
print
if (count[$0] > 1)
print "... repeated", count[$0], "times"
}
' file1.txt file1.txt
This will consume a lot of memory if your file is very large. You might want to sort it first.
if the file is like this:
ram_file
abc
123
end_file
tony_file
xyz
456
end_file
bravo_file
uvw
789
end_file
now i want to access text between ram_file and end_file, tony_file & end _file and bravo_file & end_file simultaneously. I tried sed command but i don't know how to specify *_file in this
Thanks in advance
This awk should do the job for you.
This solution threat the end_file as an end of block, and all other xxxx_file as start of block.
It will not print text between the block of there are some, like in my example do not print this.
awk '/end_file/{f=0} f; /_file/ && !/end_file/ {f=1}' file
abc
123
xyz
456
uvw
789
cat file
ram_file
abc
123
end_file
do not print this
tony_file
xyz
456
end_file
nor this data
bravo_file
uvw
789
end_file
If you like some formatting, it can be done easy with awk
awk -F_ '/end_file/{printf (f?RS:"");f=0} f; /file/ && !/end_file/ {f=1;print "-Block-"++c"--> "$1}' file
-Block-1--> ram
abc
123
-Block-2--> tony
xyz
456
-Block-3--> bravo
uvw
789
I'm trying to delete matching patterns, starting from the second occurrence, using sed or awk. The input file contains the information below:
abc
def
abc
ghi
jkl
abc
xyz
abc
I want to the delete the pattern abc from the second instance. The output should be as below:
abc
def
ghi
jkl
xyz
Neat sed solution:
sed '/abc/{2,$d}' test.txt
abc
def
ghi
jkl
xyz
$ awk '$0=="abc"{c[$0]++} c[$0]<2; ' file
abc
def
ghi
jkl
xyz
Just change the "2" to "3" or whatever number you want to keep the first N occurrences instead of just the first 1.
One way using awk:
$ awk 'f&&$0==p{next}$0==p{f=1}1' p="abc" file
abc
def
ghi
jkl
xyz
Just set p to pattern that you only want the first instance of printing:
Taken from : unix.com
Using awk '!x[$0]++' will remove duplicate lines. x is a array and it's initialized to 0.the index of x is $0,if $0 is first time meet,then plus 1 to the value of x[$0],x[$0] now is 1.As ++ here is "suffix ++",0 is returned and then be added.So !x[$0] is true,the $0 is printed by default.if $0 appears more than once,! x[$0] will be false so won't print $0.