How to delete the matching pattern from given occurrence - linux

I'm trying to delete matching patterns, starting from the second occurrence, using sed or awk. The input file contains the information below:
abc
def
abc
ghi
jkl
abc
xyz
abc
I want to the delete the pattern abc from the second instance. The output should be as below:
abc
def
ghi
jkl
xyz

Neat sed solution:
sed '/abc/{2,$d}' test.txt
abc
def
ghi
jkl
xyz

$ awk '$0=="abc"{c[$0]++} c[$0]<2; ' file
abc
def
ghi
jkl
xyz
Just change the "2" to "3" or whatever number you want to keep the first N occurrences instead of just the first 1.

One way using awk:
$ awk 'f&&$0==p{next}$0==p{f=1}1' p="abc" file
abc
def
ghi
jkl
xyz
Just set p to pattern that you only want the first instance of printing:

Taken from : unix.com
Using awk '!x[$0]++' will remove duplicate lines. x is a array and it's initialized to 0.the index of x is $0,if $0 is first time meet,then plus 1 to the value of x[$0],x[$0] now is 1.As ++ here is "suffix ++",0 is returned and then be added.So !x[$0] is true,the $0 is printed by default.if $0 appears more than once,! x[$0] will be false so won't print $0.

Related

Shell - Delete line if the line has only one column? [duplicate]

This question already has answers here:
sed delete lines not containing specific string
(4 answers)
Closed 1 year ago.
How do I delete the first column (string) if the line has only one string on the first column?
abc def geh
ijk
123 xyz 345
mno
Expected output
abc def geh
123 xyz 345
A simple awk does the job without regex:
awk 'NF > 1' file
abc def geh
123 xyz 345
This will work for the cases when line has leading or trailing space or there are lines with just the white spaces.
A lot of option are available. One of them could be this :
grep " " myfile.txt
The output corresponding of the expected result. This command filter just the line with at least one space.
This works if first string have no space at end, if not this one works too :
awk 'NF > 1' myfile.txt

How to Print All line between matching first occurrence of word?

input.txt
ABC
CDE
EFG
XYZ
ABC
PQR
EFG
From above file i want to print lines between 'ABC' and first occurrence of 'EFG'.
Expected output :
ABC
CDE
EFG
ABC
PQR
EFG
How can i print lines from one word to first occurrence of second word?
EDIT: In case you want to print all occurrences of lines coming between ABC to DEF and leave others then try following.
awk '/ABC/{found=1} found;/EFG/{found=""}' Input_file
Could you please try following.
awk '/ABC/{flag=1} flag && !count;/EFG/{count++}' Input_file
$ awk '/ABC/,/EFG/' file
Output:
ABC
CDE
EFG
ABC
PQR
EFG
This might work for you (GNU sed):
sed -n '/ABC/{:a;N;/EFG/!ba;p}' file
Turn off implicit printing by using the -n option.
Gather up lines between ABC and EFG and then print them. Repeat.
If you want to only print between the first occurrence of ABC to EFG, use:
sed -n '/ABC/{:a;N;/EFG/!ba;p;q}' file
To print the second through fourth occurrences, use:
sed -En '/ABC/{:a;N;/EFG/!ba;x;s/^/x/;/^x{2,4}$/{x;p;x};x;}' file

Compare one field, Remove duplicate if value of another field is greater

Trying to do this at linux command line. Wanting to combine two files, compare values based on ID, but only keeping the ID that has the newer/greater value for Date (edit: equal to or greater than). Because the ID 456604 is in both files, wanting to only keep the one from File 2 with the newer date: "20111015 456604 tgf"
File 1
Date ID Note
20101009 456604 abc
20101009 444444 abc
20101009 555555 abc
20101009 666666 xyz
File 2
Date ID Note
20111015 111111 abc
20111015 222222 abc
20111015 333333 xyz
20111015 456604 tgf
And then the output to have both files combined, but only keeping the second ID value, with the newer date. The order of the rows are in does not matter, just example of the output for concept.
Output
Date ID Note
20101009 444444 abc
20101009 555555 abc
20101009 666666 xyz
20111015 111111 abc
20111015 222222 abc
20111015 333333 xyz
20111015 456604 tgf
$ cat file1.txt file2.txt | sort -ru | awk '!($2 in seen) { print; seen[$2] }'
Date ID Note
20111015 456604 tgf
20111015 333333 xyz
20111015 222222 abc
20111015 111111 abc
20101009 666666 xyz
20101009 555555 abc
20101009 444444 abc
Sort the combined files by descending date and only print a line the first time you see an ID.
EDIT
More compact edition, thanks to Steve:
cat file1.txt file2.txt | sort -ru | awk '!seen[$2]++'
You didn't specify how you'd like to handle the case were the dates are also duplicated, or even if this case could exist. Therefore, I have assumed that by 'greater', you really mean 'greater or equal to' (it also makes handling the header a tiny bit easier). If that's not the case, please edit your question.
awk code:
awk 'FNR==NR {
a[$2]=$1
b[$2]=$0
next
}
a[$2] >= $1 {
print b[$2]
delete b[$2]
next
}
1
END {
for (i in b) {
print b[i]
}
}' file2 file1
Explanation:
Basically, we use an associative array, called a, to store the 'ID' and 'Date' as key and value, respectively. We also store the contents of file2 in memory using another associative array called b. When file1 is read, we test if column two exists in our array, a, and that the key's value is greater or equal to column one. If it is, we print the corresponding line from array b, then delete it from the array, and next onto the next line/record of input. The 1 on it's lonesome will return true, thereby enabling printing where the previous (two) conditions are not met. This has the effect of printing any unmatched records from file1. Finally, we print what's left in array b.
Results:
Date ID Note
20111015 456604 tgf
20101009 444444 abc
20101009 555555 abc
20101009 666666 xyz
20111015 222222 abc
20111015 111111 abc
20111015 333333 xyz
Another awk way
awk 'NR==1;FNR>1{a[$2]=(a[$2]<$1&&b[$2]=$3)?$1:a[$2]}
END{for(i in a)print a[i],i,b[i]}' file file2
Compares value in an array to previously stored value to determine which is higher, also stores the third field if current record is higher.
Then prints out the stored date,key(field 2) and the value stored for field 3.
Or shorter
awk 'NR==1;FNR>1{(a[$2]<$1&&b[$2]=$0)&&a[$2]=$1}END{for(i in b)print b[i]}' file file2

accessing text between specific words in UNIX multiple times

if the file is like this:
ram_file
abc
123
end_file
tony_file
xyz
456
end_file
bravo_file
uvw
789
end_file
now i want to access text between ram_file and end_file, tony_file & end _file and bravo_file & end_file simultaneously. I tried sed command but i don't know how to specify *_file in this
Thanks in advance
This awk should do the job for you.
This solution threat the end_file as an end of block, and all other xxxx_file as start of block.
It will not print text between the block of there are some, like in my example do not print this.
awk '/end_file/{f=0} f; /_file/ && !/end_file/ {f=1}' file
abc
123
xyz
456
uvw
789
cat file
ram_file
abc
123
end_file
do not print this
tony_file
xyz
456
end_file
nor this data
bravo_file
uvw
789
end_file
If you like some formatting, it can be done easy with awk
awk -F_ '/end_file/{printf (f?RS:"");f=0} f; /file/ && !/end_file/ {f=1;print "-Block-"++c"--> "$1}' file
-Block-1--> ram
abc
123
-Block-2--> tony
xyz
456
-Block-3--> bravo
uvw
789

Multiline trimming

I have a html file that I want to trim. I want to remove a section from the beginning all the way to a given string, and from another string to the end. How do I do that, preferably using sed?
With GNU sed:
sed '/mark1/,/mark2/d;/mark3/,$d'
this
abc
def
mark1
ghi
jkl
mno
mark2
pqr
stu
mark3
vwx
yz
becomes
abc
def
pqr
stu
you can use awk
$ cat file
mark1 dsf
abc
def
before mark2 after
blah mark1
ghi
jkl
mno
wirds mark2 here
pqr
stu
mark3
vwx
yz
$ awk -vRS="mark2" '/mark1/{gsub("mark1.*","")}/mark3/{ gsub("mark3.*","");print;f=1 } !f ' file
after
blah
here
pqr
stu

Resources