print specific line if it is matches with the line after it - linux

I have a log file containing the following info:
<msisdn>37495989804</msisdn>
<address>10.14.14.26</address>
<msisdn>37495371855</msisdn>
<address>10.14.0.172</address>
<msisdn>37495989832</msisdn>
<address>10.14.14.29</address>
<msisdn>37495479810</msisdn>
<address>10.14.1.11</address>
<msisdn>37495429157</msisdn>
<address>10.14.0.213</address>
<msisdn>37495275824</msisdn>
<msisdn>37495739176</msisdn>
<address>10.14.2.86</address>
<msisdn>37495479840</msisdn>
<address>10.14.1.12</address>
<msisdn>37495706059</msisdn>
<msisdn>37495619889</msisdn>
<address>10.14.1.198</address>
<msisdn>37495574341</msisdn>
<address>10.14.1.148</address>
<msisdn>37495391624</msisdn>
<address>10.14.0.188</address>
<msisdn>37495989796</msisdn>
<address>10.14.14.24</address>
<msisdn>37495835940</msisdn>
<address>10.14.2.164</address>
<msisdn>37495743249</msisdn>
<address>10.14.2.94</address>
<msisdn>37495674117</msisdn>
<address>10.14.1.236</address>
<msisdn>37495754536</msisdn>
<address>10.14.2.120</address>
<msisdn>37495576434</msisdn>
<msisdn>37495823889</msisdn>
<address>10.14.2.159</address>
There are some lines where the 'msisdn' line is not followed by an 'address' line, like this:
<msisdn>37495576434</msisdn>
<msisdn>37495823889</msisdn>
I would like to write a script which will output only the lines ('msisdn' lines), that aren't followed by 'address'. Expected output:
<msisdn>37495275824</msisdn>
<msisdn>37495706059</msisdn>
<msisdn>37495576434</msisdn>
If it will be smth with awk/sed, it will be perfect.
Thanks.

One way with awk:
awk '/address/{p=0}p{print a;p=0}/msisdn/{a=$0;p=1}' log

you can use pcregrep to match next line is not adddress and use awk show it
pcregrep -M '(.*</msisdn>)\n.*<msi' | awk 'NR % 2 == 1'

This might work for you (GNU sed):
sed -r '$!N;/(<msisdn>).*\n.*\1/P;D' file
This reads 2 lines into the pattern space and trys to match the pattern <msisdn> in both the 2 lines. If the pattern matchs it prints out the first line. The first line is then deleted and the process begins again, however since the pattern space contains the second line (now the first), the automatic reading of a line is forgone and process begins as of $!N.

Perl has its own way to do this:
perl -lne 'if($prev && $_!~/\./){print $prev}unless(/\./){$prev=$_}else{undef $prev}' your_file
Tested Below:
> perl -lne 'if($prev && $_!~/\./){print $prev}unless(/\./){$prev=$_}else{undef $prev}' temp
<msisdn>37495275824</msisdn>
<msisdn>37495706059</msisdn>
<msisdn>37495576434</msisdn>
>

Related

Replace a full line every Nth line in a text file

I'm trying to perform something that should be simple but I'm not quite understanding how to do it. I have a file that looks like this:
#A01182:104:HKNG5DSX3:3:1101:3947:1031 1:N:0:CATTGCCT+NATCTCAG
CNTCATAGCTGGTTGCACAGTTAACGTCGTTCAGGCCACGTTCCAGACCGTAGTTTGCCAGCGTCAGATCATAAACGGTGGTCACCAGGGCGGTGCTGCCA
+
F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFF
#A01182:104:HKNG5DSX3:3:1101:7997:1031 1:N:0:CATTGCCT+NATCTCAG
GNCGATCCCTTCGCTGCTGCTGGCAATTATCGTTGTAGCGTTTGCCGGACCGAGTTTGTCTCACGCCATGTTTGCTGTCTGGCTGGCGCTGCTGCCGCGTA
+
F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
#A01182:104:HKNG5DSX3:3:1101:5547:1047 1:N:0:CATTGCCT+NATCTCAG
GGTGATGATTGTCTTTGGCGCAACGTTAATGAAAGATGCGCCGAAGCAGGAAGTGAAAACCAGCAATGGTGTGGTGGAGAAGGACTACACCCTGGCAGAGT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF
#A01182:104:HKNG5DSX3:3:1101:20726:1063 1:N:0:CATTGCCT+GATCTCAG
GGGACGCCCATTACGCTGGTGAATCTGGCAACCCATACCAGCGCCCTGCCCCGTGAACAGCCCGGTGGCGCGGCACATCGTCCGGTATTTGTCTGGCCAAC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF
and goes on for a lot of lines (The actual file is 2.5 Gb). What I want to do is to replace every fourth line (all those that have a lot of F's) for another string, the same for all.
I have tried with sed but I don't seem to be able to get the script right since I produce and output without changes.
Any help would be really appreciated!
This might work for you (GNU sed):
sed -i '4~4s/.*/another string/' file(s)
Starting at the 4th line and every 4 lines thereafter, replace the whole line with another string.
I'd use awk for this
awk '
NR % 4 == 0 {print "new string"; next}
{print}
' file > file.new && mv file.new file

remove character on the last line that specific word appears

we have the following file example
we want to remove the , character on the last line that topic word exists
more file
{"topic":"life_is_hard","partition":84,"replicas":[1006,1003]},
{"topic":"life_is_hard","partition":85,"replicas":[1001,1004]},
{"topic":"life_is_hard","partition":86,"replicas":[1002,1005]},
{"topic":"life_is_hard","partition":87,"replicas":[1003,1006]},
{"topic":"life_is_hard","partition":88,"replicas":[1004,1001]},
{"topic":"life_is_hard","partition":89,"replicas":[1005,1002]},
{"topic":"life_is_hard","partition":90,"replicas":[1006,1004]},
{"topic":"life_is_hard","partition":91,"replicas":[1001,1005]},
{"topic":"life_is_hard","partition":92,"replicas":[1002,1006]},
{"topic":"life_is_hard","partition":93,"replicas":[1003,1001]},
{"topic":"life_is_hard","partition":94,"replicas":[1004,1002]},
{"topic":"life_is_hard","partition":95,"replicas":[1005,1003]},
{"topic":"life_is_hard","partition":96,"replicas":[1006,1005]},
{"topic":"life_is_hard","partition":97,"replicas":[1001,1006]},
{"topic":"life_is_hard","partition":98,"replicas":[1002,1001]},
{"topic":"life_is_hard","partition":99,"replicas":[1003,1002]},
expected output
{"topic":"life_is_hard","partition":84,"replicas":[1006,1003]},
{"topic":"life_is_hard","partition":85,"replicas":[1001,1004]},
{"topic":"life_is_hard","partition":86,"replicas":[1002,1005]},
{"topic":"life_is_hard","partition":87,"replicas":[1003,1006]},
{"topic":"life_is_hard","partition":88,"replicas":[1004,1001]},
{"topic":"life_is_hard","partition":89,"replicas":[1005,1002]},
{"topic":"life_is_hard","partition":90,"replicas":[1006,1004]},
{"topic":"life_is_hard","partition":91,"replicas":[1001,1005]},
{"topic":"life_is_hard","partition":92,"replicas":[1002,1006]},
{"topic":"life_is_hard","partition":93,"replicas":[1003,1001]},
{"topic":"life_is_hard","partition":94,"replicas":[1004,1002]},
{"topic":"life_is_hard","partition":95,"replicas":[1005,1003]},
{"topic":"life_is_hard","partition":96,"replicas":[1006,1005]},
{"topic":"life_is_hard","partition":97,"replicas":[1001,1006]},
{"topic":"life_is_hard","partition":98,"replicas":[1002,1001]},
{"topic":"life_is_hard","partition":99,"replicas":[1003,1002]}
we try to removed the character , from the the last line that contain topic word as the following sed cli but this syntax not renewed the ,
sed -i '${s/,[[:blank:]]*$//}' file
sed (GNU sed) 4.2.2
In case you have control M characters in your Input_file then remove them by doing:
tr -d '\r' < Input_file > temp && mv temp Input_file
Could you please try following once. From your question what I understood is you want to remove comma from very last line which has string topic in it, if this is the case then I am coming up with tac + awk solution here.
tac Input_file |
awk '/topic/ && ++count==1{sub(/,$/,"")} 1' |
tac
Once you are happy with above results then append > temp && mv temp Input_file to above command too, to save output into Input_file itself.
Explanation:
Atac will read Input_file from bottom line to first line then passing it's output to awk where I am checking if first occurrence of topic is coming remove comma from last and rest of lines simply print then passing this output to tac again to make Input_file in original form again.
You should use the address $ (last line):
sed '$s/,$//' file
Using awk:
$ awk '{if(NR>1)print p;p=$0}END{sub(/,$/,"",p);print p}' file
Output:
...
{"topic":"life_is_hard","partition":98,"replicas":[1002,1001]},
{"topic":"life_is_hard","partition":99,"replicas":[1003,1002]}

Replace text between two strings in file using linux bash

i have file "acl.txt"
192.168.0.1
192.168.4.5
#start_exceptions
192.168.3.34
192.168.6.78
#end_exceptions
192.168.5.55
and another file "exceptions"
192.168.88.88
192.168.76.6
I need to replace everything between #start_exceptions and #end_exceptions with content of exceptions file. I have tried many solutions from this forum but none of them works.
EDITED:
Ok, if you want to retain the #start and #stop, I will revert to awk:
awk '
BEGIN {p=1}
/^#start/ {print;system("cat exceptions");p=0}
/^#end/ {p=1}
p' acl.txt
Thanks to #fedorqui for tweaks in comments below.
Output:
192.168.0.1
192.168.4.5
#start_exceptions
192.168.88.88
192.168.76.6
#end_exceptions
192.168.5.55
p is a flag that says whether or not to print lines. It starts at the beginning as 1, so all lines are printed till I find a line starting with #start. Then I cat the contents of the exceptions file and stop printing lines till I find a line starting with #end, at which point I set the p flag back to 1 so remaining lines get printed.
If you want output to a file, add "> newfile" to the very end of the command like this:
awk '
BEGIN {p=1}
/^#start/ {print;system("cat exceptions");p=0}
/^#end/ {p=1}
p' acl.txt > newfile
YET ANOTHER VERSION IF YOU REALLY WANT TO USE SED
If you really, really want to do it with sed, you can use nested address spaces, firstly to select the lines between #start_exceptions and #end_exceptions, then again to select the first line within that and also lines other than the #end_exceptions line:
sed '
/^#start/,/^#end/{
/^#start/{
n
r exceptions
}
/^#end/!d
}
' acl.txt
Output:
192.168.0.1
192.168.4.5
#start_exceptions
192.168.88.88
192.168.76.6
#end_exceptions
192.168.5.55
ORIGINAL ANSWER
I think this will work:
sed -e '/^#end/r exceptions' -e '/^#start/,/^#end/d' acl.txt
When it finds /^#end/ it reads in the exceptions file. And it also deletes everything between /#start/ and /#end/.
I have left the matching slightly "loose" for clarity of expressing the technique.
You can use the following, based on Replace string with contents of a file using sed:
$ sed $'/end/ {r exceptions\n} ; /start/,/end/ {d}' acl.txt
192.168.0.1
192.168.4.5
192.168.88.88
192.168.76.6
192.168.5.55
Explanation
sed $'one_thing; another_thing' ac1.txt performs the two actions.
/end/ {r exceptions\n} if the line contains end, then read the file exceptions and append it.
/start/,/end/ {d} from a line containing start to a line containing end, delete all the lines.
I had problem with Mark Setchell's solution in MINGW. The caret was not picking up the beginning of line. Indeed, is the detection of the separator dependent on it being at the beginning of the line?
I came up with this awk alternative...
$ awk -v data="$(<exceptions)" '
BEGIN {p=1}
/#start_exceptions/ {print; print data;p=0}
/#end_exceptions/ {p=1}
p
' acl.txt

sed to insert on first match only

UPDATED:
Using sed, how can I insert (NOT SUBSTITUTE) a new line on only the first match of keyword for each file.
Currently I have the following but this inserts for every line containing Matched Keyword and I want it to only insert the New Inserted Line for only the first match found in the file:
sed -ie '/Matched Keyword/ i\New Inserted Line' *.*
For example:
Myfile.txt:
Line 1
Line 2
Line 3
This line contains the Matched Keyword and other stuff
Line 4
This line contains the Matched Keyword and other stuff
Line 6
changed to:
Line 1
Line 2
Line 3
New Inserted Line
This line contains the Matched Keyword and other stuff
Line 4
This line contains the Matched Keyword and other stuff
Line 6
You can sort of do this in GNU sed:
sed '0,/Matched Keyword/s//New Inserted Line\n&/'
But it's not portable. Since portability is good, here it is in awk:
awk '/Matched Keyword/ && !x {print "Text line to insert"; x=1} 1' inputFile
Or, if you want to pass a variable to print:
awk -v "var=$var" '/Matched Keyword/ && !x {print var; x=1} 1' inputFile
These both insert the text line before the first occurrence of the keyword, on a line by itself, per your example.
Remember that with both sed and awk, the matched keyword is a regular expression, not just a keyword.
UPDATE:
Since this question is also tagged bash, here's a simple solution that is pure bash and doesn't required sed:
#!/bin/bash
n=0
while read line; do
if [[ "$line" =~ 'Matched Keyword' && $n = 0 ]]; then
echo "New Inserted Line"
n=1
fi
echo "$line"
done
As it stands, this as a pipe. You can easily wrap it in something that acts on files instead.
If you want one with sed*:
sed '0,/Matched Keyword/s//Matched Keyword\nNew Inserted Line/' myfile.txt
*only works with GNU sed
This might work for you:
sed -i -e '/Matched Keyword/{i\New Inserted Line' -e ':a;n;ba}' file
You're nearly there! Just create a loop to read from the Matched Keyword to the end of the file.
After inserting a line, the remainder of the file can be printed out by:
Introducing a loop place holder :a (here a is an arbitrary name).
Print the current line and fetch the next into the pattern space with the ncommand.
Redirect control back using the ba command which is essentially a goto to the a place holder. The end-of-file condition is naturally taken care of by the n command which terminates any further sed commands if it tries to read passed the end-of-file.
With a little help from bash, a true one liner can be achieved:
sed $'/Matched Keyword/{iNew Inserted Line\n:a;n;ba}' file
Alternative:
sed 'x;/./{x;b};x;/Matched Keyword/h;//iNew Inserted Line' file
This uses the Matched Keyword as a flag in the hold space and once it has been set any processing is curtailed by bailing out immediately.
If you want to append a line after first match only, use AWK instead of SED as below
awk '{print} /Matched Keyword/ && !n {print "New Inserted Line"; n++}' myfile.txt
Output:
Line 1
Line 2
Line 3
This line contains the Matched Keyword and other stuff
New Inserted Line
Line 4
This line contains the Matched Keyword and other stuff
Line 6

Extract K-th Line from Chunks Using Sed/AWK/Perl

I have some data that looks like this. It comes in chunk of four lines. Each chunk starts with a # character.
#SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
AAAAAAAAAAAAAAAAAAAAAAAAAAA
+SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
::::::::::::::::::::::::;;8
#SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
88888888888888888888888888
What I want to do is to extract last line of each chunk. Yielding:
::::::::::::::::::::::::;;8
888888888888888888888888888
Note that the last line of the chunk may contain any standard ASCII character
including #.
Is there an effective one-liner to do it?
The following sed command will print the 3rd line after the pattern:
sed -n '/^#/{n;n;n;p}' file.txt
If there are no blank lines:
perl -ne 'print if $. % 4 == 0' file
$ awk 'BEGIN{RS="#";FS="\n"}{print $4 } ' file
::::::::::::::::::::::::;;8
88888888888888888888888888
If you always have those 4 lines in a chunk, some other ways
$ ruby -ne 'print if $.%4==0' file
::::::::::::::::::::::::;;8
88888888888888888888888888
$ awk 'NR%4==0' file
::::::::::::::::::::::::;;8
88888888888888888888888888
It also seems like your line is always after the line that start with "+", so
$ awk '/^\+/{getline;print}' file
::::::::::::::::::::::::;;8
88888888888888888888888888
$ ruby -ne 'gets && print if /^\+/' file
::::::::::::::::::::::::;;8
88888888888888888888888888
This prints the lines before lines that starts with #, and also the last line. It can work with non uniform sized chunks, but assumes that only a chunk leading line starts with #.
sed -ne '1d;$p;/^#/!{x;d};/^#/{x;p}' file
Some explanation is in order:
First you don't need the first line so delete it 1d
Next you always need the last line, so print it $p
If you don't have a match swap it into the hold buffer and delete it x;d
If you do have match swap it out of the hold buffer, and print it x;p
This works similarly to dogbane's answer
awk '/^#/ {mark = NR} NR == mark + 3 {print}' inputfile
And, like that answer, will work regardless of the number of lines in each chunk (as long as there are at least 4).
The direct analog to that answer, however, would be:
awk '/^#/ {next; next; next; print}' inputfile
this can be done using grep easily
grep -A 1 '^#' ./infile
This might work for you (GNU sed):
sed '/^#/,+2d' file

Resources