Deleting c-style block comments from multiple files - linux

I have files that have a header file that I need to remove.
/***....*
.
.
.
***.....*/
It is a block like this.
I have used a sed command to remove this block.
sed -i '0,/^\/*\*/d' filename
It only removes the first line of the block comment (e.g) /***....*
and I wish for it to remove the whole block.
I have tried using:
sed -i '/^.*\/\/*/d' filename, but that removes all occurrences of /*...*/

This awk will remove the block:
cat file
Beginning
/***....*
.
.
.
***.....*/
End of block
Some data
awk '/^\/\*\*/ {f=1} !f; /^\*\*/ {f=0}' file
Beginning
End of block
Some data
/^\/\*\*/ {f=1} If line starts with /** set flag f to 1
!f; If flag is not set print the line
/^\*\*/ {f=0} if line starts with ** clear flag f

Related

Fasta file - line issues

I have a FASTA file test.fasta which has the following information:
>QWE2J2_DEFR00000200123 DEFR00000560077.11 DEFR00000100333.7 3:444563-33443(-
)
acccaaagggagggagagagggctattatcatggaaaactaatttttcccagagaatttcctttcaaacctcccagtatc
tatgatcactcccaacgggaggtttaagtgcaacaccaggctgtgtctttctatcacggatttccacccggacacgtgga
acccggcctggtctgtctccaccatcctgactgggctcctgagcttcatggtggagaagggccccaccctgggcagtata
gagacgtcggacttcacgaaaagacaactggcagtgcagagaaaaggggggggggggggggataaagtcttttgtgaatt
atttcctgaagtcgtggaggagattaaacaaaaacagaaagcacaagacgaactcagtagcagaccccagactctcccct
tgccagacgtggttccagaaaaaaaaaaaaacctcgtccagaacgggattcagctgctcaacgggcatgcgccgggggcc
gtcccaaacctcgcagggctccagcaggccaaccggcaccacggactcctgggtggcgccctggcgaacttgtttgtgat
agttgggtttgcagcctttgcttacacggtcaagtaggggggggggggggcgcaggagtg
I need to convert it to CSV in the following format:
>QWE2J2_DEFR00000200123,DEFR00000560077.11,DEFR00000100333.7,3:444563-33443(-),acccaaagggagggagagagggctattatcatggaaaactaatttttcccagagaatttcctttcaaacctcccagtatctatgatcactcccaacgggaggtttaagtgcaacaccaggctgtgtctttctatcacggatttccacccggacacgtggaacccggcctggtctgtctccaccatcctgactgggctcctgagcttcatggtggagaagggccccaccctgggcagtatagagacgtcggacttcacgaaaagacaactggcagtgcagagaaaaggggggggggggggggataaagtcttttgtgaattatttcctgaagtcgtggaggagattaaacaaaaacagaaagcacaagacgaactcagtagcagaccccagactctccccttgccagacgtggttccagaaaaaaaaaaaaacctcgtccagaacgggattcagctgctcaacgggcatgcgccgggggccgtcccaaacctcgcagggctccagcaggccaaccggcaccacggactcctgggtggcgccctggcgaacttgtttgtgatagttgggtttgcagcctttgcttacacggtcaagtaggggggggggggggcgcaggagtg
I have tried in Linux terminal:
input_file=test.fasta; vim -c '0,$s/>\(.*\)\n/>\1,/' -c '0,$s/\(.*\)\n\([^>]\)/\1\2/' -c 'w! my-tmp.fasta.csv' -c 'q!' $input_file; mv my-tmp.fasta.csv $input_file.csv
However, it gives me wrong output:
>QWE2J2_DEFR00000200123 DEFR00000560077.11 DEFR00000100333.7 3:444563-33443(-,)acccaaagggagggagagagggctattatcatggaaaactaatttttcccagagaatttcctttcaaacctcccagtatctatgatcactcccaacgggaggtttaagtgcaacaccaggctgtgtctttctatcacggatttccacccggacacgtggaacccggcctggtctgtctccaccatcctgactgggctcctgagcttcatggtggagaagggccccaccctgggcagtatagagacgtcggacttcacgaaaagacaactggcagtgcagagaaaaggggggggggggggggataaagtcttttgtgaattatttcctgaagtcgtggaggagattaaacaaaaacagaaagcacaagacgaactcagtagcagaccccagactctccccttgccagacgtggttccagaaaaaaaaaaaaacctcgtccagaacgggattcagctgctcaacgggcatgcgccgggggccgtcccaaacctcgcagggctccagcaggccaaccggcaccacggactcctgggtggcgccctggcgaacttgtttgtgatagttgggtttgcagcctttgcttacacggtcaagtaggggggggggggggcgcaggagtg
How can I create this CSV file?
Using awk with RS set to > is just simple:
awk -vRS='>' 'NR>1{
gsub(/ /, ",")
sub(/\)\n/, "),")
gsub("\n", "")
print RS $0
}' file
GNU sed with -z looks simple too:
sed -z '
s/ /,/g
s/)\n/),/g
s/\n//g
s/>/\n>/g
s/^\n//
' file
The following sed script should also work:
sed -n '
# if line does not start with >
/^>/!{
# append the line to hold space
H
# if its not the end of file, start over
$!b
}
# switch pattern space with hold space
x
# add a comma after )
s/)/),/
# remove all the newlines
s/\n//g
# print it all, if hold space not empty
/^$/!p
# switch pattern space with hold space
x
# replace spaces with comma
s/ /,/g
# hold the line
h
' file
Scripts written and tested on repl:
>QWE2J2_DEFR00000200123,DEFR00000560077.11,DEFR00000100333.7,3:444563-33443(-),acccaaagggagggagagagggctattatcatggaaaactaatttttcccagagaatttcctttcaaacctcccagtatcacccggcctggtctgtctccaccatcctgactgggctcctgagcttcatggtggagaagggccccaccctgggcagtataatttcctgaagtcgtggaggagattaaacaaaaacagaaagcacaagacgaactcagtagcagaccccagactctcccctgtcccaaacctcgcagggctccagcaggccaaccggcaccacggactcctgggtggcgccctggcgaacttgtttgtgat
Prefer sed instead of vim.

delete a line after a pattern only if it is blank using sed or awk

I want to delete a blank line only if this one is after the line of my pattern using sed or awk
for example if I have
G
O TO P999-ERREUR
END-IF.
the pattern in this case is G
I want to have this output
G
O TO P999-ERREUR
END-IF.
This will do the trick:
$ awk -v n=-2 'NR==n+1 && !NF{next} /G/ {n=NR}1' file
G
O TO P999-ERREUR
END-IF.
Explanation:
-v n=-2 # Set n=-2 before the script is run to avoid not printing the first line
NR == n+1 # If the current line number is equal to the matching line + 1
&& !NF # And the line is empty
{next} # Skip the line (don't print it)
/G/ # The regular expression to match
{n = NR} # Save the current line number in the variable n
1 # Truthy value used a shorthand to print every (non skipped) line
Using sed
sed '/GG/{N;s/\n$//}' file
If it sees GG, gets the next line, removes the newline between them if the next line is empty.
Note this will only remove one blank line after, and the line must be blank i.e not spaces or tabs.
This might work for you (GNU sed):
sed -r 'N;s/(G.*)\n\s*$/\1/;P;D' file
Keep a moving window of two lines throughout the length of the file and remove a newline (and any whitespace) if it follows the intended pattern.
Using ex (edit in-place):
ex +'/G/j' -cwq foo.txt
or print to the standard output (from file or stdin):
ex -s +'/GG/j|%p|q!' file_or_/dev/stdin
where:
/GG/j - joins the next line when the pattern is found
%p - prints the buffer
q! - quits
For conditional checking (if there is a blank line), try:
ex -s +'%s/^\(G\)\n/\1/' +'%p|q!' file_or_/dev/stdin

Remove all text from last dot in bash

I have a file named test.txt which has:
abc.cde.ccd.eed.12345.5678.txt
abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt
aabc.cdve.cncd.ened.19945.2345.txt
If I want to remove everything beyond the first . like:
cde.ccd.eed.12345.5678.txt
cdde.ccdd.eaed.12346.5688.txt
cade.cacd.eaed.13345.5078.txt
cdae.ccda.eaed.29345.1678.txt
cdae.cacd.eead.18145.2678.txt
cdve.cncd.ened.19945.2345.txt
Then I will do
for i in `cat test.txt`; do echo ${i#*.}; done
but If I want to remove everything after the last . like:
abc.cde.ccd.eed.12345.5678
abcd.cdde.ccdd.eaed.12346.5688
aabc.cade.cacd.eaed.13345.5078
abzc.cdae.ccda.eaed.29345.1678
abac.cdae.cacd.eead.18145.2678
aabc.cdve.cncd.ened.19945.2345
what should I do?
With awk:
awk 'BEGIN{FS=OFS="."} NF--' file
In case there are no empty lines, this works. It sets input and output field separators to the dot .. Then, decreases the number of fields in one, so that the last one is kept out. Then it performs the default awk action: {print $0}, that is, print the line.
With sed:
sed 's/\.[^.]*$//' file
This catches the last block of . + text + end of line and replaces it with nothing. That is, it removes it.
With rev and cut:
rev file | cut -d'.' -f2- | rev
rev reverses the line, so that cut can print from the 2nd word to the end. Then, rev back to get the correct output.
With bash:
while ISF= read -r line
do
echo "${line%.*}"
done < file
This perform a string operation consisting in replacing the shortest match of .* from the end of the variable $line content.
With grep:
grep -Po '.*(?=\.)' file
Look-ahead to print just what is before the last dot.
All of them return:
abc.cde.ccd.eed.12345.5678
abcd.cdde.ccdd.eaed.12346.5688
aabc.cade.cacd.eaed.13345.5078
abzc.cdae.ccda.eaed.29345.1678
abac.cdae.cacd.eead.18145.2678
aabc.cdve.cncd.ened.19945.2345

Replace text between two strings in file using linux bash

i have file "acl.txt"
192.168.0.1
192.168.4.5
#start_exceptions
192.168.3.34
192.168.6.78
#end_exceptions
192.168.5.55
and another file "exceptions"
192.168.88.88
192.168.76.6
I need to replace everything between #start_exceptions and #end_exceptions with content of exceptions file. I have tried many solutions from this forum but none of them works.
EDITED:
Ok, if you want to retain the #start and #stop, I will revert to awk:
awk '
BEGIN {p=1}
/^#start/ {print;system("cat exceptions");p=0}
/^#end/ {p=1}
p' acl.txt
Thanks to #fedorqui for tweaks in comments below.
Output:
192.168.0.1
192.168.4.5
#start_exceptions
192.168.88.88
192.168.76.6
#end_exceptions
192.168.5.55
p is a flag that says whether or not to print lines. It starts at the beginning as 1, so all lines are printed till I find a line starting with #start. Then I cat the contents of the exceptions file and stop printing lines till I find a line starting with #end, at which point I set the p flag back to 1 so remaining lines get printed.
If you want output to a file, add "> newfile" to the very end of the command like this:
awk '
BEGIN {p=1}
/^#start/ {print;system("cat exceptions");p=0}
/^#end/ {p=1}
p' acl.txt > newfile
YET ANOTHER VERSION IF YOU REALLY WANT TO USE SED
If you really, really want to do it with sed, you can use nested address spaces, firstly to select the lines between #start_exceptions and #end_exceptions, then again to select the first line within that and also lines other than the #end_exceptions line:
sed '
/^#start/,/^#end/{
/^#start/{
n
r exceptions
}
/^#end/!d
}
' acl.txt
Output:
192.168.0.1
192.168.4.5
#start_exceptions
192.168.88.88
192.168.76.6
#end_exceptions
192.168.5.55
ORIGINAL ANSWER
I think this will work:
sed -e '/^#end/r exceptions' -e '/^#start/,/^#end/d' acl.txt
When it finds /^#end/ it reads in the exceptions file. And it also deletes everything between /#start/ and /#end/.
I have left the matching slightly "loose" for clarity of expressing the technique.
You can use the following, based on Replace string with contents of a file using sed:
$ sed $'/end/ {r exceptions\n} ; /start/,/end/ {d}' acl.txt
192.168.0.1
192.168.4.5
192.168.88.88
192.168.76.6
192.168.5.55
Explanation
sed $'one_thing; another_thing' ac1.txt performs the two actions.
/end/ {r exceptions\n} if the line contains end, then read the file exceptions and append it.
/start/,/end/ {d} from a line containing start to a line containing end, delete all the lines.
I had problem with Mark Setchell's solution in MINGW. The caret was not picking up the beginning of line. Indeed, is the detection of the separator dependent on it being at the beginning of the line?
I came up with this awk alternative...
$ awk -v data="$(<exceptions)" '
BEGIN {p=1}
/#start_exceptions/ {print; print data;p=0}
/#end_exceptions/ {p=1}
p
' acl.txt

sed to insert on first match only

UPDATED:
Using sed, how can I insert (NOT SUBSTITUTE) a new line on only the first match of keyword for each file.
Currently I have the following but this inserts for every line containing Matched Keyword and I want it to only insert the New Inserted Line for only the first match found in the file:
sed -ie '/Matched Keyword/ i\New Inserted Line' *.*
For example:
Myfile.txt:
Line 1
Line 2
Line 3
This line contains the Matched Keyword and other stuff
Line 4
This line contains the Matched Keyword and other stuff
Line 6
changed to:
Line 1
Line 2
Line 3
New Inserted Line
This line contains the Matched Keyword and other stuff
Line 4
This line contains the Matched Keyword and other stuff
Line 6
You can sort of do this in GNU sed:
sed '0,/Matched Keyword/s//New Inserted Line\n&/'
But it's not portable. Since portability is good, here it is in awk:
awk '/Matched Keyword/ && !x {print "Text line to insert"; x=1} 1' inputFile
Or, if you want to pass a variable to print:
awk -v "var=$var" '/Matched Keyword/ && !x {print var; x=1} 1' inputFile
These both insert the text line before the first occurrence of the keyword, on a line by itself, per your example.
Remember that with both sed and awk, the matched keyword is a regular expression, not just a keyword.
UPDATE:
Since this question is also tagged bash, here's a simple solution that is pure bash and doesn't required sed:
#!/bin/bash
n=0
while read line; do
if [[ "$line" =~ 'Matched Keyword' && $n = 0 ]]; then
echo "New Inserted Line"
n=1
fi
echo "$line"
done
As it stands, this as a pipe. You can easily wrap it in something that acts on files instead.
If you want one with sed*:
sed '0,/Matched Keyword/s//Matched Keyword\nNew Inserted Line/' myfile.txt
*only works with GNU sed
This might work for you:
sed -i -e '/Matched Keyword/{i\New Inserted Line' -e ':a;n;ba}' file
You're nearly there! Just create a loop to read from the Matched Keyword to the end of the file.
After inserting a line, the remainder of the file can be printed out by:
Introducing a loop place holder :a (here a is an arbitrary name).
Print the current line and fetch the next into the pattern space with the ncommand.
Redirect control back using the ba command which is essentially a goto to the a place holder. The end-of-file condition is naturally taken care of by the n command which terminates any further sed commands if it tries to read passed the end-of-file.
With a little help from bash, a true one liner can be achieved:
sed $'/Matched Keyword/{iNew Inserted Line\n:a;n;ba}' file
Alternative:
sed 'x;/./{x;b};x;/Matched Keyword/h;//iNew Inserted Line' file
This uses the Matched Keyword as a flag in the hold space and once it has been set any processing is curtailed by bailing out immediately.
If you want to append a line after first match only, use AWK instead of SED as below
awk '{print} /Matched Keyword/ && !n {print "New Inserted Line"; n++}' myfile.txt
Output:
Line 1
Line 2
Line 3
This line contains the Matched Keyword and other stuff
New Inserted Line
Line 4
This line contains the Matched Keyword and other stuff
Line 6

Resources