Regex with a few lines containing "#" and a specific text

Regex with a few lines containing "#" and a specific text - string

We are trying to update the multiple servers configuration file, with new entries in specific location, before a specific lines. The example contains part of the configuration file
#
#
#
#
# The lines below should not be replaced in this file
# Contact Sysadmins before make changes to this line
I need to match these lines including the "#" and newline, with the above lines and then add new entry above these line, as an example
New entry 1
New entry 2
#
#
#
#
# The lines below should not be replaced in this file
# Contact Sysadmins before make changes to this line
Tried this in perl as an inline code, as follows
/usr/bin/perl -lne 'print "\nNew entry 1\nNew entry 2" if (/[#\n]*# The lines below should not be replaced in this file/); print $_' filename
My regex does not work. I am not an expert in perl or regex in any other language. On many of the server instead of 4 "#" there may be 2 or 3, before the line. Any help would be greatly appreciated. I have to update the same file on 2000+ servers.

You are processing the file one line at a time, so [#\n]*# won't ever match anything but #.
One solution involves telling Perl to treat the entire file as one line and thus reading the entire file into memory.
perl -0777pe's/^([#\n]*# The lines below)/New entry 1\nNew entry 2\n$1/mg'
The other would involve postponing the print of lines starting with #.
perl -ne'
$buf .= $_;
next if /^#$/;
print "New entry 1\nNew entry 2\n" if /^# The lines below/;
print $buf;
$buf = "";
END { print $buf; }
'
Tested:
$ perl -0777pe's/^([#\n]*# The lines below)/New entry 1\nNew entry 2\n$1/mg' file
New entry 1
New entry 2
#
#
#
#
# The lines below should not be replaced in this file
# Contact Sysadmins before make changes to this line
$ perl -ne'
$buf .= $_;
next if /^#$/;
print "New entry 1\nNew entry 2\n" if /^# The lines below/;
print $buf;
$buf = "";
END { print $buf; }
' file
New entry 1
New entry 2
#
#
#
#
# The lines below should not be replaced in this file
# Contact Sysadmins before make changes to this line
(Other tests perform too.)

You can use sed command to substitute the new entry lines before the #lines start poistion.
sed '/#/ s/^/New Entry 1\n/' input.txt

Related

How do I add a new column with a specific word to a file in linux?

I have a file with one column containing 2059 ID numbers.
I want to add a second column with the word 'pop1' for all the 2059 ID numbers.
The second column will just mean that the ID number belongs to population 1.
How can I do this is linux using awk or sed?
The file currently has one column which looks like this
45958
480585
308494
I want it to look like:
45958 pop1
480585 pop1
308494 pop1

Maybe not the most elegant solution, and it doesn't use sed or awk, but I would do that:
while read -r line; do echo ""$line" pop1" >> newfile; done < test
This command will append stuff in the file 'newfile', so be sure that it's empty or it doesn't exist before executing the command.
Here is the resource I used, on reading a file line by line : https://www.cyberciti.biz/faq/unix-howto-read-line-by-line-from-file/

A Perl solution.
$ perl -lpi -e '$_ .= " pop1"' your-file-name
Command line options:
-l : remove newline from input and replace it on output
-p : put each line of input into $_ and print $_ at the end of each iteration
-i : in-place editing (overwrite the input file)
-e : run this code for each line of the input
The code ($_ .= " pop1") just appends your string to the input record.

Find and replace a substring in shell script

I have a file which contains some strings for example in file my_file.txt, I have strings
foo_eusa.r1
foo_chnc.r5
foo_deu.r10
.
.
.
Now I wanted to check for a substring whether it exists in the file and if it exists I wanted to modify the entire string or if it does not I will add it.
For example I have a new string foo_eusa.r4 I wanted to search for all occurences of substring foo_eusa in the file. If it exists (in the above it exists foo_eusa.r1) then replace r1 with r4 so the string becomes foo_eusa.r4 instead of foo_eusa.r1 in all occurences. If foo_eusa does not exist then the new string foo_eusa.r4 is to be added
I treid checking using grep -q but it gives only the first match and also could not find a way to replace the sub strings

Perl to the rescue:
perl -i~ -pe 's/^foo_eusa\..*/foo_eusa.r4/ and $changed = 1;
END { print "foo_eusa.r4\n" unless $changed }' -- file
If you need to process more than one file, it's a bit more complex:
perl -i~ -ne '
s/^foo_eusa\..*/foo_eusa.r4/ and $changed = 1;
print;
if (eof) {
print "foo_eusa.r4\n" unless $changed;
$changed = 0;
}' -- file*
-i~ modifies the file "in place", leaving a backup with the ~ extension
-p reads the input line by line and outputs each line after processing
-n is like -p, but it doesn't output the line unless told to print
$changed is used as a flag. When the substitution is triggered, it's set to 1. If it's not 1 at the END (i.e. when the processing is finished), the string is added to the output. In case of several files, the flag must be handled for each file, so eof is used to indicate the end of file.

How to remove all lines from a text file starting at first empty line?

What is the best way to remove all lines from a text file starting at first empty line in Bash? External tools (awk, sed...) can be used!
Example
1: ABC
2: DEF
3:
4: GHI
Line 3 and 4 should be removed and the remaining content should be saved in a new file.

With GNU sed:
sed '/^$/Q' "input_file.txt" > "output_file.txt"

With AWK:
$ awk '/^$/{exit} 1' test.txt > output.txt
Contents of output.txt
$ cat output.txt
ABC
DEF
Walkthrough: For lines that matches ^$ (start-of-line, end-of-line), exit (the whole script). For all lines, print the whole line -- of course, we won't get to this part after a line has made us exit.

Bet there are some more clever ways to do this, but here's one using bash's 'read' builtin. The question asks us to keep lines before the blank in one file and send lines after the blank to another file. You could send some of standard out one place and some another if you are willing to use 'exec' and reroute stdout mid-script, but I'm going to take a simpler approach and use a command line argument to let me know where the post-blank data should go:
#!/bin/bash
# script takes as argument the name of the file to send data once a blank line
# found
found_blank=0
while read stuff; do
if [ -z $stuff ] ; then
found_blank=1
fi
if [ $found_blank ] ; then
echo $stuff > $1
else
echo $stuff
fi
done
run it like this:
$ ./delete_from_empty.sh rest_of_stuff < demo
output is:
ABC
DEF
and 'rest_of_stuff' has
GHI
if you want the before-blank lines to go somewhere else besides stdout, simply redirect:
$ ./delete_from_empty.sh after_blank < input_file > before_blank
and you'll end up with two new files: after_blank and before_blank.

Perl version
perl -e '
open $fh, ">","stuff";
open $efh, ">", "rest_of_stuff";
while(<>){
if ($_ !~ /\w+/){
$fh=$efh;
}
print $fh $_;
}
' demo
This creates two output files and iterates over the demo data. When it hits a blank line, it flips the output from one file to the other.
Creates
stuff:
ABC
DEF
rest_of_stuff:
<blank line>
GHI

Another awk would be:
awk -vRS= '1;{exit}' file
By setting the record separator RS to be an empty string, we define the records as paragraphs separated by a sequence of empty lines. It is now easily to adapt this to select the nth block as:
awk -vRS= '(FNR==n){print;exit}' file
There is a problem with this method when processing files with a DOS line-ending (CRLF). There will be no empty lines as there will always be a CR in the line. But this problem applies to all presented methods.

Substituting a single line with multiple lines of text

In Linux what command can I use to replace a single line of text with new multiple lines? I want to look for a keyword on a line and delete this line and replace it with multiple new lines. So in the text shown below I want to search for the line that contains "keyword" and replace the entire line with 3 new lines of text as shown.
For example replacing the line containing the keyword,
This is Line 1
This is Line 2 that has keyword
This is Line 3
changed to this:
This is Line 1
Inserted is new first line
Inserted is new second line
Inserted is new third line
This is Line 3

$ sed '/keyword/c\
> Inserted is new first line\
> Inserted is new second line\
> Inserted is new third line' input.txt
This is Line 1
Inserted is new first line
Inserted is new second line
Inserted is new third line
This is Line 3
$ and > are bash prompt

Create a file, script.sed, containing:
/keyword/{i\
Inserted is new first line\
Inserted is new second line\
Inserted is new third line
d
}
Apply it to your data:
sed -f script.sed your_data
There are numerous variations on how to do it, using the c and a commands instead of i and/or d, but this is reasonably clean. It finds the keyword, inserts three lines of data, and then deletes the line containing the keyword. (The c command does that all, but I didn't remember that it existed, and the a command appends the text and is essentially synonymous with i in this context.)

you can do it using shell builtins too:
STRING1_WITH_MULTIPLE_LINES="your
text
here"
STRING2_WITH_MULTIPLE_LINES="more
text"
OUTPUT=""
while read LINE || [ "$LINE" ]; do
case "$LINE" in
"Entire line matches this")OUTPUT="$OUTPUT$STRING1_WITH_MULTIPLE_LINES
";;
*"line matches this with extra before and/or after"*)OUTPUT="$OUTPUT$STRING2_WITH_MULTIPLE_LINES
";;
*)OUTPUT="$OUTPUT$LINE
";;
esac
done < file
echo "$OUTPUT" >file

how to remove text block (pattern) from a file with sed/awk

I have thousands of text files that I have imported that contain a piece of text that I would like to remove.
It is not just a block of text but a pattern.
<!--
# Translator(s):
#
# username1 <email1>
# username2 <email2>
# usernameN <emailN>
#
-->
The block if it appears it will have 1 or more users being listed with their email addresses.

I have another small awk program that accomplish the task in a very few rows of code. It can be used to remove patterns of text from a file. Start as well as stop regexp can be set.
# This block is a range pattern and captures all lines between( and including )
# the start '<!--' to the end '-->' and stores the content in record $0.
# Record $0 contains every line in the range pattern.
# awk -f remove_email.awk yourfile
# The if statement is not needed to accomplish the task, but may be useful.
# It says - if the range patterns in $0 contains a '#' then it will print
# the string "Found an email..." if uncommented.
# command 'next' will discard the content of the current record and search
# for the next record.
# At the same time the awk program begins from the beginning.
/<!--/, /-->/ {
#if( $0 ~ /#/ ){
# print "Found an email and removed that!"
#}
next
}
# This line prints the body of the file to standard output - if not captured in
# the block above.
1 {
print
}
Save the code in 'remove_email.awk' and run it by:
awk -f remove_email.awk yourfile

This sed solution might work:
sed '/^<!--/,/^-->/{/^<!--/{h;d};H;/^-->/{x;/^<!--\n# Translator(s):\n#\(\n# [^<]*<email[0-9]\+>\)\+\n#\n-->$/!p};d}' file
An alternative (perhaps better solution?):
sed '/^<!--/{:a;N;/^-->/M!ba;/^<!--\n# Translator(s):\n#\(\n# \w\+ <[^>]\+>\)+\n#\n-->/d}' file
This gathers up the lines that start with <!-- and end with --> then pattern matches on the collection i.e. the second line is # Translator(s): the third line is #, the fourth and perhaps more lines follow # username <email address>, the penultimate line is # and the last line is -->. If a match is made the entire collection is deleted otherwise it is printed as normal.

for this task you need look-ahead, which is normally done with a parser.
Another solution, but not very efficient would be:
sed "s/-->/&\n/;s/<!--/\n&/" file | awk 'BEGIN {RS = "";FS = "\n"}/username/{print}'
HTH Chris

perl -i.orig -00 -pe 's/<!--\s+#\s*Translator.*?\s-->//gs' file1 file2 file3

Here is my solution, if I understood your problem correctly. Save the following to a file called remove_blocks.awk:
# See the beginning of the block, mark it
/<!--/ {
state = "block_started"
}
# At the end of the block, if the block does not contain email, print
# out the whole block.
/^-->/ {
if (!block_contains_user_email) {
for (i = 0; i < count; i++) {
print saved_line[i];
}
print
}
count = 0
block_contains_user_email = 0
state = ""
next
}
# Encounter a block: save the lines and wait until the end of the block
# to decide if we should print it out
state == "block_started" {
saved_line[count++] = $0
if (NF>=3 && $3 ~ /#/) {
block_contains_user_email = 1
}
next
}
# For everything else, print the line
1
Assume that your text file is in data.txt (or many files, for that matter):
awk -f remove_blocks.awk data.txt
The above command will print out everything in the text file, minus the blocks which contain user email.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Regex with a few lines containing "#" and a specific text - string

You can use sed command to substitute the new entry lines before the #lines start poistion. sed '/#/ s/^/New Entry 1\n/' input.txt

Related

How do I add a new column with a specific word to a file in linux?

Find and replace a substring in shell script

How to remove all lines from a text file starting at first empty line?

Substituting a single line with multiple lines of text

how to remove text block (pattern) from a file with sed/awk

Categories

Resources