linux sed remove block of code with linebreak - linux

Trying to remove this entire block of code from a script:
https://pastebin.com/gBnFBQSR
I am able to do so up until the linebreak and ending }
sed '/var gfjfgjk/,/appendChild(s);\n}/d'
how can I have it include the linebreak and } at the end

Thanks to #Wiktor Stribiżew link:
for i in $(grep -rl gfjfgjk) ; do if grep -m1 gfjfgjk $i >/dev/null 2>&1 ; then echo $i ; sed -i -e '1,6d;7s/^}//' $i ; fi; done
The only different really is, I wanted to make sure gfjfgjk matched the first line of the file in case it was injected somewhere else in the script and then sed removed the first 7 lines of legit code.

With sed, would you please try the following:
sed '
:l # define a label "l"
N # read next line and append to the pattern space
$!b l # goto "l" unless eof
s/var gfjfgjk.*appendChild(s);\n}//
# remove the specified block including newlines
' file
It first slurps all lines into the pattern space so we can process
multiple lines (including the newline characters) at once.
The possible problem is if the file contains the pattern appendChild(s);\n}
in multiple lines, sed will fall in the longest match due to the
greedy nature of regex.
As an alternative, if perl is your option, you can also say:
perl -0777 -pe 's/var gfjfgjk.*?appendChild\(s\);\n}//s' file
The -0777 option tells perl to read the all lines at once.
The regex .*? enables the shortest match.
The s option at the end makes the dot . match newlines. Otherwise
the dot in perl regex does not match newlines.

Related

Add spaces after punctuation marks with sed

I need to capitalize a txt file but I found some problems when I try to add a space after any punctuation mark with sed. For instance: "Hello,World" -> to "Hello, World"
I tried the following:
#!/bin/bash
if [ $# != 1 ]; then
echo "No parameter"
exit
fi
cp $1 $1.bak
ARCH1=/tmp/`basename $1`.$$
sed 's/[A-Z]*/\L&/g' $1 > $ARCH1
sed -i 's/^./\u&/' $ARCH1
sed 's/ */\ /g' $ARCH1 #Here I replace >= 2 spaces for 1
sed 's/, */, /g' $ARCH1
#These 2 lines don't work well
sed 's/. */. /g' $ARCH1
sed 's/; */; /g' $ARCH1
mv $ARCH1 $1
The script doesn't crash, but the output is not the one that I expect.
I believe the reason your script doesn't work is that you forgot to pass -i to sed in several calls, and also that you don't escape . in the regex, so that . matches any character.
I also believe that a simpler way to do what you're trying to do is
sed -i.bak 's/[A-Z]*/\L&/g; s/\([.,;]\) */\1 /' "$1"
-i.bak edits the file in-place and creates a backup with the .bak extension, and the script is simply
s/[A-Z]*/\L&/g # lower-case everything (I got that from your code)
s/\([.,;]\) */\1 / # replace spaces after period, comma or semicolon
Here
[.,;] is a character set matching period, comma or semicolon,
\(stuff\) captures stuff in a group for later use, and
\1 is a back reference referring to the first such capture.
Note that this is a very simple approach. If your text, for example, contains ellipses (...), it'll waltz right over that and make ... into . . ., and similar caveats apply for ?! and such.
Using GNU sed:
$ echo "foo;BAR,BaZ.qux" | sed -r 's/[[:punct:]]+/& /g; s/[[:alnum:]]+/\L\u&/g'
Foo; Bar, Baz. Qux
\L lower cases the whole word, then \u upper cases the first character.
See your regex(7) man page for regular expression documentation.

sed: replace block of text between markers INCLUDING the markers themselves

I have the following sed commands that replace a block of text with the contents of a file between the start & end markers /**##+ and **##-* respectively:
sed -i -ne '/**##+/ {p; r block.txt' -e ':a; n; /**##-*/ {p; b}; ba}; p' -e '/**##+/d' test.txt && sed -i -e '/**##+/d' -e '/**##-*/d' test.txt
(Besides replacing text, the command also converts line endings.)
As it is, this leaves the start and end markers intact, but I want to get rid of those as well. My using the p command means that I can't have a d command in the same execution unit. I work around the problem by introducing a second set of commands that delete those markers, but I would like to have it all in one single sed command, if possible.
test.txt
start of file
/**##+
* the start marker is above
*/
this should get replaced
/**##-*/
end marker is above
block.txt
THIS IS THE REPLACEMENT
Results
Running the command should change test.txt like so:
start of file
THIS IS THE REPLACEMENT
end marker is above
I am looking for the shortest, single-line solution in sed.
This might work for you (GNU sed):
sed '/^\/\*\*##+/,/^\/\*\*##-\*/cThis is the replacement' file
This changes the lines between the range to the required string.
To replace a range with contents of a file use:
sed -e '/^\/\*\*##+/!b;:a;N;/^\/\*\*##-\*/M!ba;r replacementFile' -e 'd' file
On encountering the start of the range set up a loop to gather up the range in the pattern space, then read the replacement file into the standard output and delete the contents of the pattern space.
Your start and end tags contain regex meta characters and /. sed only searches an input by regex and you need to escape / and all of those meta-characters in sed.
It is much easier to handle this in awk as awk allows non-regex plain text search also:
awk -v st='/**##+' -v et='/**##-*/' -v repl="$(<block.txt)" '
$0 == st{del=1} $0 == et{$0 = repl; del=0} !del' file
start of file
THIS IS THE REPLACEMENT
end marker is above

How can I remove the last character of a file in unix?

Say I have some arbitrary multi-line text file:
sometext
moretext
lastline
How can I remove only the last character (the e, not the newline or null) of the file without making the text file invalid?
A simpler approach (outputs to stdout, doesn't update the input file):
sed '$ s/.$//' somefile
$ is a Sed address that matches the last input line only, thus causing the following function call (s/.$//) to be executed on the last line only.
s/.$// replaces the last character on the (in this case last) line with an empty string; i.e., effectively removes the last char. (before the newline) on the line.
. matches any character on the line, and following it with $ anchors the match to the end of the line; note how the use of $ in this regular expression is conceptually related, but technically distinct from the previous use of $ as a Sed address.
Example with stdin input (assumes Bash, Ksh, or Zsh):
$ sed '$ s/.$//' <<< $'line one\nline two'
line one
line tw
To update the input file too (do not use if the input file is a symlink):
sed -i '$ s/.$//' somefile
Note:
On macOS, you'd have to use -i '' instead of just -i; for an overview of the pitfalls associated with -i, see the bottom half of this answer.
If you need to process very large input files and/or performance / disk usage are a concern and you're using GNU utilities (Linux), see ImHere's helpful answer.
truncate
truncate -s-1 file
Removes one (-1) character from the end of the same file. Exactly as a >> will append to the same file.
The problem with this approach is that it doesn't retain a trailing newline if it existed.
The solution is:
if [ -n "$(tail -c1 file)" ] # if the file has not a trailing new line.
then
truncate -s-1 file # remove one char as the question request.
else
truncate -s-2 file # remove the last two characters
echo "" >> file # add the trailing new line back
fi
This works because tail takes the last byte (not char).
It takes almost no time even with big files.
Why not sed
The problem with a sed solution like sed '$ s/.$//' file is that it reads the whole file first (taking a long time with large files), then you need a temporary file (of the same size as the original):
sed '$ s/.$//' file > tempfile
rm file; mv tempfile file
And then move the tempfile to replace the file.
Here's another using ex, which I find not as cryptic as the sed solution:
printf '%s\n' '$' 's/.$//' wq | ex somefile
The $ goes to the last line, the s deletes the last character, and wq is the well known (to vi users) write+quit.
After a whole bunch of playing around with different strategies (and avoiding sed -i or perl), the best way i found to do this was with:
sed '$! { P; D; }; s/.$//' somefile
If the goal is to remove the last character in the last line, this awk should do:
awk '{a[NR]=$0} END {for (i=1;i<NR;i++) print a[i];sub(/.$/,"",a[NR]);print a[NR]}' file
sometext
moretext
lastlin
It store all data into an array, then print it out and change last line.
Just a remark: sed will temporarily remove the file.
So if you are tailing the file, you'll get a "No such file or directory" warning until you reissue the tail command.
EDITED ANSWER
I created a script and put your text inside on my Desktop. this test file is saved as "old_file.txt"
sometext
moretext
lastline
Afterwards I wrote a small script to take the old file and eliminate the last character in the last line
#!/bin/bash
no_of_new_line_characters=`wc '/root/Desktop/old_file.txt'|cut -d ' ' -f2`
let "no_of_lines=no_of_new_line_characters+1"
sed -n 1,"$no_of_new_line_characters"p '/root/Desktop/old_file.txt' > '/root/Desktop/my_new_file'
sed -n "$no_of_lines","$no_of_lines"p '/root/Desktop/old_file.txt'|sed 's/.$//g' >> '/root/Desktop/my_new_file'
opening the new_file I created, showed the output as follows:
sometext
moretext
lastlin
I apologize for my previous answer (wasn't reading carefully)
sed 's/.$//' filename | tee newFilename
This should do your job.
A couple perl solutions, for comparison/reference:
(echo 1a; echo 2b) | perl -e '$_=join("",<>); s/.$//; print'
(echo 1a; echo 2b) | perl -e 'while(<>){ if(eof) {s/.$//}; print }'
I find the first read-whole-file-into-memory approach can be generally quite useful (less so for this particular problem). You can now do regex's which span multiple lines, for example to combine every 3 lines of a certain format into 1 summary line.
For this problem, truncate would be faster and the sed version is shorter to type. Note that truncate requires a file to operate on, not a stream. Normally I find sed to lack the power of perl and I much prefer the extended-regex / perl-regex syntax. But this problem has a nice sed solution.

Match a string that contains a newline using sed

I have a string like this one:
#
pap
which basically translates to a \t#\n\tpap and I want to replace it with:
#
pap
python
which translates to \t#\n\tpap\n\tpython.
Tried this with sed in a lot of ways but it's not working maybe because sed uses new lines in a different way. I tried with:
sed -i "s/\t#\n\tpap/\t#\tpython\n\tpap/" /etc/freeradius/sites-available/default
...and many different other ways with no result. Any idea how can I do my replace in this situation?
try this line with gawk:
awk -v RS="\0" -v ORS="" '{gsub(/\t#\n\tpap/,"yourNEwString")}7' file
if you want to let sed handle new lines, you have to read the whole file first:
sed ':a;N;$!ba;s/\t#\n\tpap/NewString/g' file
This might work for you (GNU sed):
sed '/^\t#$/{n;/^\tpap$/{p;s//\tpython/}}' file
If a line contains only \t# print it, then if the next line contains only \tpap print it too, then replace that line with \tpython and print that.
A GNU sed solution that doesn't require reading the entire file at once:
sed '/^\t#$/ {n;/^\tpap$/a\\tpython'$'\n''}' file
/^\t#$/ matches comment-only lines (matching \t# exactly), in which case (only) the entire {...} expression is executed:
n loads and prints the next line.
/^\tpap/ matches that next line against \tpap exactly.
in case of a match, a\\tpython will then output \n\tpython before the following line is read - note that the spliced-in newline ($'\n') is required to signal the end of the text passed to the a command (you can alternatively use multiple -e options).
(As an aside: with BSD sed (OS X), it gets cumbersome, because
Control chars. such as \n and \t aren't directly supported and must be spliced in as ANSI C-quoted literals.
Leading whitespace is invariably stripped from the text argument to the a command, so a substitution approach must be used: s//&\'$'\n\t'python'/ replaces the pap line with itself plus the line to append:
sed '/^'$'\t''#$/ {n; /^'$'\t''pap$/ s//&\'$'\n\t'python'/;}' file
)
An awk solution (POSIX-compliant) that also doesn't require reading the entire file at once:
awk '{print} /^\t#$/ {f=1;next} f && /^\tpap$/ {print "\tpython"} {f=0}' file
{print}: prints every input line
/^\t#$/ {f=1;next}: sets flag f (for 'found') to 1 if a comment-only line (matching \t# exactly) is found and moves on to the next line.
f && /^\tpap$/ {print "\tpython"}: if a line is preceded by a comment line and matches \tpap exactly, outputs extra line \tpython.
{f=0}: resets the flag that indicates a comment-only line.
A couple of pure bash solutions:
Concise, but somewhat fragile, using parameter expansion:
in=$'\t#\n\tpap\n' # input string
echo "${in/$'\t#\n\tpap\n'/$'\t#\n\tpap\n\tpython\n'}"
Parameter expansion only supports patterns (wildcard expressions) as search strings, which limits the matching abilities:
Here the assumption is made that pap is followed by \n, whereas no assumption is made about what precedes \t#, potentially resulting in false positives.
If the assumption could be made that \t#\n\tpap is always enclosed in \n, echo "${in/$'\n\t#\n\tpap\n'/$'\n\t#\n\tpap\n\tpython\n'}" would work robustly; otherwise, see below.
Robust, but verbose, using the =~ operator for regex matching:
The =~ operator supports extended regular expressions on the right-hand side and thus allows more flexible and robust matching:
in=$'\t#\n\tpap' # input string
# Search string and string to append after.
search=$'\t#\n\tpap'
append=$'\n\tpython'
out=$in # Initialize output string to input string.
if [[ $in =~ ^(.*$'\n')?("$search")($'\n'.*)?$ ]]; then # perform regex matching
out=${out/$search/$search$append} # replace match with match + appendage
fi
echo "$out"
You can just translate the character \n to another one, then apply sed, then apply the reverse translation. If tr is used, it must be a 1-byte character, for instance \v (vertical tabulation, nowadays almost unused).
cat FILE|tr '\n' '\v'|sed 's/\t#\v\tpap/&\v\tpython/'|tr '\v' '\n'|sponge FILE
or, without sponge:
cat FILE|tr '\n' '\v'|sed 's/\t#\v\tpap/&\v\tpython/'|tr '\v' '\n' >FILE.bak && mv FILE.bak FILE

Bash script to remove 'x' amount of characters the end of multiple filenames in a directory?

I have a list of file names in a directory (/path/to/local). I would like to remove a certain number of characters from all of those filenames.
Example filenames:
iso1111_plane001_00321.moc1
iso1111_plane002_00321.moc1
iso2222_plane001_00123.moc1
In every filename I wish to remove the last 5 characters before the file extension.
For example:
iso1111_plane001_.moc1
iso1111_plane002_.moc1
iso2222_plane001_.moc1
I believe this can be done using sed, but I cannot determine the exact coding. Something like...
for filename in /path/to/local/*.moc1; do
mv $filname $(echo $filename | sed -e 's/.....^//');
done
...but that does not work. Sorry if I butchered the sed options, I do not have much experience with it.
mv $filname $(echo $filename | sed -e 's/.....\.moc1$//');
or
echo ${filename%%?????.moc1}.moc1
%% is a bash internal operator...
This sed command will work for all the examples you gave.
sed -e 's/\(.*\)_.*\.moc1/\1_.moc1/'
However, if you just want to specifically "remove 5 characters before the last extension in a filename" this command is what you want:
sed -e 's/\(.*\)[0-9a-zA-Z]\{5\}\.\([^.]*\)/\1.\2/'
You can implement this in your script like so:
for filename in /path/to/local/*.moc1; do
mv $filename "$(echo $filename | sed -e 's/\(.*\)[0-9a-zA-Z]\{5\}\.\([^.]*\)/\1.\2/')";
done
First Command Explanation
The first sed command works by grabbing all characters until the first underscore: \(.*\)_
Then it discards all characters until it finds .moc1: .*\.moc1
Then it replaces the text that it found with everything it grabbed at first inside the parenthesis: /\1
And finally adds the .moc1 extension back on the end and ends the regex: .moc1/
Second Command Explanation
The second sed command works by grabbing all characters at first: \(.*\)
And then it is forced to stop grabbing characters so it can discard five characters, or more specifically, five characters that lie in the ranges 0-9, a-z, and A-Z: [0-9a-zA-Z]\{5\}
Then comes the dot '.' character to mark the last extension : \.
And then it looks for all non-dot characters. This ensures that we are grabbing the last extension: \([^.]*\)
Finally, it replaces all that text with the first and second capture groups, separated by the . character, and ends the regex: /\1.\2/
This might work for you (GNU sed):
sed -r 's/(.*).{5}\./\1./' file

Resources