Use anything as input to sed or Perl regex?

Use anything as input to sed or Perl regex? - linux

If I do
comment () { sed -i "/$1/s/^/$2 /g" $3 }
comment /dev/sysmsg '#' /tmp/1
comment '*.err' '#' /tmp/1
with input file
*.err;kern.notice;auth.notice /dev/sysmsg
fff
ff
then it breaks, as / is used in sed as separator, and * also be treated as a regex.
Question
Is there a way to make it robust, so the input string in $1 can contain whatever I want? Or would I need to move to Perl?

Sure, use bash parameter substitution to escape the troublesome slash character:
comment () { sed -i "/${1//\//\\/}/ s/^/${2//\//\\/} /" "$3"; }
Notes:
You need to protect slashes in both the pattern and the replacement parts.
if your search is anchored, using "g" is pointless because the pattern can match at most once.
quote all your variables: if the filename contains a space, your code breaks.
one line functions require a semicolon before the closing brace.
Demo
$ cat file
test/1/
test/2/
test/3/
$ comment 'test/2' '//' file
$ cat file
test/1/
// test/2/
test/3/
I realized I'm not escaping regex special characters. The safest way is to escape any non-alphanumeric characters:
comment () {
local pattern=$(sed 's/[^[:alnum:]]/\\&/g' <<<"$1")
local replace=${2//\//\\/}
local file=$3
sed -i "/$pattern/ s/^/$replace /" "$file"
}
But since you want to do plain text matching, sed is probably not the best tool:
comment() {
perl -i -pse '$_ = "$leader $_" if index($_, $search) > -1' -- \
-search="$1" \
-leader="$2" \
"$3"
}

Best to avoid generating code from a shell script.
comment () {
perl -i -pe'BEGIN { ($s,$r)=splice(#ARGV,0,2) } $_ = "$r $_" if /\Q$s/' -- "$#"
}
or
comment () {
s="$1" r="$2" perl -i -pe'$_ = "$ENV{r} $_" if /\Q$ENV{s}/' -- "$3"
}
or
comment () {
perl -i -spe'$_ = "$r $_" if /\Q$s/' -- -s="$1" -r="$2" -- "$3"
}
Supports:
Arbitrary text for the search string (including characters that might normally be special in regex patterns, such as *). This is achieved by using quotemeta (as \Q) to convert the text into a regex pattern that matches that text.
Arbitrary file names (including those that contain spaces or start with -), thanks to proper quoting and the use of --.

Related

Using SED to replace long string - but got unterminated substitute in regular expression

hi trying to replace the following string with a long one :
#x#
with string that I got from the command line:
read test
sed -i --backup 's/#x#/'${test}'/g' file.json README.md
but it is working only for 1 word, it is not working if there is space between word . even between quotes
sed: 1: "s/#x#/string test string: unterminated substitute in regular expression

if case you run it on MacOS and struggling with "unterminated substitute in regular expression", there is an easier explanation for this:
MacOS has slightly other version of sed than usually is on linux. -i requires a parameter. If you have none, just add "" after -i
sed -i "" --backup 's/#x#/'${test}'/g' file.json README.md
or for example if you just have to delete dome line, this works on linux, but brings “invalid command code” on MacOS
sed -i 39d filenamehere.log
and this works on MacOS
sed -i "" 39d filenamehere.log

The problem originates from the way you are using the single-quotes. Currently you are terminating your input behind the 2. single-quote. See the Error message, it makes you aware of the fact that it is missing something.
If you have a file with the following content:
foo #x# foo
Than you can replace the content e.g. with the following command:
sed 's/#x#/bar foo bar/' foo.txt > foo2.txt
And get:
foo bar foo bar foo
If you need to pass in a variable the comment from Gordon Davisson shows you the right way.
By the way, if you want to use the inplace option, on my linux you would need to use the command like this:
sed -i.old "s/#x#/${test}/" foo.txt
But I think this might depends on your enviroment (mac?).

sed doesn't understand strings where a string is a series of literal characters. It replaces a regexp (not a string) with a backreference-enabled "string" (also not a string) all within a set of delimiters (which ALSO require careful handling in both the regexp and the replacement). See Is it possible to escape regex metacharacters reliably with sed for more info.
To replace a string with another string the simplest approach is to just use a tool that understands strings such as awk:
$ cat file
before stuff
foo #x# bar
after stuff
$ cat tst.awk
BEGIN {
old = ARGV[1]
new = ARGV[2]
ARGV[1] = ARGV[2] = ""
}
s = index($0,old) { $0 = substr($0,1,s-1) new substr($0,s+length(old)) }
{ print }
$ test='a/\t/&"b'
$ awk -f tst.awk '#x#' "$test" file
before stuff
foo a/\t/&"b bar
after stuff
The above will work no matter what characters test contains, even newlines:
$ test='contains a
newline'
$ awk -f tst.awk '#x#' "$test" file
before stuff
foo contains a
newline bar
after stuff

Two pattern match on same sed command

I have the following sed command:
sed -n '/^out(/{n;p}' ${filename} | sed -n '/format/ s/.*format=//g; s/),$//gp; s/))$//gp'
I tried to do it as one line as in:
sed -n '/^out(/{n;}; /format/ s/.*format=//g; s/),$//gp; s/))$//gp' ${filename}
But that also display the lines I don't want (those that do not match).
What I have is a file with some strings as in:
entry(variable=value)),
format(variable=value)),
entry(variable=value)))
out(variable=value)),
format(variable=value)),
...
I just want the format lines that came right after the out entry. and remove those trailing )) or ),

You can use this sed command:
sed -nr '/^out[(]/ {n ; s/.*[(]([^)]+)[)].*/\1/p}' your_file
Once a out is found, it advanced to the next line (n) and uses the s command with p flag to extract only what is inside parenthesises.
Explanation:
I used [(] instead of \(. Outside brackets a ( usually means grouping, if you want a literal (, you need to escape it as \( or you can put it inside brackets. Most RE special characters dont need escaping when put inside brackets.
([^)]+) means a group (the "(" here are RE metacharacters not literal parenthesis) that consists of one or more (+) characters that are not (^) ) (literal closing parenthesis), the ^ inverts the character class [ ... ]

Count total number of pattern between two pattern (using sed if possible) in Linux

I have to count all '=' between two pattern i.e '{' and '}'
Sample:
{
100="1";
101="2";
102="3";
};
{
104="1,2,3";
};
{
105="1,2,3";
};
Expected Output:
3
1
1

A very cryptic perl answer:
perl -nE 's/\{(.*?)\}/ say ($1 =~ tr{=}{=}) /ge'
The tr function returns the number of characters transliterated.
With the new requirements, we can make a couple of small changes:
perl -0777 -nE 's/\{(.*?)\}/ say ($1 =~ tr{=}{=}) /ges'
-0777 reads the entire file/stream into a single string
the s flag to the s/// function allows . to handle newlines like a plain character.

Perl to the rescue:
perl -lne '$c = 0; $c += ("$1" =~ tr/=//) while /\{(.*?)\}/g; print $c' < input
-n reads the input line by line
-l adds a newline to each print
/\{(.*?)\}/g is a regular expression. The ? makes the asterisk frugal, i.e. matching the shortest possible string.
The (...) parentheses create a capture group, refered to as $1.
tr is normally used to transliterate (i.e. replace one character by another), but here it just counts the number of equal signs.
+= adds the number to $c.

Awk is here too
grep -o '{[^}]\+}'|awk -v FS='=' '{print NF-1}'
example
echo '{100="1";101="2";102="3";};
{104="1,2,3";};
{105="1,2,3";};'|grep -o '{[^}]\+}'|awk -v FS='=' '{print NF-1}'
output
3
1
1

First some test input (a line with a = outside the curly brackets and inside the content, one without brackets and one with only 2 brackets)
echo '== {100="1";101="2";102="3=3=3=3";} =;
a=b
{c=d}
{}'
Handle line without brackets (put a dummy char so you will not end up with an empty string)
sed -e 's/^[^{]*$/x/'
Handle line without equal sign (put a dummy char so you will not end up with an empty string)
sed -e 's/{[^=]*}/x/'
Remove stuff outside the brackets
sed -e 's/.*{\(.*\)}/\1/'
Remove stuff inside the double quotes (do not count fields there)
sed -e 's/"[^"]*"//g'
Use #repzero method to count equal signs
awk -F "=" '{print NF-1}'
Combine stuff
echo -e '{100="1";101="2";102="3";};\na=b\n{c=d}\n{}' |
sed -e 's/^[^{]*$/x/' -e 's/{[^=]*}/x/' -e 's/.*{\(.*\)}/\1/' -e 's/"[^"]*"//g' |
awk -F "=" '{print NF-1}'
The ugly temp fields x and replacing {} can be solved inside awk:
echo -e '= {100="1";101="2=2=2=2";102="3";};\na=b\n{c=d}\n{}' |
sed -e 's/^[^{]*$//' -e 's/.*{\(.*\)}/\1/' -e 's/"[^"]*"//g' |
awk -F "=" '{if (NF>0) c=NF-1; else c=0; print c}'
or shorter
echo -e '= {100="1";101="2=2=2=2";102="3";};\na=b\n{c=d}\n{}' |
sed -e 's/^[^{]*$//' -e 's/.*{\(.*\)}/\1/' -e 's/"[^"]*"//g' |
awk -F "=" '{print (NF>0) ? NF-1 : 0; }'

No harder sed than done ... in.
Restricting this answer to the environment as tagged, namely:
linux shell unix sed wc
will actually not require the use of wc (or awk, perl, or any other app.).
Though echo is used, a file source can easily exclude its use.
As for bash, it is the shell.
The actual environment used is documented at the end.
NB. Exploitation of GNU specific extensions has been used for brevity
but appropriately annotated to make a more generic implementation.
Also brace bracketed { text } will not include braces in the text.
It is implicit that such braces should be present as {} pairs but
the text src. dangling brace does not directly violate this tenet.
This is a foray into the world of `sed`'ng to gain some fluency in it's use for other purposes.
The ideas expounded upon here are used to cross pollinate another SO problem solution in order
to aquire more familiarity with vetting vagaries of vernacular version variances. Consequently
this pedantic exercice hopefully helps with the pedagogy of others beyond personal edification.
To test easily, at least in the environment noted below, judiciously highlight the appropriate
code section, carefully excluding a dangling pipe |, and then, to a CLI command line interface
drag & drop, copy & paste or use middle click to enter the code.
The other SO problem. linux - Is it possible to do simple arithmetic in sed addresses?
# _______________________________ always needed ________________________________
echo -e '\n
\n = = = {\n } = = = each = is outside the braces
\na\nb\n { } so therefore are not counted
\nc\n { = = = = = = = } while the ones here do count
{\n100="1";\n101="2";\n102="3";\n};
\n {\n104="1,2,3";\n};
a\nb\nc\n {\n105="1,2,3";\n};
{ dangling brace ignored junk = = = \n' |
# _____________ prepatory conditioning needed for final solutions _____________
sed ' s/{/\n{\n/g;
s/}/\n}\n/g; ' | # guarantee but one brace to a line
sed -n '/{/ h; # so sed addressing can "work" here
/{/,/}/ H; # use hHold buffer for only { ... }
/}/ { x; s/[^=]*//g; p } ' | # then make each {} set a line of =
# ____ stop code hi-lite selection in ^--^ here include quote not pipe ____
# ____ outputs the following exclusive of the shell " # " comment quotes _____
#
#
# =======
# ===
# =
# =
# _________________________________________________________________________
# ____________________________ "simple" GNU solution ____________________________
sed -e '/^$/ { s//0/;b }; # handle null data as 0 case: next!
s/=/\n/g; # to easily count an = make it a nl
s/\n$//g; # echo adds an extra nl - delete it
s/.*/echo "&" | sed -n $=/; # sed = command w/ $ counts last nl
e ' # who knew only GNU say you ah phoo
# 0
# 0
# 7
# 3
# 1
# 1
# _________________________________________________________________________
# ________________________ generic incomplete "solution" ________________________
sed -e '/^$/ { s//echo 0/;b }; # handle null data as 0 case: next!
s/=$//g; # echo adds an extra nl - delete it
s/=/\\\\n/g; # to easily count an = make it a nl
s/.*/echo -e & | sed -n $=/; '
# _______________________________________________________________________________
The paradigm used for the algorithm is instigated by the prolegomena study below.
The idea is to isolate groups of = signs between { } braces for counting.
These are found and each group is put on a separate line with ALL other adorning characters removed.
It is noted that sed can easily "count", actually enumerate, nl or \n line ends via =.
The first "solution" uses these sed commands:
print
branch w/o label starts a new cycle
h/Hold for filling this sed buffer
exchanage to swap the hold and pattern buffers
= to enumerate the current sed input line
substitute s/.../.../; with global flag s/.../.../g;
and most particularly the GNU specific
evaluate (execute can not remember the actual mnemonic but irrelevantly synonymous)
The GNU specific execute command is avoided in the generic code. It does not print the answer but
instead produces code that will print the answer. Run it to observe. To fully automate this, many
mechanisms can be used not the least of which is the sed write command to put these lines in a
shell file to be excuted or even embed the output in bash evaluation parentheses $( ) etc.
Note also that various sed example scripts can "count" and these too can be used efficaciously.
The interested reader can entertain these other pursuits.
prolegomena:
concept from counting # of lines between braces
sed -n '/{/=;/}/=;'
to
sed -n '/}/=;/{/=;' |
sed -n 'h;n;G;s/\n/ - /;
2s/^/ Between sets of {} \n the nl # count is\n /;
2!s/^/ /;
p'
testing "done in":
linuxuser#ubuntu:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS
Release: 18.04
Codename: bionic
linuxuser#ubuntu:~$ sed --version -----> sed (GNU sed) 4.4

And for giggles an awk-only alternative:
echo '{
> 100="1";
> 101="2";
> 102="3";
> };
> {
> 104="1,2,3";
> };
> {
> 105="1,2,3";
> };' | awk 'BEGIN{RS="\n};";FS="\n"}{c=gsub(/=/,""); if(NF>2){print c}}'
3
1
1

Replace one character with another in Bash

I need to replace a space ( ) with a dot (.) in a string in bash.
I think this would be pretty simple, but I'm new so I can't figure out how to modify a similar example for this use.

Use inline shell string replacement. Example:
foo=" "
# replace first blank only
bar=${foo/ /.}
# replace all blanks
bar=${foo// /.}
See http://tldp.org/LDP/abs/html/string-manipulation.html for more details.

You could use tr, like this:
tr " " .
Example:
# echo "hello world" | tr " " .
hello.world
From man tr:
DESCRIPTION
Translate, squeeze, and/or delete characters from standard input, writ‐
ing to standard output.

In bash, you can do pattern replacement in a string with the ${VARIABLE//PATTERN/REPLACEMENT} construct. Use just / and not // to replace only the first occurrence. The pattern is a wildcard pattern, like file globs.
string='foo bar qux'
one="${string/ /.}" # sets one to 'foo.bar qux'
all="${string// /.}" # sets all to 'foo.bar.qux'

Try this
echo "hello world" | sed 's/ /./g'

Use parameter substitution:
string=${string// /.}

Try this for paths:
echo \"hello world\"|sed 's/ /+/g'|sed 's/+/\/g'|sed 's/\"//g'
It replaces the space inside the double-quoted string with a + sing, then replaces the + sign with a backslash, then removes/replaces the double-quotes.
I had to use this to replace the spaces in one of my paths in Cygwin.
echo \"$(cygpath -u $JAVA_HOME)\"|sed 's/ /+/g'|sed 's/+/\\/g'|sed 's/\"//g'

The recommended solution by shellcheck would be the following:
string="Hello World" ; echo "${string// /.}"
output: Hello.World

How do I reverse escape backslash encodings like "\ " and "\303\266" in bash?

I have a script that records files with UTF8 encoded names. However the script's encoding / environment wasn't set up right, and it just recoded the raw bytes. I now have lots of lines in the file like this:
.../My\ Folders/My\ r\303\266m/...
So there are spaces in the filenames with \ and UTF8 encoded stuff like \303\266 (which is ö). I want to reverse this encoding? Is there some easy set of bash command line commands I can chain together to remove them?
I could get millions of sed commands but that'd take ages to list all the non-ASCII characters we have. Or start parsing it in python. But I'm hoping there's some trick I can do.

Here's a rough stab at the Unicode characters:
text="/My\ Folders/My\ r\303\266m/"
text="echo \$\'"$(echo "$text"|sed -e 's|\\|\\\\|g')"\'"
# the argument to the echo must not be quoted or escaped-quoted in the next step
text=$(eval "echo $(eval "$text")")
read text < <(echo "$text")
echo "$text"
This makes use of the $'string' quoting feature of Bash.
This outputs "/My Folders/My röm/".
As of Bash 4.4, it's as easy as:
text="/My Folders/My r\303\266m/"
echo "${text#E}"
This uses a new feature of Bash called parameter transformation. The E operator causes the parameter to be treated as if its contents were inside $'string' in which backslash escaped sequences, in this case octal values, are evaluated.

It is not clear exactly what kind of escaping is being used. The octal character codes are C, but C does not escape space. The space escape is used in the shell, but it does not use octal character escapes.
Something close to C-style escaping can be undone using the command printf %b $escaped. (The documentation says that octal escapes start with \0, but that does not seem to be required by GNU printf.) Another answer mentions read for unescaping shell escapes, although if space is the only one that is not handled by printf %b then handling that case with sed would probably be better.

In the end I used something like this:
cat file | sed 's/%/%%/g' | while read -r line ; do printf "${line}\n" ; done | sed 's/\\ / /g'
Some of the files had % in them, which is a printf special character, so I had to 'double it up' so that it would be escaped and passed straight through. The -r in read stops read escaping the \'s however read doesn't turn "\ " into " ", so I needed the final sed.

Use printf to solve the issue with utf-8 text. Use read to take care of spaces (\ ).
Like this:
$ text='/My\ Folders/My\ r\303\266m/'
$ IFS='' read t < <(printf "$text")
$ echo "$t"
/My Folders/My röm/

The built-in 'read' function will handle part of the
problem:
$ echo "with\ spaces" | while read r; do echo $r; done
with spaces

Pass the file (line by line) to the following perl script.
#!/usr/bin/per
sub encode {
$String = $_[0];
$_ = $String;
while(/(\\[0-9]+|.)/g) {
$Match = $1;
if ($Match =~ /\\([0-9]+)/) {
$Code = oct(0 + $1);
$Char = ((($Code >= 32) && ($Code 160))
? chr($Code)
: sprintf("\\x{%X}", $Code);
printf("%s", $Char);
} else {
print "$Match";
}
}
print "\n";
}
while ($#ARGV >= 0) {
$File = shift();
open(my $F, ") {
$String =~ s/\\ / /g;
&encode($Line);
}
}
Like this:
$ ./PerlEncode.pl Test.txt
Where Test.txt contains:
/My\ Folders/My\ r\303\266m/
/My\ Folders/My\ r\303\266m/
/My\ Folders/My\ r\303\266m/
The line "$String =~ s/\ / /g;" replace "\ " with " " and sub encode parse those unicode char.
Hope this help

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Use anything as input to sed or Perl regex? - linux

Related

Using SED to replace long string - but got unterminated substitute in regular expression

Two pattern match on same sed command

Count total number of pattern between two pattern (using sed if possible) in Linux

Replace one character with another in Bash

How do I reverse escape backslash encodings like "\ " and "\303\266" in bash?

Categories

Resources