sed doesn't replace variable - linux

I'm trying to replace some regex line in a apache file.
i define:
OLD1="[0-9]*.[0-9]+"
NEW1="[a-z]*.[0-9]"
when i'm executing:
sed -i 's/$OLD1/$NEW1/g' demo.conf
there's no change.
This is what i tried to do
sed -i "s/${OLD1}/${NEW1}/g" 001-kms.conf
sed -i "s/"$OLD1"/"$NEW1"/g" 001-kms.conf
sed -i "s~${OLD1}~${NEW1}~g" 001-kms.conf
i'm expecting that the new file will replace $OLD1 with $NEW1

OLD1="[0-9]*.[0-9]+"
Because the [ * . are all characters with special meaning in sed, we need to escape them. For such simple case something like this could work:
OLD2=$(<<<"$OLD1" sed 's/[][\*\.]/\\&/g')
It will set OLD2 to \[0-9\]\*\.\[0-9\]+. Note that it doesn't handle all the possible cases, like OLD1='\.\[' will convert to OLD2='\\.\\[ which means something different. Implementing a proper regex to properly escape, well, other regex I leave as an exercise to others.
Now you can:
sed "s/$OLD2/$NEW1/g"
Tested with:
OLD1="[0-9]*.[0-9]+"
NEW1="[a-z]*.[0-9]"
sed "s/$(sed 's/[][\*\.]/\\&/g' <<<"$OLD1")/$NEW1/g" <<<'XYZ="[0-9]*.[0-9]+"'
will output:
XYZ="[a-z]*.[0-9]"

you need matching on exact string
You would need something that can match on exact string [0-9]*.[0-9]+ which sed does not support well.
Therefore instead I am using this pipeline replacing one character at a time (it also is easier to read I think):echo "[0-9]*.[0-9]+" | sed 's/0/a/' | sed 's/9/z/' | sed 's/+//'
You would have to cat your files or use find with execute to then apply this pipe.
I had tried following (from other SO answers):
- sed 's/\<exact_string/>/replacement/'doesn't work as \< and \> are left and right word boundaries respectively.
- sed 's/(CB)?exact_string/replacement/'found in one answer but nowhere in documentation
I used Win 10 bash, git bash, and online Linux tools with the same results.
when I thought matching was on the pattern rather than exact string
Replacement cannot be a regex - at most it can reference parts of the regex expression which matched. From man sed:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
Additionally you have to escape some characters in your regex (specifically . and +) unless you add option -E for extended regex as per comment under your question. (N.B. only if you want to match on the full-stop . rather than it meaning any character)
$ echo "01.234--ZZZ" | sed 's/[0-9]*\.[0-9]\+/REPLACEMENT/g'
REPLACEMENT--ZZZ

Related

Getting Exact Pattern Match with grep and sed

I'm solving a bunch of text strings using grep and sed in which I only want the stdout to print the data after package: and ends at the folder name without the ending /.
For example:
data/dataapp/com.android.chrome-DeX_54==
System/app/Keychain
vendor/app/NlpService
This is the sample...
package:data/app/com.android.chrome-DeX_54==/base.apk=com.android.chrome
package:data/dataapp/ExactCalculator/ExactCalculator.apk=com.android.calculator2
package:data/hw_init/cust/app/Email/Email.apk=com.android.email
package:system/app/KeyChain/KeyChain.apk=com.android.keychain
package:system/delapp/WallpaperBackup/WallpaperBackup.apk=com.android.wallpaperbackup
package:system/framework/framework-res.apk=android
package:system/priv-app/CIT/CIT.apk=com.ontim.cit
package:vendor/app/NlpService/NlpService.apk=com.mediatek.nlpservice
I'm not getting the exact output I want so any help would be appreciated.
P.S: I'm learning grep and sed just for fun.
Would you please try:
grep -Po '(?<=package:).+(?=/[^/]*$)' input.txt
Results:
data/app/com.android.chrome-DeX_54==
data/dataapp/ExactCalculator
data/hw_init/cust/app/Email
system/app/KeyChain
system/delapp/WallpaperBackup
system/framework
system/priv-app/CIT
vendor/app/NlpService
The -P option enables a Perl compatible regex.
The -o option tells grep to print only the matched substring(s).
The pattern (?<=package:) is a positive lookbehind assertion and
the matched substring is not included in the output of grep -o.
The pattern (?=/[^/]*$) is a positive lookahead assertion as well.
The sed alternative will be:
sed 's#\(^package:\)\(.\+\)\(/[^/]*$\)#\2#' input.txt
or
sed -E 's#(^package:)(.+)(/[^/]*$)#\2#' input.txt
The latter will be more legible.
You'll see the positive lookarounds can be substituted with the back reference of sed just by discarding the unnecessary groups.
Hope this helps.
This might work for you (GNU sed):
sed -n 's#^package:\(.*\)/.*#\1#p' file
As this may be a filtering operation use the -n option to explicitly print results. The regexp starts with ^ in the substitution command which anchors package: to the start of the line and the uses .* to greedily consume the remainder of the line. However, the next character it tries to match is a / and so the regexp engine backtracks to find it and then the following .* again swallows the remainder of the line. The quoted parens \(...\) capture this part of the regexp and it is represented in the RHS of the substitute command by \1 known as a back reference. The p flag at the end of the substitute command explicitly prints the amended line in its current state.
N.B. That with the substitute command, the programmer can choose its delimiter. In documentation the command will usually be written s/LHS/RHS/flags where the delimiter is / but can be any character as in the above solution # was chosen to reduce the need for quoting the / character, LHS = regexp on the left hand side, RHS = replacement and flags = additional operations such as g meaning substitute globally throughout the line/file and p meaning print the line in its current state following a successful substitution (there are others see sed documentation.

Remove text between one string and 1st occurrence of another string

I have found several solutions to remove text between two strings but I guess my case is a little different.
I am trying to convert this:
/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume
To this:
/nz/kit/bin/adm/tools/hostaekresume
Basically remove the version specific information from the filename.
The solutions I have found remove everything from the word kit to the last occurrence of /. I need something to remove from kit to the first occurrence.
The most common solution I have seen is:
sed -e 's/\(kit\).*\(\/\)/\1\2/'
Which produces:
/nz/kit/hostaekresume
How can I only remove up to the first /? I assume this can done with sed or awk, but open to suggestions.
$ sed 's|\(kit\)[^/]*|\1|' <<< '/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume'
/nz/kit/bin/adm/tools/hostaekresume
This uses a different delimiter (| instead of /) so we don't have to escape the /. Then, for non-greedy matching, it uses [^/]*: any number of characters other than /, which matches everything between kit and the next /.
Alternatively, if you know that what you want to remove consists of dots and digits, and nothing else in the string contains them, you can use parameter expansion:
$ var='/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume'
$ echo "${var//[[:digit:].]}"
/nz/kit/bin/adm/tools/hostaekresume
The syntax is ${parameter/pattern/string}, where pattern in the expanded parameter is replaced by string. If we use // instead of /, all occurrences instead of just the first are replaced.
In our case, parameter is var, the pattern is [[:digit:].] (digits or a dot – this is a glob pattern, not a regular expression, by the way), and we've skipped the /string part, which just removes the pattern (replaces it with nothing).
You need perl for non-greedy regex. sed doesn't do that yet.
Also, use | as a delimiter since / can cause confusion when you have it in your regex.
perl -pe 's|(kit).*?(/.*)|\1\2|'
The ? after the .* makes the pattern non-greedy and will match the first instance of /.
echo "/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume" | perl -pe 's|(kit).*?(/.*)|\1\2|'
returns
/nz/kit/bin/adm/tools/hostaekresume
echo "/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume" | awk '{sub(/.7.2.0.7/,"")}1'
/nz/kit/bin/adm/tools/hostaekresume

Understanding sed expression 's/^\.\///g'

I'm studying Bash programming and I find this example but I don't understand what it means:
filtered_files=`echo "$files" | sed -e 's/^\.\///g'`
In particular the argument passed to sed after '-e'.
It's a bad example; you shouldn't follow it.
First, understanding the sed expression at hand.
s/pattern/replacement/flags is the a sed command, described in detail in man sed. In this case, pattern is a regular expression; replacement is what that pattern gets replaced with when/where found; and flags describe details about how that replacement should be done.
In this case, the s/^\.\///g breaks down as follows:
s is the sed command being run.
/ is the sigil used to separate the sections of this command. (Any character can be used as a sigil, and the person who chose to use / for this expression was, to be charitable, not thinking about what they were doing very hard).
^\.\/ is the pattern to be replaced. The ^ means that this replaces anything only at the beginning; \. matches only a period, vs . (which is regex for matching any character); and \/ matches only a / (vs /, which would go on to the next section of this sed command, being the selected sigil).
The next section is an empty string, which is why there's no content between the two following sigils.
g in the flags section indicates that more than one replacement can happen each line. In conjunction with ^, this has no meaning, since there can only be one beginning-of-the-line per line; further evidence that the person who wrote your example wasn't thinking much.
Using the same data structures, doing it better:
All of the below are buggy when handling arbitrary filenames, because storing arbitrary filenames in scalar variables is buggy in general.
Still using sed:
# Use printf instead of echo to avoid bugginess if your "files" string is "-n" or "-e"
# Use "#" as your sigil to avoid needing to backslash-escape all the "\"s
filtered_files=$(printf '%s\n' "$files" | sed -e 's#^[.]/##g'`)
Replacing sed with a bash builtin:
# This is much faster than shelling out to any external tool
filtered_files=${files//.\//}
Using better data structures
Instead of running
files=$(find .)
...instead:
files=( )
while IFS= read -r -d '' filename; do
files+=( "$filename" )
done < <(find . -print0)
That stores files in an array; it looks complex, but it's far safer -- works correctly even with filenames containing spaces, quote characters, newline literals, etc.
Also, this means you can do the following:
# Remove the leading ./ from each name; don't remove ./ at any other position in a name
filtered_files=( "${files[#]#./}" )
This means that a file named
./foo/this directory name (which has spaces) ends with a period./bar
will correctly be transformed to
foo/this directory name (which has spaces) ends with a period./bar
rather than
foo/this directory name (which has spaces) ends with a periodbar
...which would have happened with the original approach.
man sed. In particular:
-e script, --expression=script
add the script to the commands to be executed
And:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If success-
ful, replace that portion matched with replacement. The
replacement may contain the special character & to refer to that
portion of the pattern space which matched, and the special
escapes \1 through \9 to refer to the corresponding matching
sub-expressions in the regexp.
In this case, it replaces any occurence of ./ at the beginning of a line with the empty string, in other words removing it.

How can I use sed to get an xml value

How can I use sed to get the SOMETHING in <version.suffix>SOMETHING</version.suffix>?
I tried sed 's#.*>\(.*\)\<version\.suffix\>#\1#' ,but fails.
Try this one:
sed 's/<.*>\(.*\)<.*>/\1/'
It should be general enough to get every xml value.
If you need to eliminate the indentation add \s* at the beginning like this:
sed 's/\s*<.*>\(.*\)<.*>/\1/'
Alternatively if you only want version.suffix's value, you can make the command more specific like this:
sed 's/<version\.suffix>\(.*\)<.*>/\1/'
You could use the below sed command,
$ echo '<version.suffix>SOMETHING</version.suffix>' | sed 's#^<[^>]*>\(.*\)<\/[^>]*>$#\1#'
SOMETHING
^<[^>]*> Matches the first tag string <version.suffix>.
\(.*\)<\/[^>]*>$ Characters upto the next closing tag are captured. And the remaining closing tag was matched by this <\/[^>]*> regex.
Finally all the matched characters are replaced by the characters which are present inside the group index 1.
Your regex is correct but the only thing is, you forget to use / inside the closing tag.
$ echo '<version.suffix>SOMETHING</version.suffix>' | sed 's#.*>\(.*\)</version\.suffix>#\1#'
|<-Here
SOMETHING
Many ways possible, e.g:
with sed
echo '<version.suffix>SOMETHING</version.suffix>' | sed 's#<[^>]*>##g'
or grep
echo '<version.suffix>SOMETHING</version.suffix>' | grep -oP '<version.suffix>\KSOMETHING(?=</version.suffix>)'
Assuming the formatting of the question is accurate, when I run the example in the question as-is:
$ echo '<version.suffix>SOMETHING</version.suffix>' | sed 's#.*>\(.*\)\<version\.suffix\>#\1#'
I see the following output:
SOMETHING</>
In case my formatting skills fail me, this output ends with the trailing left angle bracket, a forward slash, and finally the right angle bracket.
So, why this "failure"? Well, on my system (Linux with GNU grep 2.14), grep(1) includes the following snippet:
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the beginning and end of a word.
Other answers suggest good alternatives to extract the value in XML tag syntax; use them.
I just wanted to point out why the RE in the original problem fails on current Linux systems: some symbols match no actual characters, but instead match empty boundaries in these apps that support posix-extended regular expressions. So, in this example, the brackets in the source are matched in unexpected ways:
the (.*)has matched SOMETHING</, to be printed by the \1 back-reference
the left-hand side of version.suffix is matched by \<
version.suffix is matched by version\.suffix
the right-hand side of version.suffix is matched by \>
the trailing > character remains in sed's pattern space and is printed.
TL;DR -"\X" does not mean "just match an X" for all X!

sed help: matching and replacing a literal "\n" (not the newline)

i have a file which contains several instances of \n.
i would like to replace them with actual newlines, but sed doesn't recognize the \n.
i tried
sed -r -e 's/\n/\n/'
sed -r -e 's/\\n/\n/'
sed -r -e 's/[\n]/\n/'
and many other ways of escaping it.
is sed able to recognize a literal \n? if so, how?
is there another program that can read the file interpreting the \n's as real newlines?
Can you please try this
sed -i 's/\\n/\n/g' input_filename
What exactly works depends on your sed implementation. This is poorly specified in POSIX so you see all kinds of behaviors.
The -r option is also not part of the POSIX standard; but your script doesn't use any of the -r features, so let's just take it out. (For what it's worth, it changes the regex dialect supported in the match expression from POSIX "basic" to "extended" regular expressions; some sed variants have an -E option which does the same thing. In brief, things like capturing parentheses and repeating braces are "extended" features.)
On BSD platforms (including MacOS), you will generally want to backslash the literal newline, like this:
sed 's/\\n/\
/g' file
On some other systems, like Linux (also depending on the precise sed version installed -- some distros use GNU sed, others favor something more traditional, still others let you choose) you might be able to use a literal \n in the replacement string to represent an actual newline character; but again, this is nonstandard and thus not portable.
If you need a properly portable solution, probably go with Awk or (gasp) Perl.
perl -pe 's/\\n/\n/g' file
In case you don't have access to the manuals, the /g flag says to replace every occurrence on a line; the default behavior of the s/// command is to only replace the first match on every line.
awk seems to handle this fine:
echo "test \n more data" | awk '{sub(/\\n/,"**")}1'
test ** more data
Here you need to escape the \ using \\
$ echo "\n" | sed -e 's/[\\][n]/hello/'
sed works one line at a time, so no \n on 1 line only (it's removed by sed at read time into buffer). You should use N, n or H,h to fill the buffer with more than one line, and then \n appears inside. Be careful, ^ and $ are no more end of line but end of string/buffer because of the \n inside.
\n is recognized in the search pattern, not in the replace pattern. Two ways for using it (sample):
sed s/\(\n\)bla/\1blabla\1/
sed s/\nbla/\
blabla\
/
The first uses a \n already inside as back reference (shorter code in replace pattern);
the second use a real newline.
So basically
sed "N
$ s/\(\n\)/\1/g
"
works (but is a bit useless). I imagine that s/\(\n\)\n/\1/g is more like what you want.

Resources