Understanding sed expression 's/^\.\///g' - linux

I'm studying Bash programming and I find this example but I don't understand what it means:
filtered_files=`echo "$files" | sed -e 's/^\.\///g'`
In particular the argument passed to sed after '-e'.

It's a bad example; you shouldn't follow it.
First, understanding the sed expression at hand.
s/pattern/replacement/flags is the a sed command, described in detail in man sed. In this case, pattern is a regular expression; replacement is what that pattern gets replaced with when/where found; and flags describe details about how that replacement should be done.
In this case, the s/^\.\///g breaks down as follows:
s is the sed command being run.
/ is the sigil used to separate the sections of this command. (Any character can be used as a sigil, and the person who chose to use / for this expression was, to be charitable, not thinking about what they were doing very hard).
^\.\/ is the pattern to be replaced. The ^ means that this replaces anything only at the beginning; \. matches only a period, vs . (which is regex for matching any character); and \/ matches only a / (vs /, which would go on to the next section of this sed command, being the selected sigil).
The next section is an empty string, which is why there's no content between the two following sigils.
g in the flags section indicates that more than one replacement can happen each line. In conjunction with ^, this has no meaning, since there can only be one beginning-of-the-line per line; further evidence that the person who wrote your example wasn't thinking much.
Using the same data structures, doing it better:
All of the below are buggy when handling arbitrary filenames, because storing arbitrary filenames in scalar variables is buggy in general.
Still using sed:
# Use printf instead of echo to avoid bugginess if your "files" string is "-n" or "-e"
# Use "#" as your sigil to avoid needing to backslash-escape all the "\"s
filtered_files=$(printf '%s\n' "$files" | sed -e 's#^[.]/##g'`)
Replacing sed with a bash builtin:
# This is much faster than shelling out to any external tool
filtered_files=${files//.\//}
Using better data structures
Instead of running
files=$(find .)
...instead:
files=( )
while IFS= read -r -d '' filename; do
files+=( "$filename" )
done < <(find . -print0)
That stores files in an array; it looks complex, but it's far safer -- works correctly even with filenames containing spaces, quote characters, newline literals, etc.
Also, this means you can do the following:
# Remove the leading ./ from each name; don't remove ./ at any other position in a name
filtered_files=( "${files[#]#./}" )
This means that a file named
./foo/this directory name (which has spaces) ends with a period./bar
will correctly be transformed to
foo/this directory name (which has spaces) ends with a period./bar
rather than
foo/this directory name (which has spaces) ends with a periodbar
...which would have happened with the original approach.

man sed. In particular:
-e script, --expression=script
add the script to the commands to be executed
And:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If success-
ful, replace that portion matched with replacement. The
replacement may contain the special character & to refer to that
portion of the pattern space which matched, and the special
escapes \1 through \9 to refer to the corresponding matching
sub-expressions in the regexp.
In this case, it replaces any occurence of ./ at the beginning of a line with the empty string, in other words removing it.

Related

sed doesn't replace variable

I'm trying to replace some regex line in a apache file.
i define:
OLD1="[0-9]*.[0-9]+"
NEW1="[a-z]*.[0-9]"
when i'm executing:
sed -i 's/$OLD1/$NEW1/g' demo.conf
there's no change.
This is what i tried to do
sed -i "s/${OLD1}/${NEW1}/g" 001-kms.conf
sed -i "s/"$OLD1"/"$NEW1"/g" 001-kms.conf
sed -i "s~${OLD1}~${NEW1}~g" 001-kms.conf
i'm expecting that the new file will replace $OLD1 with $NEW1
OLD1="[0-9]*.[0-9]+"
Because the [ * . are all characters with special meaning in sed, we need to escape them. For such simple case something like this could work:
OLD2=$(<<<"$OLD1" sed 's/[][\*\.]/\\&/g')
It will set OLD2 to \[0-9\]\*\.\[0-9\]+. Note that it doesn't handle all the possible cases, like OLD1='\.\[' will convert to OLD2='\\.\\[ which means something different. Implementing a proper regex to properly escape, well, other regex I leave as an exercise to others.
Now you can:
sed "s/$OLD2/$NEW1/g"
Tested with:
OLD1="[0-9]*.[0-9]+"
NEW1="[a-z]*.[0-9]"
sed "s/$(sed 's/[][\*\.]/\\&/g' <<<"$OLD1")/$NEW1/g" <<<'XYZ="[0-9]*.[0-9]+"'
will output:
XYZ="[a-z]*.[0-9]"
you need matching on exact string
You would need something that can match on exact string [0-9]*.[0-9]+ which sed does not support well.
Therefore instead I am using this pipeline replacing one character at a time (it also is easier to read I think):echo "[0-9]*.[0-9]+" | sed 's/0/a/' | sed 's/9/z/' | sed 's/+//'
You would have to cat your files or use find with execute to then apply this pipe.
I had tried following (from other SO answers):
- sed 's/\<exact_string/>/replacement/'doesn't work as \< and \> are left and right word boundaries respectively.
- sed 's/(CB)?exact_string/replacement/'found in one answer but nowhere in documentation
I used Win 10 bash, git bash, and online Linux tools with the same results.
when I thought matching was on the pattern rather than exact string
Replacement cannot be a regex - at most it can reference parts of the regex expression which matched. From man sed:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
Additionally you have to escape some characters in your regex (specifically . and +) unless you add option -E for extended regex as per comment under your question. (N.B. only if you want to match on the full-stop . rather than it meaning any character)
$ echo "01.234--ZZZ" | sed 's/[0-9]*\.[0-9]\+/REPLACEMENT/g'
REPLACEMENT--ZZZ

Remove text between one string and 1st occurrence of another string

I have found several solutions to remove text between two strings but I guess my case is a little different.
I am trying to convert this:
/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume
To this:
/nz/kit/bin/adm/tools/hostaekresume
Basically remove the version specific information from the filename.
The solutions I have found remove everything from the word kit to the last occurrence of /. I need something to remove from kit to the first occurrence.
The most common solution I have seen is:
sed -e 's/\(kit\).*\(\/\)/\1\2/'
Which produces:
/nz/kit/hostaekresume
How can I only remove up to the first /? I assume this can done with sed or awk, but open to suggestions.
$ sed 's|\(kit\)[^/]*|\1|' <<< '/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume'
/nz/kit/bin/adm/tools/hostaekresume
This uses a different delimiter (| instead of /) so we don't have to escape the /. Then, for non-greedy matching, it uses [^/]*: any number of characters other than /, which matches everything between kit and the next /.
Alternatively, if you know that what you want to remove consists of dots and digits, and nothing else in the string contains them, you can use parameter expansion:
$ var='/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume'
$ echo "${var//[[:digit:].]}"
/nz/kit/bin/adm/tools/hostaekresume
The syntax is ${parameter/pattern/string}, where pattern in the expanded parameter is replaced by string. If we use // instead of /, all occurrences instead of just the first are replaced.
In our case, parameter is var, the pattern is [[:digit:].] (digits or a dot – this is a glob pattern, not a regular expression, by the way), and we've skipped the /string part, which just removes the pattern (replaces it with nothing).
You need perl for non-greedy regex. sed doesn't do that yet.
Also, use | as a delimiter since / can cause confusion when you have it in your regex.
perl -pe 's|(kit).*?(/.*)|\1\2|'
The ? after the .* makes the pattern non-greedy and will match the first instance of /.
echo "/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume" | perl -pe 's|(kit).*?(/.*)|\1\2|'
returns
/nz/kit/bin/adm/tools/hostaekresume
echo "/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume" | awk '{sub(/.7.2.0.7/,"")}1'
/nz/kit/bin/adm/tools/hostaekresume

Bash: String manipulation terminate on whitespace

I have a string variable $LIBRARIES="abc.so.1 def.so.1 hij.so.3.1" and I want to replace all the .so such that they look like this:
"abc.so* def.so* hij.so*"
How can I do this? I tried NEW_LIBRARIES=${LIBRARIES//.so*/.so$star} but it doesn't work. How can I tell it to end on whitespace?
or simpler
${LIBRARIES//.[0-9]/*}
the ones with 2 extensions will get 2 ** but that should be fine
This should do it:
LIBRARIES="abc.so.1 def.so.1 hij.so.3.1"
NEW_ALL_LIBRARIES=$(sed 's/\.so[^ ]*/\.so\*/g' <<< "$LIBRARIES")
echo "$NEW_ALL_LIBRARIES"
Output:
abc.so* def.so* hij.so*
Explanation:
LIBRARIES="...": When assigning a string to a variable, the variable is not prefixed with $
NEW_ALL_LIBRARIES=$(...): The $(...) notation is called command subsitution; basically it spawns a new subshell to run whatever commands contained within, then returns the output to this new subshell's stdout (and here saving it to NEW_ALL_LIBRARIES).
sed: invoke sed, the Streaming EDitor tool
's/\.so[^ ]*/\.so\*/g': Use regular expressions (regex) to match patterns and substitute. Let's break this syntax down a bit further:
s/ "substitute"; For example: s/A/B/g replaces all occurrences of A with B
\.so[^ ]*/: Match any patterns that start with .so, and the [^ ]* part means "followed by zero or more non-space characters"
\.so\*/: Replace that with literally .so*
Some symbols such as [, ], . and * have special meaning in regex, so if you mean to use them literally, you just "escape" them by prefixing a \
g: Do so for all occurrences, not just the first.
<<< "$LIBRARIES": the <<< notation is called herestring: in this context it accomplishes the same thing as echo "$LIBRARIES" | sed ..., but it saves a subshell.

Bash - Changing configuration file with sed

I've been having some problems with a shell script that changes a configuration file named ".backup.conf".
The configuration file looks like this:
inputdirs=(/etc /etc/apm /usr/local)
outputdir="test_outputdir"
backupmethod="test_outputmethod"
loglocation="test_loglocation"`
My script needs to change one of the configuration file variables, and I've had no trouble with the last 3 variables.
If I wanted to change variable "inputdirs" /etc/ to /etc/perl, what expression should I use?
If I use echo with append, it will only append it to the end of the file.
I've tried using sed in the following format:
sed -i 's/${inputdirs[$((izbor-1))]}/$novi/g' .backup.conf where "izbor" is which variable I want to change from inputdirs and "novi" is the new path (e.g. /etc/perl).
So, with the following configuration file, and with variables $izbor=1and $novi=/etc/perl I should change the first variable inputdirs=/etc to /etc/perl
and the variable inputdirs should finally look like inputdirs=(/etc/perl /etc/apm /usr/local)
Thank you for your help!
You could try this:
enovi="$(printf '%s\n' "$novi" | sed -e 's/[\\&/]/\\&/g')"
izbor1="$(expr "$izbor" - 1)"
sed -rie "s/([(]([^ ]* ){$izbor1})[^ )]*/\\1$enovi/" config.txt
A summary of the commands:
The first line generates a variable $enovi that has the escaped contents of $novi. Basically,the following characters are escaped: &, \, and /. So /etc/perl becomes \/etc\/perl.
We create a new variable decrementing $izbor.
This is the actual substitute expression. I'll explain it in parts:
First we match the parenthesis character [(].
We will now search for a sequence of non-spaces followed by a space ([^ ]*).
This search (identified by grouping in the inner parenthesis) is repeated $izbor1 times ({$izbor1})
The previous expressions are grouped into an outer parenthesis group in order to be captured into an auxiliary variable \1.
We now match the word we want to replace. It is formed by a sequence of characters that aren't spaces and isn't a closing parenthesis (this is to handle the case of the last word)
The replacement is formed by the captured value \1, followed by our new string.
Hope this helps =)
If you are trying to use $izbor as an index, it will probably want to be a flag to s///. Assuming your input matches ^inputdirs=( (with no whitespace), you can probably get away with:
sed -i '/^inputdirs=(/{
s/(/( /; s/)/ )/; # Insert spaces inside parentheses
s# [^ ][^ ]* # '"$novi#$izbor"';
s/( /(/; s/ )/)/; } # Remove inserted spaces
' .backup.conf
The first two expressions ensure that you have whitespace inside the parentheses,
so may not be necessary if your input already has whitespace there. It's a bit obfuscated above, but basically the replacement you are doing is something like:
s# [^ ][^ ]* #/etc/perl#2
where the 2 flag tells sed to only replace the second occurrence of the match. This is really fragile, since it requires no whitespace before inputdirs and whitespace inside the parens and does not handle tabs, but it should work for you. Also, some sed allow [^ ][^ ]* to be written more simply as [^ ]+, but that is not universal.

Extract file basename without path and extension in bash [duplicate]

This question already has answers here:
Extract filename and extension in Bash
(38 answers)
Closed 6 years ago.
Given file names like these:
/the/path/foo.txt
bar.txt
I hope to get:
foo
bar
Why this doesn't work?
#!/bin/bash
fullfile=$1
fname=$(basename $fullfile)
fbname=${fname%.*}
echo $fbname
What's the right way to do it?
You don't have to call the external basename command. Instead, you could use the following commands:
$ s=/the/path/foo.txt
$ echo "${s##*/}"
foo.txt
$ s=${s##*/}
$ echo "${s%.txt}"
foo
$ echo "${s%.*}"
foo
Note that this solution should work in all recent (post 2004) POSIX compliant shells, (e.g. bash, dash, ksh, etc.).
Source: Shell Command Language 2.6.2 Parameter Expansion
More on bash String Manipulations: http://tldp.org/LDP/LG/issue18/bash.html
The basename command has two different invocations; in one, you specify just the path, in which case it gives you the last component, while in the other you also give a suffix that it will remove. So, you can simplify your example code by using the second invocation of basename. Also, be careful to correctly quote things:
fbname=$(basename "$1" .txt)
echo "$fbname"
A combination of basename and cut works fine, even in case of double ending like .tar.gz:
fbname=$(basename "$fullfile" | cut -d. -f1)
Would be interesting if this solution needs less arithmetic power than Bash Parameter Expansion.
Here are oneliners:
$(basename "${s%.*}")
$(basename "${s}" ".${s##*.}")
I needed this, the same as asked by bongbang and w4etwetewtwet.
Pure bash, no basename, no variable juggling. Set a string and echo:
p=/the/path/foo.txt
echo "${p//+(*\/|.*)}"
Output:
foo
Note: the bash extglob option must be "on", (Ubuntu sets extglob "on" by default), if it's not, do:
shopt -s extglob
Walking through the ${p//+(*\/|.*)}:
${p -- start with $p.
// substitute every instance of the pattern that follows.
+( match one or more of the pattern list in parenthesis, (i.e. until item #7 below).
1st pattern: *\/ matches anything before a literal "/" char.
pattern separator | which in this instance acts like a logical OR.
2nd pattern: .* matches anything after a literal "." -- that is, in bash the "." is just a period char, and not a regex dot.
) end pattern list.
} end parameter expansion. With a string substitution, there's usually another / there, followed by a replacement string. But since there's no / there, the matched patterns are substituted with nothing; this deletes the matches.
Relevant man bash background:
pattern substitution:
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pat
tern just as in pathname expansion. Parameter is expanded and
the longest match of pattern against its value is replaced with
string. If pattern begins with /, all matches of pattern are
replaced with string. Normally only the first match is
replaced. If pattern begins with #, it must match at the begin‐
ning of the expanded value of parameter. If pattern begins with
%, it must match at the end of the expanded value of parameter.
If string is null, matches of pattern are deleted and the / fol
lowing pattern may be omitted. If parameter is # or *, the sub
stitution operation is applied to each positional parameter in
turn, and the expansion is the resultant list. If parameter is
an array variable subscripted with # or *, the substitution
operation is applied to each member of the array in turn, and
the expansion is the resultant list.
extended pattern matching:
If the extglob shell option is enabled using the shopt builtin, several
extended pattern matching operators are recognized. In the following
description, a pattern-list is a list of one or more patterns separated
by a |. Composite patterns may be formed using one or more of the fol
lowing sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
Here is another (more complex) way of getting either the filename or extension, first use the rev command to invert the file path, cut from the first . and then invert the file path again, like this:
filename=`rev <<< "$1" | cut -d"." -f2- | rev`
fileext=`rev <<< "$1" | cut -d"." -f1 | rev`
If you want to play nice with Windows file paths (under Cygwin) you can also try this:
fname=${fullfile##*[/|\\]}
This will account for backslash separators when using BaSH on Windows.
Just an alternative that I came up with to extract an extension, using the posts in this thread with my own small knowledge base that was more familiar to me.
ext="$(rev <<< "$(cut -f "1" -d "." <<< "$(rev <<< "file.docx")")")"
Note: Please advise on my use of quotes; it worked for me but I might be missing something on their proper use (I probably use too many).
Use the basename command. Its manpage is here: http://unixhelp.ed.ac.uk/CGI/man-cgi?basename

Resources