Remove substring matching pattern both in the beginning and the end of the variable - string

As the title says, I'm looking for a way to remove a defined pattern both at the beginning of a variable and at the end. I know I have to use # and % but I don't know the correct syntax.
In this case, I want to remove http:// at the beginning, and /score/ at the end of the variable $line which is read from file.txt.

Well, you can't nest ${var%}/${var#} operations, so you'll have to use temporary variable.
Like here:
var="http://whatever/score/"
temp_var="${var#http://}"
echo "${temp_var%/score/}"
Alternatively, you can use regular expressions with (for example) sed:
some_variable="$( echo "$var" | sed -e 's#^http://##; s#/score/$##' )"

$ var='https://www.google.com/keep/score'
$ var=${var#*//} #removes stuff upto // from begining
$ var=${var%/*} #removes stuff from / all the way to end
$ echo $var
www.google.com/keep

You have to do it in 2 steps :
$ string="fooSTUFFfoo"
$ string="${string%foo}"
$ string="${string#foo}"
$ echo "$string"
STUFF

There IS a way to do it one step using only built-in bash functionality (no running external programs such as sed) -- with BASH_REMATCH:
url=http://whatever/score/
re='https?://(.*)/score/'
[[ $url =~ $re ]] && printf '%s\n' "${BASH_REMATCH[1]}"
This matches against the regular expression on the right-hand side of the =~ test, and puts the groups into the BASH_REMATCH array.
That said, it's more conventional to use two PE expressions and a temporary variable:
shopt -s extglob
url=http://whatever/score/
val=${url#http?(s)://}; val=${val%/score/}
printf '%s\n' "$val"
...in the above example, the extglob option is used to allow the shell to recognized "extglobs" -- bash's extensions to glob syntax (making glob-style patterns similar in power to regular expressions), among which ?(foo) means that foo is optional.
By the way, I'm using printf rather than echo in these examples because many of echo's behaviors are implementation-defined -- for instance, consider the case where the variable's contents are -e or -n.

how about
export x='https://www.google.com/keep/score';
var=$(perl -e 'if ( $ENV{x} =~ /(https:\/\/)(.+)(\/score)/ ) { print qq($2);}')

Related

awk regex compile failed

trying to do a regex replacement with a lookahead (thus awk and not sed) that removes all dots save the last one to preserve the extension eg: (my.big.file.avi > my-big-file.avi). here's my little bash script:
#!/bin/bash
shopt -s globstar nullglob dotglob
for file in ./**/*.{mpg,mpeg,mkv,avi,mp4}; do
newFile=$(printf $file | awk '{gsub(/\.(?=.*?\.)/"-");}1')
#ffmpeg -i "$newFile" -vcodec copy -acodec aac "${newFile%.*}_AAC.mp4"
printf "${file} ---> ${newFile}\n"
done
this gives me a regular expression compile failed (missing operand) error...
i can't see it. can someone point me to my mistake?
You don't need awk, or regular expressions, for any part of solving this problem; parameter expansion suffices.
#!/bin/bash
shopt -s globstar nullglob dotglob
for file in ./**/*.{mpg,mpeg,mkv,avi,mp4}; do
dirname=${file%/*} # we don't want to change the directory name
filename=${file##*/} # so split out just the filename
[[ $filename = *.*.* ]] || continue # no compound extension? do nothing
file_start=${filename%.*} # content up to last dot
file_ext=${filename##*.} # content after last dot
newFile=${dirname}/${file_start//./-}.${file_ext} # combine the two
# okay, got what we need, now we can work with it
#ffmpeg -i "$newFile" -vcodec copy -acodec aac "${newFile%.*}_AAC.mp4"
printf '%s ---> %s\n' "$file" "$newFile"
done
But if you want to use regular expressions:
#!/bin/bash
shopt -s globstar nullglob dotglob
for file in ./**/*.{mpg,mpeg,mkv,avi,mp4}; do
[[ $file =~ ^(.*)/([^/]+)[.]([^/.]+)$ ]] || continue
dirname=${BASH_REMATCH[1]}
file_start=${BASH_REMATCH[2]}
file_ext=${BASH_REMATCH[3]}
newFile=${dirname}/${file_start//./-}.${file_ext}
printf '%s ---> %s\n' "$file" "$newFile"
done
GNU AWK has limited supported for lookaheads, namely $ for end of line and \> for end of word. Your task, namely
removes all dots save the last one to preserve the extension eg:
(my.big.file.avi > my-big-file.avi)
might be accomplished using GNU AWK's functions for working with strings, I would do it as follows, let file.txt content be
my.big.file.avi
i-do-not-need-change.mp3
name-without-dot
then
awk '{match($0,/[.][^.]*$/); print gensub(/[.]/,"-","g",substr($0,1,RSTART-1)) substr($0,RSTART)}' file.txt
output
my-big-file.avi
i-do-not-need-change.mp3
name-without-dot
Note: I added 2 test cases. Explanation: Firstly use match to look for literal dot ([.]) followed by zero or more (*) not-dots ([^.]) and followed by end of line ($). This will set RSTART to position of last dot in line. Then I use substr to get part before last dot and part with last dot and following character. In 1st part I replace all dots with -, in 2nd I do nothing, then concatenate them and print. If you want to know more about functions I used read String Functions docs.
(tested in GNU Awk 5.0.1)
Keep in mind some file have 2 dots in extension, for example file.tar.gz, my solution does not that into account.
(thus awk and not sed)
Scary warning: sed is Turing complete. Ramifcation: it can do anything other Turing language can accomplish. That being said that it can does mean you should use it.
2 verbose approaches in awk :
[m/g/n]awk 'BEGIN { OFS = "-"
FS = "[.]"
} ($NF="."$NF) \
&& \
sub(/\-\./,".")'
...versus...
[m/g/n]awk 'sub(/\.[^.]+$/,"\0&") + \
gsub(/\./, "-") + \
sub(/\0\-/, ".") + 1'
I chose \0 because the null byte isn't valid in files inside just about any file system, which makes it a safe choice for using as a temporary anchor (even better than awk SUBSEP, which isn't illegal in POSIX filesystems)
And an alternative nowhere near as elegant as Charles', but maybe also does the job ...
echo my.big.file.avi | sed -E 's/\./-/g;s/-([^-]+)$/.\1/'
my-big-file.avi

/bin/dash: Bad substitution

I need to do a string manipuilation in shell script (/bin/dash):
#!/bin/sh
PORT="-p7777"
echo $PORT
echo ${PORT/p/P}
the last echo fails with Bad substitution. When I change shell to bash, it works:
#!/bin/bash
PORT="-p7777"
echo $PORT
echo ${PORT/p/P}
How can I implement the string substitution in dash ?
The substitution you're using is not a basic POSIX feature (see here, in section 2.6.2 Parameter Expansion), and dash doesn't implement it.
But you can do it with any of a number of external helpers; here's an example using sed:
PORT="-p7777"
CAPITOLPORT=$(printf '%s\n' "$PORT" | sed 's/p/P/')
printf '%s\n' "$CAPITOLPORT"
BTW, note that I'm using printf '%s\n' instead of echo -- that's because some implementations of echo do unpredictable things when their first argument starts with "-". printf is a little more complicated to use (you need a format string, in this case %s\n) but much more reliable. I'm also double-quoting all variable references ("$PORT" instead of just $PORT), to prevent unexpected parsing.
I'd also recommend switching to lower- or mixed-case variables. There are a large number of all-caps variable that have special meanings, and if you accidentally use one of those it can cause problems.
Using parameter expansion:
$ cat foo.sh
#!/bin/sh
PORT="-p7777"
echo $PORT
echo ${PORT:+-P${PORT#-p}}
PORT=""
echo $PORT
echo ${PORT:+-P${PORT#-p}}
Run it:
$ /bin/sh foo.sh
-p7777
-P7777
Update:
$ man dash:
- -
${parameter#word} Remove Smallest Prefix Pattern.
$ echo ${PORT#-p}
7777
$ man dash
- -
${parameter:+word} Use Alternative Value.
$ echo ${PORT:+-P${PORT#-p}}
-P7777

Extract property value in filename?

I have many file paths of the form:
dir1/someotherdir/name_q=3_a=2.34_p=1.2.ext
I am running a bash script to do some processing on these files, and I need to extract the value of p (in this case 1.2; in general it is a floating number) from each of these paths. Basically I am running a for loop over all the file paths, and for each path, I need to extract the value of p. How can I do this?
Parameter expansion is a useful tool for this kind of operation:
#!/bin/bash
# ^^^^ IMPORTANT: Not /bin/sh
f=dir1/someotherdir/name_q=3_a=2.34_p=1.2.ext
if [[ $f = *_p=* ]]; then # Check for substring in filename
val=${f##*_p=} # Trim everything before the last "_p="
val=${val%%_*} # Trim everything after first subsequent _
val=${val%.ext} # Trim extension, should it exist.
echo "Extracted $val from filename $f"
fi
Alternately, you could also use shell-native regex support:
#!/bin/bash
# ^^^^ again, NOT /bin/sh
f=dir1/someotherdir/name_q=3_a=2.34_p=1.2.ext
# assigning regex to a variable avoids surprising behavior with some older bash releases
p_re='_p=([[:digit:].]+)(_|[.]ext$)'
if [[ $f =~ $p_re ]]; then # evaluate regex
echo "Extracted ${BASH_REMATCH[1]}" # extract groups from BASH_REMATCH array
fi
For completeness, another approach is to use eval. There can be security dangers here, you have to make your own mind-up if these are justified.
I am using IFS for the split - not everyone's favourite, but it is another way to do it. The eval will execute each assignment as it finds it, in this case dynamically creating variables q, a, and p.
fname='dir1/someotherdir/name_q=3_a=2.34_p=1.2.ext'
OldIFS="$IFS"
IFS='_'
for val in $fname
do
if [[ $val == *=* ]]
then
val=${val%.ext}
eval "$val"
fi
done
IFS="$OldIFS"
echo "$q"
echo "$a"
echo "$p"

Why am I getting command not found error on numeric comparison?

I am trying to parse each line of a file and look for a particular string. The script seems to be doing its intended job, however, in parallel it tries to execute the if command on line 6:
#!/bin/bash
for line in $(cat $1)
do
echo $line | grep -e "Oct/2015"
if($?==0); then
echo "current line is: $line"
fi
done
and I get the following (my script is readlines.sh)
./readlines.sh: line 6: 0==0: command not found
First: As Mr. Llama says, you need more spaces. Right now your script tries to look for a file named something like /usr/bin/0==0 to run. Instead:
[ "$?" -eq 0 ] # POSIX-compliant numeric comparison
[ "$?" = 0 ] # POSIX-compliant string comparison
(( $? == 0 )) # bash-extended numeric comparison
Second: Don't test $? at all in this case. In fact, you don't even have good cause to use grep; the following is both more efficient (because it uses only functionality built into bash and requires no invocation of external commands) and more readable:
if [[ $line = *"Oct/2015"* ]]; then
echo "Current line is: $line"
fi
If you really do need to use grep, write it like so:
if echo "$line" | grep -q "Oct/2015"; then
echo "Current line is: $line"
fi
That way if operates directly on the pipeline's exit status, rather than running a second command testing $? and operating on that command's exit status.
#Charles Duffy has a good answer which I have up-voted as correct (and it is), but here's a detailed, line by line breakdown of your script and the correct thing to do for each part of it.
for line in $(cat $1)
As I noted in my comment elsewhere this should be done as a while read construct instead of a for cat construct.
This construct will wordsplit each line making spaces in the file separate "lines" in the output.
All empty lines will be skipped.
In addition when you cat $1 the variable should be quoted. If it is not quoted spaces and other less-usual characters appearing in the file name will cause the cat to fail and the loop will not process the file.
The complete line would read:
while IFS= read -r line
An illustrative example of the tradeoffs can be found here. The linked test script follows. I tried to include an indication of why IFS= and -r are important.
#!/bin/bash
mkdir -p /tmp/testcase
pushd /tmp/testcase >/dev/null
printf '%s\n' '' two 'three three' '' ' five with leading spaces' 'c:\some\dos\path' '' > testfile
printf '\nwc -l testfile:\n'
wc -l testfile
printf '\n\nfor line in $(cat) ... \n\n'
let n=1
for line in $(cat testfile) ; do
echo line $n: "$line"
let n++
done
printf '\n\nfor line in "$(cat)" ... \n\n'
let n=1
for line in "$(cat testfile)" ; do
echo line $n: "$line"
let n++
done
let n=1
printf '\n\nwhile read ... \n\n'
while read line ; do
echo line $n: "$line"
let n++
done < testfile
printf '\n\nwhile IFS= read ... \n\n'
let n=1
while IFS= read line ; do
echo line $n: "$line"
let n++
done < testfile
printf '\n\nwhile IFS= read -r ... \n\n'
let n=1
while IFS= read -r line ; do
echo line $n: "$line"
let n++
done < testfile
rm -- testfile
popd >/dev/null
rmdir /tmp/testcase
Note that this is a bash-heavy example. Other shells do not tend to support -r for read, for example, nor is let portable. On to the next line of your script.
do
As a matter of style I prefer do on the same line as the for or while declaration, but there's no convention on this.
echo $line | grep -e "Oct/2015"
The variable $line should be quoted here. In general, meaning always unless you specifically know better, you should double-quote all expansion--and that means subshells as well as variables. This insulates you from most unexpected shell weirdness.
You decclared your shell as bash which means you will have there "Here string" operator <<< available to you. When available it can be used to avoid the pipe; each element of a pipeline executes in a subshell, which incurs extra overhead and can lead to unexpected behavior if you try to modify variables. This would be written as
grep -e "Oct/2015" <<<"$line"
Note that I have quoted the line expansion.
You have called grep with -e, which is not incorrect but is needless since your pattern does not begin with -. In addition you have full-quoted a string in shell but you don't attempt to expand a variable or use other shell interpolation inside of it. When you don't expect and don't want the contents of a quoted string to be treated as special by the shell you should single quote them. Furthermore, your use of grep is inefficient: because your pattern is a fixed string and not a regular expression you could have used fgrep or grep -F, which does string contains rather than regular expression matching (and is far faster because of this). So this could be
grep -F 'Oct/2015' <<<"$line"
Without altering the behavior.
if($?==0); then
This is the source of your original problem. In shell scripts commands are separated by whitespace; when you say if($?==0) the $? expands, probably to 0, and bash will try to execute a command called if(0==0) which is a legal command name. What you wanted to do was invoke the if command and give it some parameters, which requires more whitespace. I believe others have covered this sufficiently.
You should never need to test the value of $? in a shell script. The if command exists for branching behavior based on the return code of whatever command you pass to it, so you can inline your grep call and have if check its return code directly, thus:
if grep -F 'Oct/2015` <<<"$line" ; then
Note the generous whitespace around the ; delimiter. I do this because in shell whitespace is usually required and can only sometiems be omitted. Rather than try to remember when you can do which I recommend an extra one space padding around everything. It's never wrong and can make other mistakes easier to notice.
As others have noted this grep will print matched lines to stdout, which is probably not something you want. If you are using GNU grep, which is standard on Linux, you will have the -q switch available to you. This will suppress the output from grep
if grep -q -F 'Oct/2015' <<<"$line" ; then
If you are trying to be strictly standards compliant or are in any environment with a grep that doesn't know -q the standard way to achieve this effect is to redirect stdout to /dev/null/
if printf "$line" | grep -F 'Oct/2015' >/dev/null ; then
In this example I also removed the here string bashism just to show a portable version of this line.
echo "current line is: $line"
There is nothing wrong with this line of your script, except that although echo is standard implementations vary to such an extent that it's not possible to absolutely rely on its behavior. You can use printf anywhere you would use echo and you can be fairly confident of what it will print. Even printf has some caveats: Some uncommon escape sequences are not evenly supported. See mascheck for details.
printf 'current line is: %s\n' "$line"
Note the explicit newline at the end; printf doesn't add one automatically.
fi
No comment on this line.
done
In the case where you did as I recommended and replaced the for line with a while read construct this line would change to:
done < "$1"
This directs the contents of the file in the $1 variable to the stdin of the while loop, which in turn passes the data to read.
In the interests of clarity I recommend copying the value from $1 into another variable first. That way when you read this line the purpose is more clear.
I hope no one takes great offense at the stylistic choices made above, which I have attempted to note; there are many ways to do this (but not a great many correct) ways.
Be sure to always run interesting snippets through the excellent shellcheck and explain shell when you run into difficulties like this in the future.
And finally, here's everything put together:
#!/bin/bash
input_file="$1"
while IFS= read -r line ; do
if grep -q -F 'Oct/2015' <<<"$line" ; then
printf 'current line is %s\n' "$line"
fi
done < "$input_file"
If you like one-liners, you may use AND operator (&&), for example:
echo "$line" | grep -e "Oct/2015" && echo "current line is: $line"
or:
grep -qe "Oct/2015" <<<"$line" && echo "current line is: $line"
Spacing is important in shell scripting.
Also, double-parens is for numerical comparison, not single-parens.
if (( $? == 0 )); then

Using a variable to replace lines in a file with backslashes

I want to add the string %%% to the beginning of some specific lines in a text file.
This is my script:
#!/bin/bash
a="c:\Temp"
sed "s/$a/%%%$a/g" <File.txt
And this is my File.txt content:
d:\Temp
c:\Temp
e:\Temp
But nothing changes when I execute it.
I think the 'sed' command is not finding the pattern, possibly due to the \ backslashes in the variable a.
I can find the c:\Temp line if I use grep with -F option (to not interpret strings):
cat File.txt | grep -F "$a"
But sed seems not to implement such '-F` option.
Not working neither:
sed 's/$a/%%%$a/g' <File.txt
sed 's/"$a"/%%%"$a"/g' <File.txt
I have found similar threads about replacing with sed, but they don't refer to variables.
How can I replace the desired lines by using a variable adding them the %%% char string?
EDIT: It would be fine that the $a variable could be entered via parameter when calling the script, so it will be assigned like:
a=$1
Try it like this:
#!/bin/sh
a='c:\\Temp' # single quotes
sed "s/$a/%%%$a/g" <File.txt # double quotes
Output:
Johns-MacBook-Pro:sed jcreasey$ sh x.sh
d:\Temp
e:\Temp
%%%c:\Temp
You need the double slash '\' to escape the '\'.
The single quotes won't expand the variables.
So you escape the slash in single quotes and pass it into the double quotes.
Of course you could also just do this:
#!/bin/sh
sed 's/\(.*Temp\)/%%%&/' <File.txt
If you want to get input from the command line you have to allow for the fact that \ is an escape character there too. So the user needs to type 'c:\\' or the interpreter will just wait for another character. Then once you get it, you will need to escape it again. (printf %q).
#!/bin/sh
b=`printf "%q" $1`
sed "s/\($b\)/%%% &/" < File.txt
The issue you are having has to do with substitution of your variable providing a regular expression looking for a literal c:Temp with the \ interpreted as an escape by the shell. There are a number of workarounds. Seeing the comments and having worked through the possibilities, the following will allow the unquoted entry of the search term:
#!/bin/bash
## validate that needed input is given on the command line
[ -n "$1" -a "$2" ] || {
printf "Error: insufficient input. Usage: %s <term> <file>\n" "${0//*\//}" >&2
exit 1
}
## validate that the filename given is readable
[ -r "$2" ] || {
printf "Error: file not readable '%s'\n" "$2" >&2
exit 1
}
a="$1" # assign a
filenm="$2" # assign filename
## test and fix the search term entered
[[ "$a" =~ '/' ]] || a="${a/:/:\\}" # test if \ removed by shell, if so replace
a="${a/\\/\\\\}" # add second \
sed -e "s/$a/%%%$a/g" "$filenm" # call sed with output to stdout
Usage:
$ bash sedwinpath.sh c:\Temp dat/winpath.txt
d:\Temp
%%%c:\Temp
e:\Temp
Note: This allows both single-quoted or unquoted entry of the dos path search term. To edit in place use sed -i. Additionally, the [[ operator and =~ operator are limited to bash.
I could have sworn the original question said replace, but to append, just as you suggest in the comments. I have updated the code with:
sed -e "s/$a/%%%$a/g" "$filenm"
Which provides the new output:
$ bash sedwinpath.sh c:\Temp dat/winpath.txt
d:\Temp
%%%c:\Temp
e:\Temp
Remember: If you want to edit the file in place use sed -i or sed -i.bak which will edit the actual file (and if -i.bak is given create a backup of the original in originalname.bak). Let me know if that is not what you intended and I'm happy to edit again.
Creating your script with a positional parameter of $1
#!/bin/bash
a="$1"
cat <file path>|sed "s/"$1"/%%%"$1"/g" > "temporary file"
Now whenever you want sed to find "c:\Temp" you need to use your script command line as follows
bash <my executing script> c:\\\\Temp
The first backslash will make bash interpret any backslashes that follows therefore what will be save in variable "a" in your executing script is "c:\\Temp". Now substituting this variable in sed will cause sed to interpret 1 backlash since the first backslash in this variable will cause sed to start interpreting the other backlash.
when you Open your temporary file you will see:
d:\Temp
%%%c:\Temp
e:\Temp

Resources