I try to convert filenames and remove special chars and whitespaces.
For some reasons my SED regex don't work if I declare dash and slashes not to be replaced.
Example:
echo "/path/to/file 20-456 (1).jpg" | sed -e 's/ /_/g' -e 's/[^0-9a-zA-Z\.\_\-\/]//g'
Output:
/path/to/file_20456_1.jpg
So the dash isn't in.
When I try this command:
echo "/path/to/file 20-456 (1).jpg" | sed -e 's/ /_/g' -e 's/[^0-9a-zA-Z\.\_\-]//g'
Output:
pathtofile_20-456_1.jpg
the dash is there but without the directory slashes I can't move the files.
I wonder why the replacment with dash didn't work anymore if I add \/ into regex pattern.
Any suggestions?
With your shown samples and attempts, please try following awk code.
echo "/path/to/file 20-456 (1).jpg" |
awk 'BEGIN{FS=OFS="/"} {gsub(/ /,"_",$NF);gsub(/-|\(|\)/,"",$NF)} 1'
Explanation: Simple explanation would be, by echo printing value /path/to/file 20-456 (1).jpg as a standard input to awk program. In awk program, setting FS and OFS to / in BEGIN section. Then in main program using gsub to globally substitute space with _ in last field($NF) and then globally substitute - OR ( OR ) with NULL in last field and then mentioning 1 will print that line.
You may get the result using string manipulation in Bash:
#!/bin/bash
path="/path/to/file 20-456 (1).jpg"
fldr="${path%/*}" # Get the folder
file="${path##*/}" # Get the file name
file="${file// /_}" # Replace spaces with underscores in filename
echo "$fldr/${file//[^[:alnum:]._-]/}" # Get the result
See the online demo yielding /path/to/file_20-456_1.jpg.
Quick notes:
${path%/*} - Removes the smallest chunk up to / from the end of the path
${path##*/} - Removes the largest text chunk from start of path to last / (including it)
${file// /_} replaces all spaces with _ in file
${file//[^[:alnum:]._-]/} removes all chars that are not alphanumeric, ., _ and - from file.
Related
I want to trim a string from one character, the last /, to either : or #, which ever appears first. An example would be:
https://www.example.com/?client=safari/this-text:not-this:or_this
would be trimmed to:
this-text
and
https://www.example.com/?client=safari/this-text#not-this:or_this
would be trimmed to:
this-text
I know I can trim text in bash from a specific character to another character, but is there a way to trim from one character to either of 2 characters?
Use grep like so: grep -Po '^.*/\K[^:#]*'
Examples:
echo 'https://www.example.com/?client=safari/this-text:not-this:or_this' | grep -Po '^.*/\K[^:#]*'
or:
echo 'https://www.example.com/?client=safari/this-text#not-this:or_this' | grep -Po '^.*/\K[^:#]*'
Output:
this-text
Here, grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only, 1 match/line, not the entire lines.
The regex ^.*/\K[^:#]* does the following:
^.*/ : Match from the beginning of the string (^) all the way up to the last slash ('/').
\K : Pretend that the match started at this position.
[^:#]* : zero or more occurrences (greedy) of any characters except : or #. This matches either until the end of the line, or until the next : or #, whichever comes first.
SEE ALSO:
grep manual
NOTE:
This works with GNU grep, which may need to be installed, depending on your system. For example, to install GNU grep on macOS, see this answer: https://apple.stackexchange.com/a/357426/329079
With a little Bash function:
trim() {
local str=${1##*/}
printf '%s\n' "${str%%[:#]*}"
}
This first trims everything up to and including the last /, then everything starting from the first occurrence of : or #.
In use:
$ trim 'https://www.example.com/?client=safari/this-text:not-this:or_this'
this-text
$ trim 'https://www.example.com/?client=safari/this-text#not-this:or_this'
this-text
Another way is to use sed: sed -e 's,^.*/,,' -e 's,[:#].*$,,'.
First -e command (s/regex/replacement/) removes text from the start to the last /, then the second -e removes from : or # to the end of the text.
echo 'https://www.example.com/?client=safari/this-text:not-this:or_this' | sed -e 's,^.*/,,' -e 's,[:#].*$,,'
this-text
I want to split this line
/home/edwprod/abortive_visit/bin/abortive_proc_call.ksh
to
/edwprod/abortive_visit/bin/abortive_proc_call.ksh
Can I use sed or awk command for this?
you don't need awk or sed , just try this
echo -n "/"; echo "/home/edwprod/abortive_visit/bin/abortive_proc_call.ksh" |cut -f3-6 -d/
echo '/home/edwprod/abortive_visit/bin/abortive_proc_call.ksh' | sed 's#^/[^/]\+##'
Explanatory words: using sed's replace function, we redefine the separator, which is commonly /, to #, saving us the escaping of slashes within the string. We anchor the regex at the beginning of line ^, and replace the first slash, followed by any non-slash, with nothing, thus removing the first element of the path (not the root, btw).
I need to capitalize a txt file but I found some problems when I try to add a space after any punctuation mark with sed. For instance: "Hello,World" -> to "Hello, World"
I tried the following:
#!/bin/bash
if [ $# != 1 ]; then
echo "No parameter"
exit
fi
cp $1 $1.bak
ARCH1=/tmp/`basename $1`.$$
sed 's/[A-Z]*/\L&/g' $1 > $ARCH1
sed -i 's/^./\u&/' $ARCH1
sed 's/ */\ /g' $ARCH1 #Here I replace >= 2 spaces for 1
sed 's/, */, /g' $ARCH1
#These 2 lines don't work well
sed 's/. */. /g' $ARCH1
sed 's/; */; /g' $ARCH1
mv $ARCH1 $1
The script doesn't crash, but the output is not the one that I expect.
I believe the reason your script doesn't work is that you forgot to pass -i to sed in several calls, and also that you don't escape . in the regex, so that . matches any character.
I also believe that a simpler way to do what you're trying to do is
sed -i.bak 's/[A-Z]*/\L&/g; s/\([.,;]\) */\1 /' "$1"
-i.bak edits the file in-place and creates a backup with the .bak extension, and the script is simply
s/[A-Z]*/\L&/g # lower-case everything (I got that from your code)
s/\([.,;]\) */\1 / # replace spaces after period, comma or semicolon
Here
[.,;] is a character set matching period, comma or semicolon,
\(stuff\) captures stuff in a group for later use, and
\1 is a back reference referring to the first such capture.
Note that this is a very simple approach. If your text, for example, contains ellipses (...), it'll waltz right over that and make ... into . . ., and similar caveats apply for ?! and such.
Using GNU sed:
$ echo "foo;BAR,BaZ.qux" | sed -r 's/[[:punct:]]+/& /g; s/[[:alnum:]]+/\L\u&/g'
Foo; Bar, Baz. Qux
\L lower cases the whole word, then \u upper cases the first character.
See your regex(7) man page for regular expression documentation.
I have a list of file names in a directory (/path/to/local). I would like to remove a certain number of characters from all of those filenames.
Example filenames:
iso1111_plane001_00321.moc1
iso1111_plane002_00321.moc1
iso2222_plane001_00123.moc1
In every filename I wish to remove the last 5 characters before the file extension.
For example:
iso1111_plane001_.moc1
iso1111_plane002_.moc1
iso2222_plane001_.moc1
I believe this can be done using sed, but I cannot determine the exact coding. Something like...
for filename in /path/to/local/*.moc1; do
mv $filname $(echo $filename | sed -e 's/.....^//');
done
...but that does not work. Sorry if I butchered the sed options, I do not have much experience with it.
mv $filname $(echo $filename | sed -e 's/.....\.moc1$//');
or
echo ${filename%%?????.moc1}.moc1
%% is a bash internal operator...
This sed command will work for all the examples you gave.
sed -e 's/\(.*\)_.*\.moc1/\1_.moc1/'
However, if you just want to specifically "remove 5 characters before the last extension in a filename" this command is what you want:
sed -e 's/\(.*\)[0-9a-zA-Z]\{5\}\.\([^.]*\)/\1.\2/'
You can implement this in your script like so:
for filename in /path/to/local/*.moc1; do
mv $filename "$(echo $filename | sed -e 's/\(.*\)[0-9a-zA-Z]\{5\}\.\([^.]*\)/\1.\2/')";
done
First Command Explanation
The first sed command works by grabbing all characters until the first underscore: \(.*\)_
Then it discards all characters until it finds .moc1: .*\.moc1
Then it replaces the text that it found with everything it grabbed at first inside the parenthesis: /\1
And finally adds the .moc1 extension back on the end and ends the regex: .moc1/
Second Command Explanation
The second sed command works by grabbing all characters at first: \(.*\)
And then it is forced to stop grabbing characters so it can discard five characters, or more specifically, five characters that lie in the ranges 0-9, a-z, and A-Z: [0-9a-zA-Z]\{5\}
Then comes the dot '.' character to mark the last extension : \.
And then it looks for all non-dot characters. This ensures that we are grabbing the last extension: \([^.]*\)
Finally, it replaces all that text with the first and second capture groups, separated by the . character, and ends the regex: /\1.\2/
This might work for you (GNU sed):
sed -r 's/(.*).{5}\./\1./' file
If I run these commands from a script:
#my.sh
PWD=bla
sed 's/xxx/'$PWD'/'
...
$ ./my.sh
xxx
bla
it is fine.
But, if I run:
#my.sh
sed 's/xxx/'$PWD'/'
...
$ ./my.sh
$ sed: -e expression #1, char 8: Unknown option to `s'
I read in tutorials that to substitute environment variables from shell you need to stop, and 'out quote' the $varname part so that it is not substituted directly, which is what I did, and which works only if the variable is defined immediately before.
How can I get sed to recognize a $var as an environment variable as it is defined in the shell?
Your two examples look identical, which makes problems hard to diagnose. Potential problems:
You may need double quotes, as in sed 's/xxx/'"$PWD"'/'
$PWD may contain a slash, in which case you need to find a character not contained in $PWD to use as a delimiter.
To nail both issues at once, perhaps
sed 's#xxx#'"$PWD"'#'
In addition to Norman Ramsey's answer, I'd like to add that you can double-quote the entire string (which may make the statement more readable and less error prone).
So if you want to search for 'foo' and replace it with the content of $BAR, you can enclose the sed command in double-quotes.
sed 's/foo/$BAR/g'
sed "s/foo/$BAR/g"
In the first, $BAR will not expand correctly while in the second $BAR will expand correctly.
Another easy alternative:
Since $PWD will usually contain a slash /, use | instead of / for the sed statement:
sed -e "s|xxx|$PWD|"
You can use other characters besides "/" in substitution:
sed "s#$1#$2#g" -i FILE
一. bad way: change delimiter
sed 's/xxx/'"$PWD"'/'
sed 's:xxx:'"$PWD"':'
sed 's#xxx#'"$PWD"'#'
maybe those not the final answer,
you can not known what character will occur in $PWD, / : OR #.
if delimiter char in $PWD, they will break the expression
the good way is replace(escape) the special character in $PWD.
二. good way: escape delimiter
for example:
try to replace URL as $url (has : / in content)
x.com:80/aa/bb/aa.js
in string $tmp
URL
A. use / as delimiter
escape / as \/ in var (before use in sed expression)
## step 1: try escape
echo ${url//\//\\/}
x.com:80\/aa\/bb\/aa.js #escape fine
echo ${url//\//\/}
x.com:80/aa/bb/aa.js #escape not success
echo "${url//\//\/}"
x.com:80\/aa\/bb\/aa.js #escape fine, notice `"`
## step 2: do sed
echo $tmp | sed "s/URL/${url//\//\\/}/"
URL
echo $tmp | sed "s/URL/${url//\//\/}/"
URL
OR
B. use : as delimiter (more readable than /)
escape : as \: in var (before use in sed expression)
## step 1: try escape
echo ${url//:/\:}
x.com:80/aa/bb/aa.js #escape not success
echo "${url//:/\:}"
x.com\:80/aa/bb/aa.js #escape fine, notice `"`
## step 2: do sed
echo $tmp | sed "s:URL:${url//:/\:}:g"
x.com:80/aa/bb/aa.js
With your question edit, I see your problem. Let's say the current directory is /home/yourname ... in this case, your command below:
sed 's/xxx/'$PWD'/'
will be expanded to
sed `s/xxx//home/yourname//
which is not valid. You need to put a \ character in front of each / in your $PWD if you want to do this.
Actually, the simplest thing (in GNU sed, at least) is to use a different separator for the sed substitution (s) command. So, instead of s/pattern/'$mypath'/ being expanded to s/pattern//my/path/, which will of course confuse the s command, use s!pattern!'$mypath'!, which will be expanded to s!pattern!/my/path!. I’ve used the bang (!) character (or use anything you like) which avoids the usual, but-by-no-means-your-only-choice forward slash as the separator.
Dealing with VARIABLES within sed
[root#gislab00207 ldom]# echo domainname: None > /tmp/1.txt
[root#gislab00207 ldom]# cat /tmp/1.txt
domainname: None
[root#gislab00207 ldom]# echo ${DOMAIN_NAME}
dcsw-79-98vm.us.oracle.com
[root#gislab00207 ldom]# cat /tmp/1.txt | sed -e 's/domainname: None/domainname: ${DOMAIN_NAME}/g'
--- Below is the result -- very funny.
domainname: ${DOMAIN_NAME}
--- You need to single quote your variable like this ...
[root#gislab00207 ldom]# cat /tmp/1.txt | sed -e 's/domainname: None/domainname: '${DOMAIN_NAME}'/g'
--- The right result is below
domainname: dcsw-79-98vm.us.oracle.com
VAR=8675309
echo "abcde:jhdfj$jhbsfiy/.hghi$jh:12345:dgve::" |\
sed 's/:[0-9]*:/:'$VAR':/1'
where VAR contains what you want to replace the field with
I had similar problem, I had a list and I have to build a SQL script based on template (that contained #INPUT# as element to replace):
for i in LIST
do
awk "sub(/\#INPUT\#/,\"${i}\");" template.sql >> output
done
If your replacement string may contain other sed control characters, then a two-step substitution (first escaping the replacement string) may be what you want:
PWD='/a\1&b$_' # these are problematic for sed
PWD_ESC=$(printf '%s\n' "$PWD" | sed -e 's/[\/&]/\\&/g')
echo 'xxx' | sed "s/xxx/$PWD_ESC/" # now this works as expected
for me to replace some text against the value of an environment variable in a file with sed works only with quota as the following:
sed -i 's/original_value/'"$MY_ENVIRNONMENT_VARIABLE"'/g' myfile.txt
BUT when the value of MY_ENVIRONMENT_VARIABLE contains a URL (ie https://andreas.gr) then the above was not working.
THEN use different delimiter:
sed -i "s|original_value|$MY_ENVIRNONMENT_VARIABLE|g" myfile.txt