How to get only some simbols in bash script? [duplicate] - linux

Given a string file path such as /foo/fizzbuzz.bar, how would I use bash to extract just the fizzbuzz portion of said string?

Here's how to do it with the # and % operators in Bash.
$ x="/foo/fizzbuzz.bar"
$ y=${x%.bar}
$ echo ${y##*/}
fizzbuzz
${x%.bar} could also be ${x%.*} to remove everything after a dot or ${x%%.*} to remove everything after the first dot.
Example:
$ x="/foo/fizzbuzz.bar.quux"
$ y=${x%.*}
$ echo $y
/foo/fizzbuzz.bar
$ y=${x%%.*}
$ echo $y
/foo/fizzbuzz
Documentation can be found in the Bash manual. Look for ${parameter%word} and ${parameter%%word} trailing portion matching section.

look at the basename command:
NAME="$(basename /foo/fizzbuzz.bar .bar)"
instructs it to remove the suffix .bar, results in NAME=fizzbuzz

Pure bash, done in two separate operations:
Remove the path from a path-string:
path=/foo/bar/bim/baz/file.gif
file=${path##*/}
#$file is now 'file.gif'
Remove the extension from a path-string:
base=${file%.*}
#${base} is now 'file'.

Using basename I used the following to achieve this:
for file in *; do
ext=${file##*.}
fname=`basename $file $ext`
# Do things with $fname
done;
This requires no a priori knowledge of the file extension and works even when you have a filename that has dots in it's filename (in front of it's extension); it does require the program basename though, but this is part of the GNU coreutils so it should ship with any distro.

The basename and dirname functions are what you're after:
mystring=/foo/fizzbuzz.bar
echo basename: $(basename "${mystring}")
echo basename + remove .bar: $(basename "${mystring}" .bar)
echo dirname: $(dirname "${mystring}")
Has output:
basename: fizzbuzz.bar
basename + remove .bar: fizzbuzz
dirname: /foo

Pure bash way:
~$ x="/foo/bar/fizzbuzz.bar.quux.zoom";
~$ y=${x/\/*\//};
~$ echo ${y/.*/};
fizzbuzz
This functionality is explained on man bash under "Parameter Expansion". Non bash ways abound: awk, perl, sed and so on.
EDIT: Works with dots in file suffixes and doesn't need to know the suffix (extension), but doesn’t work with dots in the name itself.

Using basename assumes that you know what the file extension is, doesn't it?
And I believe that the various regular expression suggestions don't cope with a filename containing more than one "."
The following seems to cope with double dots. Oh, and filenames that contain a "/" themselves (just for kicks)
To paraphrase Pascal, "Sorry this script is so long. I didn't have time to make it shorter"
#!/usr/bin/perl
$fullname = $ARGV[0];
($path,$name) = $fullname =~ /^(.*[^\\]\/)*(.*)$/;
($basename,$extension) = $name =~ /^(.*)(\.[^.]*)$/;
print $basename . "\n";

In addition to the POSIX conformant syntax used in this answer,
basename string [suffix]
as in
basename /foo/fizzbuzz.bar .bar
GNU basename supports another syntax:
basename -s .bar /foo/fizzbuzz.bar
with the same result. The difference and advantage is that -s implies -a, which supports multiple arguments:
$ basename -s .bar /foo/fizzbuzz.bar /baz/foobar.bar
fizzbuzz
foobar
This can even be made filename-safe by separating the output with NUL bytes using the -z option, for example for these files containing blanks, newlines and glob characters (quoted by ls):
$ ls has*
'has'$'\n''newline.bar' 'has space.bar' 'has*.bar'
Reading into an array:
$ readarray -d $'\0' arr < <(basename -zs .bar has*)
$ declare -p arr
declare -a arr=([0]=$'has\nnewline' [1]="has space" [2]="has*")
readarray -d requires Bash 4.4 or newer. For older versions, we have to loop:
while IFS= read -r -d '' fname; do arr+=("$fname"); done < <(basename -zs .bar has*)

perl -pe 's/\..*$//;s{^.*/}{}'

If you can't use basename as suggested in other posts, you can always use sed. Here is an (ugly) example. It isn't the greatest, but it works by extracting the wanted string and replacing the input with the wanted string.
echo '/foo/fizzbuzz.bar' | sed 's|.*\/\([^\.]*\)\(\..*\)$|\1|g'
Which will get you the output
fizzbuzz

Beware of the suggested perl solution: it removes anything after the first dot.
$ echo some.file.with.dots | perl -pe 's/\..*$//;s{^.*/}{}'
some
If you want to do it with perl, this works:
$ echo some.file.with.dots | perl -pe 's/(.*)\..*$/$1/;s{^.*/}{}'
some.file.with
But if you are using Bash, the solutions with y=${x%.*} (or basename "$x" .ext if you know the extension) are much simpler.

The basename does that, removes the path. It will also remove the suffix if given and if it matches the suffix of the file but you would need to know the suffix to give to the command. Otherwise you can use mv and figure out what the new name should be some other way.

Combining the top-rated answer with the second-top-rated answer to get the filename without the full path:
$ x="/foo/fizzbuzz.bar.quux"
$ y=(`basename ${x%%.*}`)
$ echo $y
fizzbuzz

If you want to keep just the filename with extension and strip the file path
$ x="myfile/hello/foo/fizzbuzz.bar"
$ echo ${x##*/}
$ fizzbuzz.bar
Explanation in Bash manual, see ${parameter##word}

You can use
mv *<PATTERN>.jar "$(basename *<PATTERN>.jar <PATTERN>.jar).jar"
For e.g:- I wanted to remove -SNAPSHOT from my file name. For that used below command
mv *-SNAPSHOT.jar "$(basename *-SNAPSHOT.jar -SNAPSHOT.jar).jar"

Related

mv renaming filename to _*_

Given an example that my file name is
A_BC_DEF_GH_IJ_LMNO_PQ_11111111_1111111111_111111_AB.dat.meta
i am trying to rename this with unix command but when i tried using this cmd
for f in *.meta; do mv "$f" "$(echo $f|sed s/[0-9]/?/g|sed 's/-/*/g')" ; done
my file is renamed to
A_BC_DEF_GH_IJ_LMNO_PQ_????????_????????????????????_???????_AB.dat.meta
it is expected to rename the file to
A_BC_DEF_GH_IJ_LMNO_PQ__????????_????????????????????_*_AB.dat.meta
Im quite new with unix cmd , any approach that i should try ?
Since [0-9] and ? are undergoing filename expansion, you should quote them to avoid nasty error messages. With this in mind, I did a
echo A_BC_DEF_GH_IJ_LMNO_PQ_11111111_1111111111_111111_AB.dat.meta | sed 's/[0-9]/?/g'|sed 's/-/*/g'
and got as output A_BC_DEF_GH_IJ_LMNO_PQ????????????????????????AB.dat.meta, which makes sense to me. Why would you expect an asterisk in the resulting filename? In your second sed command, you are turning the hyphens into asterisks, but there is no hyphen in the input.
Of course it is pretty unsane to use question marks and asterisks in a file name, as this is just begging for trouble, but there is no law that you must not do this.
A_BC_DEF_GH_IJ_LMNO_PQ_11111111_1111111111_111111_AB.dat.meta
Match it with a regex. Remember which characters need to be escaped in sed. Remember about proper quoting - if you write $ it should be inside ". Note that if there are no files named *.meta it will just iterate over a string *.meta unless nullglob is set.
$ touch A_BC_DEF_GH_IJ_LMNO_PQ_11111111_1111111111_111111_AB.dat.meta
$ for f in *.meta; do mv "$f" "$(echo "$f" | sed 's/[0-9]/?/g; s/_\(?*\)_\(?*\)_\(?*\)_\([^_]*\)$/__\1_\2_*_\4/')" ; done

Remove part of filename with common delimiter

I have a number of files with the following naming:
name1.name2.s01.ep01.RANDOMWORD.mp4
name1.name2.s01.ep02.RANDOMWORD.mp4
name1.name2.s01.ep03.RANDOMWORD.mp4
I need to remove everything between the last . and ep# from the file names and only have name1.name2.s01.ep01.mp4 (sometimes the extension can be different)
name1.name2.s01.ep01.mp4
name1.name2.s01.ep02.mp4
name1.name2.s01.ep03.mp4
This is a simpler version of #Jesse's [answer]
for file in /path/to/base_folder/* #Globbing to get the files
do
epno=${file#*.ep}
mv "$file" "${file%.ep*}."ep${epno%%.*}".${file##*.}"
#For the renaming part,see the note below
done
Note : Didn't get a grab of shell parameter expansion yet ? Check [ this ].
Using Linux string manipulation (refer: http://www.tldp.org/LDP/abs/html/string-manipulation.html) you could achieve like so:
You need to do per file-extension type.
for file in <directory>/*
do
name=${file}
firstchar="${name:0:1}"
extension=${name##${firstchar}*.}
lastchar=$(echo ${name} | tail -c 2)
strip1=${name%.*$lastchar}
lastchar=$(echo ${strip1} | tail -c 2)
strip2=${strip1%.*$lastchar}
mv $name "${strip2}.${extension}"
done
You can use rename (you may need to install it). But it works like sed on filenames.
As an example
$ for i in `seq 3`; do touch "name1.name2.s01.ep0$i.RANDOMWORD.txt"; done
$ ls -l
name1.name2.s01.ep01.RANDOMWORD.txt
name1.name2.s01.ep02.RANDOMWORD.txt
name1.name2.s01.ep03.RANDOMWORD.txt
$ rename 's/(name1.name2.s01.ep\d{2})\..*(.txt)$/$1$2/' name1.name2.s01.ep0*
$ ls -l
name1.name2.s01.ep01.txt
name1.name2.s01.ep02.txt
name1.name2.s01.ep03.txt
Where this expression matches your filenames, and using two capture groups so that the $1$2 in the replacement operation are the parts outside the "RANDOMWORD"
(name1.name2.s01.ep\d{2})\..*(.txt)$

Bash loop through directory including hidden file

I am looking for a way to make a simple loop in bash over everything my directory contains, i.e. files, directories and links including hidden ones.
I will prefer if it could be specifically in bash but it has to be the most general. Of course, file names (and directory names) can have white space, break line, symbols. Everything but "/" and ASCII NULL (0×0), even at the first character. Also, the result should exclude the '.' and '..' directories.
Here is a generator of files on which the loop has to deal with :
#!/bin/bash
mkdir -p test
cd test
touch A 1 ! "hello world" \$\"sym.dat .hidden " start with space" $'\n start with a newline'
mkdir -p ". hidden with space" $'My Personal\nDirectory'
So my loop should look like (but has to deal with the tricky stuff above):
for i in * ;
echo ">$i<"
done
My closest try was the use of ls and bash array, but it is not working with, is:
IFS=$(echo -en "\n\b")
l=( $(ls -A .) )
for i in ${l[#]} ; do
echo ">$i<"
done
unset IFS
Or using bash arrays but the ".." directory is not exclude:
IFS=$(echo -en "\n\b")
l=( [[:print:]]* .[[:print:]]* )
for i in ${l[#]} ; do
echo ">$i<"
done
unset IFS
* doesn't match files beginning with ., so you just need to be explicit:
for i in * .[^.]*; do
echo ">$i<"
done
.[^.]* will match all files and directories starting with ., followed by a non-. character, followed by zero or more characters. In other words, it's like the simpler .*, but excludes . and ... If you need to match something like ..foo, then you might add ..?* to the list of patterns.
As chepner noted in the comments below, this solution assumes you're running GNU bash along with GNU find GNU sort...
GNU find can be prevented from recursing into subdirectories with the -maxdepth option. Then use -print0 to end every filename with a 0x00 byte instead of the newline you'd usually get from -print.
The sort -z sorts the filenames between the 0x00 bytes.
Then, you can use sed to get rid of the dot and dot-dot directory entries (although GNU find seems to exclude the .. already).
I also used sed to get read of the ./ in front of every filename. basename could do that too, but older systems didn't have basename, and you might not trust it to handle the funky characters right.
(These sed commands each required two cases: one for a pattern at the start of the string, and one for the pattern between 0x00 bytes. These were so ugly I split them out into separate functions.)
The read command doesn't have a -z or -0 option like some commands, but you can fake it with -d "" and blanking the IFS environment variable.
The additional -r option prevents a backslash-newline combo from being interpreted as a line continuation. (A file called backslash\\nnewline would otherwise be mangled to backslashnewline.) It might be worth seeing if other backslash-combos get interpreted as escape sequences.
remove_dot_and_dotdot_dirs()
{
sed \
-e 's/^[.]\{1,2\}\x00//' \
-e 's/\x00[.]\{1,2\}\x00/\x00/g'
}
remove_leading_dotslash()
{
sed \
-e 's/^[.]\///' \
-e 's/\x00[.]\//\x00/g'
}
IFS=""
find . -maxdepth 1 -print0 |
sort -z |
remove_dot_and_dotdot_dirs |
remove_leading_dotslash |
while read -r -d "" filename
do
echo "Doing something with file '${filename}'..."
done
It may not be the most favorable way but I tried bellow thing
while read line ; do echo $line; done <<< $(ls -a | grep -v -w ".")
check the below trail which I did
Try the find command, something like:
find .
That will list all the files in all recursive directories.
To output only files excluding the leading . or .. try:
find . -type f -printf %P\\n

Looping through the elements of a path variable in Bash

I want to loop through a path list that I have gotten from an echo $VARIABLE command.
For example:
echo $MANPATH will return
/usr/lib:/usr/sfw/lib:/usr/info
So that is three different paths, each separated by a colon. I want to loop though each of those paths. Is there a way to do that? Thanks.
Thanks for all the replies so far, it looks like I actually don't need a loop after all. I just need a way to take out the colon so I can run one ls command on those three paths.
You can set the Internal Field Separator:
( IFS=:
for p in $MANPATH; do
echo "$p"
done
)
I used a subshell so the change in IFS is not reflected in my current shell.
The canonical way to do this, in Bash, is to use the read builtin appropriately:
IFS=: read -r -d '' -a path_array < <(printf '%s:\0' "$MANPATH")
This is the only robust solution: will do exactly what you want: split the string on the delimiter : and be safe with respect to spaces, newlines, and glob characters like *, [ ], etc. (unlike the other answers: they are all broken).
After this command, you'll have an array path_array, and you can loop on it:
for p in "${path_array[#]}"; do
printf '%s\n' "$p"
done
You can use Bash's pattern substitution parameter expansion to populate your loop variable. For example:
MANPATH=/usr/lib:/usr/sfw/lib:/usr/info
# Replace colons with spaces to create list.
for path in ${MANPATH//:/ }; do
echo "$path"
done
Note: Don't enclose the substitution expansion in quotes. You want the expanded values from MANPATH to be interpreted by the for-loop as separate words, rather than as a single string.
In this way you can safely go through the $PATH with a single loop, while $IFS will remain the same inside or outside the loop.
while IFS=: read -d: -r path; do # `$IFS` is only set for the `read` command
echo $path
done <<< "${PATH:+"${PATH}:"}" # append an extra ':' if `$PATH` is set
You can check the value of $IFS,
IFS='xxxxxxxx'
while IFS=: read -d: -r path; do
echo "${IFS}${path}"
done <<< "${PATH:+"${PATH}:"}"
and the output will be something like this.
xxxxxxxx/usr/local/bin
xxxxxxxx/usr/bin
xxxxxxxx/bin
Reference to another question on StackExchange.
for p in $(echo $MANPATH | tr ":" " ") ;do
echo $p
done
IFS=:
arr=(${MANPATH})
for path in "${arr[#]}" ; do # <- quotes required
echo $path
done
... it does take care of spaces :o) but also adds empty elements if you have something like:
:/usr/bin::/usr/lib:
... then index 0,2 will be empty (''), cannot say why index 4 isnt set at all
This can also be solved with Python, on the command line:
python -c "import os,sys;[os.system(' '.join(sys.argv[1:]).format(p)) for p in os.getenv('PATH').split(':')]" echo {}
Or as an alias:
alias foreachpath="python -c \"import os,sys;[os.system(' '.join(sys.argv[1:]).format(p)) for p in os.getenv('PATH').split(':')]\""
With example usage:
foreachpath echo {}
The advantage to this approach is that {} will be replaced by each path in succession. This can be used to construct all sorts of commands, for instance to list the size of all files and directories in the directories in $PATH. including directories with spaces in the name:
foreachpath 'for e in "{}"/*; do du -h "$e"; done'
Here is an example that shortens the length of the $PATH variable by creating symlinks to every file and directory in the $PATH in $HOME/.allbin. This is not useful for everyday usage, but may be useful if you get the too many arguments error message in a docker container, because bitbake uses the full $PATH as part of the command line...
mkdir -p "$HOME/.allbin"
python -c "import os,sys;[os.system(' '.join(sys.argv[1:]).format(p)) for p in os.getenv('PATH').split(':')]" 'for e in "{}"/*; do ln -sf "$e" "$HOME/.allbin/$(basename $e)"; done'
export PATH="$HOME/.allbin"
This should also, in theory, speed up regular shell usage and shell scripts, since there are fewer paths to search for every command that is executed. It is pretty hacky, though, so I don't recommend that anyone shorten their $PATH this way.
The foreachpath alias might come in handy, though.
Combining ideas from:
https://stackoverflow.com/a/29949759 - gniourf_gniourf
https://stackoverflow.com/a/31017384 - Yi H.
code:
PATHVAR='foo:bar baz:spam:eggs:' # demo path with space and empty
printf '%s:\0' "$PATHVAR" | while IFS=: read -d: -r p; do
echo $p
done | cat -n
output:
1 foo
2 bar baz
3 spam
4 eggs
5
You can use Bash's for X in ${} notation to accomplish this:
for p in ${PATH//:/$'\n'} ; do
echo $p;
done
OP's update wants to ls the resulting folders, and has pointed out that ls only requires a space-separated list.
ls $(echo $PATH | tr ':' ' ') is nice and simple and should fit the bill nicely.

Linux shell script to add leading zeros to file names

I have a folder with about 1,700 files. They are all named like 1.txt or 1497.txt, etc. I would like to rename all the files so that all the filenames are four digits long.
I.e., 23.txt becomes 0023.txt.
What is a shell script that will do this? Or a related question: How do I use grep to only match lines that contain \d.txt (i.e., one digit, then a period, then the letters txt)?
Here's what I have so far:
for a in [command i need help with]
do
mv $a 000$a
done
Basically, run that three times, with commands there to find one digit, two digits, and three digit filenames (with the number of initial zeros changed).
Try:
for a in [0-9]*.txt; do
mv $a `printf %04d.%s ${a%.*} ${a##*.}`
done
Change the filename pattern ([0-9]*.txt) as necessary.
A general-purpose enumerated rename that makes no assumptions about the initial set of filenames:
X=1;
for i in *.txt; do
mv $i $(printf %04d.%s ${X%.*} ${i##*.})
let X="$X+1"
done
On the same topic:
Bash script to pad file names
Extract filename and extension in bash
Using the rename (prename in some cases) script that is sometimes installed with Perl, you can use Perl expressions to do the renaming. The script skips renaming if there's a name collision.
The command below renames only files that have four or fewer digits followed by a ".txt" extension. It does not rename files that do not strictly conform to that pattern. It does not truncate names that consist of more than four digits.
rename 'unless (/0+[0-9]{4}.txt/) {s/^([0-9]{1,3}\.txt)$/000$1/g;s/0*([0-9]{4}\..*)/$1/}' *
A few examples:
Original Becomes
1.txt 0001.txt
02.txt 0002.txt
123.txt 0123.txt
00000.txt 00000.txt
1.23.txt 1.23.txt
Other answers given so far will attempt to rename files that don't conform to the pattern, produce errors for filenames that contain non-digit characters, perform renames that produce name collisions, try and fail to rename files that have spaces in their names and possibly other problems.
for a in *.txt; do
b=$(printf %04d.txt ${a%.txt})
if [ $a != $b ]; then
mv $a $b
fi
done
One-liner:
ls | awk '/^([0-9]+)\.txt$/ { printf("%s %04d.txt\n", $0, $1) }' | xargs -n2 mv
How do I use grep to only match lines that contain \d.txt (IE 1 digit, then a period, then the letters txt)?
grep -E '^[0-9]\.txt$'
Let's assume you have files with datatype .dat in your folder. Just copy this code to a file named run.sh, make it executable by running chmode +x run.sh and then execute using ./run.sh:
#!/bin/bash
num=0
for i in *.dat
do
a=`printf "%05d" $num`
mv "$i" "filename_$a.dat"
let "num = $(($num + 1))"
done
This will convert all files in your folder to filename_00000.dat, filename_00001.dat, etc.
This version also supports handling strings before(after) the number. But basically you can do any regex matching+printf as long as your awk supports it. And it supports whitespace characters (except newlines) in filenames too.
for f in *.txt ;do
mv "$f" "$(
awk -v f="$f" '{
if ( match(f, /^([a-zA-Z_-]*)([0-9]+)(\..+)/, a)) {
printf("%s%04d%s", a[1], a[2], a[3])
} else {
print(f)
}
}' <<<''
)"
done
To only match single digit text files, you can do...
$ ls | grep '[0-9]\.txt'
One-liner hint:
while [ -f ./result/result`printf "%03d" $a`.txt ]; do a=$((a+1));done
RESULT=result/result`printf "%03d" $a`.txt
To provide a solution that's cautiously written to be correct even in the presence of filenames with spaces:
#!/usr/bin/env bash
pattern='%04d' # pad with four digits: change this to taste
# enable extglob syntax: +([[:digit:]]) means "one or more digits"
# enable the nullglob flag: If no matches exist, a glob returns nothing (not itself).
shopt -s extglob nullglob
for f in [[:digit:]]*; do # iterate over filenames that start with digits
suffix=${f##+([[:digit:]])} # find the suffix (everything after the last digit)
number=${f%"$suffix"} # find the number (everything before the suffix)
printf -v new "$pattern" "$number" "$suffix" # pad the number, then append the suffix
if [[ $f != "$new" ]]; then # if the result differs from the old name
mv -- "$f" "$new" # ...then rename the file.
fi
done
There is a rename.ul command installed from util-linux package (at least in Ubuntu) by default installed.
It's use is (do a man rename.ul):
rename [options] expression replacement file...
The command will replace the first occurrence of expression with the given replacement for the provided files.
While forming the command you can use:
rename.ul -nv replace-me with-this in-all?-these-files*
for not doing any changes but reading what changes that command would make. When sure just reexecute the command without the -v (verbose) and -n (no-act) options
for your case the commands are:
rename.ul "" 000 ?.txt
rename.ul "" 00 ??.txt
rename.ul "" 0 ???.txt

Resources