Extract file basename without path and extension in bash [duplicate] - linux

This question already has answers here:
Extract filename and extension in Bash
(38 answers)
Closed 6 years ago.
Given file names like these:
/the/path/foo.txt
bar.txt
I hope to get:
foo
bar
Why this doesn't work?
#!/bin/bash
fullfile=$1
fname=$(basename $fullfile)
fbname=${fname%.*}
echo $fbname
What's the right way to do it?

You don't have to call the external basename command. Instead, you could use the following commands:
$ s=/the/path/foo.txt
$ echo "${s##*/}"
foo.txt
$ s=${s##*/}
$ echo "${s%.txt}"
foo
$ echo "${s%.*}"
foo
Note that this solution should work in all recent (post 2004) POSIX compliant shells, (e.g. bash, dash, ksh, etc.).
Source: Shell Command Language 2.6.2 Parameter Expansion
More on bash String Manipulations: http://tldp.org/LDP/LG/issue18/bash.html

The basename command has two different invocations; in one, you specify just the path, in which case it gives you the last component, while in the other you also give a suffix that it will remove. So, you can simplify your example code by using the second invocation of basename. Also, be careful to correctly quote things:
fbname=$(basename "$1" .txt)
echo "$fbname"

A combination of basename and cut works fine, even in case of double ending like .tar.gz:
fbname=$(basename "$fullfile" | cut -d. -f1)
Would be interesting if this solution needs less arithmetic power than Bash Parameter Expansion.

Here are oneliners:
$(basename "${s%.*}")
$(basename "${s}" ".${s##*.}")
I needed this, the same as asked by bongbang and w4etwetewtwet.

Pure bash, no basename, no variable juggling. Set a string and echo:
p=/the/path/foo.txt
echo "${p//+(*\/|.*)}"
Output:
foo
Note: the bash extglob option must be "on", (Ubuntu sets extglob "on" by default), if it's not, do:
shopt -s extglob
Walking through the ${p//+(*\/|.*)}:
${p -- start with $p.
// substitute every instance of the pattern that follows.
+( match one or more of the pattern list in parenthesis, (i.e. until item #7 below).
1st pattern: *\/ matches anything before a literal "/" char.
pattern separator | which in this instance acts like a logical OR.
2nd pattern: .* matches anything after a literal "." -- that is, in bash the "." is just a period char, and not a regex dot.
) end pattern list.
} end parameter expansion. With a string substitution, there's usually another / there, followed by a replacement string. But since there's no / there, the matched patterns are substituted with nothing; this deletes the matches.
Relevant man bash background:
pattern substitution:
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pat
tern just as in pathname expansion. Parameter is expanded and
the longest match of pattern against its value is replaced with
string. If pattern begins with /, all matches of pattern are
replaced with string. Normally only the first match is
replaced. If pattern begins with #, it must match at the begin‐
ning of the expanded value of parameter. If pattern begins with
%, it must match at the end of the expanded value of parameter.
If string is null, matches of pattern are deleted and the / fol
lowing pattern may be omitted. If parameter is # or *, the sub
stitution operation is applied to each positional parameter in
turn, and the expansion is the resultant list. If parameter is
an array variable subscripted with # or *, the substitution
operation is applied to each member of the array in turn, and
the expansion is the resultant list.
extended pattern matching:
If the extglob shell option is enabled using the shopt builtin, several
extended pattern matching operators are recognized. In the following
description, a pattern-list is a list of one or more patterns separated
by a |. Composite patterns may be formed using one or more of the fol
lowing sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns

Here is another (more complex) way of getting either the filename or extension, first use the rev command to invert the file path, cut from the first . and then invert the file path again, like this:
filename=`rev <<< "$1" | cut -d"." -f2- | rev`
fileext=`rev <<< "$1" | cut -d"." -f1 | rev`

If you want to play nice with Windows file paths (under Cygwin) you can also try this:
fname=${fullfile##*[/|\\]}
This will account for backslash separators when using BaSH on Windows.

Just an alternative that I came up with to extract an extension, using the posts in this thread with my own small knowledge base that was more familiar to me.
ext="$(rev <<< "$(cut -f "1" -d "." <<< "$(rev <<< "file.docx")")")"
Note: Please advise on my use of quotes; it worked for me but I might be missing something on their proper use (I probably use too many).

Use the basename command. Its manpage is here: http://unixhelp.ed.ac.uk/CGI/man-cgi?basename

Related

Find the depth of the current path

How can I write a shell script to find the depth of the current path?
Assuming I am in:
/home/user/test/test1/test2/test3
It should return 6.
With shell parameter expansions, no external commands:
$ var=${PWD//[!\/]}
$ echo ${#var}
6
The first expansion removes all characters that are not /; the second one prints the length of var.
Explanations with details for support by POSIX shell or Bash (the links in parentheses go to the corresponding sections in the POSIX standard or the Bash manual):
$PWD contains the path to the current working directory. (sh/Bash)
The ${parameter/pattern/string} expansion replaces the first occurrence of pattern in the expansion of parameter with string. (Bash)
If the first slash is doubled (as in our case), all occurrences are replaced.
If string is empty, the slash after pattern is optional (as in our case).
The pattern [!\/] is a bracket expression and stands for "any character other than slash". (sh/Bash)
The slash has to be escaped, \/, or it is interpreted as ending the pattern.
! as the first character in a bracket expression negates the expression: any character other than the ones in the expression match the pattern. POSIX sh requires support for ! and says the behaviour for using ^ is undefined; Bash supports both ! and ^. Notice that this is not a bracket expression as seen in regular expressions, where only ^ is valid.
${#parameter} expands to the length of parameter. (sh/Bash)
A simple approach in fish:
count (string split / $PWD)
You could count the number of slashes in the current path:
pwd | awk -F"/" '{print NF-1}'
You can do this using a pipeline. pipe string into grep with the -o option. This prints out each "/" on a new line. pipe again into wc -l counts the number of lines printed.
echo "$path_str" | grep -o '/' - | wc -l
Assuming you don't have trailing "/", you can just count the "/".
So you would
Remove everything that is not a "/"
Count the length of the resulting string
In fish, this would be done with something like
string replace --regex --all '[^/]' '' -- $PWD | string length
The regular expression - [^/] here matches every single character that is not a "/". With "--all", this will be done as often as possible, and replace it with '', i.e. nothing.
The -- is the option separator, so that nothing in the argument is interpreted as an option (otherwise you'd have issues if an argument started with a "-a").
$PWD is the current directory.
string length simply outputs the length of its input.
Using perl :
echo '/home/user/test/test1/test2/test3' |
perl -lne '#_ = split /\//; print scalar #_ -1'
Output
6
You could use find just like that :
find / -printf "%d %p\n" 2>/dev/null | grep "$PWD$" | awk '{print $1}'
Maybe not the most efficient, but handles slashes well.

ksh shell script to find first occurence of _ in string and remove everything until that

Im New To Shell Scripting.Using KSH Shell. Could you please help me in this.
My string is like errorfile101_ApplicationData_2_333.txt. I want to remove everything until the first occurence of _.
My output should be ApplicationData_2_333.txt
This is an easy one, assuming you can assign your string to a variable, i.e.
str="errorfile101_ApplicationData_2_333.txt"
echo ${str#*_}
output
ApplicationData_2_333.txt
The # operator in ${str#*_} means remove the following pattern from the left of the variable's value.
There is also ##, which removes the longest match from the left, which would give you
333.txt
There are also similar removal operators for working from the right side of the string, % and a longest match (from right) with %%.
All versions of ksh (and bash, and other shells) support these operators. (sorry if this is the wrong term).
Versions of ksh93 and greater (bash, zsh and probably others) also support a sed like pattern match/sub value like
echo ${str/*_/xx}
#----------|--|>replacement
#----------> pattern to match
output
xx333.txt
which means that / works like sed matching the longest possible string.
IHTH
You can use the cut command:
echo "errorfile101_ApplicationData_2_333.txt" | cut -d"_" -f2-

Understanding sed expression 's/^\.\///g'

I'm studying Bash programming and I find this example but I don't understand what it means:
filtered_files=`echo "$files" | sed -e 's/^\.\///g'`
In particular the argument passed to sed after '-e'.
It's a bad example; you shouldn't follow it.
First, understanding the sed expression at hand.
s/pattern/replacement/flags is the a sed command, described in detail in man sed. In this case, pattern is a regular expression; replacement is what that pattern gets replaced with when/where found; and flags describe details about how that replacement should be done.
In this case, the s/^\.\///g breaks down as follows:
s is the sed command being run.
/ is the sigil used to separate the sections of this command. (Any character can be used as a sigil, and the person who chose to use / for this expression was, to be charitable, not thinking about what they were doing very hard).
^\.\/ is the pattern to be replaced. The ^ means that this replaces anything only at the beginning; \. matches only a period, vs . (which is regex for matching any character); and \/ matches only a / (vs /, which would go on to the next section of this sed command, being the selected sigil).
The next section is an empty string, which is why there's no content between the two following sigils.
g in the flags section indicates that more than one replacement can happen each line. In conjunction with ^, this has no meaning, since there can only be one beginning-of-the-line per line; further evidence that the person who wrote your example wasn't thinking much.
Using the same data structures, doing it better:
All of the below are buggy when handling arbitrary filenames, because storing arbitrary filenames in scalar variables is buggy in general.
Still using sed:
# Use printf instead of echo to avoid bugginess if your "files" string is "-n" or "-e"
# Use "#" as your sigil to avoid needing to backslash-escape all the "\"s
filtered_files=$(printf '%s\n' "$files" | sed -e 's#^[.]/##g'`)
Replacing sed with a bash builtin:
# This is much faster than shelling out to any external tool
filtered_files=${files//.\//}
Using better data structures
Instead of running
files=$(find .)
...instead:
files=( )
while IFS= read -r -d '' filename; do
files+=( "$filename" )
done < <(find . -print0)
That stores files in an array; it looks complex, but it's far safer -- works correctly even with filenames containing spaces, quote characters, newline literals, etc.
Also, this means you can do the following:
# Remove the leading ./ from each name; don't remove ./ at any other position in a name
filtered_files=( "${files[#]#./}" )
This means that a file named
./foo/this directory name (which has spaces) ends with a period./bar
will correctly be transformed to
foo/this directory name (which has spaces) ends with a period./bar
rather than
foo/this directory name (which has spaces) ends with a periodbar
...which would have happened with the original approach.
man sed. In particular:
-e script, --expression=script
add the script to the commands to be executed
And:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If success-
ful, replace that portion matched with replacement. The
replacement may contain the special character & to refer to that
portion of the pattern space which matched, and the special
escapes \1 through \9 to refer to the corresponding matching
sub-expressions in the regexp.
In this case, it replaces any occurence of ./ at the beginning of a line with the empty string, in other words removing it.

Use sed/grep to get string from tail to a char in middle

I have some strings (variables) i need to edit in the bash for further analysis
They consist of things like
str="~/folder/item"
How can I use sed or grep to grab just "item" (meaning from tail to char '/')?
Refer to Shell Parameter Expansion. You can say:
$ str="~/folder/item"
$ echo ${str##*/}
item
Quoting from the manual:
${parameter##word}
The word is expanded to produce a pattern just as in filename
expansion (see Filename Expansion). If the pattern matches the
beginning of the expanded value of parameter, then the result of the
expansion is the expanded value of parameter with the shortest
matching pattern (the ‘#’ case) or the longest matching pattern (the
‘##’ case) deleted. If parameter is ‘#’ or ‘*’, the pattern removal
operation is applied to each positional parameter in turn, and the
expansion is the resultant list. If parameter is an array variable
subscripted with ‘#’ or ‘*’, the pattern removal operation is applied
to each member of the array in turn, and the expansion is the
resultant list.
Using grep:
$ grep -Po '.*/\K.*' <<< $str
item
Assuming you are dealing with a text file containing lines like the one shown, then:
sed 's%.*/\([^/]*\)"%\1%' <<< 'str="~/folder/item"'
This yields:
item
If you are dealing with a variable str that contains a string ~/folder/item, then you can use:
basename "$str"
or:
echo "${str##*/}"
basename "${str}" gives the last part of the string after the final /, including any file extensions you may have. If you want to do the opposite and grab the directory, use dirname "${str}"
echo $str|awk -F"/" '{print $NF}'
or
echo "$str" | perl -pe 's/.*\///g'
basename also suitable for this
for example
str="~/folder/item"
name=$(basename $file)
echo $name
the output would be "item"

Bash: String manipulation terminate on whitespace

I have a string variable $LIBRARIES="abc.so.1 def.so.1 hij.so.3.1" and I want to replace all the .so such that they look like this:
"abc.so* def.so* hij.so*"
How can I do this? I tried NEW_LIBRARIES=${LIBRARIES//.so*/.so$star} but it doesn't work. How can I tell it to end on whitespace?
or simpler
${LIBRARIES//.[0-9]/*}
the ones with 2 extensions will get 2 ** but that should be fine
This should do it:
LIBRARIES="abc.so.1 def.so.1 hij.so.3.1"
NEW_ALL_LIBRARIES=$(sed 's/\.so[^ ]*/\.so\*/g' <<< "$LIBRARIES")
echo "$NEW_ALL_LIBRARIES"
Output:
abc.so* def.so* hij.so*
Explanation:
LIBRARIES="...": When assigning a string to a variable, the variable is not prefixed with $
NEW_ALL_LIBRARIES=$(...): The $(...) notation is called command subsitution; basically it spawns a new subshell to run whatever commands contained within, then returns the output to this new subshell's stdout (and here saving it to NEW_ALL_LIBRARIES).
sed: invoke sed, the Streaming EDitor tool
's/\.so[^ ]*/\.so\*/g': Use regular expressions (regex) to match patterns and substitute. Let's break this syntax down a bit further:
s/ "substitute"; For example: s/A/B/g replaces all occurrences of A with B
\.so[^ ]*/: Match any patterns that start with .so, and the [^ ]* part means "followed by zero or more non-space characters"
\.so\*/: Replace that with literally .so*
Some symbols such as [, ], . and * have special meaning in regex, so if you mean to use them literally, you just "escape" them by prefixing a \
g: Do so for all occurrences, not just the first.
<<< "$LIBRARIES": the <<< notation is called herestring: in this context it accomplishes the same thing as echo "$LIBRARIES" | sed ..., but it saves a subshell.

Resources