How do I get the path of the directory in which a Bash script is located, inside that script?
I want to use a Bash script as a launcher for another application. I want to change the working directory to the one where the Bash script is located, so I can operate on the files in that directory, like so:
$ ./application
#!/usr/bin/env bash
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
is a useful one-liner which will give you the full directory name of the script no matter where it is being called from.
It will work as long as the last component of the path used to find the script is not a symlink (directory links are OK). If you also want to resolve any links to the script itself, you need a multi-line solution:
#!/usr/bin/env bash
SOURCE=${BASH_SOURCE[0]}
while [ -L "$SOURCE" ]; do # resolve $SOURCE until the file is no longer a symlink
DIR=$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )
SOURCE=$(readlink "$SOURCE")
[[ $SOURCE != /* ]] && SOURCE=$DIR/$SOURCE # if $SOURCE was a relative symlink, we need to resolve it relative to the path where the symlink file was located
done
DIR=$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )
This last one will work with any combination of aliases, source, bash -c, symlinks, etc.
Beware: if you cd to a different directory before running this snippet, the result may be incorrect!
Also, watch out for $CDPATH gotchas, and stderr output side effects if the user has smartly overridden cd to redirect output to stderr instead (including escape sequences, such as when calling update_terminal_cwd >&2 on Mac). Adding >/dev/null 2>&1 at the end of your cd command will take care of both possibilities.
To understand how it works, try running this more verbose form:
#!/usr/bin/env bash
SOURCE=${BASH_SOURCE[0]}
while [ -L "$SOURCE" ]; do # resolve $SOURCE until the file is no longer a symlink
TARGET=$(readlink "$SOURCE")
if [[ $TARGET == /* ]]; then
echo "SOURCE '$SOURCE' is an absolute symlink to '$TARGET'"
SOURCE=$TARGET
else
DIR=$( dirname "$SOURCE" )
echo "SOURCE '$SOURCE' is a relative symlink to '$TARGET' (relative to '$DIR')"
SOURCE=$DIR/$TARGET # if $SOURCE was a relative symlink, we need to resolve it relative to the path where the symlink file was located
fi
done
echo "SOURCE is '$SOURCE'"
RDIR=$( dirname "$SOURCE" )
DIR=$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )
if [ "$DIR" != "$RDIR" ]; then
echo "DIR '$RDIR' resolves to '$DIR'"
fi
echo "DIR is '$DIR'"
And it will print something like:
SOURCE './scriptdir.sh' is a relative symlink to 'sym2/scriptdir.sh' (relative to '.')
SOURCE is './sym2/scriptdir.sh'
DIR './sym2' resolves to '/home/ubuntu/dotfiles/fo fo/real/real1/real2'
DIR is '/home/ubuntu/dotfiles/fo fo/real/real1/real2'
Use dirname "$0":
#!/usr/bin/env bash
echo "The script you are running has basename $( basename -- "$0"; ), dirname $( dirname -- "$0"; )";
echo "The present working directory is $( pwd; )";
Using pwd alone will not work if you are not running the script from the directory it is contained in.
[matt#server1 ~]$ pwd
/home/matt
[matt#server1 ~]$ ./test2.sh
The script you are running has basename test2.sh, dirname .
The present working directory is /home/matt
[matt#server1 ~]$ cd /tmp
[matt#server1 tmp]$ ~/test2.sh
The script you are running has basename test2.sh, dirname /home/matt
The present working directory is /tmp
The dirname command is the most basic, simply parsing the path up to the filename off of the $0 (script name) variable:
dirname -- "$0";
But, as matt b pointed out, the path returned is different depending on how the script is called. pwd doesn't do the job because that only tells you what the current directory is, not what directory the script resides in. Additionally, if a symbolic link to a script is executed, you're going to get a (probably relative) path to where the link resides, not the actual script.
Some others have mentioned the readlink command, but at its simplest, you can use:
dirname -- "$( readlink -f -- "$0"; )";
readlink will resolve the script path to an absolute path from the root of the filesystem. So, any paths containing single or double dots, tildes and/or symbolic links will be resolved to a full path.
Here's a script demonstrating each of these, whatdir.sh:
#!/usr/bin/env bash
echo "pwd: `pwd`"
echo "\$0: $0"
echo "basename: `basename -- "$0"`"
echo "dirname: `dirname -- "$0"`"
echo "dirname/readlink: $( dirname -- "$( readlink -f -- "$0"; )"; )"
Running this script in my home dir, using a relative path:
>>>$ ./whatdir.sh
pwd: /Users/phatblat
$0: ./whatdir.sh
basename: whatdir.sh
dirname: .
dirname/readlink: /Users/phatblat
Again, but using the full path to the script:
>>>$ /Users/phatblat/whatdir.sh
pwd: /Users/phatblat
$0: /Users/phatblat/whatdir.sh
basename: whatdir.sh
dirname: /Users/phatblat
dirname/readlink: /Users/phatblat
Now changing directories:
>>>$ cd /tmp
>>>$ ~/whatdir.sh
pwd: /tmp
$0: /Users/phatblat/whatdir.sh
basename: whatdir.sh
dirname: /Users/phatblat
dirname/readlink: /Users/phatblat
And finally using a symbolic link to execute the script:
>>>$ ln -s ~/whatdir.sh whatdirlink.sh
>>>$ ./whatdirlink.sh
pwd: /tmp
$0: ./whatdirlink.sh
basename: whatdirlink.sh
dirname: .
dirname/readlink: /Users/phatblat
There is however one case where this doesn't work, when the script is sourced (instead of executed) in bash:
>>>$ cd /tmp
>>>$ . ~/whatdir.sh
pwd: /tmp
$0: bash
basename: bash
dirname: .
dirname/readlink: /tmp
pushd . > '/dev/null';
SCRIPT_PATH="${BASH_SOURCE[0]:-$0}";
while [ -h "$SCRIPT_PATH" ];
do
cd "$( dirname -- "$SCRIPT_PATH"; )";
SCRIPT_PATH="$( readlink -f -- "$SCRIPT_PATH"; )";
done
cd "$( dirname -- "$SCRIPT_PATH"; )" > '/dev/null';
SCRIPT_PATH="$( pwd; )";
popd > '/dev/null';
It works for all versions, including
when called via multiple depth soft link,
when the file it
when script called by command "source" aka . (dot) operator.
when arg $0 is modified from caller.
"./script"
"/full/path/to/script"
"/some/path/../../another/path/script"
"./some/folder/script"
Alternatively, if the Bash script itself is a relative symlink you want to follow it and return the full path of the linked-to script:
pushd . > '/dev/null';
SCRIPT_PATH="${BASH_SOURCE[0]:-$0}";
while [ -h "$SCRIPT_PATH" ];
do
cd "$( dirname -- "$SCRIPT_PATH"; )";
SCRIPT_PATH="$( readlink -f -- "$SCRIPT_PATH"; )";
done
cd "$( dirname -- "$SCRIPT_PATH"; )" > '/dev/null';
SCRIPT_PATH="$( pwd; )";
popd > '/dev/null';
SCRIPT_PATH is given in full path, no matter how it is called.
Just make sure you locate this at start of the script.
You can use $BASH_SOURCE:
#!/usr/bin/env bash
scriptdir="$( dirname -- "$BASH_SOURCE"; )";
Note that you need to use #!/bin/bash and not #!/bin/sh since it's a Bash extension.
Here is an easy-to-remember script:
DIR="$( dirname -- "${BASH_SOURCE[0]}"; )"; # Get the directory name
DIR="$( realpath -e -- "$DIR"; )"; # Resolve its full path if need be
Short answer:
"`dirname -- "$0";`"
or (preferably):
"$( dirname -- "$0"; )"
This should do it:
DIR="$(dirname "$(realpath "$0")")"
This works with symlinks and spaces in path.
Please see the man pages for dirname and realpath.
Please add a comment on how to support MacOS. I'm sorry I can verify it.
pwd can be used to find the current working directory, and dirname to find the directory of a particular file (command that was run, is $0, so dirname $0 should give you the directory of the current script).
However, dirname gives precisely the directory portion of the filename, which more likely than not is going to be relative to the current working directory. If your script needs to change directory for some reason, then the output from dirname becomes meaningless.
I suggest the following:
#!/usr/bin/env bash
reldir="$( dirname -- "$0"; )";
cd "$reldir";
directory="$( pwd; )";
echo "Directory is ${directory}";
This way, you get an absolute, rather than a relative directory.
Since the script will be run in a separate Bash instance, there isn't any need to restore the working directory afterwards, but if you do want to change back in your script for some reason, you can easily assign the value of pwd to a variable before you change directory, for future use.
Although just
cd "$( dirname -- "$0"; )";
solves the specific scenario in the question, I find having the absolute path to more more useful generally.
SCRIPT_DIR=$( cd ${0%/*} && pwd -P )
I don't think this is as easy as others have made it out to be. pwd doesn't work, as the current directory is not necessarily the directory with the script. $0 doesn't always have the information either. Consider the following three ways to invoke a script:
./script
/usr/bin/script
script
In the first and third ways $0 doesn't have the full path information. In the second and third, pwd does not work. The only way to get the directory in the third way would be to run through the path and find the file with the correct match. Basically the code would have to redo what the OS does.
One way to do what you are asking would be to just hardcode the data in the /usr/share directory, and reference it by its full path. Data shoudn't be in the /usr/bin directory anyway, so this is probably the thing to do.
This gets the current working directory on Mac OS X v10.6.6 (Snow Leopard):
DIR=$(cd "$(dirname "$0")"; pwd)
$(dirname "$(readlink -f "$BASH_SOURCE")")
This is Linux specific, but you could use:
SELF=$(readlink /proc/$$/fd/255)
Here is a POSIX compliant one-liner:
SCRIPT_PATH=`dirname "$0"`; SCRIPT_PATH=`eval "cd \"$SCRIPT_PATH\" && pwd"`
# test
echo $SCRIPT_PATH
The shortest and most elegant way to do this is:
#!/bin/bash
DIRECTORY=$(cd `dirname $0` && pwd)
echo $DIRECTORY
This would work on all platforms and is super clean.
More details can be found in "Which directory is that bash script in?".
Summary:
FULL_PATH_TO_SCRIPT="$(realpath "${BASH_SOURCE[-1]}")"
# OR, if you do NOT need it to work for **sourced** scripts too:
# FULL_PATH_TO_SCRIPT="$(realpath "$0")"
# OR, depending on which path you want, in case of nested `source` calls
# FULL_PATH_TO_SCRIPT="$(realpath "${BASH_SOURCE[0]}")"
# OR, add `-s` to NOT expand symlinks in the path:
# FULL_PATH_TO_SCRIPT="$(realpath -s "${BASH_SOURCE[-1]}")"
SCRIPT_DIRECTORY="$(dirname "$FULL_PATH_TO_SCRIPT")"
SCRIPT_FILENAME="$(basename "$FULL_PATH_TO_SCRIPT")"
Details:
How to obtain the full file path, full directory, and base filename of any script being run OR sourced...
...even when the called script is called from within another bash function or script, or when nested sourcing is being used!
For many cases, all you need to acquire is the full path to the script you just called. This can be easily accomplished using realpath. Note that realpath is part of GNU coreutils. If you don't have it already installed (it comes default on Ubuntu), you can install it with sudo apt update && sudo apt install coreutils.
get_script_path.sh (for the latest version of this script, see get_script_path.sh in my eRCaGuy_hello_world repo):
#!/bin/bash
# A. Obtain the full path, and expand (walk down) symbolic links
# A.1. `"$0"` works only if the file is **run**, but NOT if it is **sourced**.
# FULL_PATH_TO_SCRIPT="$(realpath "$0")"
# A.2. `"${BASH_SOURCE[-1]}"` works whether the file is sourced OR run, and even
# if the script is called from within another bash function!
# NB: if `"${BASH_SOURCE[-1]}"` doesn't give you quite what you want, use
# `"${BASH_SOURCE[0]}"` instead in order to get the first element from the array.
FULL_PATH_TO_SCRIPT="$(realpath "${BASH_SOURCE[-1]}")"
# B.1. `"$0"` works only if the file is **run**, but NOT if it is **sourced**.
# FULL_PATH_TO_SCRIPT_KEEP_SYMLINKS="$(realpath -s "$0")"
# B.2. `"${BASH_SOURCE[-1]}"` works whether the file is sourced OR run, and even
# if the script is called from within another bash function!
# NB: if `"${BASH_SOURCE[-1]}"` doesn't give you quite what you want, use
# `"${BASH_SOURCE[0]}"` instead in order to get the first element from the array.
FULL_PATH_TO_SCRIPT_KEEP_SYMLINKS="$(realpath -s "${BASH_SOURCE[-1]}")"
# You can then also get the full path to the directory, and the base
# filename, like this:
SCRIPT_DIRECTORY="$(dirname "$FULL_PATH_TO_SCRIPT")"
SCRIPT_FILENAME="$(basename "$FULL_PATH_TO_SCRIPT")"
# Now print it all out
echo "FULL_PATH_TO_SCRIPT = \"$FULL_PATH_TO_SCRIPT\""
echo "SCRIPT_DIRECTORY = \"$SCRIPT_DIRECTORY\""
echo "SCRIPT_FILENAME = \"$SCRIPT_FILENAME\""
IMPORTANT note on nested source calls: if "${BASH_SOURCE[-1]}" above doesn't give you quite what you want, try using "${BASH_SOURCE[0]}" instead. The first (0) index gives you the first entry in the array, and the last (-1) index gives you the last last entry in the array. Depending on what it is you're after, you may actually want the first entry. I discovered this to be the case when I sourced ~/.bashrc with . ~/.bashrc, which sourced ~/.bash_aliases with . ~/.bash_aliases, and I wanted the realpath (with expanded symlinks) to the ~/.bash_aliases file, NOT to the ~/.bashrc file. Since these are nested source calls, using "${BASH_SOURCE[0]}" gave me what I wanted: the expanded path to ~/.bash_aliases! Using "${BASH_SOURCE[-1]}", however, gave me what I did not want: the expanded path to ~/.bashrc.
Example command and output:
Running the script:
~/GS/dev/eRCaGuy_hello_world/bash$ ./get_script_path.sh
FULL_PATH_TO_SCRIPT = "/home/gabriel/GS/dev/eRCaGuy_hello_world/bash/get_script_path.sh"
SCRIPT_DIRECTORY = "/home/gabriel/GS/dev/eRCaGuy_hello_world/bash"
SCRIPT_FILENAME = "get_script_path.sh"
Sourcing the script with . get_script_path.sh or source get_script_path.sh (the result is the exact same as above because I used "${BASH_SOURCE[-1]}" in the script instead of "$0"):
~/GS/dev/eRCaGuy_hello_world/bash$ . get_script_path.sh
FULL_PATH_TO_SCRIPT = "/home/gabriel/GS/dev/eRCaGuy_hello_world/bash/get_script_path.sh"
SCRIPT_DIRECTORY = "/home/gabriel/GS/dev/eRCaGuy_hello_world/bash"
SCRIPT_FILENAME = "get_script_path.sh"
If you use "$0" in the script instead of "${BASH_SOURCE[-1]}", you'll get the same output as above when running the script, but this undesired output instead when sourcing the script:
~/GS/dev/eRCaGuy_hello_world/bash$ . get_script_path.sh
FULL_PATH_TO_SCRIPT = "/bin/bash"
SCRIPT_DIRECTORY = "/bin"
SCRIPT_FILENAME = "bash"
And, apparently if you use "$BASH_SOURCE" instead of "${BASH_SOURCE[-1]}", it will not work if the script is called from within another bash function. So, using "${BASH_SOURCE[-1]}" is therefore the best way to do it, as it solves both of these problems! See the references below.
Difference between realpath and realpath -s:
Note that realpath also successfully walks down symbolic links to determine and point to their targets rather than pointing to the symbolic link. If you do NOT want this behavior (sometimes I don't), then add -s to the realpath command above, making that line look like this instead:
# Obtain the full path, but do NOT expand (walk down) symbolic links; in
# other words: **keep** the symlinks as part of the path!
FULL_PATH_TO_SCRIPT="$(realpath -s "${BASH_SOURCE[-1]}")"
This way, symbolic links are NOT expanded. Rather, they are left as-is, as symbolic links in the full path.
The code above is now part of my eRCaGuy_hello_world repo in this file here: bash/get_script_path.sh. Reference and run this file for full examples both with and withOUT symlinks in the paths. See the bottom of the file for example output in both cases.
References:
How to retrieve absolute path given relative
taught me about the BASH_SOURCE variable: Unix & Linux: determining path to sourced shell script
taught me that BASH_SOURCE is actually an array, and we want the last element from it for it to work as expected inside a function (hence why I used "${BASH_SOURCE[-1]}" in my code here): Unix & Linux: determining path to sourced shell script
man bash --> search for BASH_SOURCE:
BASH_SOURCE
An array variable whose members are the source filenames where the corresponding shell function names in the FUNCNAME array variable are defined. The shell function ${FUNCNAME[$i]} is defined in the file ${BASH_SOURCE[$i]} and called from ${BASH_SOURCE[$i+1]}.
See also:
[my answer] Unix & Linux: determining path to sourced shell script
#!/bin/sh
PRG="$0"
# need this for relative symlinks
while [ -h "$PRG" ] ; do
PRG=`readlink "$PRG"`
done
scriptdir=`dirname "$PRG"`
Here is the simple, correct way:
actual_path=$(readlink -f "${BASH_SOURCE[0]}")
script_dir=$(dirname "$actual_path")
Explanation:
${BASH_SOURCE[0]} - the full path to the script. The value of this will be correct even when the script is being sourced, e.g. source <(echo 'echo $0') prints bash, while replacing it with ${BASH_SOURCE[0]} will print the full path of the script. (Of course, this assumes you're OK taking a dependency on Bash.)
readlink -f - Recursively resolves any symlinks in the specified path. This is a GNU extension, and not available on (for example) BSD systems. If you're running a Mac, you can use Homebrew to install GNU coreutils and supplant this with greadlink -f.
And of course dirname gets the parent directory of the path.
I tried all of these and none worked. One was very close, but it had a tiny bug that broke it badly; they forgot to wrap the path in quotation marks.
Also a lot of people assume you're running the script from a shell, so they forget when you open a new script it defaults to your home.
Try this directory on for size:
/var/No one/Thought/About Spaces Being/In a Directory/Name/And Here's your file.text
This gets it right regardless how or where you run it:
#!/bin/bash
echo "pwd: `pwd`"
echo "\$0: $0"
echo "basename: `basename "$0"`"
echo "dirname: `dirname "$0"`"
So to make it actually useful, here's how to change to the directory of the running script:
cd "`dirname "$0"`"
This is a slight revision to the solution e-satis and 3bcdnlklvc04a pointed out in their answer:
SCRIPT_DIR=''
pushd "$(dirname "$(readlink -f "$BASH_SOURCE")")" > /dev/null && {
SCRIPT_DIR="$PWD"
popd > /dev/null
}
This should still work in all the cases they listed.
This will prevent popd after a failed pushd. Thanks to konsolebox.
Try using:
real=$(realpath "$(dirname "$0")")
I would use something like this:
# Retrieve the full pathname of the called script
scriptPath=$(which $0)
# Check whether the path is a link or not
if [ -L $scriptPath ]; then
# It is a link then retrieve the target path and get the directory name
sourceDir=$(dirname $(readlink -f $scriptPath))
else
# Otherwise just get the directory name of the script path
sourceDir=$(dirname $scriptPath)
fi
For systems having GNU coreutils readlink (for example, Linux):
$(readlink -f "$(dirname "$0")")
There's no need to use BASH_SOURCE when $0 contains the script filename.
$_ is worth mentioning as an alternative to $0. If you're running a script from Bash, the accepted answer can be shortened to:
DIR="$( dirname "$_" )"
Note that this has to be the first statement in your script.
These are short ways to get script information:
Folders and files:
Script: "/tmp/src dir/test.sh"
Calling folder: "/tmp/src dir/other"
Using these commands:
echo Script-Dir : `dirname "$(realpath $0)"`
echo Script-Dir : $( cd ${0%/*} && pwd -P )
echo Script-Dir : $(dirname "$(readlink -f "$0")")
echo
echo Script-Name : `basename "$(realpath $0)"`
echo Script-Name : `basename $0`
echo
echo Script-Dir-Relative : `dirname "$BASH_SOURCE"`
echo Script-Dir-Relative : `dirname $0`
echo
echo Calling-Dir : `pwd`
And I got this output:
Script-Dir : /tmp/src dir
Script-Dir : /tmp/src dir
Script-Dir : /tmp/src dir
Script-Name : test.sh
Script-Name : test.sh
Script-Dir-Relative : ..
Script-Dir-Relative : ..
Calling-Dir : /tmp/src dir/other
Also see: https://pastebin.com/J8KjxrPF
This works in Bash 3.2:
path="$( dirname "$( which "$0" )" )"
If you have a ~/bin directory in your $PATH, you have A inside this directory. It sources the script ~/bin/lib/B. You know where the included script is relative to the original one, in the lib subdirectory, but not where it is relative to the user's current directory.
This is solved by the following (inside A):
source "$( dirname "$( which "$0" )" )/lib/B"
It doesn't matter where the user is or how he/she calls the script. This will always work.
I've compared many of the answers given, and came up with some more compact solutions. These seem to handle all of the crazy edge cases that arise from your favorite combination of:
Absolute paths or relative paths
File and directory soft links
Invocation as script, bash script, bash -c script, source script, or . script
Spaces, tabs, newlines, Unicode, etc. in directories and/or filename
Filenames beginning with a hyphen
If you're running from Linux, it seems that using the proc handle is the best solution to locate the fully resolved source of the currently running script (in an interactive session, the link points to the respective /dev/pts/X):
resolved="$(readlink /proc/$$/fd/255 && echo X)" && resolved="${resolved%$'\nX'}"
This has a small bit of ugliness to it, but the fix is compact and easy to understand. We aren't using bash primitives only, but I'm okay with that because readlink simplifies the task considerably. The echo X adds an X to the end of the variable string so that any trailing whitespace in the filename doesn't get eaten, and the parameter substitution ${VAR%X} at the end of the line gets rid of the X. Because readlink adds a newline of its own (which would normally be eaten in the command substitution if not for our previous trickery), we have to get rid of that, too. This is most easily accomplished using the $'' quoting scheme, which lets us use escape sequences such as \n to represent newlines (this is also how you can easily make deviously named directories and files).
The above should cover your needs for locating the currently running script on Linux, but if you don't have the proc filesystem at your disposal, or if you're trying to locate the fully resolved path of some other file, then maybe you'll find the below code helpful. It's only a slight modification from the above one-liner. If you're playing around with strange directory/filenames, checking the output with both ls and readlink is informative, as ls will output "simplified" paths, substituting ? for things like newlines.
absolute_path=$(readlink -e -- "${BASH_SOURCE[0]}" && echo x) && absolute_path=${absolute_path%?x}
dir=$(dirname -- "$absolute_path" && echo x) && dir=${dir%?x}
file=$(basename -- "$absolute_path" && echo x) && file=${file%?x}
ls -l -- "$dir/$file"
printf '$absolute_path: "%s"\n' "$absolute_path"
I believe I've got this one. I'm late to the party, but I think some will appreciate it being here if they come across this thread. The comments should explain:
#!/bin/sh # dash bash ksh # !zsh (issues). G. Nixon, 12/2013. Public domain.
## 'linkread' or 'fullpath' or (you choose) is a little tool to recursively
## dereference symbolic links (ala 'readlink') until the originating file
## is found. This is effectively the same function provided in stdlib.h as
## 'realpath' and on the command line in GNU 'readlink -f'.
## Neither of these tools, however, are particularly accessible on the many
## systems that do not have the GNU implementation of readlink, nor ship
## with a system compiler (not to mention the requisite knowledge of C).
## This script is written with portability and (to the extent possible, speed)
## in mind, hence the use of printf for echo and case statements where they
## can be substituded for test, though I've had to scale back a bit on that.
## It is (to the best of my knowledge) written in standard POSIX shell, and
## has been tested with bash-as-bin-sh, dash, and ksh93. zsh seems to have
## issues with it, though I'm not sure why; so probably best to avoid for now.
## Particularly useful (in fact, the reason I wrote this) is the fact that
## it can be used within a shell script to find the path of the script itself.
## (I am sure the shell knows this already; but most likely for the sake of
## security it is not made readily available. The implementation of "$0"
## specificies that the $0 must be the location of **last** symbolic link in
## a chain, or wherever it resides in the path.) This can be used for some
## ...interesting things, like self-duplicating and self-modifiying scripts.
## Currently supported are three errors: whether the file specified exists
## (ala ENOENT), whether its target exists/is accessible; and the special
## case of when a sybolic link references itself "foo -> foo": a common error
## for beginners, since 'ln' does not produce an error if the order of link
## and target are reversed on the command line. (See POSIX signal ELOOP.)
## It would probably be rather simple to write to use this as a basis for
## a pure shell implementation of the 'symlinks' util included with Linux.
## As an aside, the amount of code below **completely** belies the amount
## effort it took to get this right -- but I guess that's coding for you.
##===-------------------------------------------------------------------===##
for argv; do :; done # Last parameter on command line, for options parsing.
## Error messages. Use functions so that we can sub in when the error occurs.
recurses(){ printf "Self-referential:\n\t$argv ->\n\t$argv\n" ;}
dangling(){ printf "Broken symlink:\n\t$argv ->\n\t"$(readlink "$argv")"\n" ;}
errnoent(){ printf "No such file: "$#"\n" ;} # Borrow a horrible signal name.
# Probably best not to install as 'pathfull', if you can avoid it.
pathfull(){ cd "$(dirname "$#")"; link="$(readlink "$(basename "$#")")"
## 'test and 'ls' report different status for bad symlinks, so we use this.
if [ ! -e "$#" ]; then if $(ls -d "$#" 2>/dev/null) 2>/dev/null; then
errnoent 1>&2; exit 1; elif [ ! -e "$#" -a "$link" = "$#" ]; then
recurses 1>&2; exit 1; elif [ ! -e "$#" ] && [ ! -z "$link" ]; then
dangling 1>&2; exit 1; fi
fi
## Not a link, but there might be one in the path, so 'cd' and 'pwd'.
if [ -z "$link" ]; then if [ "$(dirname "$#" | cut -c1)" = '/' ]; then
printf "$#\n"; exit 0; else printf "$(pwd)/$(basename "$#")\n"; fi; exit 0
fi
## Walk the symlinks back to the origin. Calls itself recursivly as needed.
while [ "$link" ]; do
cd "$(dirname "$link")"; newlink="$(readlink "$(basename "$link")")"
case "$newlink" in
"$link") dangling 1>&2 && exit 1 ;;
'') printf "$(pwd)/$(basename "$link")\n"; exit 0 ;;
*) link="$newlink" && pathfull "$link" ;;
esac
done
printf "$(pwd)/$(basename "$newlink")\n"
}
## Demo. Install somewhere deep in the filesystem, then symlink somewhere
## else, symlink again (maybe with a different name) elsewhere, and link
## back into the directory you started in (or something.) The absolute path
## of the script will always be reported in the usage, along with "$0".
if [ -z "$argv" ]; then scriptname="$(pathfull "$0")"
# Yay ANSI l33t codes! Fancy.
printf "\n\033[3mfrom/as: \033[4m$0\033[0m\n\n\033[1mUSAGE:\033[0m "
printf "\033[4m$scriptname\033[24m [ link | file | dir ]\n\n "
printf "Recursive readlink for the authoritative file, symlink after "
printf "symlink.\n\n\n \033[4m$scriptname\033[24m\n\n "
printf " From within an invocation of a script, locate the script's "
printf "own file\n (no matter where it has been linked or "
printf "from where it is being called).\n\n"
else pathfull "$#"
fi
Try the following cross-compatible solution:
CWD="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)"
As the commands such as realpath or readlink could be not available (depending on the operating system).
Note: In Bash, it's recommended to use ${BASH_SOURCE[0]} instead of $0, otherwise path can break when sourcing the file (source/.).
Alternatively you can try the following function in Bash:
realpath () {
[[ $1 = /* ]] && echo "$1" || echo "$PWD/${1#./}"
}
This function takes one argument. If argument has already absolute path, print it as it is, otherwise print $PWD variable + filename argument (without ./ prefix).
Related:
How can I set the current working directory to the directory of the script in Bash?
Bash script absolute path with OS X
Reliable way for a Bash script to get the full path to itself
Am trying to grep pattern from dozen files .tar.gz but its very slow
am using
tar -ztf file.tar.gz | while read FILENAME
do
if tar -zxf file.tar.gz "$FILENAME" -O | grep "string" > /dev/null
then
echo "$FILENAME contains string"
fi
done
If you have zgrep you can use
zgrep -a string file.tar.gz
You can use the --to-command option to pipe files to an arbitrary script. Using this you can process the archive in a single pass (and without a temporary file). See also this question, and the manual.
Armed with the above information, you could try something like:
$ tar xf file.tar.gz --to-command "awk '/bar/ { print ENVIRON[\"TAR_FILENAME\"]; exit }'"
bfe2/.bferc
bfe2/CHANGELOG
bfe2/README.bferc
I know this question is 4 years old, but I have a couple different options:
Option 1: Using tar --to-command grep
The following line will look in example.tgz for PATTERN. This is similar to #Jester's example, but I couldn't get his pattern matching to work.
tar xzf example.tgz --to-command 'grep --label="$TAR_FILENAME" -H PATTERN ; true'
Option 2: Using tar -tzf
The second option is using tar -tzf to list the files, then go through them with grep. You can create a function to use it over and over:
targrep () {
for i in $(tar -tzf "$1"); do
results=$(tar -Oxzf "$1" "$i" | grep --label="$i" -H "$2")
echo "$results"
done
}
Usage:
targrep example.tar.gz "pattern"
Both the below options work well.
$ zgrep -ai 'CDF_FEED' FeedService.log.1.05-31-2019-150003.tar.gz | more
2019-05-30 19:20:14.568 ERROR 281 --- [http-nio-8007-exec-360] DrupalFeedService : CDF_FEED_SERVICE::CLASSIFICATION_ERROR:408: Classification failed even after maximum retries for url : abcd.html
$ zcat FeedService.log.1.05-31-2019-150003.tar.gz | grep -ai 'CDF_FEED'
2019-05-30 19:20:14.568 ERROR 281 --- [http-nio-8007-exec-360] DrupalFeedService : CDF_FEED_SERVICE::CLASSIFICATION_ERROR:408: Classification failed even after maximum retries for url : abcd.html
If this is really slow, I suspect you're dealing with a large archive file. It's going to uncompress it once to extract the file list, and then uncompress it N times--where N is the number of files in the archive--for the grep. In addition to all the uncompressing, it's going to have to scan a fair bit into the archive each time to extract each file. One of tar's biggest drawbacks is that there is no table of contents at the beginning. There's no efficient way to get information about all the files in the archive and only read that portion of the file. It essentially has to read all of the file up to the thing you're extracting every time; it can't just jump to a filename's location right away.
The easiest thing you can do to speed this up would be to uncompress the file first (gunzip file.tar.gz) and then work on the .tar file. That might help enough by itself. It's still going to loop through the entire archive N times, though.
If you really want this to be efficient, your only option is to completely extract everything in the archive before processing it. Since your problem is speed, I suspect this is a giant file that you don't want to extract first, but if you can, this will speed things up a lot:
tar zxf file.tar.gz
for f in hopefullySomeSubdir/*; do
grep -l "string" $f
done
Note that grep -l prints the name of any matching file, quits after the first match, and is silent if there's no match. That alone will speed up the grepping portion of your command, so even if you don't have the space to extract the entire archive, grep -l will help. If the files are huge, it will help a lot.
For starters, you could start more than one process:
tar -ztf file.tar.gz | while read FILENAME
do
(if tar -zxf file.tar.gz "$FILENAME" -O | grep -l "string"
then
echo "$FILENAME contains string"
fi) &
done
The ( ... ) & creates a new detached (read: the parent shell does not wait for the child)
process.
After that, you should optimize the extracting of your archive. The read is no problem,
as the OS should have cached the file access already. However, tar needs to unpack
the archive every time the loop runs, which can be slow. Unpacking the archive once
and iterating over the result may help here:
local tempPath=`tempfile`
mkdir $tempPath && tar -zxf file.tar.gz -C $tempPath &&
find $tempPath -type f | while read FILENAME
do
(if grep -l "string" "$FILENAME"
then
echo "$FILENAME contains string"
fi) &
done && rm -r $tempPath
find is used here, to get a list of files in the target directory of tar, which we're iterating over, for each file searching for a string.
Edit: Use grep -l to speed up things, as Jim pointed out. From man grep:
-l, --files-with-matches
Suppress normal output; instead print the name of each input file from which output would
normally have been printed. The scanning will stop on the first match. (-l is specified
by POSIX.)
Am trying to grep pattern from dozen files .tar.gz but its very slow
tar -ztf file.tar.gz | while read FILENAME
do
if tar -zxf file.tar.gz "$FILENAME" -O | grep "string" > /dev/null
then
echo "$FILENAME contains string"
fi
done
That's actually very easy with ugrep option -z:
-z, --decompress
Decompress files to search, when compressed. Archives (.cpio,
.pax, .tar, and .zip) and compressed archives (e.g. .taz, .tgz,
.tpz, .tbz, .tbz2, .tb2, .tz2, .tlz, and .txz) are searched and
matching pathnames of files in archives are output in braces. If
-g, -O, -M, or -t is specified, searches files within archives
whose name matches globs, matches file name extensions, matches
file signature magic bytes, or matches file types, respectively.
Supported compression formats: gzip (.gz), compress (.Z), zip,
bzip2 (requires suffix .bz, .bz2, .bzip2, .tbz, .tbz2, .tb2, .tz2),
lzma and xz (requires suffix .lzma, .tlz, .xz, .txz).
Which requires just one command to search file.tar.gz as follows:
ugrep -z "string" file.tar.gz
This greps each of the archived files to display matches. Archived filenames are shown in braces to distinguish them from ordinary filenames. For example:
$ ugrep -z "Hello" archive.tgz
{Hello.bat}:echo "Hello World!"
Binary file archive.tgz{Hello.class} matches
{Hello.java}:public class Hello // prints a Hello World! greeting
{Hello.java}: { System.out.println("Hello World!");
{Hello.pdf}:(Hello)
{Hello.sh}:echo "Hello World!"
{Hello.txt}:Hello
If you just want the file names, use option -l (--files-with-matches) and customize the filename output with option --format="%z%~" to get rid of the braces:
$ ugrep -z Hello -l --format="%z%~" archive.tgz
Hello.bat
Hello.class
Hello.java
Hello.pdf
Hello.sh
Hello.txt
All of the code above was really helpful, but none of it quite answered my own need: grep all *.tar.gz files in the current directory to find a pattern that is specified as an argument in a reusable script to output:
The name of both the archive file and the extracted file
The line number where the pattern was found
The contents of the matching line
It's what I was really hoping that zgrep could do for me and it just can't.
Here's my solution:
pattern=$1
for f in *.tar.gz; do
echo "$f:"
tar -xzf "$f" --to-command 'grep --label="`basename $TAR_FILENAME`" -Hin '"$pattern ; true";
done
You can also replace the tar line with the following if you'd like to test that all variables are expanding properly with a basic echo statement:
tar -xzf "$f" --to-command 'echo "f:`basename $TAR_FILENAME` s:'"$pattern\""
Let me explain what's going on. Hopefully, the for loop and the echo of the archive filename in question is obvious.
tar -xzf: x extract, z filter through gzip, f based on the following archive file...
"$f": The archive file provided by the for loop (such as what you'd get by doing an ls) in double-quotes to allow the variable to expand and ensure that the script is not broken by any file names with spaces, etc.
--to-command: Pass the output of the tar command to another command rather than actually extracting files to the filesystem. Everything after this specifies what the command is (grep) and what arguments we're passing to that command.
Let's break that part down by itself, since it's the "secret sauce" here.
'grep --label="`basename $TAR_FILENAME`" -Hin '"$pattern ; true"
First, we use a single-quote to start this chunk so that the executed sub-command (basename $TAR_FILENAME) is not immediately expanded/resolved. More on that in a moment.
grep: The command to be run on the (not actually) extracted files
--label=: The label to prepend the results, the value of which is enclosed in double-quotes because we do want to have the grep command resolve the $TAR_FILENAME environment variable passed in by the tar command.
basename $TAR_FILENAME: Runs as a command (surrounded by backticks) and removes directory path and outputs only the name of the file
-Hin: H Display filename (provided by the label), i Case insensitive search, n Display line number of match
Then we "end" the first part of the command string with a single quote and start up the next part with a double quote so that the $pattern, passed in as the first argument, can be resolved.
Realizing which quotes I needed to use where was the part that tripped me up the longest. Hopefully, this all makes sense to you and helps someone else out. Also, I hope I can find this in a year when I need it again (and I've forgotten about the script I made for it already!)
And it's been a bit a couple of weeks since I wrote the above and it's still super useful... but it wasn't quite good enough as files have piled up and searching for things has gotten more messy. I needed a way to limit what I looked at by the date of the file (only looking at more recent files). So here's that code. Hopefully it's fairly self-explanatory.
if [ -z "$1" ]; then
echo "Look within all tar.gz files for a string pattern, optionally only in recent files"
echo "Usage: targrep <string to search for> [start date]"
fi
pattern=$1
startdatein=$2
startdate=$(date -d "$startdatein" +%s)
for f in *.tar.gz; do
filedate=$(date -r "$f" +%s)
if [[ -z "$startdatein" ]] || [[ $filedate -ge $startdate ]]; then
echo "$f:"
tar -xzf "$f" --to-command 'grep --label="`basename $TAR_FILENAME`" -Hin '"$pattern ; true"
fi
done
And I can't stop tweaking this thing. I added an argument to filter by the name of the output files in the tar file. Wildcards work, too.
Usage:
targrep.sh [-d <start date>] [-f <filename to include>] <string to search for>
Example:
targrep.sh -d "1/1/2019" -f "*vehicle_models.csv" ford
while getopts "d:f:" opt; do
case $opt in
d) startdatein=$OPTARG;;
f) targetfile=$OPTARG;;
esac
done
shift "$((OPTIND-1))" # Discard options and bring forward remaining arguments
pattern=$1
echo "Searching for: $pattern"
if [[ -n $targetfile ]]; then
echo "in filenames: $targetfile"
fi
startdate=$(date -d "$startdatein" +%s)
for f in *.tar.gz; do
filedate=$(date -r "$f" +%s)
if [[ -z "$startdatein" ]] || [[ $filedate -ge $startdate ]]; then
echo "$f:"
if [[ -z "$targetfile" ]]; then
tar -xzf "$f" --to-command 'grep --label="`basename $TAR_FILENAME`" -Hin '"$pattern ; true"
else
tar -xzf "$f" --no-anchored "$targetfile" --to-command 'grep --label="`basename $TAR_FILENAME`" -Hin '"$pattern ; true"
fi
fi
done
zgrep works fine for me, only if all files inside is plain text.
it looks nothing works if the tgz file contains gzip files.
You can mount the TAR archive with ratarmount and then simply search for the pattern in the mounted view:
pip install --user ratarmount
ratarmount large-archive.tar mountpoint
grep -r '<pattern>' mountpoint/
This is much faster than iterating over each file and piping it to grep separately, especially for compressed TARs. Here are benchmark results in seconds for a 55 MiB uncompressed and 42 MiB compressed TAR archive containing 40 files:
Compression
Ratarmount
Bash Loop over tar -O
none
0.31 +- 0.01
0.55 +- 0.02
gzip
1.1 +- 0.1
13.5 +- 0.1
bzip2
1.2 +- 0.1
97.8 +- 0.2
Of course, these results are highly dependent on the archive size and how many files the archive contains. These test examples are pretty small because I didn't want to wait too long. But, they already exemplify the problem well enough. The more files there are, the longer it takes for tar -O to jump to the correct file. And for compressed archives, it will be quadratically slower the larger the archive size is because everything before the requested file has to be decompressed and each file is requested separately. Both of these problems are solved by ratarmount.
This is the code for benchmarking:
function checkFilesWithRatarmount()
{
local pattern=$1
local archive=$2
ratarmount "$archive" "$archive.mountpoint"
'grep' -r -l "$pattern" "$archive.mountpoint/"
}
function checkEachFileViaStdOut()
{
local pattern=$1
local archive=$2
tar --list --file "$archive" | while read -r file; do
if tar -x --file "$archive" -O -- "$file" | grep -q "$pattern"; then
echo "Found pattern in: $file"
fi
done
}
function createSampleTar()
{
for i in $( seq 40 ); do
head -c $(( 1024 * 1024 )) /dev/urandom | base64 > $i.dat
done
tar -czf "$1" [0-9]*.dat
}
createSampleTar myarchive.tar.gz
time checkEachFileViaStdOut ABCD myarchive.tar.gz
time checkFilesWithRatarmount ABCD myarchive.tar.gz
sleep 0.5s
fusermount -u myarchive.tar.gz.mountpoint
In my case the tarballs have a lot of tiny files and I want to know what archived file inside the tarball matches. zgrep is fast (less than one second) but doesn't provide the info I want, and tar --to-command grep is much, much slower (many minutes)1.
So I went the other direction and had zgrep tell me the byte offsets of the matches in the tarball and put that together with the list of offsets in the tarball of all archived files to find the matching archived files.
#!/bin/bash
set -e
set -o pipefail
function tar_offsets() {
# Get the byte offsets of all the files in a given tarball
# based on https://stackoverflow.com/a/49865044/60422
[ $# -eq 1 ]
tar -tvf "$1" -R | awk '
BEGIN{
getline;
f=$8;
s=$5;
}
{
offset = int($2) * 512 - and((s+511), compl(512)+1)
print offset,s,f;
f=$8;
s=$5;
}'
}
function tar_byte_offsets_to_files() {
[ $# -eq 1 ]
# Convert the search results of a tarball with byte offsets
# to search results with archived file name and offset, using
# the provided tar_offsets output (single pass, suitable for
# process substitution)
offsets_file="$1"
prev_offset=0
prev_offset_filename=""
IFS=' ' read -r last_offset last_len last_offset_filename < "$offsets_file"
while IFS=':' read -r search_result_offset match_text
do
while [ $last_offset -lt $search_result_offset ]; do
prev_offset=$last_offset
prev_offset_filename="$last_offset_filename"
IFS=' ' read -r last_offset last_len last_offset_filename < "$offsets_file"
# offsets increasing safeguard
[ $prev_offset -le $last_offset ]
done
# now last offset is the first file strictly after search result offset so prev offset is
# the one at or before it, and must be the one it is in
result_file_offset=$(( $search_result_offset - $prev_offset ))
echo "$prev_offset_filename:$result_file_offset:$match_text"
done
}
# Putting it together e.g.
zgrep -a --byte-offset "your search here" some.tgz | tar_byte_offsets_to_files <(tar_offsets some.tgz)
1 I'm running this in Git for Windows' minimal MSYS2 fork unixy environment, so it's possible that the launch overhead of grep is much much higher than on any kind of real Unix machine and would make `tar --to-command grep` good enough there; benchmark solutions for your own needs and platform situation before selecting.