Removing com.apple.quarantine from files on Linux - linux

From time to time a user uploads a file with a tag "com.apple.quarantine". This is added, I think, when the user has downloaded a file onto his computer from the internet.
My question is, how do I remove this from a file if I'm on Linux?
Thanks

Use setfattr. On linux the extended attribute should be in the "user." namespace (your mileage may vary):
setfattr -x 'user.com.apple.quarantine' file1 [ file2 [ ... ] ]
Unfortunately, the -xattr predicate hasn't made it into GNU find yet so processing a complete hierarchy involves a brute-force-and-ignorance approach looking something like this:
cd /path/to/search
errors=/var/tmp/setfattr.errors
find . -exec setfattr -x 'user.com.apple.quarantine' {} + 2> "$errors"
After which the $errors file should only contain entries for files which didn't have the relevant attribute:
grep -v 'No such attribute' -- "$errors"

touch /tmp/com.apple.quarantine.test1
touch /tmp/com.apple.quarantine.test2
Then run following codes.
for f in $(find /tmp/ -type f|grep -i 'com.apple.quarantine');
do
OLD_NAME=$(echo $f|awk -F "/" '{print $NF'})
NEW_NAME=$(echo $OLD_NAME|sed "s/com\.apple\.quarantine\.//g")
echo $NEW_NAME
DIR_NAME=$(dirname $f)
cd $DIR_NAME
mv "$OLD_NAME" "$NEW_NAME"
done
Now there is only test1 and test2 at under the /tmp file.

Related

Deleting all files except ones mentioned in config file

Situation:
I need a bash script that deletes all files in the current folder, except all the files mentioned in a file called ".rmignore". This file may contain addresses relative to the current folder, that might also contain asterisks(*). For example:
1.php
2/1.php
1/*.php
What I've tried:
I tried to use GLOBIGNORE but that didn't work well.
I also tried to use find with grep, like follows:
find . | grep -Fxv $(echo $(cat .rmignore) | tr ' ' "\n")
It is considered bad practice to pipe the exit of find to another command. You can use -exec, -execdir followed by the command and '{}' as a placeholder for the file, and ';' to indicate the end of your command. You can also use '+' to pipe commands together IIRC.
In your case, you want to list all the contend of a directory, and remove files one by one.
#!/usr/bin/env bash
set -o nounset
set -o errexit
shopt -s nullglob # allows glob to expand to nothing if no match
shopt -s globstar # process recursively current directory
my:rm_all() {
local ignore_file=".rmignore"
local ignore_array=()
while read -r glob; # Generate files list
do
ignore_array+=(${glob});
done < "${ignore_file}"
echo "${ignore_array[#]}"
for file in **; # iterate over all the content of the current directory
do
if [ -f "${file}" ]; # file exist and is file
then
local do_rmfile=true;
# Remove only if matches regex
for ignore in "${ignore_array[#]}"; # Iterate over files to keep
do
[[ "${file}" == "${ignore}" ]] && do_rmfile=false; #rm ${file};
done
${do_rmfile} && echo "Removing ${file}"
fi
done
}
my:rm_all;
If we assume that none of the files in .rmignore contain newlines in their name, the following might suffice:
# Gather our exclusions...
mapfile -t excl < .rmignore
# Reverse the array (put data in indexes)
declare -A arr=()
for file in "${excl[#]}"; do arr[$file]=1; done
# Walk through files, deleting anything that's not in the associative array.
shopt -s globstar
for file in **; do
[ -n "${arr[$file]}" ] && continue
echo rm -fv "$file"
done
Note: untested. :-) Also, associative arrays were introduced with Bash 4.
An alternate method might be to populate an array with the whole file list, then remove the exclusions. This might be impractical if you're dealing with hundreds of thousands of files.
shopt -s globstar
declare -A filelist=()
# Build a list of all files...
for file in **; do filelist[$file]=1; done
# Remove files to be ignored.
while read -r file; do unset filelist[$file]; done < .rmignore
# Annd .. delete.
echo rm -v "${!filelist[#]}"
Also untested.
Warning: rm at your own risk. May contain nuts. Keep backups.
I note that neither of these solutions will handle wildcards in your .rmignore file. For that, you might need some extra processing...
shopt -s globstar
declare -A filelist=()
# Build a list...
for file in **; do filelist[$file]=1; done
# Remove PATTERNS...
while read -r glob; do
for file in $glob; do
unset filelist[$file]
done
done < .rmignore
# And remove whatever's left.
echo rm -v "${!filelist[#]}"
And .. you guessed it. Untested. This depends on $f expanding as a glob.
Lastly, if you want a heavier-weight solution, you can use find and grep:
find . -type f -not -exec grep -q -f '{}' .rmignore \; -delete
This runs a grep for EACH file being considered. And it's not a bash solution, it only relies on find which is pretty universal.
Note that ALL of these solutions are at risk of errors if you have files that contain newlines.
This line do perfectly the job
find . -type f | grep -vFf .rmignore
If you have rsync, you might be able to copy an empty directory to the target one, with suitable rsync ignore files. Try it first with -n, to see what it will attempt, before running it for real!
This is another bash solution that seems to work ok in my tests:
while read -r line;do
exclude+=$(find . -type f -path "./$line")$'\n'
done <.rmignore
echo "ignored files:"
printf '%s\n' "$exclude"
echo "files to be deleted"
echo rm $(LC_ALL=C sort <(find . -type f) <(printf '%s\n' "$exclude") |uniq -u ) #intentionally non quoted to remove new lines
Test it online here
Alternatively, you may want to look at the simplest format:
rm $(ls -1 | grep -v .rmignore)

Bash: Move files to specific folder if name contains one of a list of strings

I have a script that queries the Twitter API for several queries, and then writes the raw data to a file with the query in the name, plus a timestamp. I'd like to have a script that, given the list of query strings (regexs?) and for all files in a folder, if one of the query strings is a substring in that file, move it to a specific folder. Right now I have just a script with just a few dozen mv commands, but I'd like a simpler and more maintainable version. Here's an example of what I'm doing now:
mv /home/nick/TwitterSearchToDatabase/queries_for_amita/*femin*/home/nick/TwitterSearchToDatabase/queries_for_amita/feminism
mv /home/nick/TwitterSearchToDatabase/queries_for_amita/*patriarchy* /home/nick/TwitterSearchToDatabase/queries_for_amita/feminism
mv /home/nick/TwitterSearchToDatabase/queries_for_amita/*yesallwomen* /home/nick/TwitterSearchToDatabase/queries_for_amita/feminism
mv /home/nick/TwitterSearchToDatabase/queries_for_amita/*womanpower* /home/nick/TwitterSearchToDatabase/queries_for_amita/feminism
I would use a for loop:
for i in femin patriarchy yesallwomen womanpower; do
mv /home/nick/TwitterSearchToDatabase/queries_for_amita/*$i* /home/nick/TwitterSearchToDatabase/queries_for_amita/feminism
done
That way the list is in the first line so it is easy to amend.
I would isolate data (the words to be moved to feminism) and code.
When you have more keywords (feminism and so), you can make files with keywords and check these keywordfiles for the files you are considering to move.
With ${fromdir} where the files come from, ${todir} where you want them and ${keyfiledir} with the keywords, you get something like
for keyfile in ${keyfiledir}/*; do
key="${keyfile##*/}"
find $from -type f | sed 's#.*/##' | while read -r file; do
echo "${file}" | grep -q -f "${keyfiledir}"/"${key}" && mv "${from}"/"${file}" "${to}"/"${key}"
done
done
How does that work? I tested the solution above with the following script.
from=fromdir
to=todir
keyfiledir=keyfiledir
rm -rf ${from} ${to} ${keyfiledir}
mkdir ${from} ${to} ${keyfiledir}
mkdir ${to}/feminism ${to}/so
touch ${from}/yesallwomen ${from}/women ${from}/some_femin ${from}/"help move"
cat <<# > ${keyfiledir}/feminism
femin
patriarchy
yesallwomen
womanpower
#
touch ${from}/yesallwomen ${from}/women ${from}/some_femin
cat <<# > ${keyfiledir}/so
stack
exchange
help
#
test ! -d "${from}" && echo " Wrong dir ${from}" && exit 1
test ! -d "${to}" && echo " Wrong dir ${to}" && exit 1
test ! -d "${keyfiledir}" && echo " Wrong dir ${keyfiledir}" && exit 1
for keyfile in ${keyfiledir}/*; do
key="${keyfile##*/}"
find $from -type f | sed 's#.*/##' | while read -r file; do
echo "${file}" | grep -q -f "${keyfiledir}"/"${key}" && mv "${from}"/"${file}" "${to}"/"${key}"
done
done
echo "Not moved"
ls ${from}
echo "Moved"
ls -R ${to}
A simple combination of mv and egrep should suffice. egrep can take a pattern list from a file (and then you get to use full regexp syntax, not just glob syntax.) Make sure to exclude the name of the target folder.
cd /home/nick/TwitterSearchToDatabase/queries_for_amita
mv $(ls | egrep -f patterns.txt | grep -v '^feminism$') feminism

How do I search for a file based on what is output by a command running on that file

I am working on a project for one of my professors and he asked me to sort a couple hundred .fits images based on their header files (specifically what star they are images of) I think that grep would be the best way to do this however I can't seam to figure out how to use grep based on the header.
I am entering:
ls | imhead *.fits | grep -E -r "PG\ 1104+243" *
to just list them out for now, once they are listed I know how to copy them into a directory.
I am new to using grep so I am unsure as to where my error lies? any help would be greatly appreciated! Thanks!
Assuming that imghead will extract the headers of the .fits as txt, you can use a simple shell script to do it:
script.sh
#!/bin/bash
grep "$1" "$2" > /dev/null 2>&1 && echo "$2"
Note that the + is a special character if you use extended regular expression, meaning if you pass the -E as in the question. A simple grep without any options should do the trick here.
Use find to exec the script on every *.fits file in the current folder:
find -maxdepth 1 -name '*.fits' -exec ./script.sh 'PG 1104+243' {} \;
If you are going to copy/move/alter or do something with the files you find, you might be better off, in terms of complexity and ease of quoting, using a loop like this:
#!/bin/bash
find . -name \*.fits -print0 | while read -d '' -r file; do
echo Checking file: $file
imhead "$file" | grep -q 'PG 1104+243'
if [ $? -eq 0 ]; then
echo Object matches: $file
fi
done

grep from tar.gz without extracting [faster one]

Am trying to grep pattern from dozen files .tar.gz but its very slow
am using
tar -ztf file.tar.gz | while read FILENAME
do
if tar -zxf file.tar.gz "$FILENAME" -O | grep "string" > /dev/null
then
echo "$FILENAME contains string"
fi
done
If you have zgrep you can use
zgrep -a string file.tar.gz
You can use the --to-command option to pipe files to an arbitrary script. Using this you can process the archive in a single pass (and without a temporary file). See also this question, and the manual.
Armed with the above information, you could try something like:
$ tar xf file.tar.gz --to-command "awk '/bar/ { print ENVIRON[\"TAR_FILENAME\"]; exit }'"
bfe2/.bferc
bfe2/CHANGELOG
bfe2/README.bferc
I know this question is 4 years old, but I have a couple different options:
Option 1: Using tar --to-command grep
The following line will look in example.tgz for PATTERN. This is similar to #Jester's example, but I couldn't get his pattern matching to work.
tar xzf example.tgz --to-command 'grep --label="$TAR_FILENAME" -H PATTERN ; true'
Option 2: Using tar -tzf
The second option is using tar -tzf to list the files, then go through them with grep. You can create a function to use it over and over:
targrep () {
for i in $(tar -tzf "$1"); do
results=$(tar -Oxzf "$1" "$i" | grep --label="$i" -H "$2")
echo "$results"
done
}
Usage:
targrep example.tar.gz "pattern"
Both the below options work well.
$ zgrep -ai 'CDF_FEED' FeedService.log.1.05-31-2019-150003.tar.gz | more
2019-05-30 19:20:14.568 ERROR 281 --- [http-nio-8007-exec-360] DrupalFeedService : CDF_FEED_SERVICE::CLASSIFICATION_ERROR:408: Classification failed even after maximum retries for url : abcd.html
$ zcat FeedService.log.1.05-31-2019-150003.tar.gz | grep -ai 'CDF_FEED'
2019-05-30 19:20:14.568 ERROR 281 --- [http-nio-8007-exec-360] DrupalFeedService : CDF_FEED_SERVICE::CLASSIFICATION_ERROR:408: Classification failed even after maximum retries for url : abcd.html
If this is really slow, I suspect you're dealing with a large archive file. It's going to uncompress it once to extract the file list, and then uncompress it N times--where N is the number of files in the archive--for the grep. In addition to all the uncompressing, it's going to have to scan a fair bit into the archive each time to extract each file. One of tar's biggest drawbacks is that there is no table of contents at the beginning. There's no efficient way to get information about all the files in the archive and only read that portion of the file. It essentially has to read all of the file up to the thing you're extracting every time; it can't just jump to a filename's location right away.
The easiest thing you can do to speed this up would be to uncompress the file first (gunzip file.tar.gz) and then work on the .tar file. That might help enough by itself. It's still going to loop through the entire archive N times, though.
If you really want this to be efficient, your only option is to completely extract everything in the archive before processing it. Since your problem is speed, I suspect this is a giant file that you don't want to extract first, but if you can, this will speed things up a lot:
tar zxf file.tar.gz
for f in hopefullySomeSubdir/*; do
grep -l "string" $f
done
Note that grep -l prints the name of any matching file, quits after the first match, and is silent if there's no match. That alone will speed up the grepping portion of your command, so even if you don't have the space to extract the entire archive, grep -l will help. If the files are huge, it will help a lot.
For starters, you could start more than one process:
tar -ztf file.tar.gz | while read FILENAME
do
(if tar -zxf file.tar.gz "$FILENAME" -O | grep -l "string"
then
echo "$FILENAME contains string"
fi) &
done
The ( ... ) & creates a new detached (read: the parent shell does not wait for the child)
process.
After that, you should optimize the extracting of your archive. The read is no problem,
as the OS should have cached the file access already. However, tar needs to unpack
the archive every time the loop runs, which can be slow. Unpacking the archive once
and iterating over the result may help here:
local tempPath=`tempfile`
mkdir $tempPath && tar -zxf file.tar.gz -C $tempPath &&
find $tempPath -type f | while read FILENAME
do
(if grep -l "string" "$FILENAME"
then
echo "$FILENAME contains string"
fi) &
done && rm -r $tempPath
find is used here, to get a list of files in the target directory of tar, which we're iterating over, for each file searching for a string.
Edit: Use grep -l to speed up things, as Jim pointed out. From man grep:
-l, --files-with-matches
Suppress normal output; instead print the name of each input file from which output would
normally have been printed. The scanning will stop on the first match. (-l is specified
by POSIX.)
Am trying to grep pattern from dozen files .tar.gz but its very slow
tar -ztf file.tar.gz | while read FILENAME
do
if tar -zxf file.tar.gz "$FILENAME" -O | grep "string" > /dev/null
then
echo "$FILENAME contains string"
fi
done
That's actually very easy with ugrep option -z:
-z, --decompress
Decompress files to search, when compressed. Archives (.cpio,
.pax, .tar, and .zip) and compressed archives (e.g. .taz, .tgz,
.tpz, .tbz, .tbz2, .tb2, .tz2, .tlz, and .txz) are searched and
matching pathnames of files in archives are output in braces. If
-g, -O, -M, or -t is specified, searches files within archives
whose name matches globs, matches file name extensions, matches
file signature magic bytes, or matches file types, respectively.
Supported compression formats: gzip (.gz), compress (.Z), zip,
bzip2 (requires suffix .bz, .bz2, .bzip2, .tbz, .tbz2, .tb2, .tz2),
lzma and xz (requires suffix .lzma, .tlz, .xz, .txz).
Which requires just one command to search file.tar.gz as follows:
ugrep -z "string" file.tar.gz
This greps each of the archived files to display matches. Archived filenames are shown in braces to distinguish them from ordinary filenames. For example:
$ ugrep -z "Hello" archive.tgz
{Hello.bat}:echo "Hello World!"
Binary file archive.tgz{Hello.class} matches
{Hello.java}:public class Hello // prints a Hello World! greeting
{Hello.java}: { System.out.println("Hello World!");
{Hello.pdf}:(Hello)
{Hello.sh}:echo "Hello World!"
{Hello.txt}:Hello
If you just want the file names, use option -l (--files-with-matches) and customize the filename output with option --format="%z%~" to get rid of the braces:
$ ugrep -z Hello -l --format="%z%~" archive.tgz
Hello.bat
Hello.class
Hello.java
Hello.pdf
Hello.sh
Hello.txt
All of the code above was really helpful, but none of it quite answered my own need: grep all *.tar.gz files in the current directory to find a pattern that is specified as an argument in a reusable script to output:
The name of both the archive file and the extracted file
The line number where the pattern was found
The contents of the matching line
It's what I was really hoping that zgrep could do for me and it just can't.
Here's my solution:
pattern=$1
for f in *.tar.gz; do
echo "$f:"
tar -xzf "$f" --to-command 'grep --label="`basename $TAR_FILENAME`" -Hin '"$pattern ; true";
done
You can also replace the tar line with the following if you'd like to test that all variables are expanding properly with a basic echo statement:
tar -xzf "$f" --to-command 'echo "f:`basename $TAR_FILENAME` s:'"$pattern\""
Let me explain what's going on. Hopefully, the for loop and the echo of the archive filename in question is obvious.
tar -xzf: x extract, z filter through gzip, f based on the following archive file...
"$f": The archive file provided by the for loop (such as what you'd get by doing an ls) in double-quotes to allow the variable to expand and ensure that the script is not broken by any file names with spaces, etc.
--to-command: Pass the output of the tar command to another command rather than actually extracting files to the filesystem. Everything after this specifies what the command is (grep) and what arguments we're passing to that command.
Let's break that part down by itself, since it's the "secret sauce" here.
'grep --label="`basename $TAR_FILENAME`" -Hin '"$pattern ; true"
First, we use a single-quote to start this chunk so that the executed sub-command (basename $TAR_FILENAME) is not immediately expanded/resolved. More on that in a moment.
grep: The command to be run on the (not actually) extracted files
--label=: The label to prepend the results, the value of which is enclosed in double-quotes because we do want to have the grep command resolve the $TAR_FILENAME environment variable passed in by the tar command.
basename $TAR_FILENAME: Runs as a command (surrounded by backticks) and removes directory path and outputs only the name of the file
-Hin: H Display filename (provided by the label), i Case insensitive search, n Display line number of match
Then we "end" the first part of the command string with a single quote and start up the next part with a double quote so that the $pattern, passed in as the first argument, can be resolved.
Realizing which quotes I needed to use where was the part that tripped me up the longest. Hopefully, this all makes sense to you and helps someone else out. Also, I hope I can find this in a year when I need it again (and I've forgotten about the script I made for it already!)
And it's been a bit a couple of weeks since I wrote the above and it's still super useful... but it wasn't quite good enough as files have piled up and searching for things has gotten more messy. I needed a way to limit what I looked at by the date of the file (only looking at more recent files). So here's that code. Hopefully it's fairly self-explanatory.
if [ -z "$1" ]; then
echo "Look within all tar.gz files for a string pattern, optionally only in recent files"
echo "Usage: targrep <string to search for> [start date]"
fi
pattern=$1
startdatein=$2
startdate=$(date -d "$startdatein" +%s)
for f in *.tar.gz; do
filedate=$(date -r "$f" +%s)
if [[ -z "$startdatein" ]] || [[ $filedate -ge $startdate ]]; then
echo "$f:"
tar -xzf "$f" --to-command 'grep --label="`basename $TAR_FILENAME`" -Hin '"$pattern ; true"
fi
done
And I can't stop tweaking this thing. I added an argument to filter by the name of the output files in the tar file. Wildcards work, too.
Usage:
targrep.sh [-d <start date>] [-f <filename to include>] <string to search for>
Example:
targrep.sh -d "1/1/2019" -f "*vehicle_models.csv" ford
while getopts "d:f:" opt; do
case $opt in
d) startdatein=$OPTARG;;
f) targetfile=$OPTARG;;
esac
done
shift "$((OPTIND-1))" # Discard options and bring forward remaining arguments
pattern=$1
echo "Searching for: $pattern"
if [[ -n $targetfile ]]; then
echo "in filenames: $targetfile"
fi
startdate=$(date -d "$startdatein" +%s)
for f in *.tar.gz; do
filedate=$(date -r "$f" +%s)
if [[ -z "$startdatein" ]] || [[ $filedate -ge $startdate ]]; then
echo "$f:"
if [[ -z "$targetfile" ]]; then
tar -xzf "$f" --to-command 'grep --label="`basename $TAR_FILENAME`" -Hin '"$pattern ; true"
else
tar -xzf "$f" --no-anchored "$targetfile" --to-command 'grep --label="`basename $TAR_FILENAME`" -Hin '"$pattern ; true"
fi
fi
done
zgrep works fine for me, only if all files inside is plain text.
it looks nothing works if the tgz file contains gzip files.
You can mount the TAR archive with ratarmount and then simply search for the pattern in the mounted view:
pip install --user ratarmount
ratarmount large-archive.tar mountpoint
grep -r '<pattern>' mountpoint/
This is much faster than iterating over each file and piping it to grep separately, especially for compressed TARs. Here are benchmark results in seconds for a 55 MiB uncompressed and 42 MiB compressed TAR archive containing 40 files:
Compression
Ratarmount
Bash Loop over tar -O
none
0.31 +- 0.01
0.55 +- 0.02
gzip
1.1 +- 0.1
13.5 +- 0.1
bzip2
1.2 +- 0.1
97.8 +- 0.2
Of course, these results are highly dependent on the archive size and how many files the archive contains. These test examples are pretty small because I didn't want to wait too long. But, they already exemplify the problem well enough. The more files there are, the longer it takes for tar -O to jump to the correct file. And for compressed archives, it will be quadratically slower the larger the archive size is because everything before the requested file has to be decompressed and each file is requested separately. Both of these problems are solved by ratarmount.
This is the code for benchmarking:
function checkFilesWithRatarmount()
{
local pattern=$1
local archive=$2
ratarmount "$archive" "$archive.mountpoint"
'grep' -r -l "$pattern" "$archive.mountpoint/"
}
function checkEachFileViaStdOut()
{
local pattern=$1
local archive=$2
tar --list --file "$archive" | while read -r file; do
if tar -x --file "$archive" -O -- "$file" | grep -q "$pattern"; then
echo "Found pattern in: $file"
fi
done
}
function createSampleTar()
{
for i in $( seq 40 ); do
head -c $(( 1024 * 1024 )) /dev/urandom | base64 > $i.dat
done
tar -czf "$1" [0-9]*.dat
}
createSampleTar myarchive.tar.gz
time checkEachFileViaStdOut ABCD myarchive.tar.gz
time checkFilesWithRatarmount ABCD myarchive.tar.gz
sleep 0.5s
fusermount -u myarchive.tar.gz.mountpoint
In my case the tarballs have a lot of tiny files and I want to know what archived file inside the tarball matches. zgrep is fast (less than one second) but doesn't provide the info I want, and tar --to-command grep is much, much slower (many minutes)1.
So I went the other direction and had zgrep tell me the byte offsets of the matches in the tarball and put that together with the list of offsets in the tarball of all archived files to find the matching archived files.
#!/bin/bash
set -e
set -o pipefail
function tar_offsets() {
# Get the byte offsets of all the files in a given tarball
# based on https://stackoverflow.com/a/49865044/60422
[ $# -eq 1 ]
tar -tvf "$1" -R | awk '
BEGIN{
getline;
f=$8;
s=$5;
}
{
offset = int($2) * 512 - and((s+511), compl(512)+1)
print offset,s,f;
f=$8;
s=$5;
}'
}
function tar_byte_offsets_to_files() {
[ $# -eq 1 ]
# Convert the search results of a tarball with byte offsets
# to search results with archived file name and offset, using
# the provided tar_offsets output (single pass, suitable for
# process substitution)
offsets_file="$1"
prev_offset=0
prev_offset_filename=""
IFS=' ' read -r last_offset last_len last_offset_filename < "$offsets_file"
while IFS=':' read -r search_result_offset match_text
do
while [ $last_offset -lt $search_result_offset ]; do
prev_offset=$last_offset
prev_offset_filename="$last_offset_filename"
IFS=' ' read -r last_offset last_len last_offset_filename < "$offsets_file"
# offsets increasing safeguard
[ $prev_offset -le $last_offset ]
done
# now last offset is the first file strictly after search result offset so prev offset is
# the one at or before it, and must be the one it is in
result_file_offset=$(( $search_result_offset - $prev_offset ))
echo "$prev_offset_filename:$result_file_offset:$match_text"
done
}
# Putting it together e.g.
zgrep -a --byte-offset "your search here" some.tgz | tar_byte_offsets_to_files <(tar_offsets some.tgz)
1 I'm running this in Git for Windows' minimal MSYS2 fork unixy environment, so it's possible that the launch overhead of grep is much much higher than on any kind of real Unix machine and would make `tar --to-command grep` good enough there; benchmark solutions for your own needs and platform situation before selecting.

How to get full path of a file?

Is there an easy way I can print the full path of file.txt ?
file.txt = /nfs/an/disks/jj/home/dir/file.txt
The <command>
dir> <command> file.txt
should print
/nfs/an/disks/jj/home/dir/file.txt
Use readlink:
readlink -f file.txt
I suppose you are using Linux.
I found a utility called realpath in coreutils 8.15.
realpath -s file.txt
/data/ail_data/transformed_binaries/coreutils/test_folder_realpath/file.txt
Since the question is about how to get the full/absolute path of a file and not about how to get the target of symlinks, use -s or --no-symlinks which means don't expand symlinks.
As per #styrofoam-fly and #arch-standton comments, realpath alone doesn't check for file existence, to solve this add the e argument: realpath -e file
The following usually does the trick:
echo "$(cd "$(dirname "$1")" && pwd -P)/$(basename "$1")"
I know there's an easier way that this, but darned if I can find it...
jcomeau#intrepid:~$ python -c 'import os; print(os.path.abspath("cat.wav"))'
/home/jcomeau/cat.wav
jcomeau#intrepid:~$ ls $PWD/cat.wav
/home/jcomeau/cat.wav
On Windows:
Holding Shift and right clicking on a file in Windows Explorer gives you an option called Copy as Path.
This will copy the full path of the file to clipboard.
On Linux:
You can use the command realpath yourfile to get the full path of a file as suggested by others.
find $PWD -type f | grep "filename"
or
find $PWD -type f -name "*filename*"
If you are in the same directory as the file:
ls "`pwd`/file.txt"
Replace file.txt with your target filename.
I know that this is an old question now, but just to add to the information here:
The Linux command which can be used to find the filepath of a command file, i.e.
$ which ls
/bin/ls
There are some caveats to this; please see https://www.cyberciti.biz/faq/how-do-i-find-the-path-to-a-command-file/.
You could use the fpn (full path name) script:
% pwd
/Users/adamatan/bins/scripts/fpn
% ls
LICENSE README.md fpn.py
% fpn *
/Users/adamatan/bins/scripts/fpn/LICENSE
/Users/adamatan/bins/scripts/fpn/README.md
/Users/adamatan/bins/scripts/fpn/fpn.py
fpn is not a standard Linux package, but it's a free and open github project and you could set it up in a minute.
Works on Mac, Linux, *nix:
This will give you a quoted csv of all files in the current dir:
ls | xargs -I {} echo "$(pwd -P)/{}" | xargs | sed 's/ /","/g'
The output of this can be easily copied into a python list or any similar data structure.
echo $(cd $(dirname "$1") && pwd -P)/$(basename "$1")
This is explanation of what is going on at #ZeRemz's answer:
This script get relative path as argument "$1"
Then we get dirname part of that path (you can pass either dir or file to this script): dirname "$1"
Then we cd "$(dirname "$1") into this relative dir
&& pwd -P and get absolute path for it. -P option will avoid all symlinks
After that we append basename to absolute path: $(basename "$1")
As final step we echo it
You may use this function. If the file name is given without relative path, then it is assumed to be present in the current working directory:
abspath() { old=`pwd`;new=$(dirname "$1");if [ "$new" != "." ]; then cd $new; fi;file=`pwd`/$(basename "$1");cd $old;echo $file; }
Usage:
$ abspath file.txt
/I/am/in/present/dir/file.txt
Usage with relative path:
$ abspath ../../some/dir/some-file.txt
/I/am/in/some/dir/some-file.txt
With spaces in file name:
$ abspath "../../some/dir/another file.txt"
/I/am/in/some/dir/another file.txt
You can save this in your shell.rc or just put in console
function absolute_path { echo "$PWD/$1"; }
alias ap="absolute_path"
example:
ap somefile.txt
will output
/home/user/somefile.txt
I was surprised no one mentioned located.
If you have the locate package installed, you don't even need to be in the directory with the file of interest.
Say I am looking for the full pathname of a setenv.sh script. This is how to find it.
$ locate setenv.sh
/home/davis/progs/devpost_aws_disaster_response/python/setenv.sh
/home/davis/progs/devpost_aws_disaster_response/webapp/setenv.sh
/home/davis/progs/eb_testy/setenv.sh
Note, it finds three scripts in this case, but if I wanted just one I
would do this:
$ locate *testy*setenv.sh
/home/davis/progs/eb_testy/setenv.sh
This solution uses commands that exist on Ubuntu 22.04, but generally exist on most other Linux distributions, unless they are just to hardcore for s'mores.
The shortest way to get the full path of a file on Linux or Mac is to use the ls command and the PWD environment variable.
<0.o> touch afile
<0.o> pwd
/adir
<0.o> ls $PWD/afile
/adir/afile
You can do the same thing with a directory variable of your own, say d.
<0.o> touch afile
<0.o> d=/adir
<0.o> ls $d/afile
/adir/afile
Notice that without flags ls <FILE> and echo <FILE> are equivalent (for valid names of files in the current directory), so if you're using echo for that, you can use ls instead if you want.
If the situation is reversed, so that you have the full path and want the filename, just use the basename command.
<0.o> touch afile
<0.o> basename $PWD/afile
afile
In a similar scenario, I'm launching a cshell script from some other location. For setting the correct absolute path of the script so that it runs in the designated directory only, I'm using the following code:
set script_dir = `pwd`/`dirname $0`
$0 stores the exact string how the script was executed.
For e.g. if the script was launched like this: $> ../../test/test.csh,
$script_dir will contain /home/abc/sandbox/v1/../../test
For Mac OS X, I replaced the utilities that come with the operating system and replaced them with a newer version of coreutils. This allows you to access tools like readlink -f (for absolute path to files) and realpath (absolute path to directories) on your Mac.
The Homebrew version appends a 'G' (for GNU Tools) in front of the command name -- so the equivalents become greadlink -f FILE and grealpath DIRECTORY.
Instructions for how to install the coreutils/GNU Tools on Mac OS X through Homebrew can be found in this StackExchange arcticle.
NB: The readlink -f and realpath commands should work out of the box for non-Mac Unix users.
I like many of the answers already given, but I have found this really useful, especially within a script to get the full path of a file, including following symlinks and relative references such as . and ..
dirname `readlink -e relative/path/to/file`
Which will return the full path of the file from the root path onwards.
This can be used in a script so that the script knows which path it is running from, which is useful in a repository clone which could be located anywhere on a machine.
basePath=`dirname \`readlink -e $0\``
I can then use the ${basePath} variable in my scripts to directly reference other scripts.
Hope this helps,
Dave
This worked pretty well for me. It doesn't rely on the file system (a pro/con depending on need) so it'll be fast; and, it should be portable to most any *NIX. It does assume the passed string is indeed relative to the PWD and not some other directory.
function abspath () {
echo $1 | awk '\
# Root parent directory refs to the PWD for replacement below
/^\.\.\// { sub("^", "./") } \
# Replace the symbolic PWD refs with the absolute PWD \
/^\.\// { sub("^\.", ENVIRON["PWD"])} \
# Print absolute paths \
/^\// {print} \'
}
This is naive, but I had to make it to be POSIX compliant. Requires permission to cd into the file's directory.
#!/bin/sh
if [ ${#} = 0 ]; then
echo "Error: 0 args. need 1" >&2
exit 1
fi
if [ -d ${1} ]; then
# Directory
base=$( cd ${1}; echo ${PWD##*/} )
dir=$( cd ${1}; echo ${PWD%${base}} )
if [ ${dir} = / ]; then
parentPath=${dir}
else
parentPath=${dir%/}
fi
if [ -z ${base} ] || [ -z ${parentPath} ]; then
if [ -n ${1} ]; then
fullPath=$( cd ${1}; echo ${PWD} )
else
echo "Error: unsupported scenario 1" >&2
exit 1
fi
fi
elif [ ${1%/*} = ${1} ]; then
if [ -f ./${1} ]; then
# File in current directory
base=$( echo ${1##*/} )
parentPath=$( echo ${PWD} )
else
echo "Error: unsupported scenario 2" >&2
exit 1
fi
elif [ -f ${1} ] && [ -d ${1%/*} ]; then
# File in directory
base=$( echo ${1##*/} )
parentPath=$( cd ${1%/*}; echo ${PWD} )
else
echo "Error: not file or directory" >&2
exit 1
fi
if [ ${parentPath} = / ]; then
fullPath=${fullPath:-${parentPath}${base}}
fi
fullPath=${fullPath:-${parentPath}/${base}}
if [ ! -e ${fullPath} ]; then
echo "Error: does not exist" >&2
exit 1
fi
echo ${fullPath}
This works with both Linux and Mac OSX:
echo $(pwd)$/$(ls file.txt)
find / -samefile file.txt -print
Will find all the links to the file with the same inode number as file.txt
adding a -xdev flag will avoid find to cross device boundaries ("mount points"). (But this will probably cause nothing to be found if the find does not start at a directory on the same device as file.txt)
Do note that find can report multiple paths for a single filesystem object, because an Inode can be linked by more than one directory entry, possibly even using different names. For instance:
find /bin -samefile /bin/gunzip -ls
Will output:
12845178 4 -rwxr-xr-x 2 root root 2251 feb 9 2012 /bin/uncompress
12845178 4 -rwxr-xr-x 2 root root 2251 feb 9 2012 /bin/gunzip
Usually:
find `pwd` | grep <filename>
Alternatively, just for the current folder:
find `pwd` -maxdepth 1 | grep <filename>
This will work for both file and folder:
getAbsolutePath(){
[[ -d $1 ]] && { cd "$1"; echo "$(pwd -P)"; } ||
{ cd "$(dirname "$1")" || exit 1; echo "$(pwd -P)/$(basename "$1")"; }
}
Another Linux utility, that does this job:
fname <file>
For Mac OS, if you just want to get the path of a file in the finder, control click the file, and scroll down to "Services" at the bottom. You get many choices, including "copy path" and "copy full path". Clicking on one of these puts the path on the clipboard.
fp () {
PHYS_DIR=`pwd -P`
RESULT=$PHYS_DIR/$1
echo $RESULT | pbcopy
echo $RESULT
}
Copies the text to your clipboard and displays the text on the terminal window.
:)
(I copied some of the code from another stack overflow answer but cannot find that answer anymore)
In Mac OSX, do the following steps:
cd into the directory of the target file.
Type either of the following terminal commands.
Terminal
ls "`pwd`/file.txt"
echo $(pwd)/file.txt
Replace file.txt with your actual file name.
Press Enter
you#you:~/test$ ls
file
you#you:~/test$ path="`pwd`/`ls`"
you#you:~/test$ echo $path
/home/you/test/file
Beside "readlink -f" , another commonly used command:
$find /the/long/path/but/I/can/use/TAB/to/auto/it/to/ -name myfile
/the/long/path/but/I/can/use/TAB/to/auto/it/to/myfile
$
This also give the full path and file name at console
Off-topic: This method just gives relative links, not absolute. The readlink -f command is the right one.

Resources