How to extract the directory from full file path

How to extract the directory from full file path - linux

I have the following script which prints various file stats, which was kindly supplied by another user (choroba) (link).
Is there a way that this can be amended to report just the directory name of each file and not the full file path with the file name? I have tried changing filepath with dirname and I get a series of errors saying No such file or directory. Thanks for any advice.
#!/bin/bash
set -eu
filepath=$1
qfilepath=${filepath//\\/\\\\} # Quote backslashes.
qfilepath=${qfilepath//\"/\\\"} # Quote doublequotes.
file=${qfilepath##*/} # Remove the path.
stats=($(stat -c "%s %W %Y" "$filepath"))
size=${stats[0]}
ctime=$(date --date #"${stats[1]}" +'%d/%m/%Y %H:%M:%S')
mtime=$(date --date #"${stats[2]}" +'%d/%m/%Y %H:%M:%S')
md5=$(md5sum < "$filepath")
md5=${md5%% *} # Remove the dash.
printf '"%s","%s",%s,%s,%s,%s\n' \
"$file" "$qfilepath" "$size" "$ctime" "$mtime" $md5

You can use a combination of dirname and basename, where:
dirname will strip the last component from the full path;
basename will get the last component from the path.
So to summarize: $(basename $(dirname $qfilepath)) will give you the name of the last directory in the path.
Or, for the full path without the file name - just $(dirname $qfilepath).

Although I do not see anything in the script snippet which would produce a recursive list, you can get the directory name output with adding dir=$(dirname $filepath) and modifying your printf output to use $dir instead of $file.

Related

How do I create a recursive file list with md5sum in Linux and output to csv

I would like to list the files (ideally with an md5sum) within a directory and subdirectories in Ubuntu and output the results to a csv file. I would like the output to be in the following format.
File Name, File Path, File Size (bytes), Created Date Time (dd/mm/yyyy hh:mm:ss), Modified Date Time (dd/mm/yyyy hh:mm:ss), md5sum
I have played around with the ls command but can seem to get the output correct. Is there a better way to do this?
Thanks

Create the following script that outputs a CSV line for a given filepath argument:
#!/bin/bash
set -eu
filepath=$1
qfilepath=${filepath//\\/\\\\} # Quote backslashes.
qfilepath=${qfilepath//\"/\\\"} # Quote doublequotes.
file=${qfilepath##*/} # Remove the path.
stats=($(stat -c "%s %W %Y" "$filepath"))
size=${stats[0]}
ctime=$(date --date #"${stats[1]}" +'%d/%m/%Y %H:%M:%S')
mtime=$(date --date #"${stats[2]}" +'%d/%m/%Y %H:%M:%S')
md5=$(md5sum < "$filepath")
md5=${md5%% *} # Remove the dash.
printf '"%s","%s",%s,%s,%s,%s\n' \
"$file" "$qfilepath" "$size" "$ctime" "$mtime" $md5
Now call it with
find /path/to/dir -type f -exec ~/csvline.sh {} \;
Note that the creation time is often not supported by the file system.

Linux: batch filename change adding creation date

i have a directory with a lot of sub-directories including files.
For each WAV file i would like to rename WAV file by adding creation date (date when file WAV has been firstly created) at the beginning of the file (without changing timestamps of file itself).
Next step would be to convert the WAV file to MP3 file, so i will save hard drive space.
for that purpose, i'm trying to create a bash script but i'm having some issues.
I want to keep the same structure as original directory and therefore i was thinking of something like:
for file in `ls -1 *.wav`
do name=`stat -c %y $file | awk -F"." '{ print $1 }' | sed -e "s/\-//g" -e "s/\://g" -e "s/[ ]/_/g"`.wav
cp -r --preserve=timestampcp $dir_original/$file $dir_converted/$name
done

Don't use ls to generate a list of file names, just let the shell glob them (that's what ls *.wav does anyway):
for file in ./*.wav ; do
I think you want the timestamp in the format YYYYMMDD_HHMMSS ?
You could use GNU date with stat to have a somewhat neater control of the output format:
epochtime=$(stat -c %Y "$file" )
name=$(date -d "#$epochtime" +%Y%m%d_%H%M%S).wav
stat -c %Y (or %y) gives the last modification date, but you can't really get the date of the file creation on Linux systems.
That cp looks ok, except for the stray cp at the end of timestampcp, but that must be a typo. If you do *.wav, the file names will be relative to current directory anyway, so no need to prefix with $dir_original/.
If you want to walk through a whole subdirectory, use Bash's globstar feature, or find. Something like this:
shopt -s globstar
cd "$sourcedir"
for file in ./**/*.wav ; do
epochtime=$(stat -c %Y "$file" )
name=$(date -d "#$epochtime" +%Y%m%d_%H%M%S).wav
dir=$(dirname "$file")
mkdir -p "$target/$dir"
cp -r --preserve=timestamp "$file" "$target/$dir/$name"
done
The slight inconvenience here is that cp can't create the directories in the path, so we need to use mkdir there. Also, I'm not sure if you wanted to keep the original filename as part of the resulting one, this would remove it and just replace the file names with the timestamp.

I did some experimenting with the calculation of name to see if I could get it more succinctly, and came up with this:
name=$(date "+%Y%m%d_%H%M%S" -r "$file")

I wanted to append all file names in that folder with the date they were created , and below works perfectly.
#############################
#!/bin/sh
for file in `ls *.JPG`;
do
mv -f "$file" "$(date -r "$file" +"%Y%m%d_%H_%M_%S")_"$file".jpg"
done
##############################

Delete files in one directory that do not exist in another directory or its child directories

I am still a newbie in shell scripting and trying to come up with a simple code. Could anyone give me some direction here. Here is what I need.
Files in path 1: /tmp
100abcd
200efgh
300ijkl
Files in path2: /home/storage
backupfile_100abcd_str1
backupfile_100abcd_str2
backupfile_200efgh_str1
backupfile_200efgh_str2
backupfile_200efgh_str3
Now I need to delete file 300ijkl in /tmp as the corresponding backup file is not present in /home/storage. The /tmp file contains more than 300 files. I need to delete the files in /tmp for which the corresponding backup files are not present and the file names in /tmp will match file names in /home/storage or directories under /home/storage.
Appreciate your time and response.

You can also approach the deletion using grep as well. You can loop though the files in /tmp checking with ls piped to grep, and deleting if there is not a match:
#!/bin/bash
[ -z "$1" -o -z "$2" ] && { ## validate input
printf "error: insufficient input. Usage: %s tmpfiles storage\n" ${0//*\//}
exit 1
}
for i in "$1"/*; do
fn=${i##*/} ## strip path, leaving filename only
## if file in backup matches filename, skip rest of loop
ls "${2}"* | grep -q "$fn" &>/dev/null && continue
printf "removing %s\n" "$i"
# rm "$i" ## remove file
done
Note: the actual removal is commented out above, test and insure there are no unintended consequences before preforming the actual delete. Call it passing the path to tmp (without trailing /) as the first argument and with /home/storage as the second argument:
$ bash scriptname /path/to/tmp /home/storage

You can solve this by
making a list of the files in /home/storage
testing each filename in /tmp to see if it is in the list from /home/storage
Given the linux+shell tags, one might use bash:
make the list of files from /home/storage an associative array
make the subscript of the array the filename
Here is a sample script to illustrate ($1 and $2 are the parameters to pass to the script, i.e., /home/storage and /tmp):
#!/bin/bash
declare -A InTarget
while read path
do
name=${path##*/}
InTarget[$name]=$path
done < <(find $1 -type f)
while read path
do
name=${path##*/}
[[ -z ${InTarget[$name]} ]] && rm -f $path
done < <(find $2 -type f)
It uses two interesting shell features:
name=${path##*/} is a POSIX shell feature which allows the script to perform the basename function without an extra process (per filename). That makes the script faster.
done < <(find $2 -type f) is a bash feature which lets the script read the list of filenames from find without making the assignments to the array run in a subprocess. Here the reason for using the feature is that if the array is updated in a subprocess, it would have no effect on the array value in the script which is passed to the second loop.
For related discussion:
Extract File Basename Without Path and Extension in Bash
Bash Script: While-Loop Subshell Dilemma

I spent some really nice time on this today because I needed to delete files which have same name but different extensions, so if anyone is looking for a quick implementation, here you go:
#!/bin/bash
# We need some reference to files which we want to keep and not delete,
 # let's assume you want to keep files in first folder with jpeg, so you
# need to map it into the desired file extension first.
FILES_TO_KEEP=`ls -1 ${2} | sed 's/\.pdf$/.jpeg/g'`
#iterate through files in first argument path
for file in ${1}/*; do
# In my case, I did not want to do anything with directories, so let's continue cycle when hitting one.
if [[ -d $file ]]; then
continue
fi
# let's omit path from the iterated file with baseline so we can compare it to the files we want to keep
NAME_WITHOUT_PATH=`basename $file`
 # I use mac which is equal to having poor quality clts
# when it comes to operating with strings,
# this should be safe check to see if FILES_TO_KEEP contain NAME_WITHOUT_PATH
if [[ $FILES_TO_KEEP == *"$NAME_WITHOUT_PATH"* ]];then
echo "Not deleting: $NAME_WITHOUT_PATH"
else
# If it does not contain file from the other directory, remove it.
echo "deleting: $NAME_WITHOUT_PATH"
rm -rf $file
fi
done
Usage: sh deleteDifferentFiles.sh path/from/where path/source/of/truth

How to remove the extension of a file?

I have a folder that is full of .bak files and some other files also. I need to remove the extension of all .bak files in that folder. How do I make a command which will accept a folder name and then remove the extension of all .bak files in that folder ?
Thanks.

To remove a string from the end of a BASH variable, use the ${var%ending} syntax. It's one of a number of string manipulations available to you in BASH.
Use it like this:
# Run in the same directory as the files
for FILENAME in *.bak; do mv "$FILENAME" "${FILENAME%.bak}"; done
That works nicely as a one-liner, but you could also wrap it as a script to work in an arbitrary directory:
# If we're passed a parameter, cd into that directory. Otherwise, do nothing.
if [ -n "$1" ]; then
cd "$1"
fi
for FILENAME in *.bak; do mv "$FILENAME" "${FILENAME%.bak}"; done
Note that while quoting your variables is almost always a good practice, the for FILENAME in *.bak is still dangerous if any of your filenames might contain spaces. Read David W.'s answer for a more-robust solution, and this document for alternative solutions.

There are several ways to remove file suffixes:
In BASH and Kornshell, you can use the environment variable filtering. Search for ${parameter%word} in the BASH manpage for complete information. Basically, # is a left filter and % is a right filter. You can remember this because # is to the left of %.
If you use a double filter (i.e. ## or %%, you are trying to filter on the biggest match. If you have a single filter (i.e. # or %, you are trying to filter on the smallest match.
What matches is filtered out and you get the rest of the string:
file="this/is/my/file/name.txt"
echo ${file#*/} #Matches is "this/` and will print out "is/my/file/name.txt"
echo ${file##*/} #Matches "this/is/my/file/" and will print out "name.txt"
echo ${file%/*} #Matches "/name.txt" and will print out "/this/is/my/file"
echo ${file%%/*} #Matches "/is/my/file/name.txt" and will print out "this"
Notice this is a glob match and not a regular expression match!. If you want to remove a file suffix:
file_sans_ext=${file%.*}
The .* will match on the period and all characters after it. Since it is a single %, it will match on the smallest glob on the right side of the string. If the filter can't match anything, it the same as your original string.
You can verify a file suffix with something like this:
if [ "${file}" != "${file%.bak}" ]
then
echo "$file is a type '.bak' file"
else
echo "$file is not a type '.bak' file"
fi
Or you could do this:
file_suffix=$(file##*.}
echo "My file is a file '.$file_suffix'"
Note that this will remove the period of the file extension.
Next, we will loop:
find . -name "*.bak" -print0 | while read -d $'\0' file
do
echo "mv '$file' '${file%.bak}'"
done | tee find.out
The find command finds the files you specify. The -print0 separates out the names of the files with a NUL symbol -- which is one of the few characters not allowed in a file name. The -d $\0means that your input separators are NUL symbols. See how nicely thefind -print0andread -d $'\0'` together?
You should almost never use the for file in $(*.bak) method. This will fail if the files have any white space in the name.
Notice that this command doesn't actually move any files. Instead, it produces a find.out file with a list of all the file renames. You should always do something like this when you do commands that operate on massive amounts of files just to be sure everything is fine.
Once you've determined that all the commands in find.out are correct, you can run it like a shell script:
$ bash find.out

rename .bak '' *.bak
(rename is in the util-linux package)

Caveat: there is no error checking:
#!/bin/bash
cd "$1"
for i in *.bak ; do mv -f "$i" "${i%%.bak}" ; done

You can always use the find command to get all the subdirectories
for FILENAME in `find . -name "*.bak"`; do mv --force "$FILENAME" "${FILENAME%.bak}"; done

linux releative path to fullpath file name

I need a way of getting the fullpath name of a file on a linux shell script.
The full path may already be supplied or a relative file may be supplied.
afile.txt
/home/me/bfile.txt
to
/home/me/afile.txt
/home/me/bfile.txt
any ideas?

Use readlink(1).
readlink -f afile

Quick hack:
get_fn()
{
echo $(cd $(dirname $1); pwd)/$(basename $1)
}
But it can be costly.

If the directory will be the same, you can list the files in that directory in this way:
DIRECTORY=/some/directory
FILE_NAME="my-file-list"
for i in `ls -1 $DIRECTORY`
do
echo $i >> $FILE_NAME
done
Otherwise, you would use the FIND command in the How can I list files with their absolute path in linux?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string