How can I delete directories based on their numeric name value with a shell script? - linux

I have a directory that contains numerically named subdirectories ( eg. 1, 2, 3, 32000, 43546 ). I need to delete all directories over a certain number. For example, I need to delete all subdirectories that have a name that is numerically larger than 14234. Can this be done with a single command line action?
rm -r /directory/subdirectories_over_14234 ( how can I do this? )

In bash, I'd write
for dir in *; do [[ -d $dir ]] && (( dir > 14234 )) && echo rm -r $dir; done
Remove the echo at your discretion.

Well you can do a bash for loop instruction so as to iterate over the directory filename and use the test command then after extracting the target number of the file name.
Should be something like this :
for $file in /your/path
do
#extract number here with any text processing command (ed ?)
if test [$name -leq your_value]
then
rm -R $file
fi
done

You don't mention which shell you're using. I'm using Zsh and it has a very cool feature: it can select files based on numbers just like you want! So you can do
$ rm -r /directory/<14234->(/)
to select all the subdirectories of /directory with a numeric value over 14234.
In general, you use
<a-b>
to select paths with a numeric values between a and b. You append a (/) to only match directories. Use (.) to only match files. The glob patterns in Zsh are very powerful and can mostly (if not always) replace the good old find command.

Related

How to concat "/" delimiter to string?

Task: concatinate array of string with delimiter, dilimeter is "/".
Metatask: i've a folder with many files. Need to copy them into another folder.
So i need to get "name of file" and "path to folder".
What's wrong: delimiter "/" works incorrectly. It doesn't concatinate with my strings. If i try to use "\/" - string disappeare at all.
What's going on?
loc_path='./test/*'
delim='\/'
for itt in $loc_path; do
IFS=$delim
read -ra res <<< "$itt"
str=''
for ((i = 1; i \<= ${#res[#]}; i++)); do
#str=($str${res[$i]}$delim)
str="$str${res[$i]}$delim"
done
echo $str
done
Please, give to two answers:
how to solve task-problem
better way to solve metatask
There is an issue in delim='\/'. Firstly, you need not to protect slash. Secondly all characters are already protected between simple quotes.
There is a syntax issue with your concatenation. You must not use parenthesis here! They can be used to open a sub shell. We need not that.
To solve your 'meta-task', you should avoid to use IFS, or read. They are complex to use (for example by modifying IFS globally as you do, you change how echo display the res array. It can mislead you while you troubleshoot...) I suggest you use more simple tool like: basename, etc.
Here few scripts to solve your meta (scholar?) task:
# one line :-)
cp src/* dst/
# to illustrate basename etc
for file in "$SRC/"*; do
dest="$DST/$(basename $file)"
cp "$file" "$dest"
done
# with a change of directory
cd "$SRC"
for file in *; do cp "$file" "$DST/$file"; done
cd -
# Change of directory and a sub shell
(cd "$SRC" ; for file in *; do cp "$file" "$DST/$file"; done)
Task solution:
arr=( string1 string2 string3 ) # array of strings
str=$( IFS='/'; printf '%s' "${arr[*]}" ) # concatenated with / as delimiter
$str will be the single string string1/string2/string3.
Meta task solution:
Few files:
cp path/to/source/folder/* path/to/dest/folder
Note that * matches any type of file and that it does not match hidden names. For hidden names, use shopt -s dotglob in bash. This will fail if there are thousands of files (argument list too long).
Few or many files files, only non-directories:
for pathaname in path/to/source/folder/*; do
[ ! -type d "$pathame" ] && cp "$pathname" path/to/dest/folder
done
or, with find,
find path/to/source/folder -maxdepth 1 ! -type d -exec cp {} path/to/dest/folder \;
The difference between these two is that the shell loop will refuse to copy symbolic links that resolve to directories, while the find command will copy them.

How do I move files to specific directories based on a pattern in the filename?

If any of this isn't particularly clear, please let me know and I'll do my best to clarify.
I basically need to sort a set of files with various extensions and similar patterns to the filename, into directories and subdirectories that match the pattern and type of extension.
To elaborate a bit:
All files, regardless of extension, begin with the pattern "zz####" where #### is a number from 1 to 900; "zz1.zip through zz950.zip, zz1.mov through zz950.mov, zz1.mp4 through zz950.mp4"
Some files contain additional characters; "zz360_hello_world.zip"
Some files contain spaces; "zz370_hello world.zip"
I need these files to be sorted and moved into directories and subdirectories following a particular format: "/home/hello/zz1/zip, /home/hello/zz1/vid"
If the directories and/or subdirectories don't exist, I need them created.
Example:
zz400_testing.zip ----> /home/hello/zz400/zip
zz400 testing video.mov ----> /home/hello/zz400/vid
zz500.zip ----> /home/hello/zz500/zip
zz500_testing another video.mp4 ----> /home/hello/zz500/vid
I found a few answers around here for simpler use-cases, but wasn't able to get anything working for my particular needs.
Any help at all would be much appreciated.
Thank you!
EDIT: Adding the code I've been messing with
for f in *.zip; do
set=`echo "$f"|sed 's/[0-9].*//'`
dir="/home/demo/$set/photos"
mkdir -p "$dir"
mv "$f" "$dir"
done
I think I'm just having trouble wrapping my head around how to match with regex. I've got this far with it:
[demo#alpha grep]$ echo zz433.zip|sed 's/[0-9].*//'
zz
The script will run the mkdir, and even move the zip files into their proper place. I just can't get it to create the proper top-level directory (zz433).
The sed command here doesn't do what you're trying to achieve:
set=`echo "$f"|sed 's/[0-9].*//'`
The meaning of the regular expression [0-9].* is "a digit followed by anything".
The s/// command of sed performs a replacement.
The result is effectively removing everything from the input starting from the first digit.
So for "zz360_hello_world.zip" it removes everything starting from "3",
leaving only "zz".
Note also that to match the files, the pattern *.zip doesn't match your description. You're looking for files starting with "zz" and a number from 1 up to 900. If you don't mind including numbers > 900 then you can write the loop expression like this:
for f in zz[0-9][^0-9]* zz[0-9][0-9][^0-9]* zz[0-9][0-9][0-9][^0-9]*; do
Or the same thing more compactly:
for f in zz{[0-9],[0-9][0-9],[0-9][0-9][0-9]}[^0-9]*; do
These are glob patterns.
zz[0-9][^0-9]* means "start with 'zz', followed by a digit, followed by a non-digit, followed by anything".
In the above example I use three patterns to cover the cases of "zz" followed by 1, 2 or 3 digits, followed by a non-digit.
The second example is a more compact form of the first,
the idea is that a{b,c}d expands to abd and acd.
Next, to get the appropriate prefix, you could use pattern matching with a case statement and extract substrings.
The syntax of these patterns is the same glob syntax as in the previous example in the for statement.
case "$f" in
zz[0-9][0-9][0-9]*) prefix=${f:0:5} ;;
zz[0-9][0-9]*) prefix=${f:0:4} ;;
zz[0-9]*) prefix=${f:0:3} ;;
esac
It seems you also want to create grouping by file type. You could get the file extension by chopping off the beginning of the name until the dot with ext=${f##*.}, and then use a case statement as in the earlier example to map extensions to the desired directory names.
Putting the above together:
for f in zz{[0-9],[0-9][0-9],[0-9][0-9][0-9]}[^0-9]*; do
case "$f" in
zz[0-9][0-9][0-9]*) prefix=${f:0:5} ;;
zz[0-9][0-9]*) prefix=${f:0:4} ;;
zz[0-9]*) prefix=${f:0:3} ;;
esac
ext=${f##*.}
case "$ext" in
mov|mp4) group=vid ;;
*) group=$ext ;;
esac
dir="/home/demo/$prefix/$group"
mkdir -p "$dir"
mv "$f" "$dir"
done
I've answered part of my own question!
for f in *.zip; do
set=`echo "$f"|grep -o -P 'zz[0-9]+.{0,0}'`
dir="/home/demo/$set/photos"
mkdir -p "$dir"
mv "$f" "$dir"
done
Basically, the following script will grab files like:
zz232.zip
zz233test.zip
zz234 test.zip
Then it will create the top-level directory (zz####), the photos sub-directory, and move the file into place:
/home/demo/zz232/photos/zz232.zip
/home/demo/zz233/photos/zz233test.zip
/home/demo/zz234/photos/zz234 test.zip
Moving on to expanding the script for additional functionality.
Thanks all!
How about:
#!/bin/bash
IFS=$'\n'
for file in *; do
if [[ $file =~ ^(zz[0-9]+).*\.(zip|mov|mp4)$ ]]; then
ext=${BASH_REMATCH[2]}
if [ $ext = "mov" -o $ext = "mp4" ]; then
ext="vid"
fi
dir="/home/hello/${BASH_REMATCH[1]}/$ext"
mkdir -p $dir
mv "$file" $dir
fi
done
Hope this helps.

Delete files in one directory that do not exist in another directory or its child directories

I am still a newbie in shell scripting and trying to come up with a simple code. Could anyone give me some direction here. Here is what I need.
Files in path 1: /tmp
100abcd
200efgh
300ijkl
Files in path2: /home/storage
backupfile_100abcd_str1
backupfile_100abcd_str2
backupfile_200efgh_str1
backupfile_200efgh_str2
backupfile_200efgh_str3
Now I need to delete file 300ijkl in /tmp as the corresponding backup file is not present in /home/storage. The /tmp file contains more than 300 files. I need to delete the files in /tmp for which the corresponding backup files are not present and the file names in /tmp will match file names in /home/storage or directories under /home/storage.
Appreciate your time and response.
You can also approach the deletion using grep as well. You can loop though the files in /tmp checking with ls piped to grep, and deleting if there is not a match:
#!/bin/bash
[ -z "$1" -o -z "$2" ] && { ## validate input
printf "error: insufficient input. Usage: %s tmpfiles storage\n" ${0//*\//}
exit 1
}
for i in "$1"/*; do
fn=${i##*/} ## strip path, leaving filename only
## if file in backup matches filename, skip rest of loop
ls "${2}"* | grep -q "$fn" &>/dev/null && continue
printf "removing %s\n" "$i"
# rm "$i" ## remove file
done
Note: the actual removal is commented out above, test and insure there are no unintended consequences before preforming the actual delete. Call it passing the path to tmp (without trailing /) as the first argument and with /home/storage as the second argument:
$ bash scriptname /path/to/tmp /home/storage
You can solve this by
making a list of the files in /home/storage
testing each filename in /tmp to see if it is in the list from /home/storage
Given the linux+shell tags, one might use bash:
make the list of files from /home/storage an associative array
make the subscript of the array the filename
Here is a sample script to illustrate ($1 and $2 are the parameters to pass to the script, i.e., /home/storage and /tmp):
#!/bin/bash
declare -A InTarget
while read path
do
name=${path##*/}
InTarget[$name]=$path
done < <(find $1 -type f)
while read path
do
name=${path##*/}
[[ -z ${InTarget[$name]} ]] && rm -f $path
done < <(find $2 -type f)
It uses two interesting shell features:
name=${path##*/} is a POSIX shell feature which allows the script to perform the basename function without an extra process (per filename). That makes the script faster.
done < <(find $2 -type f) is a bash feature which lets the script read the list of filenames from find without making the assignments to the array run in a subprocess. Here the reason for using the feature is that if the array is updated in a subprocess, it would have no effect on the array value in the script which is passed to the second loop.
For related discussion:
Extract File Basename Without Path and Extension in Bash
Bash Script: While-Loop Subshell Dilemma
I spent some really nice time on this today because I needed to delete files which have same name but different extensions, so if anyone is looking for a quick implementation, here you go:
#!/bin/bash
# We need some reference to files which we want to keep and not delete,
 # let's assume you want to keep files in first folder with jpeg, so you
# need to map it into the desired file extension first.
FILES_TO_KEEP=`ls -1 ${2} | sed 's/\.pdf$/.jpeg/g'`
#iterate through files in first argument path
for file in ${1}/*; do
# In my case, I did not want to do anything with directories, so let's continue cycle when hitting one.
if [[ -d $file ]]; then
continue
fi
# let's omit path from the iterated file with baseline so we can compare it to the files we want to keep
NAME_WITHOUT_PATH=`basename $file`
 # I use mac which is equal to having poor quality clts
# when it comes to operating with strings,
# this should be safe check to see if FILES_TO_KEEP contain NAME_WITHOUT_PATH
if [[ $FILES_TO_KEEP == *"$NAME_WITHOUT_PATH"* ]];then
echo "Not deleting: $NAME_WITHOUT_PATH"
else
# If it does not contain file from the other directory, remove it.
echo "deleting: $NAME_WITHOUT_PATH"
rm -rf $file
fi
done
Usage: sh deleteDifferentFiles.sh path/from/where path/source/of/truth

Find and delete files that contain same string in filename in linux terminal

I want to delete all files from a folder that contain a not unique numerical string in the filename using linux terminal. E.g.:
werrt-110009.jpg => delete
asfff-110009.JPG => delete
asffa-123489.jpg => maintain
asffa-111122.JPG => maintain
Any suggestions?
I only now understand your question, I think. You want to remove all files that contain a numeric value that is not unique (in a particular folder). If a filename contains a value that is also found in another filename, you want to remove both files, right?
This is how I would do that (it may not be the fastest way):
# put all files in your folder in a list
# for array=(*) to work make sure you have enabled nullglob: shopt -s nullglob
array=(*)
delete=()
for elem in "${array[#]}"; do
# for each elem in your list extract the number
num_regex='([0-9]+)\.'
[[ "$elem" =~ $num_regex ]]
num="${BASH_REMATCH[1]}"
# use the extracted number to check if it is unique
dup_regex="[^0-9]($num)\..+?(\1)"
# if it is not unique, put the file in the files-to-delete list
if [[ "${array[#]}" =~ $dup_regex ]]; then
delete+=("$elem")
fi
done
# delete all found duplicates
for elem in "${delete[#]}"; do
rm "$elem"
done
In your example, array would be:
array=(werrt-110009.jpg asfff-110009.JPG asffa-123489.jpg asffa-111122.JPG)
And the result in delete would be:
delete=(werrt-110009.jpg asfff-110009.JPG)
Is this what you meant?
you can use the linux find command along with the -regex parameter and the -delete parameter
to do it in one command
Use "rm" command to delete all matching string files in directory
cd <path-to-directory>/ && rm *110009*
This command helps to delete all files with matching string and it doesn't depend on the position of string in file name.
I was mentioned rm command option as another option to delete files with matching string.
Below is the complete script to achieve your requirement,
#!/bin/sh -eu
#provide the destination fodler path
DEST_FOLDER_PATH="$1"
TEMP_BUILD_DIR="/tmp/$( date +%Y%m%d-%H%M%S)_clenup_duplicate_files"
#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
clean_up()
{
if [ -d $TEMP_BUILD_DIR ]; then
rm -rf $TEMP_BUILD_DIR
fi
}
trap clean_up EXIT
[ ! -d $TEMP_BUILD_DIR ] && mkdir -p $TEMP_BUILD_DIR
TEMP_FILES_LIST_FILE="$TEMP_BUILD_DIR/folder_file_names.txt"
echo "$(ls $DEST_FOLDER_PATH)" > $TEMP_FILES_LIST_FILE
while read filename
do
#check files with number pattern
if [[ "$filename" =~ '([0-9]+)\.' ]]; then
#fetch the number to find files with similar number
matching_string="${BASH_REMATCH[1]}"
# use the extracted number to check if it is unique
#find the files count with matching_string
if [ $(ls -1 $DEST_FOLDER_PATH/*$matching_string* | wc -l) -gt 1 ]; then
rm $DEST_FOLDER_PATH/*$matching_string*
fi
fi
#reload remaining files in folder (this optimizes the loop and speeds up the operation
#(this helps lot when folder contains more files))
echo "$(ls $DEST_FOLDER_PATH)" > $TEMP_FILES_LIST_FILE
done < $TEMP_FILES_LIST_FILE
exit 0
How to execute this script,
Save this script into file as
path-to-script/delete_duplicate_files.sh (you can rename whatever
you want)
Make script executable
chmod +x {path-to-script}/delete_duplicate_files.sh
Execute script by providing directory path where duplicate
files(files with matching number pattern) needs to be deleted
{path-to-script}/delete_duplicate_files.sh "{path-to-directory}"

How to remove the extension of a file?

I have a folder that is full of .bak files and some other files also. I need to remove the extension of all .bak files in that folder. How do I make a command which will accept a folder name and then remove the extension of all .bak files in that folder ?
Thanks.
To remove a string from the end of a BASH variable, use the ${var%ending} syntax. It's one of a number of string manipulations available to you in BASH.
Use it like this:
# Run in the same directory as the files
for FILENAME in *.bak; do mv "$FILENAME" "${FILENAME%.bak}"; done
That works nicely as a one-liner, but you could also wrap it as a script to work in an arbitrary directory:
# If we're passed a parameter, cd into that directory. Otherwise, do nothing.
if [ -n "$1" ]; then
cd "$1"
fi
for FILENAME in *.bak; do mv "$FILENAME" "${FILENAME%.bak}"; done
Note that while quoting your variables is almost always a good practice, the for FILENAME in *.bak is still dangerous if any of your filenames might contain spaces. Read David W.'s answer for a more-robust solution, and this document for alternative solutions.
There are several ways to remove file suffixes:
In BASH and Kornshell, you can use the environment variable filtering. Search for ${parameter%word} in the BASH manpage for complete information. Basically, # is a left filter and % is a right filter. You can remember this because # is to the left of %.
If you use a double filter (i.e. ## or %%, you are trying to filter on the biggest match. If you have a single filter (i.e. # or %, you are trying to filter on the smallest match.
What matches is filtered out and you get the rest of the string:
file="this/is/my/file/name.txt"
echo ${file#*/} #Matches is "this/` and will print out "is/my/file/name.txt"
echo ${file##*/} #Matches "this/is/my/file/" and will print out "name.txt"
echo ${file%/*} #Matches "/name.txt" and will print out "/this/is/my/file"
echo ${file%%/*} #Matches "/is/my/file/name.txt" and will print out "this"
Notice this is a glob match and not a regular expression match!. If you want to remove a file suffix:
file_sans_ext=${file%.*}
The .* will match on the period and all characters after it. Since it is a single %, it will match on the smallest glob on the right side of the string. If the filter can't match anything, it the same as your original string.
You can verify a file suffix with something like this:
if [ "${file}" != "${file%.bak}" ]
then
echo "$file is a type '.bak' file"
else
echo "$file is not a type '.bak' file"
fi
Or you could do this:
file_suffix=$(file##*.}
echo "My file is a file '.$file_suffix'"
Note that this will remove the period of the file extension.
Next, we will loop:
find . -name "*.bak" -print0 | while read -d $'\0' file
do
echo "mv '$file' '${file%.bak}'"
done | tee find.out
The find command finds the files you specify. The -print0 separates out the names of the files with a NUL symbol -- which is one of the few characters not allowed in a file name. The -d $\0means that your input separators are NUL symbols. See how nicely thefind -print0andread -d $'\0'` together?
You should almost never use the for file in $(*.bak) method. This will fail if the files have any white space in the name.
Notice that this command doesn't actually move any files. Instead, it produces a find.out file with a list of all the file renames. You should always do something like this when you do commands that operate on massive amounts of files just to be sure everything is fine.
Once you've determined that all the commands in find.out are correct, you can run it like a shell script:
$ bash find.out
rename .bak '' *.bak
(rename is in the util-linux package)
Caveat: there is no error checking:
#!/bin/bash
cd "$1"
for i in *.bak ; do mv -f "$i" "${i%%.bak}" ; done
You can always use the find command to get all the subdirectories
for FILENAME in `find . -name "*.bak"`; do mv --force "$FILENAME" "${FILENAME%.bak}"; done

Resources