Script for deleting multiple directories based on directory name using shell or bash script

Script for deleting multiple directories based on directory name using shell or bash script - linux

I am trying to write a shell script which removed the directories and its contents based on directory name instead of last modified time.
I have following directories in /tmp/ location
2015-05-25
2015-05-26
2015-05-27
2015-05-28
2015-05-29
2015-05-30
2015-05-31
Now I would like to delete all the directories till 2015-05-29. Last modified date is same for all the directories.
Can any one please suggest?

A straightforward but not flexible way (in bash) is:
rm -rf 2015-05-{25..29}
A more flexible way would involve some coding:
ls -d ./2015-* | sort | sed '/2015-06-02/,$d' | xargs rm -r
Sort lexcially all the directories follow the name pattern 2015-*
Use 'sed' to remove all files after (inclusive) 2015-06-02
Use 'xargs' to delete the remaining ones

A simple solution is:
rm -r /tmp/2015-05-2?
If you want to keep 2 folders, try:
ls -d ./2015-* | sort | head -n -2 | xargs echo
Replace -2 with the negative number of folders to keep. Replace echo with rm -r when the output looks correct.

The intention of this question is to find directories whose names indicate a timestamp. Therefore I'd propose to calculate those timestamps to decide whether to delete or not:
ref=$(date --date 2015-05-29 +%s)
for d in ????-??-??; do [[ $(date --date "$d" +%s) -le $ref ]] && rm -rf "$d"; done
ref is the reference date, the names of the other directories are compared to this one.

Related

How can I make a bash script where I can move certain files to certain folders which are named based on a string in the files?

This is the script that I'm using to move files with the string "john" in them (124334_john_rtx.mp4 , 3464r64_john_gty.mp4 etc) to a certain folder
find /home/peter/Videos -maxdepth 1 -type f -iname '*john' -print0 | \
xargs -0 --no-run-if-empty echo mv --target-directory=/home/peter/Videos/john/
Since I have a large amount of videos with various names written in the files, I want to make a bash script which moves videos with a string between the underscores to a folder named based on the string between the underscores. So for example if a file is named 4345655_ben_rts.mp4 the script would identify the string "ben" between the underscores, create a folder named as the string between the underscores which in this case is "ben" and move the file to that folder. Any advice is greatly appreciated !

My way to do it :
cd /home/peter/Videos # Change directory to your start directory
for name in $(ls *.mp4 | cut -d'_' -f2 | sort -u) # loops on a list of names after the first underscore
do
mkdir -p /home/peter/Videos/${name} # create the target directory if it doesn't exist
mv *_${name}_*.mp4 /home/peter/Videos/${name} # Moving the files
done

This bash loop should do what you need:
find dir -maxdepth 1 -type f -iname '*mp4' -print0 | while IFS= read -r -d '' file
do
if [[ $file =~ _([^_]+)_ ]]; then
TARGET_DIR="/PARENTPATH/${BASH_REMATCH[1]}"
mkdir -p "$TARGET_DIR"
mv "$file" "$TARGET_DIR"
fi
done
It'll only move the files if it finds a directory token.
I used _([^_]+)_ to make sure there is no _ in the dir name, but you didn't specify what you want if there are more than two _ in the file name. _(.+)_ will work if foo_bar_baz_buz.mp4 is meant to go into directory bar_baz.
And this answer to a different question explains the find | while logic: https://stackoverflow.com/a/64826172/3216427 .
EDIT: As per a question in the comments, I added mkdir -p to create the target directory. The -p means recursively create any part of the path that doesn't already exist, and will not error out if the full directory already exists.

How to copy the contents of a folder to multiple folders based on number of files?

I want to copy the files from a folder (named: 1) to multiple folders based on the number of files (here: 50).
The code given below works. I transferred all the files from the folder to the subfolders based on number of files and then copied back all the files in the directory back to the initial folder.
However, I need something cleaner and more efficient. Apologies for the mess below, I'm a nube.
bf=1 #breakfolder
cd 1 #the folder from where I wanna copy stuff, contains 179 files
flies_exist=$(ls -1q * | wc -l) #assign the number of files in folder 1
#move 50 files from 1 to various subfolders
while [ $flies_exist -gt 50 ]
do
mkdir ../CompiledPdfOutput/temp/1-$bf
set --
for f in .* *; do
[ "$#" -lt 50 ] || break
[ -f "$f" ] || continue
[ -L "$f" ] && continue
set -- "$#" "$f"
done
mv -- "$#" ../CompiledPdfOutput/temp/1-$bf/
flies_exist=$(ls -1q * | wc -l)
bf=$(($bf + 1))
done
#mover the rest of the files into one final subdir
mkdir ../CompiledPdfOutput/temp/1-$bf
set --
for f in .* *; do
[ "$#" -lt 50 ] || break
[ -f "$f" ] || continue
[ -L "$f" ] && continue
set -- "$#" "$f"
done
mv -- "$#" ../CompiledPdfOutput/temp/1-$bf/
#get out of 1
cd ..
# copy back the contents from subdir to 1
find CompiledPdfOutput/temp/ -exec cp {} 1 \;
The required directory structure is:
parent
________|________
| |
1 CompiledPdfOutput
| |
(179) temp
|
---------------
| | | |
1-1 1-2 1-3 1-4
(50) (50) (50) (29)
The number inside "()" denotes the number of files.
BTW, the final step of my code gives this warning, would be glad if anyone can explain what's happening and a solution.
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/'
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/1-4'
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/1-3'
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/1-1'
cp: -r not specified; omitting directory 'CompiledPdfOutput/temp/1-2'
I dont wnt to copy the directory as well, just the files so giving -r would be bad.

Assuming that you need something more compact/efficient, you can leverage existing tools (find, xargs) to create a pipeline, eliminating the need to program each step using bash.
The following will move the files into the split folder. It will find the files, group them, 50 into each folder, use awk to generate output folder, and move the files. Solution not as elegant as original one :-(
find 1 -type f |
xargs -L50 echo |
awk '{ print "CompliedOutput/temp/1-" NR, $0 }' |
xargs -L1 echo mv -t
As a side note, current script moves the files from the '1' folder, to the numbered folders, and then copy the file back to the original folder. Why not just copy the files to the numbered folders. You can use 'cp -p' to to preserve timestamp, if that's needed.
Supporting file names with new lines (and spaces)
Clarification to question indicate a solution should work with file names with embedded new lines (and while spaces). This require minor change to use NUL character as separator.
# Count number of output folders
DIR_COUNT=$(find 1 -type f -print0 | xargs -0 -I{} echo X | wc -l)
# Remove previous tree, and create folder
OUT=CompiledOutput/temp
rm -rf $OUT
eval mkdir -p $OUT/1-{1..$DIR_COUNT}
# Process file, use NUL as separator
find 1 -type f -print0 |
awk -vRS="\0" -v"OUT=$OUT" 'NR%50 == 1 { printf "%s/1-%d%s",OUT,1+int(NR/50),RS } { printf "%s", ($0 RS) }' |
xargs -0 -L51 -t mv -t
Did limited testing with both space and new lines in the file. Looks OK on my machine.

I find a couple of issues with the posted script:
The logic of copying maximum 50 files per folder is overcomplicated, and the code duplication of an entire loop is error-prone.
It uses reuses the $# array of positional parameters for internal storage purposes. This variable was not intended for that, it would be better to use a new dedicated array.
Instead of moving files to sub-directories and then copying them back, it would be simpler to just copy them in the first step, without ever moving.
Parsing the output of ls is not recommended.
Consider this alternative, simpler logic:
Initialize an empty array to_copy, to keep files that should be copied
Initialize a folder counter, to use to compute the target folder
Loop over the source files
Apply filters as before (skip if not file)
Add file to to_copy
If to_copy contains the target number of files, then:
Create the target folder
Copy the files contained in to_copy
Reset the content of to_copy to empty
Increment folder_counter
If to_copy is not empty
Create the target folder
Copy the files contained in to_copy
Something like this:
#!/usr/bin/env bash
set -euo pipefail
distribute_to_folders() {
local src=$1
local target=$2
local max_files=$3
local to_copy=()
local folder_counter=1
for file in "$src"/* "$src/.*"; do
[ -f "$file" ] || continue
to_copy+=("$file")
if (( ${#to_copy[#]} == max_files )); then
mkdir -p "$target/$folder_counter"
cp -v "${to_copy[#]}" "$target/$folder_counter/"
to_copy=()
((++folder_counter))
fi
done
if (( ${#to_copy[#]} > 0 )); then
mkdir -p "$target/$folder_counter"
cp -v "${to_copy[#]}" "$target/$folder_counter/"
fi
}
distribute_to_folders "$#"
To distribute files in path/to/1 into directories of maximum 50 files under path/to/compiled-output, you can call this script with:
./distribute.sh path/to/1 path/to/compiled-output 50
BTW, the final step of my code gives this warning, would be glad if anyone can explain what's happening and a solution.
Sure. The command find CompiledPdfOutput/temp/ -exec cp {} 1 \; finds files and directories, and tries to copy them. When cp encounters a directory and the -r parameter is not specified, it issues the warning you saw. You could add a filter for files, with -type f. If there are not excessively many files then a simple shell glob will do the job:
cp -v CompiledPdfOutput/temp/*/* 1

This will copy files to multiple folders of fixed size. Change source, target, and folderSize as per your requirement. This also works with filenames with special character (e.g. 'file 131!##$%^&*()_+-=;?').
source=1
target=CompiledPDFOutput/temp
folderSize=50
find $source -type f -printf "\"%p\"\0" \
| xargs -0 -L$folderSize \
| awk '{system("mkdir -p '$target'/1-" NR); printf "'$target'/1-" NR " %s\n", $0}' \
| xargs -L1 cp -t

Retaining n most recent directories in a backup script

I have directory in /home/backup/ that stores yearly backups. Inside the backup folder, we have these directories:
/home/backup/2012
/home/backup/2013
/home/backup/2014
/home/backup/2015
/home/backup/2016
/home/backup/2017
and every year I have to clean up the data, keeping only the last three years of backup.
In the above case, I have to delete:
/home/backup/2012
/home/backup/2013
/home/backup/2014
What is the best way to find the directories to be deleted? I have this but it doesn't work:
find /home/ecentrix/recording/ -maxdepth 1 -mindepth 1 -type d -ctime +1095 -exec rm -rf {} \;
Do you guys have another idea to do that?

Since your directories have well-defined and integer names, I'd just use bash to calculate the appropriate targets:
mkdir -p backup/201{2..7} # just for testing
cd backup
rm -fr $(seq 2012 $(( $(date +"%Y") - 3)))
seq generates a list of numbers from 2012 through the current year minus 3, which are then passed to rm to blast them.

A more generic solution
I think it is best to traverse the directories in the descending order and then delete the ones after the third. This way, there is no danger of losing a directory when the script is run again and again:
#!/bin/bash
backups_to_keep=3
count=0
cd /home/backup
while read -d '' -r dir; do
[[ -d "$dir" ]] || continue # skip if not directory
((++count <= backups_to_keep)) && continue # skip if we are within retaining territory
echo "Removing old backup directory '$dir'" # it is good to log what was cleaned up
echo rm -rf -- "$dir"
done < <(find . -maxdepth 1 -name '[2-9][0-9][0-9][0-9]' -type d -print0 | sort -nrz)
Remove the echo before rm -rf after testing. For your example, it gives this output:
rm -rf -- ./2014
rm -rf -- ./2013
rm -rf -- ./2012
cd /home/backup restricts rm -rf to just that directory for extra safety
find . -maxdepth 1 -name '[2-9][0-9][0-9][0-9]' -type d gives the top level directories that match the glob
sort -nrz makes sure newer directories come first, -z processes the null terminated output of find ... -print0
This solution doesn't hardcode the years - it just assumes that the directories to be removed are named in numerically sortable way
It is resilient to any other files or directories being present in the backup directory
There are no side effects if the script is run again and again
This can easily be extended to support different naming conventions for the backup directory - just change the glob expression

Solution
# Check if extended globbing is on
shopt extglob
# If extended globbing is off, run this line
shopt -s extglob
# Remove all files except 2015, 2016, and 2017
rm -r -i /home/backup/!(2015|2016|2017)
# Turn off extended globbing (optional)
shop -u extglob
Explanation
shopt -s extglob allows you to match any files except the ones inside !(...). So that line means remove any file in /home/backup except 2015, 2016, or 2017.
The -i flag in rm -r -i ... allows you to interactively confirm the removal of each file. Remove -i if you want the files to be removed automatically.
Dynamic Dates
This solution is valid for automation (e.g. cron jobs)
# Number of latest years to keep
LATEST_YEARS=3
# Get the current year
current_year=$(date '+%Y')
# Get the first/earliest year to keep
first_year=$(( current_year - $(($LATEST_YEARS - 1)) ))
# Turn on extended globbing
shopt -s extglob
# Store years to keep in an array
keep_years=( $(seq $first_year $current_year) )
# Specify files to keep
rm -r /home/backup/!(${keep_years[0]}|${keep_years[1]}|${keep_years[2]})
NOTE: ALL FILES IN BACKUP DIRECTORY WILL BE REMOVED EXCEPT LAST 3 YEARS

Consider this:
find /home/backup/2* -maxdepth 1 | sort -r | awk "NR>3" | xargs rm -rf
How this works
Produce a list of filenames starting with "2", only under /home/backup/
Alphabetically sort the list, in reverse order.
Use AWK to filter the number of rows in the list. NR specifies the number of reverse-sorted rows. You can change that 3 to be however many rows you want to be left. So if you only wanted the latest two years, change the 3 to a 2. If you wanted the latest 10 to be kept, make it "NR>10".
Append the resultant list to the command "rm -rf".
Run as dedicated user, for safety
The danger here is that I'm suggesting rm -rf. This is risky. If something goes wrong, you could delete things you want to keep. I mitigate this risk by only invoking these commands by a dedicated user that ONLY has permissions to delete backup files (and not beyond).
Merit
The merit of this approach is that when you throw it in a cron job and time advances, it'll continue to retain only the latest few directories. So this, I consider to be a general solution to your problem.
Demonstration
To test this, I created a test directory with all the same directories you have. I altered it just to see what would be executed at the end, so I've tried:
find test01/2* -maxdepth 1 | sort -r | awk "NR>4" | xargs echo rm -rf
I used NR>4 rather than NR>3 (as you'd want) because NR>4 shows that we're selecting how many rows to remove from the list, and thus not delete.
Here's what I get:
The second-to-final command above changed the final stage not to echo what it would do, but actually do it.
I have a crude copy of a dump of this in a script as I use it on some servers of mine, you can view it here: https://github.com/docdawning/teenybackup
Required for success
This approach DEPENDS on the alphabetization of whatever the find command produces. In my case, I use ISO-8601 type dates, which lend themselves entirely to being inherently date-sorted when they're alphabetized. Your YYYY type dates totally qualify.
Additional Safety
I recommend that you change your backups to be stored as tar archives. Then you can change the rm -rf to a simple rm. This is a lot safer, though not fool-proof. Regardless, you really should run this as a dedicated otherwise unprivileged user (as you should do for any script calling a delete, in my opinion).
Be aware that if you start it with
find /home/backup
Then the call to xargs will include /home/backup itself, which would be a disaster, because it'd get deleted too. So you must search within that path. Insteading calling it with the below would work:
find /home/backup/*
The 2* I gave above is just a way of somewhat limiting the search operation.
Warranty
None; this is the Internet. Be careful. Test things heavily to convince yourself. Also, maybe get some offline backups too.
Finally - I previously posted this as an answer, but made the fatal mistake of representing the find command based out of /home/backup and not /home/backup/* or /home/backup/2*. This caused /home/backup to also be sent for deletion, which would be a disaster. It's a very small distinction that I've tried to be clear about above. I've deleted that previous answer and replaced it with this one.

Here is one way.
Updated answer.
[dev]$ find backup/* | grep -vE "$(date '+%Y')|$(date +%Y --date='1 year ago')|$(date +%Y --date='2 year ago')" | xargs rm -rfv
removed directory: ‘backup/2012’
removed directory: ‘backup/2013’
removed directory: ‘backup/2014’

Unix - Only list directories which contain a subdirectory

How can I print in the Unix shell the number of directories in a tree which contain other directories?
I haven't found a solution yet with commands like find or ls.

You can use find command: find . -type d -not -empty
That will print every subdirectory that is not empty. You can control how deep you want the search with -maxdepth.
To print the number, you can use wc -l.
find . -type d -not -empty | wc -l

If you generate a list of all the directories under a particular directory, and then remove the last component from the name, you have a list of the directories containing subdirectories, but there are likely to be repeats in that list. So, you need to post-process the list, yielding (as a first approximation):
find ${base:-.} -type d |
sed 's%/[^/]*$%%' |
sort -u
Find all the directories under the directory or directories listed in variable $base, defaulting to the current directory, and print their names. The code assumes you don't have directories with a newline in the name. If you do, there are fixes, but the best fix is to rename the directory. The sed command removes the last slash and everything after it. The sort eliminates duplicate entries. What's left is the list of directories containing subdirectories.
Well, more or less. There's the degenerate case to consider: the top-level directories in the list will be listed regardless of whether they have sub-directories or not. Fixing that is a bit harder. You need to eliminate any lines of output that exactly match the directories specified to find before removing trailing material. So, you need something like:
{
printf '\\#^%s$#d\n' ${base:-.}
echo 's%/[^/]*$%%'
} > sed.script
find ${base:-.} -type d |
sed -f sed.script |
sort -u
rm -f sed.script
The \\#^%s$#d assumes you don't use # in directory names. If you do use it, then you need to find a character you don't use in names (maybe Control-A) and use that in place of the #. If you could face absolutely any character, then you'll need to do more work escaping some obscure character, such as Control-A, when it appears in a directory name.
There's a problem still: using a fixed name like sed.script for a temporary file name is bad (for multiple reasons — such as two people trying to run the script at the same time in the same directory, though it can also be a security risk), so use mktemp to create a temporary file name:
tmp=$(mktemp ${TMPDIR:-/tmp}/dircnt.XXXXXX)
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15
{
printf '\\#^%s$#d\n' ${base:-.}
echo 's%/[^/]*$%%'
} > $tmp
find ${base:-.} -type d |
sed -f $tmp |
sort -u
rm -f $tmp
trap 0
This deals with the most common signals (HUP, INT, QUIT, PIPE, TERM) and removes the temporary file even if one of those arrives.
Clearly, if you want to simply count the number of directories, you can pipe the output from the commands above through wc -l to get the count.

ls -1d */*/. | cut -d / -f1 | uniq

How can I delete the directory with the highest number name?

I have a directory containing sub-directories, some of whose names are numbers. Without looking, I don't know what the numbers are. How can I delete the sub-directory with the highest number name? I reckon the solution might sort the sub-directories into reverse order and select the first sub-directory that begins with a number but I don't know how to do that. Thank you for your help.

cd $yourdir #go to that dir
ls -q -p | #list all files directly in dir and make directories end with /
grep '^[0-9]*/$' | #select directories (end with /) whose names are made of numbers
sort -n | #sort numerically
tail -n1 | #select the last one (largest)
xargs -r rmdir #or rm -r if nonempty
Recommend running it first without the xargs -r rmdir or xargs -r rm -r part to make sure your deleting the right thing.

A pure Bash solution:
#!/bin/bash
shopt -s nullglob extglob
# Make an array of all the dir names that only contain digits
dirs=( +([[:digit:]])/ )
# If none found, exit
if ((${#dirs[#]}==0)); then
echo >&2 "No dirs found"
exit
fi
# Loop through all elements of array dirs, saving the greatest number
max=${dirs[0]%/}
for i in "${dirs[#]%/}"; do
((10#$max<10#$i)) && max=$i
done
# Finally, delete the dir with largest number found
echo rm -r "$max"
Note:
This will have an unpredictable behavior when there are dirs with same number but written differently, e.g., 2 and 0002.
Will fail if the numbers overflow Bash's numbers.
Doesn't take into account negative numbers and non-integer numbers.
Remove the echo in the last line if you're happy with it.
To be run from within your directory.

Let's make some directories to test the script:
mkdir test; cd test; mkdir $(seq 100)
Now
find -mindepth 1 -maxdepth 1 -type d | cut -c 3- | sort -k1n | tail -n 1 | xargs -r echo rm -r
Result:
rm -r 100
Now, remove the word echo from the command and xargs will execute rm -r 100.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string