I have a problem. I need to write a bash script that will find all files and directories in given path and will display some info about results. Allowed time: 30 seconds.
#YEAR_AGO=$(date -d "now - 1 year" +%s)
function check_dir {
for entry in "$1"/*
if [ -d "$entry" ]; then
check_dir "$entry"
else if [ -f "$entry" ]; then
#SIZE=$(stat -c%s "$entry")
#CREATE_DATE=$(date -r "$entry" +%s)
#CREATE_DATE=$(stat -c%W "$entry")
#if [ $DIFF -ge $SECONDS_IN_YEAR ]; then
# ((OLD_FILES+=1))
if [ $# -ne 2 ]; then
echo "Usage: ./srpt path emailaddress"
exit 1
if [ ! -d $1 ]; then
echo "Provided path is invalid"
exit 1
check_dir $1
echo "Execution time $SECONDS"
echo "Dicrecoties $DIRS"
echo "Files $FILES"
echo "Sym links $SYM_LINKS"
echo "Old files $OLD_FILES"
echo "Large files $LARGE_FILES"
echo "Graphics files $IMG_FILES"
echo "Temporary files $TMP_FILES"
echo "Executable files $EXE_FILES"
echo "Total file size $TOTAL_BYTES"
Here are result of executing with commented lines above:
Execution time 1
Dicrecoties 931
Files 14515
Sym links 0
Old files 0
Large files 0
Graphics files 0
Temporary files 0
Executable files 0
Total file size 0
If I'll delete comment from
SIZE=$(stat -c%s "$entry")
I got:
Execution time 31
Dicrecoties 931
Files 14515
Sym links 0
Old files 0
Large files 0
Graphics files 0
Temporary files 0
Executable files 0
Total file size 447297022
31 seconds. How can I speed up my script?
Another +30 seconds gives finding of files with date creating more the one year
More often than not, using loops in shells is an indication that you're going for the wrong approach.
A shell is before all a tool to run other tools.
Though it can do counting, awk is a better tool to do it.
Though it can list and find files, find is better at it.
The best shell scripts are those that manage to have a few tools contribute to the task, not those that start millions of tools in sequence and where all the job is done by the shell.
Here, typically a better approach would be to have find find the files and gather all the data you need, and have awk munch it and return the statistics. Here using GNU find and GNU awk (for RS='\0') and GNU date (for -d):
find . -printf '%y.%s.%Ts%p\0' |
awk -v RS='\0' -F'[.]' -v yearago="$(date -d '1 year ago' +%s)" '
if ($1 == "f") {
if ($3 < yearago) old++
if (!index($NF, "/")) ext[tolower($NF)]++
printf("%20s: %d\n", "Directories", type["d"])
printf("%20s: %d\n", "Total size", total_size)
printf("%20s: %d\n", "old", old)
printf("%20s: %d\n", "jpeg", ext["jpg"]+ext["jpeg"])
printf("%20s: %d\n", "and so on...", 0)
The key is to avoid firing up too many utilities. You seem to be invoking two or three per file, which will be quite slow.
Also, the comments show that handling filenames, in general, is complicated, particularly if the filenames might have spaces and/or newlines in them. But you don't actually need the filenames, if I understand your problem correctly, since you are only using them to collect information.
If you're using gnu find, you can extract the stat information directly from find, which will be quite a lot more efficient, since find needs to do a stat() anyway on every file. Here's an example, which pipes from find into awk for simplicity:
summary() {
find "$#" '(' -type f -o -type d ')' -printf '%y %s %C#\n' |
awk '$1=="d"{DIR+=1;next}
$3<'$(date +%s -d"last year")'{OLD+=1}
END{printf "Directories: %d\nFiles: %d\nOld files: %d\nTotal Size: %d\n",
On my machine, that summarised 28718 files in 4817 directories in one-tenth of a second elapsed time. YMMV.
You surely want to avoid parsing the output of find as you did (see my comment): it'll break whenever you have spaces in filenames.
You surely want to avoid forking to external processes like your $(stat ...) or $(date ...) statements: each fork costs a lot!
It turns out that find on its own can do quite a lot. For example, if we want to count the numbers of files, dirs and links.
We all know the naive way in bash (pretty much what you've done):
shopt -s globstar
shopt -s nullglob
shopt -s dotglob
for f in ./**; do
[[ -f $f ]] && ((++nbfiles))
[[ -d $f ]] && ((++nbdirs))
echo "There are $nbdirs directories and $nbfiles files, and we're very happy."
Caveat. This method counts links according to what they link to: a link to a file will be counted as a file.
How about the find way? Count number of files, directories and (symbolic) links:
while read t n; do
case $t in
dirs) ((nbdirs+=n+1)) ;;
files) ((nbfiles+=n+1)) ;;
links) ((nblinks+=n+1)) ;;
done < <(
find . -type d -exec bash -c 'echo "dirs $#"' {} + \
-or -type f -exec bash -c 'echo "files $#"' {} + \
-or -type l -exec bash -c 'echo "links $#"' {} + 2> /dev/null
echo "There are $nbfiles files, $nbdirs dirs and $nblinks links. You're happy to know aren't you?"
Same principles, using associative arrays, more fields and more involved find logic:
declare -A fields
while read f n; do
done < <(
find . -type d -exec bash -c 'echo "dirs $(($#+1))"' {} + \
-or -type f -exec bash -c 'echo "files $(($#+1))"' {} + -printf 'size %s\n' \
\( \
\( -iname '*.jpg' -printf 'jpg 1\n' -printf 'jpg_size %s\n' \) \
-or -size +100M -printf 'large 1\n' \
\) \
-or -type l -exec bash -c 'echo "links $(($#+1))"' {} + 2> /dev/null
for f in "${!fields[#]}"; do
printf "%s: %s\n" "$f" "${fields[$f]}"
I hope this will give you some ideas! Good luck!
I have few files named as per year+month+date format.
As the file generated daily, I need to move 1st 2(20220101,20220102) and last 2(20220130,20220131) files in a specific folder every month. Can someone help me out how can I write the script?
This helped me a long back -
cd "$DIR"
for file in *; do
# Top tear folder name
year=$(stat -f "%Sm" -t "%Y" $file)
# Secondary folder name
subfolderName=$(stat -f "%Sm" -t "%d-%m-%Y" $file)
if [ ! -d "$target/$year" ]; then
mkdir "$target/$year"
echo "starting new year: $year"
if [ ! -d "$target/$year/$subfolderName" ]; then
mkdir "$target/$year/$subfolderName"
echo "starting new day & month folder: $subfolderName"
echo "moving file $file"
mv "$file" "$target/$year/$subfolderName"
well if you want to do this in bash i would suggest having a single script file and one log file to keep track of the current month/previous month.
x=$(date +%D | cut -c 4,5 | sed 's|0||g')
y=$(sed -n 1p date.log 2>/dev/null)
if ! [ -d date.log ]; then
printf "$x" > date.log
exit 0
if [[ $y -ge 0 && $y -le 12 && $x != $y ]]; then
#if the current month equal the previous month then everthing here will be exicuted
echo "a new month is here"
sed -i "1s/^.*$/$x/" date.log
what this script essentially dose is that it creates log file containing the current month "if it doesn't already exist and". After that "if executed again" it matches the new month value to the one contained in the log file if it doesn't match it executes everything where the commented text is which is most likely a bunch of mv commands.
Try this Shellcheck-clean code:
#! /bin/bash -p
datefiles=( 20[0-9][0-9][01][0-9][0-3][0-9] )
mv -n -v -- "${datefiles[#]:0:2}" "${datefiles[#]: -2}" /path/to/folder
datefiles=( 20[0-9][0-9][01][0-9][0-3][0-9] ) makes an array of the files in the current directory with date-formatted names, sorted by name.
"${datefiles[#]:0:2}" expands to the first two elements in the datefiles array.
"${datefiles[#]: -2}" expands to the last two elements in the datefiles array.
You'll need to change /path/to/folder.
Unless it is absolutely guaranteed that there will always be at least 4 date files, you should add a check on the number of files found (eg. if (( ${#datefiles[*]} >= 4 )) ...).
$ string="20220101 20220102 20220103 20220104 .. 20220130 20220131"
$ awk '{ print |"mv " $1" "$2" "$(NF-1)" "$NF " /your/folder"}' <<<"$string"
$ myArray=(20220101 20220102 20220103 20220104 .. 20220130 20220131)
$ mv ${myArray[0]} ${myArray[1]} ${myArray[-2]} ${myArray[-1]} /your/folder
Files to array
$ myArray=($(find /path/to/files -mindepth 1 -maxdepth 1 -type f -name "[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]" -print0))
$ readarray myArray < <(find /path/to/files -mindepth 1 -maxdepth 1 -type f -name "[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]")
My purpose has been served.
for i in {1..12}; do
mv $(date -d "20220101 $i months" +'%Y%m%d') /path/to/folder/
mv $(date -d "20220102 $i months" +'%Y%m%d') /path/to/folder/
mv $(date -d "20220101 + $i month - 1 day" +'%Y%m%d') /path/to/folder/
mv $(date -d "20220101 + $i month - 2 day" +'%Y%m%d') /path/to/folder/
that's the solution.
thanks everyone who has been participated.
I want to move all my files older than 1000 days, which are distributed over various subfolders, from /home/user/documents into /home/user/archive. The command I tried was
find /home/user/documents -type f -mtime +1000 -exec rsync -a --progress --remove-source-files {} /home/user/archive \;
The problem is, that (understandably) all files end up being moved into the single folder /home/user/archive. However, what I want is to re-construct the file tree below /home/user/documents inside /home/user/archive. I figure this should be possible by simply replacing a string with another somehow, but how? What is the command that serves this purpose?
Thank you!
I would take this route instead of rsync:
Change directories so we can deal with relative path names instead of absolute ones:
cd /home/user/documents
Run your find command and feed the output to cpio, requesting it to make hard-links (-l) to the files, creating the leading directories (-d) and preserve attributes (-m). The -print0 and -0 options use nulls as record terminators to correctly handle file names with whitespace in them. The -l option to cpio uses links instead of actually copying the files, so very little additional space is used (just what is needed for the new directories).
find . -type f -mtime +1000 -print0 | cpio -dumpl0 /home/user/archives
Re-run your find command and feed the output to xargs rm to remove the originals:
find . -type f -mtime +1000 -print0 | xargs -0 rm
Here's a script too.
[ -n "$BASH_VERSION" ] && [[ BASH_VERSINFO -ge 4 ]] || {
echo "You need Bash version 4.0 to run this script."
exit 1
# SOURCE=/home/user/documents/
# DEST=/home/user/archive/
declare -i DAYSOLD=10
declare -a DIRS=()
declare -A DIRS_HASH=()
declare -a FILES=()
declare -i E=0
# Check directories.
[[ -n $SOURCE && -d $SOURCE && -n $DEST && -d $DEST ]] || {
echo "Source or destination directory may be invalid."
exit 1
# Format source and dest variables properly:
# Copy directories first.
echo "Creating directories."
while read -r FILE; do
if [[ -z ${DIRS_HASH[$DIR]} ]]; then
if [[ -n $PARTIAL ]]; then
echo "'$TARGET'"
mkdir -p "$TARGET" || (( E += $? ))
chmod --reference="$DIR" "$TARGET" || (( E += $? ))
chown --reference="$DIR" "$TARGET" || (( E += $? ))
touch --reference="$DIR" "$TARGET" || (( E += $? ))
done < <(exec find "$SOURCE" -mindepth 1 -type f -mtime +"$DAYSOLD")
# Copy files.
echo "Copying files."
while read -r FILE; do
cp -av "$FILE" "${DEST}${PARTIAL}" || (( E += $? ))
done < <(exec find "$SOURCE" -mindepth 1 -type f -mtime +"$DAYSOLD")
# Remove old files.
if [[ E -eq 0 ]]; then
echo "Removing old files."
rm -fr "${DIRS[#]}" "${FILES[#]}"
echo "An error occurred during copy. Not removing old files."
exit 1
I am trying to create a script that will find all the files in a folder that contain, for example, the string 'J34567' and process them. Right now I can process all the files in the folder with my code, however, my script will not just process the contained string it will process all the files in the folder. In other words once I run the script even with the string name ./bashexample 'J37264' it will still process all the files even without that string name. Here is my code below:
directory=$(cd `dirname .` && pwd)
echo find: $tag on $directory
find $directory . -type f -exec grep -sl "$tag" {} \;
for files in $directory/*$tag*
for i in *.std
/projects/OPSLIB/BCMTOOLS/sumfmt_linux < $i > $i.sum
for j in *.txt
egrep "device|Device|\(F\)" $i > $i.fail
echo $files
Kevin, you could try the following:
for files in $directory/*$tag*
if [ -f $files ]
#do your stuff
echo $files
where directory is your directory name (you could pass it as a command-line argument too) and tag is the search term you are looking for in a filename.
Following script will give you the list of files that contain (inside the file, not in file name) the given pattern.
for file in $(find "$directory" -type f -exec grep -l "$tag" {} \;); do
echo $file
# use $file for further operations
What is the relevance of .std, .txt, .sum and .fail files to the files containing given pattern?
Its assumed there are no special characters, spaces, etc. in file names.
If that is the case following should help working around those.
How can I escape white space in a bash loop list?
Capturing output of find . -print0 into a bash array
There are multiple issues in your script.
Following is not required to set the operating directory to current directory.
directory=$(cd `dirname .` && pwd)
find is executed twice for the current directory due to $directory and ..
find $directory . -type f -exec grep -sl "$tag" {} \;
Also, result/output of above find is not used in for loop.
For loop is run for files in the $directory (sub directories not considered) with their file name having the given pattern.
for files in $directory/*$tag*
Following for loop will run for all .txt files in current directory, but will result in only one output file due to use of $i from previous loop.
for j in *.txt
egrep "device|Device|\(F\)" $i > $i.fail
This is my temporary solution. Please check if it follows your intention.
directory=$(cd `dirname .` && pwd) ## Should this be just directory=$PWD ?
echo "find: $tag on $directory"
find "$directory" . -type f -exec grep -sl "$tag" {} \; ## Shouldn't you add -maxdepth 1 ? Are the files listed here the one that should be processed in the loop below instead?
for file in "$directory"/*"$tag"*; do
if [[ $file == *.std ]]; then
/projects/OPSLIB/BCMTOOLS/sumfmt_linux < "$file" > "${file}.sum"
if [[ $file == *.txt ]]; then
egrep "device|Device|\(F\)" "$file" > "${file}.fail"
echo "$file"
Update 1
directory=$PWD ## Change this to another directory if needed.
echo "find: $tag on $directory"
while IFS= read -rd $'\0' file; do
echo "$file"
case "$file" in
/projects/OPSLIB/BCMTOOLS/sumfmt_linux < "$file" > "${file}.sum"
egrep "device|Device|\(F\)" "$file" > "${file}.fail"
echo "Unexpected match: $file"
done < <(exec find "$directory" -maxdepth 1 -type f -name "*${tag}*" \( -name '*.std' -or -name '*.txt' \) -print0) ## Change or remove the maxdepth option as wanted.
Update 2
echo "find: $tag on $directory"
while IFS= read -rd $'\0' file; do
echo "$file"
/projects/OPSLIB/BCMTOOLS/sumfmt_linux < "$file" > "${file}.sum"
done < <(exec find "$directory" . -maxdepth 1 -type f -name "*${tag}*" -name '*.std' -print0)
while IFS= read -rd $'\0' file; do
echo "$file"
egrep "device|Device|\(F\)" "$file" > "${file}.fail"
done < <(exec find "$directory" -maxdepth 1 -type f -name "*${tag}*" -name '*.txt' -print0)
I have written a script to zip a set of files into one zip file if the number of files go above a limit.
limit=1000 #limit the number of files
files=( /mnt/md0/capture/dcn/*.pcap) #file format to be zipped
if((${#files[0]}>limit )); then #if number of files above limit
zip -j /mnt/md0/capture/dcn/capture_zip-$(date "+%b_%d_%Y_%H_%M_%S").zip /mnt/md0/capture/dcn/*.pcap
I need to modify this, so that the script checks for number of files from previous month rather than the whole set of files. How do I implement that
This script perhaps.
[ -n "$BASH_VERSION" ] || {
echo "You need Bash to run this script."
exit 1
shopt -s extglob || {
echo "Unable to enable extglob option."
exit 1
read ONE_MONTH_BEFORE < <(date -d 'TODAY - 1 month' '+%s') && [[ $ONE_MONTH_BEFORE == +([[:digit:]]) && ONE_MONTH_BEFORE -gt 0 ]] || {
echo "Unable to get timestamp one month before current day."
exit 1
for F in "${FILES[#]}"; do
read TIMESTAMP < <(date -r "$F" '+%s') && [[ $TIMESTAMP == +([[:digit:]]) && TIMESTAMP -le ONE_MONTH_BEFORE ]] && ONE_MONTH_OLD_FILES+=("$F")
if [[ ${#ONE_MONTH_OLD_FILES[#]} -gt LIMIT ]]; then
# echo "Zipping ${FILES[*]}." ## Just an example message you can create.
zip -j "/mnt/md0/capture/dcn/capture_zip-$(date '+%b_%d_%Y_%H_%M_%S').zip" "${ONE_MONTH_OLD_FILES[#]}"
Make sure you save in unix file format and run bash script.sh.
You could also modify the script to get files by arguments instead by:
Complete update:
#Limit of your choice
#Get the number of files, that has `*.txt` in its name, with last modified time 30 days ago
NUMBER=$(find /yourdirectory -maxdepth 1 -name "*.pcap" -mtime +30 | wc -l)
if [[ $NUMBER -gt $LIMIT ]]
FILES=$(find /yourdirectory -maxdepth 1 -name "*.pcap" -mtime +30)
zip archive.zip $FILES
The reason I am getting the files twice, is because the bash array is delimeted by space, rather than \n, and I couldn't find a clear way to count the number of files, you might want to do some research on that to make find once.
Just replace your if line with
if [[ "$(find $(dirname "$files") -maxdepth 1 -wholename "$files" -mtime -30 | wc -l)" -gt "$limit" ]]; then
From left to right this expression
searches (find)
in the path of your pattern ($(dirname "$files") strips away everything from the last "/")
but not in its subdirectories (-maxdepth 1)
for files matching your pattern (-wholename "$files")
that are newer than 30 days (-mtime -30)
and counts the number of those files (wc -l)
I prefer -gt for comparisons, but else it is the same as in your example.
Note that this will only work when all your files are in the same directory!
I'm trying to write a function that will traverse the file directory and give me the value of the deepest directory. I've written the function and it seems like it is going to each directory, but my counter doesn't seem to work at all.
local olddir=$PWD
local dir
local counter=0
cd "$1"
for dir in *
if [ -d "$dir" ]
dir_depth "$1/$dir"
echo "$dir"
counter=$(( $counter + 1 ))
cd "$olddir"
What I want it to do is feed the function a directory, say /home, and it'll go down each subdirectory within and find the deepest value. I'm trying to learn recursion better, but I'm not sure what I'm doing wrong.
Obviously find should be used for this
find . -type d -exec bash -c 'echo $(tr -cd / <<< "$1"|wc -c):$1' -- {} \; | sort -n | tail -n 1 | awk -F: '{print $1, $2}'
At the end I use awk to just print the output, but if that were the output you wanted it would be better just to echo it that way to begin with.
Not that it helps learn about recursion, of course.
Here's a one–liner that's pretty fast:
find . -type d -printf '%d:%p\n' | sort -n | tail -1
Or as a function:
find $1 -type d -printf '%d:%p\n' | sort -n | tail -1
Here is a version that seems to work:
dir_depth() {
cd "$1"
for d in */.; do
[ -d "$d" ] || continue
depth=`dir_depth "$d"`
maxdepth=$(($depth > $maxdepth ? $depth : $maxdepth))
echo $((1 + $maxdepth))
dir_depth "$#"
Just a few small changes to your script. I've added several explanatory comments:
# don't need olddir and counter needs to be "global"
local dir
cd -- "$1" # the -- protects against dirnames that start with -
# do this out here because we're counting depth not visits
for dir in *
if [ -d "$dir" ]
# we want to descend from where we are rather than where we started from
dir_depth "$dir"
if ((counter > max))
max=$counter # these are what we're after
((counter--)) # decrement and test to see if we're back where we started
if (( counter == 0 ))
echo $max $maxdir # ta da!
unset counter # ready for the next run
cd .. # go up one level instead of "olddir"
It prints the max depth (including the starting directory as 1) and the first directory name that it finds at that depth. You can change the test if ((counter > max)) to >= and it will print the last directory name it finds at that depth.
The AIX (6.1) find command seems to be quite limited (e.g. no printf option). If you like to list all directories up to a given depth try this combination of find and dirname. Save the script code as maxdepth.ksh. In comparison to the Linux find -maxdepth option, AIX find will not stop at the given maximum level which results in a longer runtime, depending on the size/depth of the scanned direcory:
# Param 1: maxdepth
# Param 2: Directoryname
while [[ "$netxt_dir" != "/" ]] && [[ "$netxt_dir" != "." ]]; do
max_depth=$(($max_depth + 1))
netxt_dir=$(dirname $netxt_dir)
if [ $1 -lt $max_depth ]; then
ls -d $2
exit $ret
Sample call:
find /usr -type d -exec maxdepth.ksh 2 {} \;
The traditional way to do this is to have dir_depth return the maximum depth too. So you'll return both the name and depth.
You can't return an array, struct, or object in bash, so you can return e.g. a comma-separated string instead..
local dir
local max_dir="$1"
local max_depth=0
for dir in $1/*
if [ -d "$dir" ]
cur_ret=$(dir_depth "$dir")
cur_depth=$(expr "$cur_ret" : '\([^,]*\)')
cur_dir=$(expr "$cur_ret" : '.*,\(.*\)')
if [[ "$cur_depth" -gt "$max_depth" ]]; then
max_depth=$(($max_depth + 1))
echo "$max_depth,$max_dir"
EDIT: Fixed now. It starts with the directory you passed in as level 1, then counts upwards. I removed the cd, as it isn't necessary. Note that this will fail if filenames contain commas.
You might want to consider using a programming language with more built-in data structures, like Python.