Get filenames sorted by mtime - linux

How to get file names sorted by the modification timestamp descending?
I should add that file names may potentially contain any special character except \0.
Here is what I got so far. The loop that gets file name and its mtime, however it is unsorted:
while IFS= read -r -d '' fname; do
read -r -d '' mtime
done < <(find . -maxdepth 3 -printf '%p\0%T#\0')

If you reorder your find printf, it becomes esay to sort:
find . -maxdepth 3 -printf '%T# :: %p\0'|\
sort -zrn |\
sed -z 's/^[0-9.]* :: //' |\
xargs -0 -n1 echo
The sed and xargs lines are just examples of stripping out the mtime and then doing something with the filenames.

For files within the same folder, this will do:
$ ls -t
If you want to cross a tree, one of these will do depending on your variant of Linux (stat command has different syntaxes):
$ find . -type f -exec stat -c '%Y %n' {} \; | sort -nr | cut -d' ' -f2-
Or:
$ find . -type f -exec stat -f '%m %N' {} \; | sort -nr | cut -d' ' -f2-
I hope this helps.

Assuming that you want a list of files and timestamps, ordered by timestamp:
while IFS=: read mtime fname ; do
echo "mtime = [$mtime] / fname = [$fname]"
done < <(find . -printf '%T#:%f\n' | sort -t:)
I've choosen a : as a delimiter as it is quite rare as a character for your filenames, being even prohibited in DOS/NTFS
With needs so stringent (filenames with : or \n as possible characters),
to get what you need, you can try:
while IFS= read -r -d '' mtime; do
read -r -d '' fname;
echo "[$mtime][$fname]";
done < <(find . -maxdepth 3 -printf '%T#\0%p\0' ) | sort -nr
Trying to solve the newlines embedded in the filenames:
while IFS= read -r -d '' mtime; do
read -r -d '' fname;
printf "[%s][%s]\0" "$mtime" "$fname";
done < <(find . -maxdepth 3 -printf '%T#\0%p\0' ) \
| sort -nrz | tr \\0 \\n

All you need is:
find . -maxdepth 3 -printf '%T#\t%p\0' | sort -zn
and if you wanted to get just the filename newline-terminated after that then pipe it to awk to remove the timestamp and tab plus replace NUL with newline:
find . -maxdepth 3 -printf '%T#\t%p\0' | sort -zn | awk -v RS='\0' '{sub(/^[^\t]+\t/,"")}1'

Related

Bash command to recursively find directories with the newest file older than 3 days

I was wondering if there was a single command which would recursively find the directories in which their newest file in older than 3 days. Other solutions seem to only print the newest file in all subdirectories, I was wondering if there was a way to do it recursively and print all the subdirectories? I tried find -newermt "aug 27, 2022" -ls but this only gets me directories that have files younger than the date specified, not the youngest for each directory.
A long one-liner to sort files by date, get uniq directory names, list by modification keeping first
find ~/.config -type f -newermt "aug 29, 2022" -print0 | xargs -r0 ls -l --time-style=+%s | sort -r -k 6 | gawk '{ print $7}' | xargs -I {} dirname {} | sort | uniq | xargs -I {} bash -c "ls -lt --time-style=full-iso {} | head -n2" | grep -v 'total '
With comments
find ~/.config -type f -newermt "aug 29, 2022" -print0 |\
xargs -r0 ls -l --time-style=+%s | sort -r -k 6 |\ # newer files sorted by reverse date
gawk '{ print $7}' | xargs -I {} dirname {} |\ # get directory names
sort | uniq | \ # get uniq directory names
xargs -I {} bash -c "ls -lt --time-style=full-iso {} | head -n2" |\# list each directory by time, keep first
grep -v 'total '
If I'm understanding the requirements correctly, would you please try:
#!/bin/bash
find dir -type d -print0 | while IFS= read -r -d "" d; do # traverse "dir" recursively for subdirectories assigning "$d" to each directory name
if [[ -n $(find "$d" -maxdepth 1 -type f) \
&& -z $(find "$d" -maxdepth 1 -type f -mtime -3) ]]; then # if "$d" contains file(s) and does not contain files newer than 3 days
echo "$d" # then print the directory name "$d"
fi
done
A one liner version:
find dir -type d -print0 | while IFS= read -r -d "" d; do if [[ -n $(find "$d" -maxdepth 1 -type f) && -z $(find "$d" -maxdepth 1 -type f -mtime -3) ]]; then echo "$d"; fi; done
Please modify the top directory name dir according to your file location.

Bash: Compare alphanumeric string with lower and upper case

Given a directory with files with an alphanumeric name:
file45369985.xml
file45793220.xml
file0005461x.xml
Also, given a csv table with a list of files
file45369985.xml,file,45369985,.xml,https://www.tib.eu/de/suchen/id/FILE:45369985/Understanding-terrorism-challenges-perspectives?cHash=16d713678274dd2aa205fc07b2fc5b86
file0005461X.xml,file,0005461X,.xml,https://www.tib.eu/de/suchen/id/FILE:0005461X/The-reality-of-social-construction?cHash=5d8152fbbfae77357c1ec6f443f8c8a4
I would like to match all files in the csv table with the directory's content and move them somewhere else. However, I cannot switch off the case sensitivity in this command:
while read p; do
data_set=$(echo "$p" | cut -f1 -d",")
# do something else
done
How can the "X-Files" be correctly matched as well?
Given the format of the csv file (no quotes around the first field), I show an answer for filenames without newlines.
List all files in current directory
find . -maxdepth 1 -type f -printf "%f\n"
Look for one filename in that list (ignoring case)
grep -Fix file0005461X.xml <(find . -maxdepth 1 -type f -printf "%f\n")
Show first field only from file
cut -d"," -f1 csvfile
Pretend that the output is a file
<(cut -d"," -f1 csvfile)
Tell grep to use that "file" for strings to look for with option f
grep -Fixf <(cut -d"," -f1 csvfile) <(find . -maxdepth 1 -type f -printf "%f\n")
Move to /tmp
grep -Fixf <(cut -d"," -f1 csvfile) <(find . -maxdepth 1 -type f -printf "%f\n") |
xargs -i{} mv "{}" /tmp
You can use join to perform a inner join between the CSV and the file list:
join -i -t, \
<(sort -t, -k1 list.csv) \
<(find given_dir -maxdepth 1 -mindepth 1 -type f -printf "%f\n" | sort) \
-o "2.1"
Explanation:
-i: perform a case insensitive comparison for the join
-t,: use the comma as a field separator
<(sort -t, -k1 list.csv): sort the CSV file on the first field using the comma as a field separator and use the output as a file, and perform a process substitution to "connect the output" to a file and use it as file argument (see Bash manual page)
<(find given_dir -maxdepth 1 -mindepth 1 -type f -printf "%f\n" | sort): list all the file stored in the root of the given directory given_dir (and not in the subdirectories), sort it and perform a process substitution like the above
-o "2.1": list the first column of the second input (the find output) of the join result
Note: this solution relies on GNU find due to printf command
awk -F [,\.] '{ print substr($1,1,length($1)-1)toupper(substr($1,length($1)))"."$2;print substr($1,1,length($1)-1)tolower(substr($1,length($1)))"."$2 }' csvfile | while read line
do
find /path -name "$line" -exec mv '{}' /newpath \;
done
Use awk and set the file delimiter to . and , Take each line and generate both an uppercase and lowercase X version of the file name.
Loop through this output and find the file in a given path. If the file exists, execute the move command to a given path.
You can use grep -i to make case insensitive matches:
while read p; do
data_set=$(echo "$p" | cut -f1 -d",")
match=$(ls $your_dir | grep -i "^$data_set\$")
if [ ! -z match ]; then
mv "$match" $another_dir
fi
done

Bash Script to find the Duplicate filenames in the same directory and send a Notification email

My aim is to find for any duplicates file names by comparing all the file names(abc.xyz , def.csv) in the same Directory. if there aren't any duplicate file names then move all those files(.csv , .xlsx) in the mentioned file path into Archive path.
If there are duplicate filenames, then fetch the names of those duplicate filenames only with their modified date timestamp and send a notification email to the team and move the remaining non-duplicate filenames to the archive folder.
As you can see I am trying to achieve it by the following code.
if the find command is empty, then perform the if condition and perform 'mv' command and exit the script entirely, if they are duplicate files, then exit the if condition and pipe the duplicate files and perform the mail and date stamp operation.
However the code what actually doing is, sending a notification email if it finds or does not find any duplicate files.
if there are duplicate files, then send an email with duplicate filenames and modification name , if there is no duplicate filnames, then it is sending the filename as blank and current time as modified time.
currently there are no files outside archive(only files inside archive, but all the files inside the archive are unique and looks good) so technically it shouldn't send any notification email.
{
DATE=`date +"%Y-%m-%d"`
dirname=/marketsource/SrcFiles/Target_Shellscript_Autodownload/Airtime_Activation
tempfile=myTempfileName
find $dirname -type f > $tempfile
cat $tempfile | sed 's_.*/__' | sort | uniq -d|
while read fileName
do
grep "$fileName" $tempfile
done
}
if ["$fileName" == ""]; then
mv /marketsource/SrcFiles/Target_Shellscript_Autodownload/Airtime_Activation/*.xlsx /marketsource/SrcFiles/Target_Shellscript_Autodownload/Airtime_Activation/Archive
mv /marketsource/SrcFiles/Target_Shellscript_Autodownload/Airtime_Activation/*.csv /marketsource/SrcFiles/Target_Shellscript_Autodownload/Airtime_Activation/Archive
exit 1
fi | tee '/marketsource/scripts/tj_var.txt' | awk -F"/" '{print $NF}' | tee '/marketsource/scripts/tj_var.txt' | sort -u | tee '/marketsource/scripts/tj_mail.txt'
DATE=`date +"%Y-%m-%d"`
printf "%s\n" "$(</marketsource/scripts/tj_mail.txt)" | while IFS= read -r filename; do
mtime=$(stat -c %y "/marketsource/SrcFiles/Target_Shellscript_Autodownload/Airtime_Activation/$filename")
printf 'Duplicate Filename - %s Uploaded time - %s\n\n' "$filename" "$mtime"
done | mail -s "Duplicate file found ${DATE}" ti#gmail.com
find for any duplicates file names by comparing all the file names(abc.xyz , def.csv) in the same Directory.
its .xlsx and .csv extensions
I'm assuming there are no whitespaces in filenames
IFS=$'\n'
duplicates=($(
find . -maxdepth 1 -type f '(' -name '*.xlsx' -o -name '*.csv' ')' \
-exec bash -c 'printf "%s %s\n" "$1" "${1%.*}"' -- {} \; |
sort -k1 |
uniq -f1 -d |
cut -d' ' -f2
))
# or simpler:
duplicates=($(
find . -type f '(' -name '*.xlsx' -o -name '*.csv' ')' |
sed 's/\.[^\.]*$//' |
sort |
uniq -d
))
IFS=$' \t\n'
# if there aren't any duplicate file names then move all those files(.csv , .xlsx) in the mentioned file path into Archive path
if ((${#duplicates[#]} == 0)); then
find . -type f '(' -name '*.xlsx' -o -name '*.csv' ')' \
-exec mv -v {} "$the_archive_path" \;
# If there are duplicate filenames, then
else
# fetch the names of those duplicate filenames only with their modified date timestamp
duplicate_filenames_with_modified_date=$(
{
printf "%s.xlsx\n" "${duplicates[#]}"
printf "%s.csv\n" "${duplicates[#]}"
} |
xargs -d$'\n' stat -c '%n %y\n'
)
# and send a notification email to the team and
mail the_team <<<"a notification email"
# move the remaining non-duplicate filenames to the archive folder.
find . -maxdepth 1 -type f '(' -name '*.xlsx' -o -name '*.csv' ')' \
-exec bash -c 'echo "$1" "${1%.*}"' -- {} \; | tee /dev/stderr |
sort -k2 |
uniq -f1 -u |
cut -d' ' -f1 |
xargs -r -d$'\n' -I{} echo mv -v {} "$the_archive_folder"
fi

How to count all the lines of files

I also need the directory name to be outputs as well. What I was able to do is to output the total number of lines in all directories with directory name.
find . -name '*.c' | xargs wc -l | xargs -I{} dirname {} | xargs -I{} dirname {}
I have jumbled up a mixture of bash commands mostly GNU-specific, make sure you have them, GNU grep and GNU Awk
find . -type f -print0 | xargs -0 grep -c ';$' | \
awk -F":" '$NF>0{cmd="dirname "$1; while ( ( cmd | getline result ) > 0 ) {printf "%s\t%s\n",result,$2} close(cmd) }'
The idea is grep -c returns the pattern count in format, file-name:count, which I am passing it to GNU Awk to filter those files whose count is greater than zero and print the directory of the file containing it and the count itself.
As a fancy one-liner as they call it these days,
find . -type f -print0 | xargs -0 grep -c ';$' | awk -F":" '$NF>0{cmd="dirname "$1; while ( ( cmd | getline result ) > 0 ) {printf "%s\t%s\n",result,$2} close(cmd) }'
Here is a script:
#!/usr/bin/env bash
for dir in */; do (
cd "$dir"
count=$(find . -name '*.c' -print0 | xargs -0 grep '[;]$' | wc -l)
echo -e "${count}\t${dir}"
) done
If you want numbers for each sub-directory:
#!/usr/bin/env bash
for dir in $(find . -type d); do (
cd "$dir"
count=$(find . -maxdepth 1 -name '*.c' -print0 | \
xargs -0 grep '[;]$' | wc -l)
echo -e "${count}\t${dir}"
) done
Using -maxdepth 1 makes sure the calculation is only done in the current directory, not its sub-directories. So each file is counted once.

Finding the number of files in a directory for all directories in pwd

I am trying to list all directories and place its number of files next to it.
I can find the total number of files ls -lR | grep .*.mp3 | wc -l. But how can I get an output like this:
dir1 34
dir2 15
dir3 2
...
I don't mind writing to a text file or CSV to get this information if its not possible to get it on screen.
Thank you all for any help on this.
This seems to work assuming you are in a directory where some subdirectories may contain mp3 files. It omits the top level directory. It will list the directories in order by largest number of contained mp3 files.
find . -mindepth 2 -name \*.mp3 -print0| xargs -0 -n 1 dirname | sort | uniq -c | sort -r | awk '{print $2 "," $1}'
I updated this with print0 to handle filenames with spaces and other tricky characters and to print output suitable for CSV.
find . -type f -iname '*.mp3' -printf "%h\n" | uniq -c
Or, if order (dir-> count instead of count-> dir) is really important to you:
find . -type f -iname '*.mp3' -printf "%h\n" | uniq -c | awk '{print $2" "$1}'
There's probably much better ways, but this seems to work.
Put this in a shell script:
#!/bin/sh
for f in *
do
if [ -d "$f" ]
then
cd "$f"
c=`ls -l *.mp3 2>/dev/null | wc -l`
if test $c -gt 0
then
echo "$f $c"
fi
cd ..
fi
done
With Perl:
perl -MFile::Find -le'
find {
wanted => sub {
return unless /\.mp3$/i;
++$_{$File::Find::dir};
}
}, ".";
print "$_,$_{$_}" for
sort {
$_{$b} <=> $_{$a}
} keys %_;
'
Here's yet another way to even handle file names containing unusual (but legal) characters, such as newlines, ...:
# count .mp3 files (using GNU find)
find . -xdev -type f -iname "*.mp3" -print0 | tr -dc '\0' | wc -c
# list directories with number of .mp3 files
find "$(pwd -P)" -xdev -depth -type d -exec bash -c '
for ((i=1; i<=$#; i++ )); do
d="${#:i:1}"
mp3s="$(find "${d}" -xdev -type f -iname "*.mp3" -print0 | tr -dc "${0}" | wc -c )"
[[ $mp3s -gt 0 ]] && printf "%s\n" "${d}, ${mp3s// /}"
done
' "'\\0'" '{}' +

Resources