sort files from directory based on unique time stamp and group them - linux

I want to get the list of files from a directory and group them in an array or variable based on unique time stamp ( ls -ltr month , day ) using bash. this time stamp is 2-3 columns in range.
Any suggestions?

This is a one-liner don't know if is exactly what you are asking for:
array=($(ls -ltr | awk -v x=9 '{print $x}'))
It will create an array with the output of ls -ltr of the files
To print contents of the array:
printf "%s\n" "${array[#]}"
But is also worth checking this "Why you shouldn't parse the output of ls(1)"

Related

How to sort by name then date modification in BASH

Lets say I have a folder of .txt files that have a dd-MM-yyyy_HH-mm-ss time followed by _name.txt. I want to be able to sort by name first then time after. Example:
BEFORE
15-2-2010_10-01-55_greg.txt
10-2-1999_10-01-55_greg.txt
10-2-1999_10-01-55_jason.txt
AFTER
greg_1_10-2-1999_10-01-55
greg_2_15-2-2010_10-01-55
jason_1_10-2-1999_10-01-55
Edit: Apologies, from my "cp" line I was meant to copy them into another directory with a different name to them.
Something I tried to do is make a copy with the count, but it doesn't sort the files with the same name properly in terms of dates:
cd data/unfilteredNames
for filename in *.txt; do
n=${filename%.*}
n=${filename##*_}
filteredName=${n%.*}
count=0
find . -type f -name "*_$n" | while read name; do
count=$(($count+1))
cp -p $name ../filteredNames/"$filteredName"_"$count"
done
done
Not sure that the renaming of files is one of your expectation. At least for only sorting file name, you don't need to.
You can do this by only using GNU sort command:
sort -t- -k5.4 -k3.1,3.4 -k2.1,2.1 -k1.1,1.2 -k3.6,3.13 <(printf "%s\n" *.txt)
-t sets the field separator to a dash -.
-k enables to sort based on fields. As explained in man sort page, the syntax is -k<start>,<stop> where <start> or is composed of <field number>.<position>. Adding several -k option to the command allows to sort on multiple fields; the first in he command line having more precedence than the other.
For example, the first -k5.4 tells to sort based on the 5th fields with an offset of 4 characters. There isn't a stop field because this is the end of the filename.
The -k3.1,3.4 option sorts based on the 3rd field starting from offset 1 to 4.
The same principle applies to other -k options.
In your example the month field only has 1 digit. If you have files with a month coded with 2 digits, you might want to pad with 0 all month filenames. This can be done by adding to the printf statement this <(... | sed 's/-0\?\([0-9]\)/-0\1/') and change the -k 2.1,2.1 by -k2.1,2.2.

Search for multiple strings (from a file) in another file and list the missing strings

I have a file with 200 student names. Another huge file that contains data for those 200 students. I want to make sure that none of the student names got missed. I'm looking for a script that look at the string from students.txt and then search for it in alldata.txt. If it is missing, list it
I tried using
find /tmp/alldata.txt -type f -exec grep -iHFf students.txt {} +
But it lists all the matches and misses to provide the list of the strings that it didn't find in the alldata.txt
You don't need find if you're just searching one file. But if your data file is unstructured text and the names can appear anywhere, you may need to look for them one at a time:
while read name; do
fgrep -q $name alldata.txt || echo $name
done < students.txt
Assuming the student names are in the first field of alldata.txt:
comm -23 <(sort students.txt) <(awk '{print $1}' alldata.txt | sort -u)
comm -23 prints all the lines that are in the first file but not in the second file. This uses process substitution to treat the output of the two commands as files.

Shell Script to check the folder created date against the current date/time

I am preparing shell script if folder created date is equal to the current date/time then it need to call the another script .
My requirement is script need to check only the folder created date against the current date/time not to the files date/time inside the folder .
Thanks in advance
You can have a look at the stat command and the change time of the folder. The change time gives the last date when the metadata of the folder was changed. (stackexchange link). Timestamps of permission changes are also included in this timestamp. This is maybe not exactly what you need.
For the current time, you can use the date command. You can compare timestamps between the stat and the date command if you print the output in seconds since the epoch.
stat --format="%Z" /path/to/folder
date +%s
Suppose TestToday is your Directory Name
*Please replace TestToday with your directory name in below script
Test.sh
ls -lrt | grep ^d | grep TestToday | grep "`date +%b`" | awk -v var="`date +%d | bc`" '$7==var {print $NF}' > DIR_NAME
DIR_EXISTS=`cat DIR_NAME`
#echo $DIR_EXISTS
if [ "$DIR_EXISTS" == "TestToday" ];then
echo "Calling Shell Script Here."
else
echo "Directed not created or updated today."
fi

Clearing archive files with linux bash script

Here is my problem,
I have a folder where is stored multiple files with a specific format:
Name_of_file.TypeMM-DD-YYYY-HH:MM
where MM-DD-YYYY-HH:MM is the time of its creation. There could be multiple files with the same name but not the same time of course.
What i want is a script that can keep the 3 newest version of each file.
So, I found one example there:
Deleting oldest files with shell
But I don't want to delete a number of files but to keep a certain number of newer files. Is there a way to get that find command, parse in the Name_of_file and keep the 3 newest???
Here is the code I've tried yet, but it's not exactly what I need.
find /the/folder -type f -name 'Name_of_file.Type*' -mtime +3 -delete
Thanks for help!
So i decided to add my final solution in case anyone liked to get it. It's a combination of the 2 solutions given.
ls -r | grep -P "(.+)\d{4}-\d{2}-\d{2}-\d{2}:\d{2}" | awk 'NR > 3' | xargs rm
One line, super efficiant. If anything changes on the pattern of date or name just change the grep -P pattern to match it. This way you are sure that only the files fitting this pattern will get deleted.
Can you be extra, extra sure that the timestamp on the file is the exact same timestamp on the file name? If they're off a bit, do you care?
The ls command can sort files by timestamp order. You could do something like this:
$ ls -t | awk 'NR > 3' | xargs rm
THe ls -t lists the files by modification time where the newest are first.
The `awk 'NR > 3' prints out the list of files except for the first three lines which are the three newest.
The xargs rm will remove the files that are older than the first three.
Now, this isn't the exact solution. There are possible problems with xargs because file names might contain weird characters or whitespace. If you can guarantee that's not the case, this should be okay.
Also, you probably want to group the files by name, and keep the last three. Hmm...
ls | sed 's/MM-DD-YYYY-HH:MM*$//' | sort -u | while read file
do
ls -t $file* | awk 'NR > 3' | xargs rm
done
The ls will list all of the files in the directory. The sed 's/\MM-DD-YYYY-HH:MM//' will remove the date time stamp from the files. Thesort -u` will make sure you only have the unique file names. Thus
file1.txt-01-12-1950
file2.txt-02-12-1978
file2.txt-03-12-1991
Will be reduced to just:
file1.txt
file2.txt
These are placed through the loop, and the ls $file* will list all of the files that start with the file name and suffix, but will pipe that to awk which will strip out the newest three, and pipe that to xargs rm that will delete all but the newest three.
Assuming we're using the date in the filename to date the archive file, and that is possible to change the date format to YYYY-MM-DD-HH:MM (as established in comments above), here's a quick and dirty shell script to keep the newest 3 versions of each file within the present working directory:
#!/bin/bash
KEEP=3 # number of versions to keep
while read FNAME; do
NODATE=${FNAME:0:-16} # get filename without the date (remove last 16 chars)
if [ "$NODATE" != "$LASTSEEN" ]; then # new file found
FOUND=1; LASTSEEN="$NODATE"
else # same file, different date
let FOUND="FOUND + 1"
if [ $FOUND -gt $KEEP ]; then
echo "- Deleting older file: $FNAME"
rm "$FNAME"
fi
fi
done < <(\ls -r | grep -P "(.+)\d{4}-\d{2}-\d{2}-\d{2}:\d{2}")
Example run:
[me#home]$ ls
another_file.txt2011-02-11-08:05
another_file.txt2012-12-09-23:13
delete_old.sh
not_an_archive.jpg
some_file.exe2011-12-12-12:11
some_file.exe2012-01-11-23:11
some_file.exe2012-12-10-00:11
some_file.exe2013-03-01-23:11
some_file.exe2013-03-01-23:12
[me#home]$ ./delete_old.sh
- Deleting older file: some_file.exe2012-01-11-23:11
- Deleting older file: some_file.exe2011-12-12-12:11
[me#home]$ ls
another_file.txt2011-02-11-08:05
another_file.txt2012-12-09-23:13
delete_old.sh
not_an_archive.jpg
some_file.exe2012-12-10-00:11
some_file.exe2013-03-01-23:11
some_file.exe2013-03-01-23:12
Essentially, but changing the file name to dates in the form to YYYY-MM-DD-HH:MM, a normal string sort (such as that done by ls) will automatically group similar files together sorted by date-time.
The ls -r on the last line simply lists all files within the current working directly print the results in reverse order so newer archive files appear first.
We pass the output through grep to extract only files that are in the correct format.
The output of that command combination is then looped through (see the while loop) and we can simply start deleting after 3 occurrences of the same filename (minus the date portion).
This pipeline will get you the 3 newest files (by modification time) in the current dir
stat -c $'%Y\t%n' file* | sort -n | tail -3 | cut -f 2-
To get all but the 3 newest:
stat -c $'%Y\t%n' file* | sort -rn | tail -n +4 | cut -f 2-

How to view last created file?

I have uploaded a file to a Linux computer. I do not know its name. So how to view files through their last created date attribute ?
ls -lat
will show a list of all files sorted by date. When listing with the -l flag using the -t flag sorts by date. If you only need the filename (for a script maybe) then try something like:
ls -lat | head -2 | tail -1 | awk '{print $9}'
This will list all files as before, get the first 2 rows (the first one will be something like 'total 260'), the get the last one (the one which shows the details of the file) and then get the 9th column which contains the filename.
find / -ctime -5
Will print the files created in the last five minutes. Increase the period one minute at a time to find your file.
Assuming you know the folder where you'll be searching it, the most easy solution is:
ls -t | head -1
# use -A in case the file can start with a dot
ls -tA | head -1
ls -t will sort by time, newest first (from ls --help itself)
head -1 will only keep 1 line at the top of anything
Use ls -lUt or ls -lUtr, as you wish. You can take a look at the ls command documentation typing man ls on a terminal.

Resources