Bash - Get files for last 12 hours / sophisticated name format - linux

I have a set of logs which have the names as follows:
SystemOut_15.07.20_23.00.00.log SystemOut_15.07.21_10.27.17.log
SystemOut_15.07.21_16.48.29.log SystemOut_15.07.22_15.57.46.log
SystemOut_15.07.22_13.03.46.log
From that list I need to get only files for last 12 hours.
So as an output I will receive:
SystemOut_15.07.22_15.57.46.log SystemOut_15.07.22_13.03.46.log
I had similar issue with files having below names but was able to resolve that quickly as the date comes in an easy format:
servicemix.log.2015-07-21-11 servicemix.log.2015-07-22-12
servicemix.log.2015-07-22-13
So I created a variable called 'day':
day=$(date -d '-12 hour' +%F-%H)
And used below command to get the files for last 12 hours:
ls | awk -F. -v x=$day '$3 >= x'
Can you help to have that done with SystemOut files as they have such name syntax containing underscore which confuses me.

Assuming the date-time in log file's name is in the format
YY.MM.DD_HH24.MI.SS,
day=$(date -d '-12 hour' +%Y.%m.%d_%H.%M.%S.log)
Prepend the century to the 2 digit year in the log file name and then compare
ls | awk -F_ -v x=$day '"20"$2"_"$3 >= x'
Alternatively, as Ed Morton suggested, find can be used like so:
find . -type f -name '*.log' -cmin -720
This returns the log files created within last 720 minutes. To be precise, this means file status was last changed within the past 720 minutes. -mmin option can be used to search by modification time.

Related

how to find a last updated file with the prefix name in bash?

How can I find a last updated file with the specific prefix in bash?
For example, I have three files, and I just want to see a file that has "ABC" and where the last Last_updatedDateTime desc.
fileName Last_UpdatedDateTime
abc123 7/8/2020 10:34am
abc456 7/6/2020 10:34am
def123 7/8/2020 10:34am
You can list files sorted in the order they were modified with ls -t:
-t sort by modification time, newest first
You can use globbing (abc*) to match all files starting with abc.
Since you will get more than one match and only want the newest (that is first):
head -1
Combined:
ls -t abc* | head -1
If there are a lot of these files scattered across a variety of directories, find mind be better.
find -name abc\* -printf "%T# %f\n" |sort -nr|sed 's/^.* //; q;'
Breaking that out -
find -name 'abc*' -printf "%T# %f\n" |
find has a ton of options. This is the simplest case, assuming the current directory as the root of the search. You can add a lot of refinements, or just give / to search the whole system.
-name 'abc*' picks just the filenames you want. Quote it to protect any globs, but you can use normal globbing rules. -iname makes the search case-insensitive.
-printf defines the output. %f prints the filename, but you want it ordered on the date, so print that first for sorting so the filename itself doesn't change the order. %T accepts another character to define the date format - # is the unix epoch, seconds since 00:00:00 01/01/1970, so it is easy to sort numerically. On my git bash emulation it returns fractions as well, so it's great granularity.
$: find -name abc\* -printf "%T# %f\n"
1594219755.7741618000 abc123
1594219775.5162510000 abc321
1594219734.0162554000 abc456
find may not return them in the order you want, though, so -
sort -nr |
-n makes it a numeric sort. -r sorts in reverse order, so that the latest file will pop out first and you can ignore everything after that.
sed 's/^.* //; q;'
Since the first record is the one we want, sed can just use s/^.* //; to strip off everything up to the space, which we know will be the timestamp numbers since we controlled the output explicitly. That leaves only the filename. q explicitly quits after the s/// scrubs the record, so sed spits out the filename and stops without reading the rest, which prevents the need for another process (head -1) in the pipeline.

Bash command to archive files daily based on date added

I have a suite of scripts that involve downloading files from a remote server and then parsing them. Each night, I would like to create an archive of the files downloaded that day.
Some constraints are:
Downloading from a Windows server to an Ubuntu server.
Inability to delete files on the remote server.
Require the date added to the local directory, not the date the file was created.
I have deduplication running at the downloading stage; however, (using ncftp), the check involves comparing the remote and local directories. A strategy is to create a new folder each day, download files into it and then tar it sometime after midnight. A problem arises in that the first scheduled download on the new day will grab ALL files on the remote server because the new local folder is empty.
Because of the constraints, I considered simply archiving files based on "date added" to a central folder. This works very well using a Mac because HFS+ stores extended metadata such as date created and date added. So I can combine a tar command with something like below:
mdls -name kMDItemFSName -name kMDItemDateAdded -raw *.xml | \
xargs -0 -I {} echo {} | \
sed 'N;s/\n/ /' | \
but there doesn't seem to be an analogue under linux (at least not with EXT4 that I am aware of).
I am open to any form of solution to get around doubling up files into a subsequent day. The end result should be an archives directory full of tar.gz files looking something like:
files_$(date +"%Y-%m-%d").tar.gz
Depending on the method that is used to backup the files, the modified or changed date should reflect the time it was copied - for example if you used cp -p to back them up, the modified date would not change but the changed date would reflect the time of copy.
You can get this information using the stat command:
stat <filename>
which will return the following (along with other file related info not shown):
Access: 2016-05-28 20:35:03.153214170 -0400
Modify: 2016-05-28 20:34:59.456122913 -0400
Change: 2016-05-29 01:39:52.070336376 -0400
This output is from a file that I copied using cp -p at the time shown as 'change'.
You can get just the change time by calling stat with a specified format:
stat -c '%z' <filename>
2016-05-29 01:39:56.037433640 -0400
or with capital Z for that time in seconds since epoch. You could combine that with the date command to pull out just the date (or use grep, etc)
date -d "`stat -c '%z' <filename>" -I
2016-05-29
The command find can be used to find files by time frame, in this case using the flags -cmin 'changed minutes', -mmin 'modified minutes', or unlikely, -amin 'accessed minutes'. The sequence of commands to get the minutes since midnight is a little ugly, but it works.
We have to pass find an argument of "minutes since a file was last changed" (or modified, if that criteria works). So first you have to calculate the minutes since midnight, then run find.
min_since_mid=$(echo $(( $(date +%s) - $(date -d "(date -I) 0" +%s) )) / 60 | bc)
Unrolling that a bit:
$(date +%s) == seconds since epoch until 'now'
"(date -I) 0" == todays date in format "YYYY-MM-DD 0" with 0 indicating 0 seconds into the day
$(date -d "(date -I 0" +%s)) == seconds from epoch until today at midnight
Then we (effectively) echo ( $now - $midnight ) / 60 to bc to convert the results into minutes.
The find call is passed the minutes since midnight with a leading '-' indicating up to X minutes ago. A'+' would indicate X minutes or more ago.
find /path/to/base/folder -cmin -"$min_since_mid"
The actual answer
Finally to create a tgz archive of files in the given directory (and subdirectories) that have been changed since midnight today, use these two commands:
min_since_mid=$(echo $(( $(date +%s) - $(date -d "(date -I) 0" +%s) )) / 60 | bc)
find /path/to/base/folder -cmin -"${min_since_mid:-0}" -print0 -exec tar czvf /path/to/new/tarball.tgz {} +
The -print0 argument to find tells it to delimit the files with a null string which will prevent issues with spaces in names, among other things.
The only thing I'm not sure on is you should use the changed time (-cmin), the modified time (-mmin) or the accessed time (-amin). Take a look at your backup files and see which field accurately reflects the date/time of the backup - I would think changed time, but I'm not certain.
Update: changed -"$min_since_mid" to -"${min_since_mid:-0}" so that if min_since_mid isn't set you won't error out with invalid argument - you just won't get any results. You could also surround the find with an if statement to block the call if that variable isn't set properly.

grep/touch -mtime check log within last XX minutes

i want to run a query on a linux machine. i want to check a log file and only check within the last 15 minutes. assuming there were changes and additions to the log file. what is the correct query?
grep 'test condition' /home/somelogpath/file.log -mtime 15
thanks.
to check only files modified in the last hour:
find /path/to/logfiles -mtime -1 -print
so you could
find /path/to/logfiles -mtime -1 -print | xargs grep "some condition" /dev/null
(note : I add a file (/dev/null) so that grep will always have at least 2 files and thus will be forced to display the name(s) of the matching files.. As it's /dev/null, it won't match anything you grep, so you don't have the risk of seing "/dev/null" in the ouput of grep. But it serves its purpose: grep will prefix any match with the filename, even if only 1 file matched)
for minutes, etc, please check your :
man find
(it depends on your os, etc) (and I don't have a linux at hand right now)
If you meant "only match the last X seconds in a log file" : use awk, and have a condition on the line (it depends on the logfile format.. if it uses "seconds since startup" or "epoch", you can simply test the relevant field to be >= some value. If it's using "2014-05-15 hh:mm" you can also find ways to do it, but it will be cumbersome..)

Clearing archive files with linux bash script

Here is my problem,
I have a folder where is stored multiple files with a specific format:
Name_of_file.TypeMM-DD-YYYY-HH:MM
where MM-DD-YYYY-HH:MM is the time of its creation. There could be multiple files with the same name but not the same time of course.
What i want is a script that can keep the 3 newest version of each file.
So, I found one example there:
Deleting oldest files with shell
But I don't want to delete a number of files but to keep a certain number of newer files. Is there a way to get that find command, parse in the Name_of_file and keep the 3 newest???
Here is the code I've tried yet, but it's not exactly what I need.
find /the/folder -type f -name 'Name_of_file.Type*' -mtime +3 -delete
Thanks for help!
So i decided to add my final solution in case anyone liked to get it. It's a combination of the 2 solutions given.
ls -r | grep -P "(.+)\d{4}-\d{2}-\d{2}-\d{2}:\d{2}" | awk 'NR > 3' | xargs rm
One line, super efficiant. If anything changes on the pattern of date or name just change the grep -P pattern to match it. This way you are sure that only the files fitting this pattern will get deleted.
Can you be extra, extra sure that the timestamp on the file is the exact same timestamp on the file name? If they're off a bit, do you care?
The ls command can sort files by timestamp order. You could do something like this:
$ ls -t | awk 'NR > 3' | xargs rm
THe ls -t lists the files by modification time where the newest are first.
The `awk 'NR > 3' prints out the list of files except for the first three lines which are the three newest.
The xargs rm will remove the files that are older than the first three.
Now, this isn't the exact solution. There are possible problems with xargs because file names might contain weird characters or whitespace. If you can guarantee that's not the case, this should be okay.
Also, you probably want to group the files by name, and keep the last three. Hmm...
ls | sed 's/MM-DD-YYYY-HH:MM*$//' | sort -u | while read file
do
ls -t $file* | awk 'NR > 3' | xargs rm
done
The ls will list all of the files in the directory. The sed 's/\MM-DD-YYYY-HH:MM//' will remove the date time stamp from the files. Thesort -u` will make sure you only have the unique file names. Thus
file1.txt-01-12-1950
file2.txt-02-12-1978
file2.txt-03-12-1991
Will be reduced to just:
file1.txt
file2.txt
These are placed through the loop, and the ls $file* will list all of the files that start with the file name and suffix, but will pipe that to awk which will strip out the newest three, and pipe that to xargs rm that will delete all but the newest three.
Assuming we're using the date in the filename to date the archive file, and that is possible to change the date format to YYYY-MM-DD-HH:MM (as established in comments above), here's a quick and dirty shell script to keep the newest 3 versions of each file within the present working directory:
#!/bin/bash
KEEP=3 # number of versions to keep
while read FNAME; do
NODATE=${FNAME:0:-16} # get filename without the date (remove last 16 chars)
if [ "$NODATE" != "$LASTSEEN" ]; then # new file found
FOUND=1; LASTSEEN="$NODATE"
else # same file, different date
let FOUND="FOUND + 1"
if [ $FOUND -gt $KEEP ]; then
echo "- Deleting older file: $FNAME"
rm "$FNAME"
fi
fi
done < <(\ls -r | grep -P "(.+)\d{4}-\d{2}-\d{2}-\d{2}:\d{2}")
Example run:
[me#home]$ ls
another_file.txt2011-02-11-08:05
another_file.txt2012-12-09-23:13
delete_old.sh
not_an_archive.jpg
some_file.exe2011-12-12-12:11
some_file.exe2012-01-11-23:11
some_file.exe2012-12-10-00:11
some_file.exe2013-03-01-23:11
some_file.exe2013-03-01-23:12
[me#home]$ ./delete_old.sh
- Deleting older file: some_file.exe2012-01-11-23:11
- Deleting older file: some_file.exe2011-12-12-12:11
[me#home]$ ls
another_file.txt2011-02-11-08:05
another_file.txt2012-12-09-23:13
delete_old.sh
not_an_archive.jpg
some_file.exe2012-12-10-00:11
some_file.exe2013-03-01-23:11
some_file.exe2013-03-01-23:12
Essentially, but changing the file name to dates in the form to YYYY-MM-DD-HH:MM, a normal string sort (such as that done by ls) will automatically group similar files together sorted by date-time.
The ls -r on the last line simply lists all files within the current working directly print the results in reverse order so newer archive files appear first.
We pass the output through grep to extract only files that are in the correct format.
The output of that command combination is then looped through (see the while loop) and we can simply start deleting after 3 occurrences of the same filename (minus the date portion).
This pipeline will get you the 3 newest files (by modification time) in the current dir
stat -c $'%Y\t%n' file* | sort -n | tail -3 | cut -f 2-
To get all but the 3 newest:
stat -c $'%Y\t%n' file* | sort -rn | tail -n +4 | cut -f 2-

How to view last created file?

I have uploaded a file to a Linux computer. I do not know its name. So how to view files through their last created date attribute ?
ls -lat
will show a list of all files sorted by date. When listing with the -l flag using the -t flag sorts by date. If you only need the filename (for a script maybe) then try something like:
ls -lat | head -2 | tail -1 | awk '{print $9}'
This will list all files as before, get the first 2 rows (the first one will be something like 'total 260'), the get the last one (the one which shows the details of the file) and then get the 9th column which contains the filename.
find / -ctime -5
Will print the files created in the last five minutes. Increase the period one minute at a time to find your file.
Assuming you know the folder where you'll be searching it, the most easy solution is:
ls -t | head -1
# use -A in case the file can start with a dot
ls -tA | head -1
ls -t will sort by time, newest first (from ls --help itself)
head -1 will only keep 1 line at the top of anything
Use ls -lUt or ls -lUtr, as you wish. You can take a look at the ls command documentation typing man ls on a terminal.

Resources