Bash command to archive files daily based on date added - linux

I have a suite of scripts that involve downloading files from a remote server and then parsing them. Each night, I would like to create an archive of the files downloaded that day.
Some constraints are:
Downloading from a Windows server to an Ubuntu server.
Inability to delete files on the remote server.
Require the date added to the local directory, not the date the file was created.
I have deduplication running at the downloading stage; however, (using ncftp), the check involves comparing the remote and local directories. A strategy is to create a new folder each day, download files into it and then tar it sometime after midnight. A problem arises in that the first scheduled download on the new day will grab ALL files on the remote server because the new local folder is empty.
Because of the constraints, I considered simply archiving files based on "date added" to a central folder. This works very well using a Mac because HFS+ stores extended metadata such as date created and date added. So I can combine a tar command with something like below:
mdls -name kMDItemFSName -name kMDItemDateAdded -raw *.xml | \
xargs -0 -I {} echo {} | \
sed 'N;s/\n/ /' | \
but there doesn't seem to be an analogue under linux (at least not with EXT4 that I am aware of).
I am open to any form of solution to get around doubling up files into a subsequent day. The end result should be an archives directory full of tar.gz files looking something like:
files_$(date +"%Y-%m-%d").tar.gz

Depending on the method that is used to backup the files, the modified or changed date should reflect the time it was copied - for example if you used cp -p to back them up, the modified date would not change but the changed date would reflect the time of copy.
You can get this information using the stat command:
stat <filename>
which will return the following (along with other file related info not shown):
Access: 2016-05-28 20:35:03.153214170 -0400
Modify: 2016-05-28 20:34:59.456122913 -0400
Change: 2016-05-29 01:39:52.070336376 -0400
This output is from a file that I copied using cp -p at the time shown as 'change'.
You can get just the change time by calling stat with a specified format:
stat -c '%z' <filename>
2016-05-29 01:39:56.037433640 -0400
or with capital Z for that time in seconds since epoch. You could combine that with the date command to pull out just the date (or use grep, etc)
date -d "`stat -c '%z' <filename>" -I
2016-05-29
The command find can be used to find files by time frame, in this case using the flags -cmin 'changed minutes', -mmin 'modified minutes', or unlikely, -amin 'accessed minutes'. The sequence of commands to get the minutes since midnight is a little ugly, but it works.
We have to pass find an argument of "minutes since a file was last changed" (or modified, if that criteria works). So first you have to calculate the minutes since midnight, then run find.
min_since_mid=$(echo $(( $(date +%s) - $(date -d "(date -I) 0" +%s) )) / 60 | bc)
Unrolling that a bit:
$(date +%s) == seconds since epoch until 'now'
"(date -I) 0" == todays date in format "YYYY-MM-DD 0" with 0 indicating 0 seconds into the day
$(date -d "(date -I 0" +%s)) == seconds from epoch until today at midnight
Then we (effectively) echo ( $now - $midnight ) / 60 to bc to convert the results into minutes.
The find call is passed the minutes since midnight with a leading '-' indicating up to X minutes ago. A'+' would indicate X minutes or more ago.
find /path/to/base/folder -cmin -"$min_since_mid"
The actual answer
Finally to create a tgz archive of files in the given directory (and subdirectories) that have been changed since midnight today, use these two commands:
min_since_mid=$(echo $(( $(date +%s) - $(date -d "(date -I) 0" +%s) )) / 60 | bc)
find /path/to/base/folder -cmin -"${min_since_mid:-0}" -print0 -exec tar czvf /path/to/new/tarball.tgz {} +
The -print0 argument to find tells it to delimit the files with a null string which will prevent issues with spaces in names, among other things.
The only thing I'm not sure on is you should use the changed time (-cmin), the modified time (-mmin) or the accessed time (-amin). Take a look at your backup files and see which field accurately reflects the date/time of the backup - I would think changed time, but I'm not certain.
Update: changed -"$min_since_mid" to -"${min_since_mid:-0}" so that if min_since_mid isn't set you won't error out with invalid argument - you just won't get any results. You could also surround the find with an if statement to block the call if that variable isn't set properly.

Related

rename Files of past two days with the there file creation time as suffix and scp the same

I am stuck with commands in Linux where logic is below. Getting the list of files with Yesterday and they-before-yesterday with creation date and renamining them and transfer via scp.
file name should be like <<<YYYYMMDD_creation_date_of_that_filename>>>_filename.
for e.g.
getting output
ls -lrth --time-style=+"%Y%m%d" | egrep "date -d '1 day ago' '+%Y%m%d'| date -d '2 day ago' '+%Y%m%d'" > tmp.txt
-rw-rw-rw- 1 user user 418K 20211225 log.897.gz
-rw-rw-rw- 1 user user 419K 20211225 log.898.gz
-rw-rw-rw- 1 user user 419K 20211225 log.899.gz
renaming. this will rename with creation time of that same file as suffix
for f in tmp.txt|awk '{print$7}'; do cp "$f" "$(stat -c %Y "$f" | date +%Y%m%d)_$f"; done
transfer all these file in one go
for i in list_of_file_from_above; do scp $i user#server.com:/target/folder;done
I am stuck in somewhere loop.
also every time i have to enter password for scp
Please help.
"tar" option can also be considered.
It seems like you have two questions that are critical blockers.
How do I filter files by date?
How can I use ssh without passwords?
For the first question, use find. There is a nice clue here but that doesn't quite get you to the finish line. Note that the upper bound is exclusive, so that is why I set it to the current date rather than yesterday.
find . -type f -newermt $(date +%Y%m%d --date "2 days ago") \! -newermt $(date +%Y%m%d)
For the second question, you will want to use SSH keys. The documentation for ssh-keygen has good examples and is easy to read. In brief, you will:
Create a public/private key pair;
Copy the public key to the server (the destination machine for your ssh command);
and specify the location of the private key in the scp command with the -i option.
Tip: since you want to use the key in an automated application, do not enter a key password. This is a risk if your key is compromised, so be smart about it.

shell - faster alternative to "find"

I'm writing a shell script wich should output the oldest file in a directory.
This directory is on a remote server and has (worst case) between 1000 and 1500 (temporary) files in it. I have no access to the server and I have no influence on how the files are stored. The server is connect through a stable but not very fast line.
The result of my script is passed to a monitoring system wich in turn allerts the staff if there are too many (=unprocessed) files in the directory.
Unfortunately the monitoring system only allows a maximun execution time of 30 seconds for my script before a timeout occurs.
This wasn't a problem when testing with small directories, this wasn't a problem. Testing with the target directory over the remote-mounted directory (approx 1000 files) it is.
So I'm looking for the fastest way to get things like "the oldest / newest / largest / smallest" file in a directory (not recursive) without using 'find' or sorting the output of 'ls'.
Currently I'm using this statement in my sh script:
old)
# return oldest file (age in seconds)
oldest=`find $2 -maxdepth 1 -type f | xargs ls -tr | head -1`
timestamp=`stat -f %B $oldest`
curdate=`date +%s`
echo `expr $(($curdate-$timestamp))`
;;
and I tried this one:
gfind /livedrive/669/iwt.save -type f -printf "%T# %P\n" | sort -nr | tail -1 | cut -d' ' -f 2-
wich are two of many variants of statements one can find using google.
Additional information:
I'writing this on a FreeBSD Box with sh und bash installed. I have full access to the box and can install programs if needed. For reference: gfind is the GNU-"find" utuility as known from linux as FreeBSD has another "find" installed by default.
any help is appreciated
with kind regards,
dura-zell
For the oldest/newest file issue, you can use -t option to ls which sorts the output using the time modified.
-t Sort by descending time modified (most recently modified first).
If two files have the same modification timestamp, sort their
names in ascending lexicographical order. The -r option reverses
both of these sort orders.
For the size issue, you can use -S to sort file by size.
-S Sort by size (largest file first) before sorting the operands in
lexicographical order.
Notice that for both cases, -r will reverse the order of the output.
-r Reverse the order of the sort.
Those options are available on FreeBSD and Linux; and must be pretty common in most implementations of ls.
Let use know if it's fast enough.
In general, you shouldn't be parsing the output of ls. In this case, it's just acting as a wrapper around stat anyway, so you may as well just call stat on each file, and use sort to get the oldest.
old) now=$(date +%s)
read name timestamp < <(stat -f "%N %B" "$2"/* | sort -k2,2n)
echo $(( $now - $timestamp ))
The above is concise, but doesn't distinguish between regular files and directories in the glob. If that is necessary, stick with find, but use a different form of -exec to minimize the number of calls to stat:
old ) now=$(date +%s)
read name timestamp < <(find "$2" -maxdepth 1 -type f -exec stat -f "%N %B" '{}' + | sort -k2,2n)
echo $(( $now - $timestamp ))
(Neither approach works if a filename contains a newline, although since you aren't using the filename in your example anyway, you can avoid that problem by dropping %N from the format and just sorting the timestamps numerically. For example:
read timestamp < <(stat -f %B "$2"/* | sort -n)
# or
read timestamp < <(find "$2" -maxdepth 1 -type f -exec stat -f %B '{}' + | sort -n)
)
Can you try creating a shell script that will reside in the remote host and when executed will provide the required output. Then from your local machine just use ssh or something like that to run that. In this way the script will run locally there. Just a thought :-)

Bash - Get files for last 12 hours / sophisticated name format

I have a set of logs which have the names as follows:
SystemOut_15.07.20_23.00.00.log SystemOut_15.07.21_10.27.17.log
SystemOut_15.07.21_16.48.29.log SystemOut_15.07.22_15.57.46.log
SystemOut_15.07.22_13.03.46.log
From that list I need to get only files for last 12 hours.
So as an output I will receive:
SystemOut_15.07.22_15.57.46.log SystemOut_15.07.22_13.03.46.log
I had similar issue with files having below names but was able to resolve that quickly as the date comes in an easy format:
servicemix.log.2015-07-21-11 servicemix.log.2015-07-22-12
servicemix.log.2015-07-22-13
So I created a variable called 'day':
day=$(date -d '-12 hour' +%F-%H)
And used below command to get the files for last 12 hours:
ls | awk -F. -v x=$day '$3 >= x'
Can you help to have that done with SystemOut files as they have such name syntax containing underscore which confuses me.
Assuming the date-time in log file's name is in the format
YY.MM.DD_HH24.MI.SS,
day=$(date -d '-12 hour' +%Y.%m.%d_%H.%M.%S.log)
Prepend the century to the 2 digit year in the log file name and then compare
ls | awk -F_ -v x=$day '"20"$2"_"$3 >= x'
Alternatively, as Ed Morton suggested, find can be used like so:
find . -type f -name '*.log' -cmin -720
This returns the log files created within last 720 minutes. To be precise, this means file status was last changed within the past 720 minutes. -mmin option can be used to search by modification time.

Get mtime of specific file using Bash?

I am well aware of being able to do find myfile.txt -mtime +5 to check if my file is older than 5 days or not. However I would like to fetch mtime in days of myfile.txt and store it into a variable for further usage. How would I do that?
stat can give you that info:
filemtime=$(stat -c %Y myfile.txt)
%Y gives you the last modification as "seconds since The Epoch", but there are lots of other options; more info. So if the file was modified on 2011-01-22 at 15:30 GMT, the above would return a number in the region of 1295710237.
Edit: Ah, you want the time in days since it was modified. That's going to be more complicated, not least because a "day" is not a fixed period of time (some "days" have only 23 hours, others 25 — thanks to daylight savings time).
The naive version might look like this:
filemtime=$(stat -c %Y "$1")
currtime=$(date +%s)
diff=$(( (currtime - filemtime) / 86400 ))
echo $diff
...but again, that's assuming a day is always exactly 86,400 second long.
More about arithmetic in bash here.
The date utility has a convenient switch for extracting the mtime from a file, which you can then display or store using a format string.
date -r file "+%F"
# 2021-01-12
file_mtime=$(date -r file "+%F")
See man date, the output of date is controlled by a format string beginning with "+"
Useful format strings for comparing many dates might include:
"+%j": day of year
"+%s": unix epoch time
Arithmetic with dates is a bit of a pain in bash, so if you need relative time that will work in all corner cases, you may be better off with another language.
AGE=$(perl -e 'print -M $ARGV[0]' $file)
will set $AGE to the age of $file in days, as Perl's -M operator handles the stat call and the conversion to days for you.
The return value is a floating-point value (e.g., 6.62849537 days). Add an int to the expression if you need to have an integer result
AGE=$(perl -e 'print int -M $ARGV[0]' $file)
Ruby and Python also have their one-liners to stat a file and return some data, but I believe Perl has the most concise way.
I this the answer?
A=$(stat -c "%y" myfile.txt)
look at stat-help
stat --help
Usage: stat [OPTION]... FILE...
Display file or file system status.
[...]
-c --format=FORMAT use the specified FORMAT instead of the default;
output a newline after each use of FORMAT
[...]
The valid format sequences for files
[...]
%y Time of last modification, human-readable
%Y Time of last modification, seconds since Epoch
[...]

how do I check in bash whether a file was created more than x time ago?

I want to check in linux bash whether a file was created more than x time ago.
let's say the file is called text.txt and the time is 2 hours.
if [ what? ]
then
echo "old enough"
fi
Only for modification time
if test `find "text.txt" -mmin +120`
then
echo old enough
fi
You can use -cmin for change or -amin for access time. As others pointed I don’t think you can track creation time.
I always liked using date -r /the/file +%s to find its age.
You can also do touch --date '2015-10-10 9:55' /tmp/file to get extremely fine-grained time on an arbitrary date/time.
Using the stat to figure out the last modification date of the file, date to figure out the current time and a liberal use of bashisms, one can do the test that you want based on the file's last modification time1.
if [ "$(( $(date +"%s") - $(stat -c "%Y" "$somefile") ))" -gt "7200" ]; then
echo "'$somefile' is older then 2 hours"
fi
While the code is a bit less readable then the find approach, I think its a better approach then running find to look at a file you already "found". Also, date manipulation is fun ;-)
As Phil correctly noted creation time is not recorded, but use %Z instead of %Y below to get "change time" which may be what you want.
[Update]
For mac users, use stat -f "%m" "$somefile" instead of the Linux specific syntax above
Creation time isn't stored.
What are stored are three timestamps (generally, they can be turned off on certain filesystems or by certain filesystem options):
Last access time
Last modification time
Last change time
a "Change" to the file is counted as permission changes, rename etc. While the modification is contents only.
Although ctime isn't technically the time of creation, it quite often is.
Since ctime it isn't affected by changes to the contents of the file, it's usually only updated when the file is created. And yes - I can hear you all screaming - it's also updated if you change the access permissions or ownership... but generally that's something that's done once, usually at the same time you put the file there.
Personally I always use mtime for everything, and I imagine that is what you want. But anyway... here's a rehash of Guss's "unattractive" bash, in an easy to use function.
#!/bin/bash
function age() {
local filename=$1
local changed=`stat -c %Y "$filename"`
local now=`date +%s`
local elapsed
let elapsed=now-changed
echo $elapsed
}
file="/"
echo The age of $file is $(age "$file") seconds.
The find one is good but I think you can use anotherway, especially if you need to now how many seconds is the file old
date -d "now - $( stat -c "%Y" $filename ) seconds" +%s
using GNU date
Consider the outcome of the tool 'stat':
File: `infolog.txt'
Size: 694 Blocks: 8 IO Block: 4096 regular file
Device: 801h/2049d Inode: 11635578 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ fdr) Gid: ( 1000/ fdr)
Access: 2009-01-01 22:04:15.000000000 -0800
Modify: 2009-01-01 22:05:05.000000000 -0800
Change: 2009-01-01 22:05:05.000000000 -0800
You can see here the three dates for Access/modify/change. There is no
created date. You can only really be sure when the file contents were
modified (the "modify" field) or its inode changed (the "change"
field).
Examples of when both fields get updated:
"Modify" will be updated if someone concatenated extra information to
the end of the file.
"Change" will be updated if someone changed permissions via chmod.
I use
file_age() {
local filename=$1
echo $(( $(date +%s) - $(date -r $filename +%s) ))
}
is_stale() {
local filename=$1
local max_minutes=20
[ $(file_age $filename) -gt $(( $max_minutes*60 )) ]
}
if is_stale /my/file; then
...
fi

Resources