rename Files of past two days with the there file creation time as suffix and scp the same - linux

I am stuck with commands in Linux where logic is below. Getting the list of files with Yesterday and they-before-yesterday with creation date and renamining them and transfer via scp.
file name should be like <<<YYYYMMDD_creation_date_of_that_filename>>>_filename.
for e.g.
getting output
ls -lrth --time-style=+"%Y%m%d" | egrep "date -d '1 day ago' '+%Y%m%d'| date -d '2 day ago' '+%Y%m%d'" > tmp.txt
-rw-rw-rw- 1 user user 418K 20211225 log.897.gz
-rw-rw-rw- 1 user user 419K 20211225 log.898.gz
-rw-rw-rw- 1 user user 419K 20211225 log.899.gz
renaming. this will rename with creation time of that same file as suffix
for f in tmp.txt|awk '{print$7}'; do cp "$f" "$(stat -c %Y "$f" | date +%Y%m%d)_$f"; done
transfer all these file in one go
for i in list_of_file_from_above; do scp $i user#server.com:/target/folder;done
I am stuck in somewhere loop.
also every time i have to enter password for scp
Please help.
"tar" option can also be considered.

It seems like you have two questions that are critical blockers.
How do I filter files by date?
How can I use ssh without passwords?
For the first question, use find. There is a nice clue here but that doesn't quite get you to the finish line. Note that the upper bound is exclusive, so that is why I set it to the current date rather than yesterday.
find . -type f -newermt $(date +%Y%m%d --date "2 days ago") \! -newermt $(date +%Y%m%d)
For the second question, you will want to use SSH keys. The documentation for ssh-keygen has good examples and is easy to read. In brief, you will:
Create a public/private key pair;
Copy the public key to the server (the destination machine for your ssh command);
and specify the location of the private key in the scp command with the -i option.
Tip: since you want to use the key in an automated application, do not enter a key password. This is a risk if your key is compromised, so be smart about it.

Related

Bash command to archive files daily based on date added

I have a suite of scripts that involve downloading files from a remote server and then parsing them. Each night, I would like to create an archive of the files downloaded that day.
Some constraints are:
Downloading from a Windows server to an Ubuntu server.
Inability to delete files on the remote server.
Require the date added to the local directory, not the date the file was created.
I have deduplication running at the downloading stage; however, (using ncftp), the check involves comparing the remote and local directories. A strategy is to create a new folder each day, download files into it and then tar it sometime after midnight. A problem arises in that the first scheduled download on the new day will grab ALL files on the remote server because the new local folder is empty.
Because of the constraints, I considered simply archiving files based on "date added" to a central folder. This works very well using a Mac because HFS+ stores extended metadata such as date created and date added. So I can combine a tar command with something like below:
mdls -name kMDItemFSName -name kMDItemDateAdded -raw *.xml | \
xargs -0 -I {} echo {} | \
sed 'N;s/\n/ /' | \
but there doesn't seem to be an analogue under linux (at least not with EXT4 that I am aware of).
I am open to any form of solution to get around doubling up files into a subsequent day. The end result should be an archives directory full of tar.gz files looking something like:
files_$(date +"%Y-%m-%d").tar.gz
Depending on the method that is used to backup the files, the modified or changed date should reflect the time it was copied - for example if you used cp -p to back them up, the modified date would not change but the changed date would reflect the time of copy.
You can get this information using the stat command:
stat <filename>
which will return the following (along with other file related info not shown):
Access: 2016-05-28 20:35:03.153214170 -0400
Modify: 2016-05-28 20:34:59.456122913 -0400
Change: 2016-05-29 01:39:52.070336376 -0400
This output is from a file that I copied using cp -p at the time shown as 'change'.
You can get just the change time by calling stat with a specified format:
stat -c '%z' <filename>
2016-05-29 01:39:56.037433640 -0400
or with capital Z for that time in seconds since epoch. You could combine that with the date command to pull out just the date (or use grep, etc)
date -d "`stat -c '%z' <filename>" -I
2016-05-29
The command find can be used to find files by time frame, in this case using the flags -cmin 'changed minutes', -mmin 'modified minutes', or unlikely, -amin 'accessed minutes'. The sequence of commands to get the minutes since midnight is a little ugly, but it works.
We have to pass find an argument of "minutes since a file was last changed" (or modified, if that criteria works). So first you have to calculate the minutes since midnight, then run find.
min_since_mid=$(echo $(( $(date +%s) - $(date -d "(date -I) 0" +%s) )) / 60 | bc)
Unrolling that a bit:
$(date +%s) == seconds since epoch until 'now'
"(date -I) 0" == todays date in format "YYYY-MM-DD 0" with 0 indicating 0 seconds into the day
$(date -d "(date -I 0" +%s)) == seconds from epoch until today at midnight
Then we (effectively) echo ( $now - $midnight ) / 60 to bc to convert the results into minutes.
The find call is passed the minutes since midnight with a leading '-' indicating up to X minutes ago. A'+' would indicate X minutes or more ago.
find /path/to/base/folder -cmin -"$min_since_mid"
The actual answer
Finally to create a tgz archive of files in the given directory (and subdirectories) that have been changed since midnight today, use these two commands:
min_since_mid=$(echo $(( $(date +%s) - $(date -d "(date -I) 0" +%s) )) / 60 | bc)
find /path/to/base/folder -cmin -"${min_since_mid:-0}" -print0 -exec tar czvf /path/to/new/tarball.tgz {} +
The -print0 argument to find tells it to delimit the files with a null string which will prevent issues with spaces in names, among other things.
The only thing I'm not sure on is you should use the changed time (-cmin), the modified time (-mmin) or the accessed time (-amin). Take a look at your backup files and see which field accurately reflects the date/time of the backup - I would think changed time, but I'm not certain.
Update: changed -"$min_since_mid" to -"${min_since_mid:-0}" so that if min_since_mid isn't set you won't error out with invalid argument - you just won't get any results. You could also surround the find with an if statement to block the call if that variable isn't set properly.

shell - faster alternative to "find"

I'm writing a shell script wich should output the oldest file in a directory.
This directory is on a remote server and has (worst case) between 1000 and 1500 (temporary) files in it. I have no access to the server and I have no influence on how the files are stored. The server is connect through a stable but not very fast line.
The result of my script is passed to a monitoring system wich in turn allerts the staff if there are too many (=unprocessed) files in the directory.
Unfortunately the monitoring system only allows a maximun execution time of 30 seconds for my script before a timeout occurs.
This wasn't a problem when testing with small directories, this wasn't a problem. Testing with the target directory over the remote-mounted directory (approx 1000 files) it is.
So I'm looking for the fastest way to get things like "the oldest / newest / largest / smallest" file in a directory (not recursive) without using 'find' or sorting the output of 'ls'.
Currently I'm using this statement in my sh script:
old)
# return oldest file (age in seconds)
oldest=`find $2 -maxdepth 1 -type f | xargs ls -tr | head -1`
timestamp=`stat -f %B $oldest`
curdate=`date +%s`
echo `expr $(($curdate-$timestamp))`
;;
and I tried this one:
gfind /livedrive/669/iwt.save -type f -printf "%T# %P\n" | sort -nr | tail -1 | cut -d' ' -f 2-
wich are two of many variants of statements one can find using google.
Additional information:
I'writing this on a FreeBSD Box with sh und bash installed. I have full access to the box and can install programs if needed. For reference: gfind is the GNU-"find" utuility as known from linux as FreeBSD has another "find" installed by default.
any help is appreciated
with kind regards,
dura-zell
For the oldest/newest file issue, you can use -t option to ls which sorts the output using the time modified.
-t Sort by descending time modified (most recently modified first).
If two files have the same modification timestamp, sort their
names in ascending lexicographical order. The -r option reverses
both of these sort orders.
For the size issue, you can use -S to sort file by size.
-S Sort by size (largest file first) before sorting the operands in
lexicographical order.
Notice that for both cases, -r will reverse the order of the output.
-r Reverse the order of the sort.
Those options are available on FreeBSD and Linux; and must be pretty common in most implementations of ls.
Let use know if it's fast enough.
In general, you shouldn't be parsing the output of ls. In this case, it's just acting as a wrapper around stat anyway, so you may as well just call stat on each file, and use sort to get the oldest.
old) now=$(date +%s)
read name timestamp < <(stat -f "%N %B" "$2"/* | sort -k2,2n)
echo $(( $now - $timestamp ))
The above is concise, but doesn't distinguish between regular files and directories in the glob. If that is necessary, stick with find, but use a different form of -exec to minimize the number of calls to stat:
old ) now=$(date +%s)
read name timestamp < <(find "$2" -maxdepth 1 -type f -exec stat -f "%N %B" '{}' + | sort -k2,2n)
echo $(( $now - $timestamp ))
(Neither approach works if a filename contains a newline, although since you aren't using the filename in your example anyway, you can avoid that problem by dropping %N from the format and just sorting the timestamps numerically. For example:
read timestamp < <(stat -f %B "$2"/* | sort -n)
# or
read timestamp < <(find "$2" -maxdepth 1 -type f -exec stat -f %B '{}' + | sort -n)
)
Can you try creating a shell script that will reside in the remote host and when executed will provide the required output. Then from your local machine just use ssh or something like that to run that. In this way the script will run locally there. Just a thought :-)

Getting the Canonical Time Zone name in shell script

Is there a way of getting the Canonical Time Zone name from a Linux shell script? for example, if my configured time zone is PDT, then I would like to get "America/Los_Angeles".
I know I could get from the symbolic link /etc/localtime if it were configured, but as it might not be configured in all servers I cannot rely on that one.
On the other hand, I can get the short time zone name with the command date +%Z, but I still need the canonical name.
Is there a way to either get the canonical name of the current time zone or transform the time zone gotten with the date +%Z command, even if the symbolic link /etc/localtime is not set?
This is more complicated than it sounds. Most linux distributions do it differently so there is no 100% reliable way to get the Olson TZ name.
Below is the heuristic that I have used in the past:
First check /etc/timezone, if it exists use it.
Next check if /etc/localtime is a symlink to the timezone database
Otherwise find a file in /usr/share/zoneinfo with the same content
as the file /etc/localtime
Untested example code:
if [ -f /etc/timezone ]; then
OLSONTZ=`cat /etc/timezone`
elif [ -h /etc/localtime ]; then
OLSONTZ=`readlink /etc/localtime | sed "s/\/usr\/share\/zoneinfo\///"`
else
checksum=`md5sum /etc/localtime | cut -d' ' -f1`
OLSONTZ=`find /usr/share/zoneinfo/ -type f -exec md5sum {} \; | grep "^$checksum" | sed "s/.*\/usr\/share\/zoneinfo\///" | head -n 1`
fi
echo $OLSONTZ
Note that this quick example does not handle the case where multiple TZ names match the given file (when looking in /usr/share/zoneinfo). Disambiguating the appropriate TZ name will depend on your application.
-nick

Pruning old backups in several steps

I am looking for a way to thin out old backups. The backups are run on a daily basis, and I want to increase the interval as the backups become older.
After a couple of days I'd like to remove the daily backups, leaving only the "Sunday" backup. After a couple of weeks, only the first backup of a month that is available should be removed.
Since I am dealing with historic backups, I cannot just change the naming scheme.
I tried to use 'find' for it, but couldn't find the right options.
Anyone got anything that might help?
I know it is historical data, but you might prefer coming up with a naming scheme to assist this problem. It might be far easier to tackle this problem in two passes: first, renaming the directories based on the date, then selecting the directories to keep in the future.
You could make a quick approximation, if all the directory dates in ls -l output look good enough:
ls -l | awk '{print "mv " $8 " " $6;}' > /tmp/runme
Look at /tmp/runme, and if it looks good, you can run it with sh /tmp/runme. You might wish to prune the entries or something like that, up to you.
If all the backups are stored in directories named, e.g:
2011-01-01/
2011-01-02/
2011-01-03/
...
2011-02-01/
2011-02-02/
...
2011-03-07/
then your problem would be reduced to computing the names to keep and delete. This problem is much easier to solve than searching through all your files and trying to select which ones to keep and delete based on when they were made. (See date "+%Y-%m-%d" output for a quick way to generate this sort of name.)
Once they are named conveniently, you can keep the first backup of every month with a script like this:
for y in `seq 2008 2010`
do for m in `seq -w 1 12`
do for d in `seq -w 2 31`
do echo "rm $y-$m-$d"
done
done
done
Save its output, inspect it :) and then run the output, similar to the rename script.
Once you've got the past backups under control, then you can generate the 2010 from date --date="Last Year" "+%Y", and other improvements so it handles "one a week" for the current month and maintains itself forever going forward.
I've developed a solution for my similar needs on top of #ajreal's starting point. My backups are named like "backup-2015-06-01T01:00:01" (using date "+%Y-%m-%dT%H:%M:%S").
Two simple steps: touch the files to keep using a shell glob pattern for first-of-each-month, and use find and xargs to delete anything more than 30 days old.
cd $BACKUPS_DIR
# touch backups from the first of each month
touch *-01T*
# delete backups more than 30 days old
echo "Deleting these backups:"
find -maxdepth 1 -mtime +30
find -maxdepth 1 -mtime +30 -print0 | xargs -0 rm -r
yup, for example
find -type f -mtime 30
details -
http://www.gnu.org/software/findutils/manual/html_mono/find.html#Age-Ranges

how do I check in bash whether a file was created more than x time ago?

I want to check in linux bash whether a file was created more than x time ago.
let's say the file is called text.txt and the time is 2 hours.
if [ what? ]
then
echo "old enough"
fi
Only for modification time
if test `find "text.txt" -mmin +120`
then
echo old enough
fi
You can use -cmin for change or -amin for access time. As others pointed I don’t think you can track creation time.
I always liked using date -r /the/file +%s to find its age.
You can also do touch --date '2015-10-10 9:55' /tmp/file to get extremely fine-grained time on an arbitrary date/time.
Using the stat to figure out the last modification date of the file, date to figure out the current time and a liberal use of bashisms, one can do the test that you want based on the file's last modification time1.
if [ "$(( $(date +"%s") - $(stat -c "%Y" "$somefile") ))" -gt "7200" ]; then
echo "'$somefile' is older then 2 hours"
fi
While the code is a bit less readable then the find approach, I think its a better approach then running find to look at a file you already "found". Also, date manipulation is fun ;-)
As Phil correctly noted creation time is not recorded, but use %Z instead of %Y below to get "change time" which may be what you want.
[Update]
For mac users, use stat -f "%m" "$somefile" instead of the Linux specific syntax above
Creation time isn't stored.
What are stored are three timestamps (generally, they can be turned off on certain filesystems or by certain filesystem options):
Last access time
Last modification time
Last change time
a "Change" to the file is counted as permission changes, rename etc. While the modification is contents only.
Although ctime isn't technically the time of creation, it quite often is.
Since ctime it isn't affected by changes to the contents of the file, it's usually only updated when the file is created. And yes - I can hear you all screaming - it's also updated if you change the access permissions or ownership... but generally that's something that's done once, usually at the same time you put the file there.
Personally I always use mtime for everything, and I imagine that is what you want. But anyway... here's a rehash of Guss's "unattractive" bash, in an easy to use function.
#!/bin/bash
function age() {
local filename=$1
local changed=`stat -c %Y "$filename"`
local now=`date +%s`
local elapsed
let elapsed=now-changed
echo $elapsed
}
file="/"
echo The age of $file is $(age "$file") seconds.
The find one is good but I think you can use anotherway, especially if you need to now how many seconds is the file old
date -d "now - $( stat -c "%Y" $filename ) seconds" +%s
using GNU date
Consider the outcome of the tool 'stat':
File: `infolog.txt'
Size: 694 Blocks: 8 IO Block: 4096 regular file
Device: 801h/2049d Inode: 11635578 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ fdr) Gid: ( 1000/ fdr)
Access: 2009-01-01 22:04:15.000000000 -0800
Modify: 2009-01-01 22:05:05.000000000 -0800
Change: 2009-01-01 22:05:05.000000000 -0800
You can see here the three dates for Access/modify/change. There is no
created date. You can only really be sure when the file contents were
modified (the "modify" field) or its inode changed (the "change"
field).
Examples of when both fields get updated:
"Modify" will be updated if someone concatenated extra information to
the end of the file.
"Change" will be updated if someone changed permissions via chmod.
I use
file_age() {
local filename=$1
echo $(( $(date +%s) - $(date -r $filename +%s) ))
}
is_stale() {
local filename=$1
local max_minutes=20
[ $(file_age $filename) -gt $(( $max_minutes*60 )) ]
}
if is_stale /my/file; then
...
fi

Resources