rsync only files younger than xy days - linux

I am using rsync to copy photos form our satellite servers into main server. So the script doing it is basically connecting from PC to PC and executing rsync.
I have been trying to use find to determine files younger than xy days (it will be days, but number can vary). Specifing the files with --files-from=<() BUT the command find /var/dav/davserver/ -mtime -3 -type f -exec basename {} \; is on some machines very very slow, and even makes rsync to timeout. Also they are servers, so running this command every few minutes would cost too much processor power that I don't want to take away.
The second aproach was to take advantage of the way we are storing those files, under /var/dav/davserver/year/month/day/ directory. However as I started to work on it, I have realized that I need to write quite a some code do take care of end of months and years, even more that number of days is not fixed (it can be more than 31 days, thus this scrip could need to run through several months).
So I was wondering if there is not some easier way how to achieve this without killing source PCs processor or write a whole new library to take care of all month/year ends?
EDIT:
I have prepared script that generates paths to files for me. What I did, is that I left handling end of months/year for date..
#!/bin/bash
DATE_now=`date +"%Y-%m-%d"`
DATE_end=`date -d "-$1 days" +"%Y-%m-%d"`
echo "Date now: $DATE_now | Date end: $DATE_end"
start_d=`date +%s`
end_d=`date -d "-$1 days" +%s`
synced_day=$DATE_now
synced_day_s=$start_d
daycount=1
echo "" > /tmp/$2_paths
while [ $synced_day_s -ge $end_d ]; do
DAY=$(date -d "$synced_day" '+%d')
MONTH=$(date -d "$synced_day" '+%m')
YEAR=$(date -d "$synced_day" '+%Y')
SYNC_DIR="/var/dav/davserver/$YEAR/$MONTH/$DAY/**"
echo "Adding day ($synced_day) directory: \"$SYNC_DIR\" to synced paths | Day: $daycount"
echo $SYNC_DIR >> /tmp/$2_paths
synced_day=$(date -d "$synced_day -1 days" +"%Y-%m-%d")
synced_day_s=$(date -d "$synced_day" +%s)
daycount=$((daycount+1))
done
and counting down days using it, than just extract needed info. This script gives me a list of directories to rsync:
rrr#rRr-kali:~/bash_devel$ bash date_extract.sh 8 Z00163
Date now: 2017-06-29 | Date end: 2017-06-21
Adding day (2017-06-29) directory: "/var/dav/davserver/2017/06/29/**" to synced paths | Day: 1
Adding day (2017-06-28) directory: "/var/dav/davserver/2017/06/28/**" to synced paths | Day: 2
Adding day (2017-06-27) directory: "/var/dav/davserver/2017/06/27/**" to synced paths | Day: 3
Adding day (2017-06-26) directory: "/var/dav/davserver/2017/06/26/**" to synced paths | Day: 4
Adding day (2017-06-25) directory: "/var/dav/davserver/2017/06/25/**" to synced paths | Day: 5
Adding day (2017-06-24) directory: "/var/dav/davserver/2017/06/24/**" to synced paths | Day: 6
Adding day (2017-06-23) directory: "/var/dav/davserver/2017/06/23/**" to synced paths | Day: 7
Adding day (2017-06-22) directory: "/var/dav/davserver/2017/06/22/**" to synced paths | Day: 8
rrr#rRr-kali:~/bash_devel$ cat /tmp/Z00163_paths
/var/dav/davserver/2017/06/29/**
/var/dav/davserver/2017/06/28/**
/var/dav/davserver/2017/06/27/**
/var/dav/davserver/2017/06/26/**
/var/dav/davserver/2017/06/25/**
/var/dav/davserver/2017/06/24/**
/var/dav/davserver/2017/06/23/**
/var/dav/davserver/2017/06/22/**
rrr#rRr-kali:~/bash_devel$
However, now is my problem to use this list, I have been trying to use many combination of --include and --exclude commands with both --include-files and --include-from BUT I am getting only 2 results: either everything is being rsynced, or nothing.

Since you already have files ordered by date (in directories), it's easy and efficient to just rsync those directories:
#!/bin/bash
maxage="45" # in days, from today
for ((d=0; d<=maxage; d++)); do
dir="/var/dav/davserver/$(date -d "-$d day" +"%Y/%m/%d")"
rsync -avrz server:"$dir" localdir
done
We're using date to calculate today - x days and iterate over all days from 0 to your maxage.
Edit: use arithmetic for loop instead of iterating over GNU seq range.

So, I have solved it with combination of:
Script generating paths according to actual date. Details are presented in my edit of initial post. It simply uses date to go through previous days and manage month and year ends. And generate paths from those dates. However radomir's solution is simplier, so I will use it. (Its basically same as I did, just simplier way to write it down).
than I have used combination of --include-files=/tmp/files_list and -r a.k.a. --recursive argument to use this list of paths properly.
(It was copying only empty directories without -r. And everything or nothing if I have used --include-from instead of --include-files)
Final rsync command is:
rsync --timeout=300 -Sazrv --force --delete --numeric-ids --files-from=/tmp/date_paths app_core#172.23.160.1:/var/dav/davserver/ /data/snapshots/
However this solution is not deleting old files on my side, despite --delete argument. Will probably need to make an extra script for it.

Related

How can I stop my script to overwrite existing files

I am learning bash since 6 days I think I got some of the basics.
Anyway, for the wallpapers downloaded from Variety I've written two scripts. One of them moves downloaded photos older than 12 days to a folder and renames them all as "Aday 1,2,3..." and the other lets me select these and moves them to another folder and removes photos I didn't select. 1st script works just as I intended, my question is about the other
I think I should write the script down to better explain my problem
Script:
#!/bin/bash
#Move victors of 'Seçme-Eleme' to 'Kazananlar'
cd /home/eurydice/Bulunur\ Bir\ Şeyler/Dosyamsılar/Seçme-Eleme
echo "Select victors"
read vct
for i in $vct; do
mv -i "Aday $i.png" /home/eurydice/"Bulunur Bir Şeyler"/Dosyamsılar/Kazananlar/"Bahar $RANDOM.png" ;
mv -i "Aday $i.jpg" /home/eurydice/"Bulunur Bir Şeyler"/Dosyamsılar/Kazananlar/"Bahar $RANDOM.jpg" ;
done
#Now let's remove the rest
rm /home/eurydice/Bulunur\ Bir\ Şeyler/Dosyamsılar/Seçme-Eleme/*
In this script I originally intended to define another variable (let's call this "n") and so did I with copying and changing the variable from the first script. It was something like that
for i in $vct; do
n=1
mv "Aday $i.png" /home/eurydice/"Bulunur Bir Şeyler"/Dosyamsılar/Kazananlar/"Bahar $n.png" ;
mv "Aday $i.jpg" /home/eurydice/"Bulunur Bir Şeyler"/Dosyamsılar/Kazananlar/"Bahar $n.jpg" ;
n=$((n+1))
done
When I do that for the first time the script worked just as I intended. However, in my 2nd test run this script overwrote the files that already existed. I mean, for example in 1st run i had 5 files whose names are "Bahar 1,2,3,4,5" and the 2nd time I chose 3 files to add. I wanted their names to be "Bahar 6,7,8" but instead, my script made them the new 1,2 and 3. I tried many solutions and when I couldn't fix that I just assigned random numbers to them.
Is there a way to make this script work as I intended?
This command finds the biggest file name number amongst files in current directory. If no file is found, biggest number is assigned to 0.
biggest_number=$(ls -1 | sed -n 's/^[^0-9]*\([0-9]\+\)\(\.[a-zA-Z]\+\)\?$/\1/p' | sort -r -g | head -n 1)
[[ ! -z "$biggest_number" ]] || biggest_number=0
The regex in sed command assumes that there is no digit in filenames before the trailing number intended for increment.
As soon as you have found the biggest number, you can use it to start your loop to prevent overwrites.
n=$((biggest_number+1))

.bashrc code to execute 1 time a day upon first login

Is there a way to make a specific piece of code in my .bashrc file execute only on the first log-in of a specific day of the week? I already know that using the command substitution
"$(date +%u)" will give me a number from 1-7 that corresponds to each day of the week (1 being Monday). However, i do not want this code to execute all day for every subsequent log-in. Any tips would be much appreciated. Thanks in advance!
You should not have to write anything to disk.
I would extract the day out of the commands:
lastlog -u $USER
and
date
Then do the appropriate matches/comparisons.
The logic would be something like:
get day from date
if day from date is the magic day, then
get day from from lastlog -u $USER
if day does not match today's day then
run your command
You can also use what is called 'semaphore file', something like this:
if [[ ! -e /tmp/$(date +%u).sem ]]
then
touch /tmp/$(date +%u).sem
# Do your one-time stuff
fi
However, which approach you choose, I would recommend you to use a full date (date +"%Y%m%d") to avoid potential bug if the user login on Monday, and his next login is in the next Monday.
date +%u | ## Generate timestamp (could be a better date-spec)
tee timestamp.tmp | ## Save a copy for later usage
cmp - timestamp || ## Fail if date-spec changed
{
## In that case, update timestamp
mv timestamp.tmp timestamp &&
## And only if that succeeds, run your code
echo "Once a day" ;
}
I prefer to touch the timestamp BEFORE running de command, because it is usually safer not to run anything at all than running it repeatedly. (The partition could have been remounted read-only, the disk might be full, permissions could have been changed...)

Linux, re-name image file to create sequential list of files

I'm finding it difficult to word my question in a way I can search for the answer, my problem is as follows.....
I have a webcam that takes a photo every 2mins and saves as a numbered file, the first photo is taken at 0000hrs and is named image001.jpg, at 0002hrs image002.jpg and so on. At 2359hrs all the photos are turned in to 24hr time lapse video and saved as daily_video.mov. At 0000hrs (of the next day) the old image001.jpg is over written and the whole process repeated including generation of a new daily_video.mov.
This is all working fine with the webcam doing the file naming and overwriting, and a cron job running fffmpeg once a day to make the video.
What I want to do now is make a time lapse video over say a month by copying every 30th file from the days images to a new folder and naming in a sequential order. ie.
Day 1; image030.jpg, image060.jpg, etc... are renamed to Archive001.jpg, Archive002.jpg,etc...
But on day 2; image030.jpg, image060.jpg etc... Will need to be named to Archive025.jpg, Archive026.jpg etc.. and repeat untill the end of the month copying files from the day to a sequentially increasing in name list of files to use at the end of month, where the process can be repeated.
Does that make sense?!!
You could use a bash script like the following. Just call it at 2359hrs.
Remeber to make it executable using chmod +x myScript
I did not rename to Archive00X.jpg, but by adding the current date, they will be in proper alphabetical order.
example output:
cp files/image000.jpg >> archive/image_2012-08-29_000.jpg
cp files/image030.jpg >> archive/image_2012-08-29_030.jpg
....
adapt pSource and pDest to your paths (preferrably absolute paths)
adapt offset and maxnum to your needs. If maxnum is too big it will tell you some files are missing, but otherwise work properly.
Remove the echo lines if they disturb you ;)
Code:
#!/bin/bash
pSource="files"
pDest="archive"
offset=30
maxnum=721
curdate=`date "+%F"`
function rename_stuff()
{
myvar=0
while [ $myvar -lt $maxnum ]
do
forg=`printf image%03d.jpg ${myvar}`
fnew=`printf image_%s_%03d.jpg ${curdate} ${myvar}`
forg="$pSource/$forg"
fnew="$pDest/$fnew"
if [ -f "$forg" ]; then
echo "cp $forg >> $fnew"
cp "$forg" "$fnew"
else
echo "missing file $forg"
fi
myvar=$(( $myvar + $offset ))
done
}
rename_stuff

filename last modification date shell in script

I'm using bash to build a script where I will get a filename in a variable an then with this variable get the file unix last modification date.
I need to get this modification date value and I can't use stat command.
Do you know any way to get it with the common available *nix commands?
Why you shouldn't use ls:
Parsing ls is a bad idea. Not only is the behaviour of certain characters in filenames undefined and platform dependant, for your purposes, it'll mess with dates when they're six months in the past. In short, yes, it'll probably work for you in your limited testing. It will not be platform-independent (so no portability) and the behaviour of your parsing is not guaranteed given the range of 'legal' filenames on various systems. (Ext4, for example, allows spaces and newlines in filenames).
Having said all that, personally, I'd use ls because it's fast and easy ;)
Edit
As pointed out by Hugo in the comments, the OP doesn't want to use stat. In addition, I should point out that the below section is BSD-stat specific (the %Sm flag doesn't work when I test on Ubuntu; Linux has a stat command, if you're interested in it read the man page).
So, a non-stat solution: use date
date, at least on Linux, has a flag: -r, which according to the man page:
display the last modification time of FILE
So, the scripted solution would be similar to this:
date -r ${MY_FILE_VARIABLE}
which would return you something similar to this:
zsh% date -r MyFile.foo
Thu Feb 23 07:41:27 CST 2012
To address the OP's comment:
If possible with a configurable date format
date has a rather extensive set of time-format variables; read the man page for more information.
I'm not 100% sure how portable date is across all 'UNIX-like systems'. For BSD-based (such as OS X), this will not work; the -r flag for the BSD-date does something completely different. The question doesn't' specify exactly how portable a solution is required to be. For a BSD-based solution, see the below section ;)
A better solution, BSD systems (tested on OS X, using BSD-stat; GNU stat is slightly different but could be made to work in the same way).
Use stat. You can format the output of stat with the -f flag, and you can select to display only the file modification data (which, for this question, is nice).
For example, stat -f "%m%t%Sm %N" ./*:
1340738054 Jun 26 21:14:14 2012 ./build
1340738921 Jun 26 21:28:41 2012 ./build.xml
1340738140 Jun 26 21:15:40 2012 ./lib
1340657124 Jun 25 22:45:24 2012 ./tests
Where the first bit is the UNIX epoch time, the date is the file modification time, and the rest is the filename.
Breakdown of the example command
stat -f "%m%t%Sm %N" ./*
stat -f: call stat, and specify the format (-f).
%m: The UNIX epoch time.
%t: A tab seperator in the output.
%Sm: S says to display the output as a string, m says to use the file modification data.
%N: Display the name of the file in question.
A command in your script along the lines of the following:
stat -f "%Sm" ${FILE_VARIABLE}
will give you output such as:
Jun 26 21:28:41 2012
Read the man page for stat for further information; timestamp formatting is done by strftime.
have perl?
perl -MFile::stat -e "print scalar localtime stat('FileName.txt')->mtime"
How about:
find $PATH -maxdepth 1 -name $FILE -printf %Tc
See the find manpage for other values you can use with %T.
You can use the "date" command adding the desired format option the format:
date +%Y-%m-%d -r /root/foo.txt
2013-05-27
date +%H:%M -r /root/foo.txt
23:02
You can use ls -l which lists the last modification time, and then use cut to cut out the modification date:
mod_date=$(ls -l $file_name | cut -c35-46)
This works on my system because the date appears between columns 35 to 46. You might have to play with it on your system.
The date is in two different formats:
Mmm dd hh:mm
Mmm dd yyyy
Files modified more than a year ago will have the later format. Files modified less than a year ago will have to first format. You could search for a ":" and know which format the file is in:
if echo "$mod_date" | grep -q ":"
then
echo "File was modified within the year"
else
echo "File was modified more than a year ago"
fi

perl while loop

In this code I parse a file (containing the output from ls -lrt) for a log file's modification date. Then I move all log files into a new folder with their modification dates added to the filenames, and than making a tar of all those files.
The problem I am getting is in the while loop. Because it's reading the data for all the files the while loop keeps on running 15 times. I understand that there is some issue in the code but I can't figure it out.
Inside the while loop I am splitting the ls -lrt records to find the log file modified date. $file is the output of the ls command that I am storing in the text file /scripts/yagya.txt in order to get the modification date. But the while loop is executing 15 times since there are 15 log files in the folder which match the pattern.
#!/usr/bin/perl
use File::Find;
use strict;
my #field;
my $filenew;
my $date;
my $file = `ls -lrt /scripts/*log*`;
my $directory="/scripts/*.log";
my $current = localtime;
my $current_time = $current;
$current_time = s/\s+//g;
my $freetime = $current_time;
my $daytime = substr($current_time,0,8);
my $seconddir = "/$freetime/";
system ("mkdir $seconddir");
open (MYFILE,">/scripts/yagya.txt");
print MYFILE "$file";
close (MYFILE);
my $data = "/scripts/yagya.txt";
my $datas = "/scripts/";
my %options = (
wanted => \&wanted,
untaint => 1
);
find (\%options, $datas);
sub wanted {
if (/[._]log\d*$/){
my $files;
my #fields;
my $fields;
chomp;
$files=$_;
open (MYFILE,$data);
while(<MYFILE>){
chop;
s/#.*//;
next unless /\S/;
#fields = (split)[5,6,7];
$fields = join('',#fields), "\n";
}
close (MYFILE);
system ("mv $files $seconddir$fields$files");
}
}
system ("tar cvf /$daytime/$daytime.tar.gz /$daytime/*log*");
system ("rm $seconddir*log*");
system ("rm $data");
Your code is very difficult to read. It looks like you have written the program as a single big chunk before you started to test it. That way of working is common but very wrong. You should start by implementing a small part of the program and testing that before you add a little more functionality, test again, and so on. That way you won't be overwhelmed with fixing many problems at once in a large untested program.
It would also help you a lot if you added use warnings to your use strict at the top of the program. It helps to catch simple errors that you may overlook.
Also, are you aware that File::Find will call your wanted callback subroutine every time it encounters a file? It doesn't pass all the files at once.
The problem seems to be that you are reading all the way through the yagya.txt file when you should be stopping when you find the record that matches the current file that File::Find has found. What you need to do is to check whether the current record in the ls output ends with the name of the current file. If you write the loop like this
while (<MYFILE>) {
if (/\Q$files\E$/) {
my #fields = (split)[5,6,7];
$fields = join('',#fields);
last;
}
}
then $fields will end up with the modification date of the current file, which is what you want.
But this would be a thousand times easier if you used Perl to read the file modification date for you.
Instead of writing an ls listing to a file and reading it back, you should do something like this
use File::stat;
my $mtime = localtime(stat($files)->mtime);
which will give you a string like Wed Jun 13 11:25:23 2012. The date from my ls output includes only the month name, day of month, and time of day, like Jun 8 12:37. That isn't very specific and you perhaps should at least include a year, but to generate the same string from this $mtime you can write
my $fields = join '', (split ' ', $mtime)[1,2,3];
There is a lot more I could say about your program, but I hope this gets it going for you for now.
Another couple of things I have noticed:
The line $current_time = s/\s+//g should be $current_time =~ s/\s+//g to remove all spaces from the current time string
A value like Sun Jun 3 11:50:54 2012 will be reduced to SunJun311:53:552012, and $daytime will then take the value SunJun31 which is incorrect
I'm usually not recommending using bash instead of perl, but sometimes it is much shorter
this problem has 2 parts:
rename files into another directory and adding timestamp into the filenames
archive them by every minutes or hours, days ... etc..
for 1.)
find ./scripts -name \*[_.]log\* -type f -printf "%p\0./logs/%TY%Tm%Td-%TH%Tk%TM-%f\0" | xargs -0 -L 2 mv
The above will find all plain files with [_.]log in their names and rename them into the ./logs directory with timestamp prefix. e.g.
./scripts/aaa.log12 get renamed into ./logs/20120403-102233-aaa.log12
2.) archiving
ls logs | sed 's/\(........-....\).*/\1/' | sort -u | while read groupby
do
( cd logs && echo tar cvzf ../$groupby.tgz $groupby* )
done
this will create tar archives by timestamp-prefix. (Assumed than the ./logs contain only files with valid (timestamped) filenames)
Of course, the above sed pattern is not nice, but clearly shows deleting seconds from the timestamp - so it is creating archives by minutes. If want another grouping, you can use:
sed 's/\(........-..\).*/\1/' - by hours
sed 's/\(........\).*/\1/' - by days
Other:
the -printf for find is supported only in gnu version of find - common in Linux
usually not a good practice working directly in '/', like /scripts, therefore my example uses ./
if in your ./scrips subtree exists the same filename with the same timestamp, the mv will overwrite the first, e.g. both of ./scripts/a/a.log and ./scripts/x/a.log with the same timestamp will be renamed into ./logs/TIMESTAMP-a.log

Resources