Keep files updated from remote server - linux

I have a server at hostname.com/files. Whenever a file has been uploaded I want to download it.
I was thinking of creating a script that constantly checked the files directory. It would check the timestamp of the files on the server and download them based on that.
Is it possible to check the files timestamp using a bash script? Are there better ways of doing this?
I could just download all the files in the server every 1 hour. Would it therefore be better to use a cron job?

If you have a regular interval at which you'd like to update your files, yes, a cron job is probably your best bet. Just write a script that does the checking and run that at an hourly interval.
As #Barmar commented above, rsync could be another option. Put something like this in the crontab and you should be set:
# min hour day month day-of-week user command
17 * * * * user rsync -av http://hostname.com/ >> rsync.log
would grab files from the server in that location and append the details to rsync.log on the 17th minute of every hour. Right now, though, I can't seem to get rsync to get files from a webserver.
Another option using wget is:
wget -Nrb -np -o wget.log http://hostname.com/
where -N re-downloads only files newer than the timestamp on the local version, -b sends
the process to the background, -r recurses into directories and -o specifies a log file. This works from an arbitrary web server. -np makes sure it doesn't go up into a parent directory, effectively spidering the entire server's content.
More details, as usual, will be in the man pages of rsync or wget.

Related

mysqldump problem in Crontab and bash file

I have created a cron tab to backup my DB each 30 minutes...
*/30 * * * * bash /opt/mysqlbackup.sh > /dev/null 2>&1
The cron tab works well.. Each 30 minutes I have my backup with the script bellow.
#!/bin/sh
find /opt/mysqlbackup -type f -mtime +2 -exec rm {} +
mysqldump --single-transaction --skip-lock-tables --user=myuser --
password=mypass mydb | gzip -9 > /opt/mysqlbackup/$(date +%Y-%m-%d-%H.%M)_mydb.sql.gz
But my problem is that the rm function to delete old data isn't working.. this is never deleted.. Do you know why ?
and also... the name of my backup is 2020-02-02-12.12_mydb.sql.gz?
I always have a ? at the end of my file name.. Do you know why ?
Thank you for your help
The question mark typically indicates a character that can't be displayed; the fact that it's at the end of a line makes me think that your script has Windows line endings rather than Unix. You can fix that with the dos2unix command:
dos2unix /path/to/script.sh
It's also good practice not to throw around MySQL passwords on the CLI or store them in executable scripts. You can accomplish this by using MySQL Option files, specifically the file that defines user-level options (~/.my.cnf).
This would require us to figure out which user is executing that cronjob, however. My assumption is that you did not make that definition inside the system-level crontab; if you had, you'd actually be trying to execute /opt/mysqlbackup.sh > /dev/null 2>&1 as the user bash. This user most likely doesn't (and shouldn't) exist, so cron would fail to execute the script entirely.
As this is not the case (you say it's executing the mysqldump just fine), this makes me believe you have the definition in a user-level crontab instead. Once we figure out which user that actually is as I asked for in my comment, we can identify the file permissions issue as well as create the aforementioned MySQL options file.
Using find with mtime is not the best choice. If for some reason mysqldump stops creating backups, then in two days all backups will be deleted.
You can use my Python script "rotate-archives" for smart delete backups. (https://gitlab.com/k11a/rotate-archives). The script adds the current date at the beginning of the file or directory name. Like 2020-12-31_filename.ext. Subsequently uses this date to decide on deletion.
Running a script on your question:
rotate-archives.py test_mode=off age_from-period-amount_for_last_timeslot=0-0-48 archives_dir=/mnt/archives
In this case, 48 new archives will always be saved. Old archives in excess of this number will be deleted.
An example of more flexible archives deletion:
rotate-archives.py test_mode=off age_from-period-amount_for_last_timeslot=7-5,31-14,365-180-5 archives_dir=/mnt/archives
As a result, there will remain archives from 7 to 30 days old with a time interval between archives of 5 days, from 31 to 364 days old with time interval between archives 14 days, from 365 days old with time interval between archives 180 days and the number of 5.

wget to download new wildcard files and overwrite old ones

I'm currently using wget to download specific files from a remote server. The files are updated every week, but always have the same file names. e.g new upload file1.jpg will replace local file1.jpg
This is how I am grabbing them, nothing fancy :
wget -N -P /path/to/local/folder/ http://xx.xxx.xxx.xxx/remote/files/file1.jpg
This downloads file1.jpg from the remote server if it is newer than the local version then overwrites the local one with the new one.
Trouble is, I'm doing this for over 100 files every week and have set up cron jobs to fire the 100 different download scripts at specific times.
Is there a way I can use a wildcard for the file name and have just one script that fires every 5 minutes for example?
Something like....
wget -N -P /path/to/local/folder/ http://xx.xxx.xxx.xxx/remote/files/*.jpg
Will that work? Will it check the local folder for all current file names, see what is new and then download and overwrite only the new ones? Also, is there any danger of it downloading partially uploaded files on the remote server?
I know that some kind of file sync script between servers would be a better option but they all look pretty complicated to set up.
Many thanks!
You can specify the files to be downloaded one by one in a text file, and then pass that file name using option -i or --input-file.
e.g. contents of list.txt:
http://xx.xxx.xxx.xxx/remote/files/file1.jpg
http://xx.xxx.xxx.xxx/remote/files/file2.jpg
http://xx.xxx.xxx.xxx/remote/files/file3.jpg
....
then
wget .... --input-file list.txt
Alternatively, If all your *.jpg files are linked from a particular HTML page, you can use recursive downloading, i.e. let wget follow links on your page to all linked resources. You might need to limit the "recursion level" and file types in order to prevent downloading too much. See wget --help for more info.
wget .... --recursive --level=1 --accept=jpg --no-parent http://.../your-index-page.html

Debian: Cron bash script every 5 minutes and lftp

We have to run a script every 5 minutes for downloading data from an FTP server. We have arranged the FTP script, but now we want to download automatic every 5 minutes the data.
We can use: "0 * * * * /home/kbroeren/import.ch"
where import the ftp script is for downloading the data files.
The point is, the data files become every 5 minutes available on the FTP server. Sometimes this where will be a minute offset. It would be nice to download the files when they become a couple of seconds be available on the FTP server. Maybe a function that scans the ftp file folder if the file is allready available, and then download the file, if not... the script will retry it again in about 10 seconds.
One other point to fix is the time of the FTP script. there are 12k files in the map. We should only the newest every time we run the script. Now scanning the folder takes about 3 minutes time thats way too long. The filename of all the datafiles contains date and time, is there a possibility to make a dynamic filename to download the right file every 5 minutes ?
Lot os questions, i hope someone could help me out with this!
Thank you
Kevin Broeren
Our FTP script:
#!/bin/bash
HOST='ftp.mysite.com'
USER='****'
PASS='****'
SOURCEFOLDER='/'
TARGETFOLDER='/home/kbroeren/datafiles'
lftp -f "
open $HOST
user $USER $PASS
LCD $SOURCEFOLDER
mirror --newer-than=now-1day --use-cache $SOURCEFOLDER $TARGETFOLDER
bye
"
find /home/kbroeren/datafiles/* -mtime +7 -exec rm {} \;
Perhaps you might want to give a try to curlftpfs. Using this FUSE filesystem you can mount an FTP share into your local filesystem. If you do so, you won't have to download the files from FTP and you can iterate over the files as if they were local. You can give it a try following these steps:
# Install curlftpfs
apt-get install curlftpfs
# Make sure FUSE kernel module is loaded
modprobe fuse
# Mount the FTP Directory to your datafiles directory
curlftpfs USER:PASS#ftp.mysite.com /home/kbroeren/datafiles -o allow_other,disable_eprt
You are now able to process these files as you wish. You'll always have the most recent files in this directory. But be aware of the fact, that this is not a copy of the files. You are working directly on the FTP server. For example removing a file from /home/kbroeren/datafiles will remove it from the FTP server.
If this works foor you, you might want to write this information into /etc/fstab, to make sure the directory is mounted with each start of the mashine:
curlftpfs#USER:PASS#ftp.mysite.com /home/kbroeren/datafiles fuse auto,user,uid=USERID,allow_other,_netdev 0 0
Make sure to change USERID to match the UID of the user who needs access to this files.

wget newest file in another server's folder

I have an automatic backup of a file running on a cronjob. It outputs into a folder, let's call /backup, and appends a timestamp to each file, every hour, like so:
file_08_07_2013_01_00_00.txt, file_08_07_2013_02_00_00.txt, etc.
I want to download these to another server, to keep as a separate backup. I normally just use wget and download a specific file, but was wondering how I could automate this, ideally every hour it would download the most recent file.
What would I need to look into to set this up?
Thanks!
wget can handle that, just enable time-stamping. I'm not even going to attempt my own explanation, here's a direct quote from the manual:
The usage of time-stamping is simple. Say you would like to download a
file so that it keeps its date of modification.
wget -S http://www.gnu.ai.mit.edu/
A simple ls -l shows that the time stamp on the local file equals the state of the Last-Modified
header, as returned by the server. As you can see, the time-stamping
info is preserved locally, even without ā€˜-Nā€™ (at least for http).
Several days later, you would like Wget to check if the remote file
has changed, and download it if it has.
wget -N http://www.gnu.ai.mit.edu/
Wget will ask the server for the last-modified date. If the local file has the same timestamp as
the server, or a newer one, the remote file will not be re-fetched.
However, if the remote file is more recent, Wget will proceed to fetch
it.

Cron job to SFTP files in a directory

Scenario: I need to create a cron job that scans through a directory and sftps each file to another machine
bash script : /home/user/sendFiles.sh
cron interval : 1 minute
directory: /home/user/myfiles
sftp destination: 10.10.10.123
Create the cron job
crontab -u user 1 * * * /home/user/sendFiles.sh
Create the Script
#!/bin/bash
/usr/bin/scp -r user#10.10.10.123:/home/user/myfiles .
#REMOVE FILES AFTER ALL HAVE BEEN SENT
rm -rf *
Problem: Not exactly sure if that cron tab is correct or how to sftp an entire directory with the cron tab
If is going to be executed on a cronjob, I'm assuming its in order to sync the dir.
In that case, I would use rdiff-backup to make an incremental backup. That way, only the things that change will transferred.
This system will use ssh for transferring the data, but using rdiff-backup instead of a plain scp. The major benefit of doing it this way is speed; is faster to transfer only the parts that have changed.
This is the command to do a copy over ssh using rdiff-backup:
rdiff-backup /some/local-dir hostname.net::/whatever/remote-dir
Add that to a cronjob, making sure the user that executes the rdiff backup has the ssh keys, so it does not require a password.
(About ssh keys: Read about ssh keys here: http://www.linuxproblem.org/art_9.html Once is set, try to do a regular ssh to see if you can log without password.)
Something like this:
* * * * * rdiff-backup /some/local-dir hostname.net::/whatever/remote-dir
will do the copy every minute. (your example, 1 * * * * will execute every first minute of each hour; that is, once every hour, instead of once every minute)
Keep in mind that can cause problems if the transfer is huge it hasn't finished to transfer. But I guess that you want is to do transfers of not so huge files. Or that your network speed is large. Otherwise, change it to do the transfer every 5 minutes by using */5 * * * * instead.
And finally, read more about rdiff-backup here : http://www.nongnu.org/rdiff-backup/examples.html
rdiff-backup is a good option, there is also rsync
rsync -az user#10.10.10.123:/home/user/myfiles .
I notice you are also deleting files, is this simply because you don't want to recopy them? rsync will only copy updated files.
You might also be interested in unison which does a two way sync
These are both good answers; if you stick with scp you may want to make a slight change to your script:
#!/bin/bash
/usr/bin/scp -r user#10.10.10.123:/home/user/myfiles .
#REMOVE FILES AFTER ALL HAVE BEEN SENT
cd /home/user/myfiles # make sure you're in the right directory before rm
rm -rf *

Resources