Debian: Cron bash script every 5 minutes and lftp

Debian: Cron bash script every 5 minutes and lftp - linux

We have to run a script every 5 minutes for downloading data from an FTP server. We have arranged the FTP script, but now we want to download automatic every 5 minutes the data.
We can use: "0 * * * * /home/kbroeren/import.ch"
where import the ftp script is for downloading the data files.
The point is, the data files become every 5 minutes available on the FTP server. Sometimes this where will be a minute offset. It would be nice to download the files when they become a couple of seconds be available on the FTP server. Maybe a function that scans the ftp file folder if the file is allready available, and then download the file, if not... the script will retry it again in about 10 seconds.
One other point to fix is the time of the FTP script. there are 12k files in the map. We should only the newest every time we run the script. Now scanning the folder takes about 3 minutes time thats way too long. The filename of all the datafiles contains date and time, is there a possibility to make a dynamic filename to download the right file every 5 minutes ?
Lot os questions, i hope someone could help me out with this!
Thank you
Kevin Broeren
Our FTP script:
#!/bin/bash
HOST='ftp.mysite.com'
USER='****'
PASS='****'
SOURCEFOLDER='/'
TARGETFOLDER='/home/kbroeren/datafiles'
lftp -f "
open $HOST
user $USER $PASS
LCD $SOURCEFOLDER
mirror --newer-than=now-1day --use-cache $SOURCEFOLDER $TARGETFOLDER
bye
"
find /home/kbroeren/datafiles/* -mtime +7 -exec rm {} \;

Perhaps you might want to give a try to curlftpfs. Using this FUSE filesystem you can mount an FTP share into your local filesystem. If you do so, you won't have to download the files from FTP and you can iterate over the files as if they were local. You can give it a try following these steps:
# Install curlftpfs
apt-get install curlftpfs
# Make sure FUSE kernel module is loaded
modprobe fuse
# Mount the FTP Directory to your datafiles directory
curlftpfs USER:PASS#ftp.mysite.com /home/kbroeren/datafiles -o allow_other,disable_eprt
You are now able to process these files as you wish. You'll always have the most recent files in this directory. But be aware of the fact, that this is not a copy of the files. You are working directly on the FTP server. For example removing a file from /home/kbroeren/datafiles will remove it from the FTP server.
If this works foor you, you might want to write this information into /etc/fstab, to make sure the directory is mounted with each start of the mashine:
curlftpfs#USER:PASS#ftp.mysite.com /home/kbroeren/datafiles fuse auto,user,uid=USERID,allow_other,_netdev 0 0
Make sure to change USERID to match the UID of the user who needs access to this files.

Related

Downloading Specific Filenames with FTP

I have about 1,000,000 files and I should do FTP to get some specific files.
in 1,000,000 files with the name of ML0000000-ML1000000 i want specific file starts ML00002222 till ML00899999.
can anyone help me how to edir mget for ftp ?
######login to FTP server:#####
ftp -inv 172.0.0.1
user Codegirl $$$$
#######cd to ftp server#########
cd /root/desktop
######cd to local PC#############
lcd /root/myfile
*mget ML* ??? (how can i change it to specific file name?)*

If it was me I'd use a loop and wget.
cd /root/myfile
for i in $(seq -f "%08g" 2222 899999)
do
wget --username=un --password=pw ftp://172.0.0.1/root/desktop/ML${i}
done
This does require wget to reconnect each time, it's going to take time anyway so go and run the script and grab a cup of tea. I use loops like this all the time and it works well.

uploading file to google-drive using gdrive is not working on crontab

I wrote backup script for my computer. The backup scenario is like this:
Whole directories under root are bound into tar.gz twice a day(3AM, and 12AM), and this archive is going to be uploaded to google-drive using gdrive app. every 3AM.
and here is the script
#!/bin/bash
#Program: arklab backup script version 2.0
#Author: namil son
#Last modified date: 160508
#Contact: 21100352#handong.edu
#It should be executed as a super user
export LANG=en
MD=`date +%m%d`
TIME=`date +%y%m%d_%a_%H`
filename=`date +%y%m%d_%a_%H`.tar.gz
HOST=$HOSTNAME
backuproot="/local_share/backup/"
backup=`cat $backuproot/backup.conf`
gdriveID="blablabla" #This argument should be manually substituted to google-drive directory ID for each server.
#Start a new backup period at January first and June first.
if [ $MD = '0101' -o $MD = '0601' ]; then
mkdir $backuproot/`date +%y%m`
rm -rf $backuproot/`date --date '1 year ago' +%y%m`
echo $backuproot/`date +%y%m` > $backuproot/backup.conf #Save directory name for this period in backup.conf
backup=`cat $backuproot/backup.conf`
gdrive mkdir -p $gdriveID `date +%y%m` > $backup/dir
awk '{print $2}' $backup/dir > dirID
rm -f $backup/dir
fi
#make tar ball
tar -g $backup/snapshot -czpf $backup/$filename / --exclude=/tmp/* --exclude=/mnt/* --exclude=/media/* --exclude=/proc/* --exclude=/lost+found/* --exclude=/sys/* --exclude=/local_share/backup/* --exclude=/home/* \
--exclude=/share/*
#upload backup file using gdrive under the path written in dirID
if [ `date +%H` = '03' ]; then
gdrive upload -p `cat $backup/dirID` $backup/$filename
gdrive upload -p `cat $backup/dirID` $backup/`date --date '15 hour ago' +%y%m%d_%a_%H`.tar.gz
fi
Here is the problem!
When run this script on crontab, it works pretty well except for uploading tar ball to google-drive, though whole script works perfectly when run the script manually. Only the uploading process is not working when it is runned on crontab!
Can anybody help me?
Crontab entry is like this:
0 3,12 * * * sh /local_share/backup/backup2.0.sh &>> /local_share/backup/backup.sh.log

I have same case.
This is my solution
Change your command gdrive to absolute path to gdrive command
Example:
Don't set cron like this
0 1 * * * gdrive upload abc.tar.gz
Use absolute path
0 1 * * * /usr/local/bin/gdrive upload abc.tar.gz
It will work perfectly

I had the exact same issue with minor differences. I'm using gdrive on a CentOS system. Setup was fine. As root, I set up gdrive. From the command line, 'drive list' worked fine. I used the following blog post to set up gdrive:
http://linuxnewbieguide.org/?p=1078
I wrote a PHP script to do a backup of some directories. When I ran the PHP script as root from the command line, everything worked and uploaded to Google Drive just fine.
So I threw:
1 1 * * * php /root/my_backup_script.php
Into root's crontab. The script executed fine, but the upload to Google Drive wasn't working. I did some debugging, the line:
drive upload --file /root/myfile.bz2
Just wasn't working. The only command-line return was a null string. Very confusing. I'm no unix expert, but I thought when crontab runs as a user, it runs as a user (in this case root). To test, I did the following, and this is very insecure and not recommended:
I created a file with the root password at /root/.rootpassword
chmod 500 .rootpassword
Changed the crontab line to:
1 1 * * * cat /root/.rootpassword | sudo -kS php /root/my_backup_script.php
And now it works, but this is a horrible solution, as the root password is stored in a plain text file on the system. The file is readable only by root, but it is still a very bad solution.
I don't know why (again, no unix expert) I have to have root crontab run a command as sudo to make this work. I know the issue is with the gdrive token generated during gdrive setup. When crontab runs the token is not matching and the upload fails. But when you have crontab sudo as root and run the php script, it works.
I have thought of a possible solution that doesn't require storing the root password in a text file on the system. I am tired right now and haven't tried it. I have been working on this issue for about 4 days, trying various Google Drive backup solutions... all failing. It basically goes like this:
Run the gdrive setup all within the PHP/Apache interpreter. This will (perhaps) set the gdrive token to apache. For example:
Create a PHP script at /home/public_html/gdrive_setup.php. This file needs to step through the entire gdrive and token setup.
Run the script in a browser, get gdrive and the token all set up.
Test gdrive, write a PHP script something like:
$cmd = exec("drive list");
echo $cmd;
Save as gdrive_test.php and run in a browser. If it outputs your google drive files, it's working.
Write up your backup script in php. Put it in a non-indexable web directory and call it something random like 2DJAj23DAJE123.php
Now whenever you pull up 2DJAj23DAJE123.php in a web browser your backup should run.
Finally, edit crontab for root and add:
1 1 * * * wget http://my-website.com/non-indexable-directory/2DJAj23DAJE123.php >/dev/null 2>&1
In theory this should work. No passwords are stored. The only security hole is someone else might be able to run your backup if they executed 2DJAj23DAJE123.php.
Further checks could be added, like checking the system time at the start of 2DJAj23DAJE123.php and make sure it matches the crontab run time, before executing. If the times don't match, just exit the script and do nothing.
The above is all theory and not tested. I think it should work, but I am very tired from this issue.
I hope this was helpful and not overly complicated, but Google Drive IS complicated since their switch over in authentication method earlier this year. Many of the posts/blog posts you find online will just not work.

Sometimes the problem occurs because of the config path of the gdrive, means gdrive cannot find the default configuration so in order to avoid such problems we add --config flag
gdrive upload --config /home/<you>/.gdrive -p <google_drive_folder_id> /path/to/file_to_be_uploaded
Source: GDrive w/ CRON

I have had the same issue and fixed by indicating where the drive command file is.
Ex:
/usr/sbin/drive upload --file xxx..

Why rm command in Linux can delete file/dir in seconds while delete in FTP is really slow

Recently I created some dir which contains a lot of files and subdir by mistake. And then I tried to delete the dir through my FTP software (FileZilla), but it's really slow, like you can see it cost 2/3 seconds to delete each file.
So I stopped it and tried that through SSH and use rm -rf command, then the target directory was deleted just in a second.
My question is why it's so slow on FTP while fast on SSH?
Much thanks!

To delete a directory tree, you have to iterate it, retrieve lists of all files and subdirectories, and delete them one by one.
When you use the remote rm -rf command, it has a direct access to the file system, so it is relatively quick.
While the FTP client has to retrieve the file lists (what involves couple of FTP command exchanges, opening data channel, listing transfer, etc) and then it has to delete the files one by one. Each delete involves sending the FTP command, waiting for the response. So it takes long.
There's no "delete whole tree" command in FTP protocol that would be an equivalent of the rm -rf command executed on the remote shell.

tar a folder into multiple files over SSH

Here is the thing
I have a server with total 85 GB disk space and right now i have a folder with the size of 50 GB which is containing over 60000 files .
Now i want to download these files on my localhost and in order to do that i need to tar the folder but I can't tar the whole folder because of disk space limitation.
So i'm looking for a way to archive the folder into two 25 GB tar file like part1.tar and part2.tar but when the first part is done it should wait for asking something like next part name or permission or anything so I can transfer the first part to an another server and then continue archiving to part2. Or a way to tar half of the folder like first 30000 files and then tar the rest.
Any idea? Thanks in advance

One of the earliest applications of rsync was to implement mirroring or backup for multiple Unix clients to a central Unix server using rsync/ssh and standard Unix accounts.
I use rsync to move compressed (and uncompressed) files between servers.
I think the command should be something like this
rsync -av host::src /dest

rsync solution was good enough but i found the solution for main question:
tar -c -M --tape-length=30000000 --file=filename.tar foldername
After reaching 29GB you will need to change the tape(in my case transferring the first part and removing it) and hit enter for continue.Additionally it is possible for give next parts name:
Prepare volume #2 for `filename.tar' and hit return:
n filename2.tar
Because it is going to take time i suggest using screen session over SSH :
http://thelinuxnoob.com/linux/screen-in-ssh/

Keep files updated from remote server

I have a server at hostname.com/files. Whenever a file has been uploaded I want to download it.
I was thinking of creating a script that constantly checked the files directory. It would check the timestamp of the files on the server and download them based on that.
Is it possible to check the files timestamp using a bash script? Are there better ways of doing this?
I could just download all the files in the server every 1 hour. Would it therefore be better to use a cron job?

If you have a regular interval at which you'd like to update your files, yes, a cron job is probably your best bet. Just write a script that does the checking and run that at an hourly interval.
As #Barmar commented above, rsync could be another option. Put something like this in the crontab and you should be set:
# min hour day month day-of-week user command
17 * * * * user rsync -av http://hostname.com/ >> rsync.log
would grab files from the server in that location and append the details to rsync.log on the 17th minute of every hour. Right now, though, I can't seem to get rsync to get files from a webserver.
Another option using wget is:
wget -Nrb -np -o wget.log http://hostname.com/
where -N re-downloads only files newer than the timestamp on the local version, -b sends
the process to the background, -r recurses into directories and -o specifies a log file. This works from an arbitrary web server. -np makes sure it doesn't go up into a parent directory, effectively spidering the entire server's content.
More details, as usual, will be in the man pages of rsync or wget.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string