transferring files with cron? - cron

I am trying to figure out if it is possible to set up a cron job that will transfer all files from one folder on my server to another folder and then take a set number of files (files chosen randomly) from another folder and put it in the original folder. If so any hints on how to do this, I have no experience with cron at all, I just don't want to have to log in with ftp and do the transfers manually.

Cron is really simple, all it's doing is to run a command of your choice at the specified times of day.
In your case, you probably want to write a shell script that use rsync, scp or ftp to transfer the files, make sure that exits successfully (check exit code from transfer, stored in the $? variable), then move the set of files into the original folder.
I would use rsync and passwordless authentication via ssh keys. That's good for security, and if you want to, you can even limit the receiving side to only allow that ssh key to run rsync's server side.
If this script is called /opt/scripts/myscript.sh, and is to be run once every 10 minutes, add the following to your crontab (run crontab -e to edit your crontab):
*/10 * * * * /opt/scripts/myscript.sh
Remember that the environment variables you have available in your shell are not the same as those available when the cronjob runs, so PATH, etc, may be different. This often causes cron jobs to fail the first few times you run them (See my law on cron jobs: http://efod.se/blog/archive/2010/02/19/forsbergs-law-on-cron-jobs :-)). Any output from cron is sent via mail to the user running the cron job, which is helpful for debugging. Simple debugging writing messages to some file in /tmp/ is also often a way to get your cron jobs running.
In many cases it makes sense to run cron jobs as a special user. Don't run your cron jobs as root unless they absolutely must have root access, it's better to run things as special users that only has limited permissions in the file system.

To edit your cron file:
crontab -e
An example entry for transferring files would look like:
30 3 * * * rsync -av School/* username#server:~/School/ >| /home/username/CronLogs/school_update
the fields are: minute, hour, day, month, day of week, command
So in my example, I transfer files everyday at 3:30am by executing the rsync command listed. Note the *'s mark the fields as unused.
For a quick reference/tutorial, see: this link

Related

how to limit job submision depending on location or partition

I am wondering if there is a way to limit the job submission depending on the location where the submission was made in HPC.
Thing is, recently a storage for a scratch disk was added. So now I have two partitions.
home directory
scratch directory
I want all the HPC users are forced to submit their jobs only in scratch directory not in home directory.
HPC is operating LSF for job scheduler. So, can I have job submission (i.e. bsub) are controlled through LSF such that only jobs submitted under the scratch directory runs in HPC?
Thanks in advance.
I don't think there's a way to do this natively, but there is a way to customize LSF submission-time checks to reject jobs submitted from the wrong directory. Take a look at this documentation:
https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_admin/chap_sub_exec_controls_lsf_admin.html
In particular, what you want to do is define an esub script which checks for the appropriate submission CWD. Let's say you name your script esub.dircheck, it would look something like this:
#!/bin/sh
# Reject if submission is under /home
if [[ $PWD/ = /home/* ]]; then
echo "Job submission from /home forbidden"
exit $LSB_SUB_ABORT_VALUE
fi
Now you can place the esub.dircheck into $LSF_SERVERDIR (make sure it's executable by all). Finally, if you want the check to happen for every job submission, set the following parameter in lsf.conf:
LSB_ESUB_METHOD=dircheck
One final note: I'm just checking that PWD has home as a prefix in the code above, but you probably need to do something a bit more sophisticated if you want to make sure that the directory you're in is under /home because there could be symbolic links that gum up the prefix check. Take a look at this answer for details.

How can I configure SLURM at the user level (e.g. with something like a ".slurmrc")?

Is there something like .slurmrc for SLURM that would allow each user to set their own defaults for parameters that they would normally specify on the command line.
For example, I run 95% of my jobs on what I'll call our HighMem partition. Since my routine jobs can easily go over the default of 1GB, I almost always request 10GB of RAM. To make the best use of my time, I would like to put the partition and RAM requests in a configuration file so that I don't have to type them in all the time. So, instead of typing the following:
sbatch --partition=HighMem --mem=10G script.sh
I could just type this:
sbatch script.sh
I tried searching for multiple variations on "SLURM user-level configuration" and it seemed that all SLURM-related hits dealt with slurm.conf (a global-level configuration file).
I even tried creating slurm.conf and .slurmrc in my home directory, just in case that worked, but they didn't have any effect on the partition used.
update 1
Yes, I thought about scontrol, but the only configuration file it deals with is global and most parameters in it aren't even relevant for a normal user.
update 2
My supervisor pointed out the SLURM Perl API to me. The last time I looked at it, it seemed too complicated to me, but this time upon looking at the code for https://github.com/SchedMD/slurm/blob/master/contribs/perlapi/libslurm/perl/t/06-complete.t, it would seem that it wouldn't too be hard to create a script that behaves similar to sbatch that reads in a default configuration file and sets the desired parameters. However, I haven't had any success in setting the 'std_out' to a file name that gets written to.
If your example is representative, defining an alias
alias sbatch='sbatch --partition=HighMem --mem=10G'
could be the easiest way. Alternatively, a Bash function could also be used
sbatch() {
command sbatch --partition=HighMem --mem=10G "$#"
}
Put any of these in your .bash_profile for persistence.

Are "crontab -e" & "/etc/crontab" the same?

I'm wondering if it matters if I add a crontab to /etc/crontab or crontab -e?
I have an Ubuntu 17 and Debian 9 VM running, and I'm confused which one is the right place.
Thanks in advance!
They are not the same.
The crontab command is specific to a user. When you edit your
crontab (via crontab -e) you are really saving it to
/var/spool/cron/. I find this to be more geared toward interactive
setup/maintenance: it uses your $EDITOR. (Though I have seen tools
like whenever that will
automatically populate a user crontab.
The "system" cron files live in /etc/crontab and /etc/cron.d.
These are similar to your user's crontab, but the format has an
additional (sixth) field to specify which user to run as, and you'll
need root privileges to change these. The latter directory is often
used by tools to place a cron script into, by system installs, or your
own deployment routines.
You'll also find related system directories in /etc/, such as
cron.daily/, cron.hourly/, etc. These hold normal scripts that are
run on their respective cadence. E.g., /etc/cron.daily/logrotate
rotates system log files daily. They typically orchestrated by your
/etc/anacrontab to add some small random delay across systems.
There are a few places to look for documentation of the various pieces
of cron. The relevant man pages are:
crontab(1) -- the command
crontab(5) -- spec formatting
cron(8) -- the daemon
An alternative to cron with SystemD now is
timers.

Mysterious find command hogging memory on Linux Mint

I'm running linux mint 17 and I notice that every so often my computer slows to a crawl.W When I look at top I see "/usr/bin/find / -ignore_readdir_race (..." etc. sucking up most of my memory. It runs for a really long time (several hours) and my guess is that its an automated indexing process for my hard drive.
I'm working on a project that requires me to have over 6 million audio files on a mounted SSD so another guess is that the filesystem manager is trying to index all these files for quick search. Is that the case? Is there any way to turn it off for the SSD?
The locate command reports data collected for its database by a regular cron task. You can exclude directories from the database, making the task run more quickly. According to updatedb.conf(5)
PRUNEPATHS
A whitespace-separated list of path names of directories which should not be scanned by updatedb(8). Each path name must be exactly in the form in which the directory would be reported by locate(1).
By default, no paths are skipped.
On my Debian machine for instance, /etc/updatedb.conf contains this line:
PRUNEPATHS="/tmp /var/spool /media"
You could modify your /etc/updatedb.conf to add the directories which you want to ignore. Only the top-level directory of a directory tree need be listed; subdirectories are ignored when the parent is ignored.
Further reading:
Tip of the day: Speed up `locate`
How do I get mlocate to only index certain directories?
It's a daily cron job that updates databases used by the locate command. See updatedb(8) if you want to learn more. Having six million audio files will likely cause this process to eat up a lot of CPU as it's trying to index your local filesystems.
If you don't use locate, I'd recommend simply disabling updatedb, something like this:
sudo kill -9 <PID>
sudo chmod -x /etc/cron.daily/mlocate
sudo mv /var/lib/mlocate/mlocate.db /var/lib/mlocate/mlocate.db.bak
If all else fails just remove the package.

detect if something is modified in directory, and if so, backup - otherwise do nothing

I have a "Data" directory, that I rsync to a remote NAS periodically via a shell script.
However, I'd like to make this more efficient. I'd like to detect if something has changed in "Data" before running rsync. This is so that I don't wake up the drives on the NAS unnecessarily.
I was thinking of modifying the shell script to get the latest modified time of the files in Data (by using a recursive find), and write that to a file every time Data is rsynced.
Before every sync, the shell script can compare the current timestamp of "Data" with the previous timestamp when "Data" was sync'd. If the current timestamp is newer, then rsync, otherwise do nothing.
My question is, is there a more efficient way to figure out if the "Data" directory is modified since the last rsync? Note that Data has many, many, layers of sub-directories.
If I understand correctly, you just want to see if any files have been modified so you can figure out whether to proceed to the rsync portion of your script?
It's a pretty simple task to figure out when the data was last synced, especially if you do this nightly. As soon as you find one file with mtime greater than the time of the last sync, you know you have to proceed to the full rsync.
find has this functionality built in:
# find all files modified in the last 24 hours
find -mtime 1
Rsync already does this. There is no on-demand solution that doesn't require checking the mtime and ctime properties of the inodes.
However you could create a daemon that uses inotify to track changes as they occur, and fire rsync at intervals, or whenever you feel sufficient events have occurred to justify calling rsync.
I would use the find command, but do it this way: When the rsync runs, touch a file, like "rsyncranflag". Then you can run
find Data -newer rsyncranflag
That will say definitively whether any files were changed since the last rsync (subject to the accuracy of mtime).

Resources