I am currently using cronjob to run a crawler every night, which can only run at night. But I found sometimes the data would be huge and one night would not be enough for crawling everything. So I have to kill that process at morning like 6:00 am. How can I kill just kill the crawler process using cronjob?
Depends what you use for crawling, but with StormCrawler which runs continuously, you can have one cron job to start the crawl by calling the 'storm jar ...' command and another one to kill it with 'storm kill ...'. With Apache Nutch, you can achieve the same thing by listing the hadoop jobs currently running and kill it. It would however be cleaner to let the current iteration finish and parse and index the segment before terminating the crawl. Again, it depends on the crawler you are using.
Related
I have an uploader service which needs to run every 5minutes and it definitely finished within 5 minutes so there are never two parallel session.
Wondering what would be a good strategy to run this, either to schedule this as a cron job on host or start a go program with infinite loop which execute the program and sleeps(Golang: Implementing a cron / executing tasks at a specific time)
If your task is...
On Unix
Stand alone
Periodic
Has an acceptable startup time
cron will be better than rolling your own scheduler just for the one service. It will guarantee the process will always run at the correct time and has rudimentary error reporting. There's no need to add a watchdog in case your infinite loop has an error, cron will run the process again in 5 minutes.
If cron is insufficient, look into other job schedulers before rolling your own.
I have an uploader service which needs to run every 5minutes and it definitely finished within 5 minutes so there are never two parallel session.
These are famous last words. I would suggest adding in some form of locking. For example, write your PID to a file in /var/run and check if that process is running. There's even a little pidfile library for Go.
Take a look on Systemd, you can execute a script with timers and set max execution time for the script.
https://wiki.archlinux.org/index.php/Systemd/Timers
I can't seem to get my script to run in parallel every minute via cron on Ubuntu 14.
I have created a cron job which executes every minute. The cron job executes a script that runs much longer than a minute. When a minute expires it seems the new cron execution overwrites the previous execution. Is this correct? Any ideas welcomed.
I need concurrent independent running jobs. The cron job runs a script which queries a mysql database. The idea is to poll a db- if yes execute script in its own process.
cron will not stop a previous execution of a process to start a new one. cron will simply kick off the new process even though the old process is still running.
If you need cron to terminate the previous process, you'll need to modify your script to handle that itself.
You need a locking mechanism to identify that the script is already running.
There are several ways of doing this but you need to be careful to use an atomic method.
I use lock directories as creating a directory is guaranteed to be atomic -
LOCKDIR=/tmp/myproc.lock
if ! mkdir $LOCKDIR >/dev/null 2>&1
then
print -u2 "Processing already running - terminating"
exit 1
fi
trap "rm -rf $LOCKDIR" EXIT
This is a common occurrence. Try adding a check in your script to see if a lockfile already exists. If it does, exit. If not, continue.
Cronjobs are not overrun. They do however have the possibility of overlapping. Unless your script explicitly kills any pre-existing process, it shouldn't be able to stop the previously running script.
However, introducing the concept of lockfiles will save you from all these confusions altogether.
I have a cron job I need to run every 7 days to aggregate up a bunch of data using a php script. The process is pretty CPU intensive and can take a decent amount of time. Despite setting it to run at 4 am (when we get the least amount of traffic) users are starting to notice some down time when the script runs. Is there a way to run this in the background only when the CPU is not being used or has an open thread?
Thanks!
In the cron job line, you can wrap the php command line with either the 'nice', 'chrt' or 'loadwatch' programs.
Suppose if i have cron tasks running every minute. And if each time, that task takes more than one minute to run, what will happen. Will the next cron wait for the first cron or will it run without any checks.
I want to run a cron task every minute and I don't over lapping cron tasks like that in case of a long running task/situation.
please help.
It depends on what you run. If it's your own script, you can implement a locking/lock checking mechanism to avoid running duplicates.
But that's not cron's job.
Yes, cron will go ahead and start your 1+ minute-running process every minute until something crashes.
You'll want to put a lock of some sort into your job if you can to basically do this at start-up:
if not get_lock()
print "Another process is running"
exit
This, of course, assumes that you own the code running. If you're running a command that you didn't code, then I'd recommend building a shell wrapper that implements the above pseudocoded logic where get_lock() will see if another process like this one is running.
As others have mentioned, CRON will run your script every minute regardless of whether another instance of your script is still running.
If you want to avoid this and don't fancy implementing your own locking mechanism then you could try using a CRON alternative called The Fat Controller which is a daemon that will continually re-run scripts. You can optionally specify an interval between runs and also optionally specify a maximum execution time so if a script goes AWOL then it can be killed.
There's some use cases and more information on the website:
http://fat-controller.sourceforge.net/
Need some advice, I'm after a decent process/task manager for Ubuntu.
Basically I have a few scripts/programs which I want to run as long running processes, but I want to shut them down at various periods (say over the weekend or every day for a few hours). During the time that the process needs to be up and crashes, I would like it so that the task scheduler will automatically restart the process.
SO for example, I want to run program X between 9:00-17:00 every day. If the process is still running it should be killed at 17:00. If the process crashes between 9AM and 5PM then the process should be automatically restarted.
Are there any easy to use tools which can do this? I would like to avoid having to manage PID files and having cron jobs which do the start and stop...
Any thing anyone recommends? Any advice appreciated!
Cheers.
I do not know if a tool exists for this, but except if you have many interactive tasks, it really does not a that big issue to manage for a few jobs :
1) You can start your cronjobs whenever you like thanks to the crontab,
2) You can insert a "commit suicide" within these scripts under a time condition for example.
# your script doing things
# Then it commit suicide
if [ your_condition ];then
kill $$
fi
Please note that if you want to allow users login only at certain periods of time, then it's a different question.