Create historical data directories with crontab - cron

For this specific project I am collecting data from the internet on an hourly basis and placing it into a directory within a hadoop framework. I need to run specific programs at specific times and crontab is the best way to do this. The problem is I don't know how to create the new files, put them in the proper directories and name them BASED on the time they were made. Crontab takes the commands exactly as they would appear in the command line. Do I have to write a separate program to modify the crontab file so its naming everything properly?
Basically what I want it to do is something like
1 * * * * python /location/of/pyfile/stream.py >
/home/hadoop/project/d2015/"currentmonth"/subfolder/"filenametimestamp".txt
Everything in quotes is what needs to change on an hourly/monthly basis.

Related

How do I set a custom crontab file inside project?

I want to set a crontab file in my project repo so that it is tracked and easy to manage. So, I'd prefer if I could add cron jobs in a file in my project e.g. /home/user1/project/.crontab instead of /var/spool/cron/crontab or /etc/crontab. Is there any way to do this?
Operating system would be ubuntu.
There is no way to make cron use a different file. The daemon's files are stored in a location owned by the daemon itself which generally cannot be overridden.
But of course, you can always make sure the file that cron reads is identical to yours.
crontab < /home/user1/project/.crontab
will replace any cron schedule for the current user with the contents of the input file.

Why my cronjob setup by Ansible is not running?

I've setup a cron job by Ansible but when I use crontab -l it's said that my cron is empty.
Here's my script to set it up.
- name: Setup cron to run backup.sh every midnight
cron:
name="Backup S3 to GS"
minute="0"
hour="0"
job="sh ~/backup.sh"
cron_file=backup_s3
user=vagrant
But when I go inside the vagrant machine and ls /etc/cron.d/ I can see that backup_s3 file is there. But when I use command crontab -l it's said it's empty.
This is the content of backup_s3
#Ansible: Backup S3 to GS
0 0 * * * vagrant sh ~/backup.sh
I know that it's not running because I don't get any email saying that the backup is done and when I run the script manually it's working fine.
Okay. There are several layers of confusion here.
First, the crontab you see when you edit (crontab -e) or view (crontab -l) is a user cron. This sits in a magic spool directory/file. You can't edit it directly (approximately speaking), and it's not a good place to put any serious crons.
Since you are using cron_file=, Ansible is doing the appropriate thing by placing an entry in /etc/cron.d/. That means individual files can be placed there, which is much more sane than trying to edit a document. (look at all the people struggling with lineinfile here on stackoverflow)
This is why it isn't showing up in crontab -l, and it's a good thing.
As far as output from cron is concerned, does email even work for your Vagrant system? It likely doesn't. There are good ways around this. First, look at /var/log/cron. If you don't have one, look for CRON entries in /var/log/syslog. They may indicate if there are problems.
Next, crons typically don't have good access to a user shell. That means you should avoid ~. Further, if your permissions are wrong on backup.sh, it may not get executed. Finally, you can pipe output so you can see it. Here's what I'd recommend doing with your cron entry:
job="/bin/sh /home/vagrant/backup.sh >> /home/vagrant/backup.log"
You can also modify the minute/hour so it runs more frequently- so you don't have to wait overnight to see what is happening.
Once you've done that, you have plenty of places to look for information. There are two places in /var/log, there's a new backup.log which will give you information (if it exists, the cron has been run; if there is data in it, you should be able to figure out any problems).
TLDR
Change the job line. Look for execution in /var/log and /home/vagrant/backup.log.

On Ubuntu, Is there a way to automatically move files to another directory as they are SFTP'd?

I am SFTP'ing files to a directory on my ubuntu server. Ideally I would like these files in my apache public html folder as they are pictures that a user is uploading.
I've found that I can't simply SFTP the files directly to my public html folder, so am researching other methods. My picture server is ubuntu, so I thought there may some native command or setting that I could use to automatically move pictures that show up in my SFTP directory, to my public html directory.
Hopefully I am making sense, and I'm not sure where else I should be asking this question.
Three possibilities:
Why can you not simply upload the files directly in your public html folder? I assume that has something to do with access restrictions in writing to that directory, so you could try to change this directories write permissions for the user you are uploading as.
The access restrictions are changed with the command chmod, the ownership of diles and directories is changed with chown. Best you read documentation to these commands ("man chmod" and "man chown").
You can run a script periodically that takes all uploaded files and moves them to the specified target dir. For this you need to write a short shell script in bash, for example:
#!/bin/bash
mv /home/user/UPLOADS/*.jpg /var/www/images/
(This script takes simply all files with the extension .jpg from directory /home/user/UPLOADS and puts them without further check to the directory /var/wwww/images )
Place this script somewhere (eg. /home/user/bin/) and make it executable: chmod a+x /home/user/bin/SCRIPTNAME
This script can be run periodically via cron, call crontab -e and write a new line
like so:
*/5 * * * * /home/user/bin/SCRIPTNAME
that executes the script every 5 minutes.
Drawback is that it is called every 5 minutes, so there might be a gap between upload and move of max 5 minutes. additionally, if the script runs WHILE uploading of new images, something strange might happen...
The 3rd possibility is to execute a script as soon as the upload is finished by watching the upload directory with the inotify feature of the kernel. If you want to do this, best google for inotify examples, that is a little bit more complicated. Here is another SO answer to that:
Monitor Directory for Changes

when are cron.d jobs executed?

This is probably stupidly obvious beginner question, but somehow I can't find the answer.
On my debian box, I have a script in /etc/cron.d. It executes every once in a while, but I can't find the schedule or initiator. I've tried looking at all users cron tabs, as described here, but no user has a script that runs the cron.d. I've looked at /etc/crontab, which holds the scripts for cron.daily, monthly and hourly, but not cron.d.
Where is this schedule held?
From the output of man cron,
Support for /etc/cron.d is included in the cron daemon itself,
which handles this location as the system-wide crontab spool. This
directory can contain any file defining tasks following the
format used in /etc/crontab, i.e. unlike the user cron spool, these
files must provide the username to run the task as in the task
definition.
This implies that the file inside /etc/cron.d should not be a script, but rather a configuration file that looks similar to /etc/crontab. It will carry its own scheduling information, the same way that /etc/crontab does. In particular, the format should look like:
# m h dom mon dow user command

Delete expired cron jobs from crontab

My product requires a cronjob processing for every message a user sends to another users. This cronjob gets added to the crontab on the server. Everything works fine. Now once the job is done, is there a way to remove the expired cronjob entry from crontab?
Since the number of messages are huge, my crontab keeps growing so I want to clean up the old job entries. Any neat way of achieving it?
At least in most Linux distributions there is a crontab command that allows you to fetch and set the contents for the user's crontab. You can use it as such:
crontab -l > myfile
... edit the file (removing the entry)
crontab myfile
However this is clunky and error-prone. A better way is to wrap your job in a script that checks for a condition (e.g. a flag file) to decide whether to run the underlying logic and manipulate this flag file instead.

Resources