I have this command to run a cron and create a log file out of it
cd /root/amazon-crawler/ && python batchscript.py >> `date +%Y%m%d%H%M%S`cronlog.log 2>&
Actually I am running this cron twice a day and each log file has 400mb to 700mb size.
As you can see every time a new file is created because I don't want to miss/delete older log files, though I can manually delete files a week older.
Is there any way you can specify to Zip the log file after a cron is finished.
Better still, use logrotate. It can automatically:
rename logfiles
compress them
discard old logfiles
Related
I've started to get some problems with corrupt gzip files. I'm not sure exactly when it happens and the colleague how set our storage have quit etc. so I'm not an expert in cron jobs etc. but this is how it looks today:
/var/spool/new_files/*.csv
{
daily
rotate 12
missingok
notifempty
delaycompress
compress
sharedscripts
postrotate
service capture_data restart >/dev/null 2>&1 || true
endscript
}
In principle at midnight the script restart all csv files in var/spool/new_files/, change the name to them (increment them by 1) and gzip the one which is then named "2" and moves that to our long time storage.
I don't know if the files are corrupt just after they have been gzip or if this happens during the "transfer" to the storage. If I run zcat file_name | tail I get an invalid compressed data--length error. This error happens randomly 1-3 times per month.
So the first thing I want to do is to run gzip -k and keep the original,
Check if the files are corrupt after they have been gziped
Retry once
If this also fails add an error in logs
Stop the cron job
If the gzip file is ok after creation move it to long time storage
Test if they are ok, if not:
Retry once
If this also fails add an error in logs
Stop the cron job
Throw away the original file
Does the logic that I suggest make sense? Any suggestions how to add it to the cron job?
It seems ok... for the integrity check you could see something here How to check if a Unix .tar.gz file is a valid file without uncompressing?
You can make a .sh where you add all the commands you need and after that you can add the script .sh in the crontab in:
/var/spool/cron/
if you want to run .sh script with root just add or modify /var/spool/cron/root file... in a similar way you can add cron runned by other users.
the cron would be something like:
0 0 * * * sh <path to your sh>
my_cron-file works when it's created directly in /etc/cron.d/:
sudo nano /etc/cron.d/my_cron
# Add content:
* * * * * username /path/to/python /path/to/file 2>/path/to/log
But it doesn't work when I copy/move it to the directory:
sudo cp ./my_cron /etc/cron.d/my_cron
ls -l /etc/cron.d outputs the same permissions both times: -rw-r--r--. The files are owned by root.
The only reason I could imagine at the moment is that I've to refresh/activate something after copying, which happens automatically on creation.
Tested on Ubuntu and Raspbian.
Any idea? Thanks!
Older cron daemons used to examine /etc/cron.d for updated content only when they saw that the last-modified timestamp of that directory, or of the /etc/crontab file, had changed since the last time cron scanned it. Recent cron daemons also examine the timestamps of the individual files in /etc/cron.d but maybe you're dealing with an old one here.
If you have an old cron, then if you copied a brand new file into /etc/cron.d then the directory's timestamp should change and cron should notice the new file.
However, if your cp was merely overwriting an existing file then that would not change the directory timestamp and cron would not pick up the new file content.
Editing a file in-place in /etc/cron.d would not necessarily update the directory timestamp, but some editors (certainly vi, unless you've configured it otherwise) will create temporary working files and perhaps a backup file in the directory where the file being edited lives. The creation and deletion of those other files will cause the directory timestamp to be updated, and that will cause cron to put the edited file into effect. This could explain why editing behaves differently for you than cp'ing does.
To force a timestamp to be updated you could do something like sudo touch /etc/crontab or create and immediately remove a scratch file (or a directory) in /etc/cron.d after you've cp'ed or rm'ed a file in there. Obviously touch is easier. If you want to go the create+delete route then mktemp would be a good tool to use for that, in order to avoid clobbering someone else's legitimate file.
If you were really paranoid, you'd wait at least a second between making file changes and then doing whatever you choose to do to force a timestamp update. That should avoid the situation where a cron rescan, your file updates, and your touch or scratch create+delete could all happen within the granularity of the timestamp.
If you want to see what your cron is actually doing, you can sudo strace -p <pid-of-cron>. Mostly it sleeps for a minute at a time, but you'll see it stat some files and directories (including /etc/crontab and /etc/cron.d) each time it wakes up. And of course if it decides that it needs to run a job, you'll see that activity too.
I have a server at hostname.com/files. Whenever a file has been uploaded I want to download it.
I was thinking of creating a script that constantly checked the files directory. It would check the timestamp of the files on the server and download them based on that.
Is it possible to check the files timestamp using a bash script? Are there better ways of doing this?
I could just download all the files in the server every 1 hour. Would it therefore be better to use a cron job?
If you have a regular interval at which you'd like to update your files, yes, a cron job is probably your best bet. Just write a script that does the checking and run that at an hourly interval.
As #Barmar commented above, rsync could be another option. Put something like this in the crontab and you should be set:
# min hour day month day-of-week user command
17 * * * * user rsync -av http://hostname.com/ >> rsync.log
would grab files from the server in that location and append the details to rsync.log on the 17th minute of every hour. Right now, though, I can't seem to get rsync to get files from a webserver.
Another option using wget is:
wget -Nrb -np -o wget.log http://hostname.com/
where -N re-downloads only files newer than the timestamp on the local version, -b sends
the process to the background, -r recurses into directories and -o specifies a log file. This works from an arbitrary web server. -np makes sure it doesn't go up into a parent directory, effectively spidering the entire server's content.
More details, as usual, will be in the man pages of rsync or wget.
I have a bash script which creates a mysqldump backup every hour in a certain directory.
The filenames of the backup files include the date and hour as per the following schema:
backupfile_<day>-<month>-<year>_<hour>.sql.gz
and to clarify here are some example filenames:
backupfile_30-05-2012_0800.sql.gz
backupfile_01-06-2012_0100.sql.gz
backupfile_05-06-2012_1500.sql.gz
Would someone help me with creating a script that will loop through all files in the directory and then delete files LEAVING the following:
Keep alternate hour backups older than a day
Keep twice daily backups older than a week
Keep once daily backups older than a month.
I have the following beginnings of the script:
#!/bin/bash
cd /backup_dir
for file in *
do
# do the magic to find out if this files time is up (i.e. needs to be deleted)
# delete the file
done
I have seen many fancy scripts like this for taking scheduled backups and wonder why folks don't make a use of logroate utility available on most of *nix distros available today support following options of your interest:
compress
Old versions of log files are compressed with gzip by default.
dateext
Archive old versions of log files adding a daily extension like YYYYMMDD instead
of simply adding a number.
olddir directory
Logs are moved into directory for rotation. The directory must be on the same
physical device as the log file being rotated, and is assumed to be relative to
the directory holding the log file unless an absolute path name is specified.
When this option is used all old versions of the log end up in directory. This
option may be overriden by the noolddir option.
notifempty
Do not rotate the log if it is empty (this overrides the ifempty option).
postrotate/endscript
The lines between postrotate and endscript (both of which must appear on lines by
themselves) are executed after the log file is rotated. These directives may
only appear inside of a log file definition. See prerotate as well.
You can parse your timestamps by iterating over filenames, or you can use the -cmin flag in the find command (see man 1 find for details).
I have a nightly back up script that makes a backup from one server of any files that have been modified and thensync them across to our back server.
/var/backups/backup-2011-04-02/backuped/ backuped files and folders
The format above is the nightly incremental backup, which copies all the files and folders to a date stamped folder and then another folder underneath.
Thinking of a script which would run after the back up script to merge all the files in the /var/backups/backup-2011-04-02/backuped/ into /var/www/live/documents
So in theory I need to merge a number of different folders from the backup into the live www on the backup server only with the right date
So whats the best way to go about this script?
You could run rsync on each backup directory to the destination in order of
creation:
$ for f in `ls -t /var/backups`; do rsync -aL "/var/backups/$f" /var/www/live/documents/; done
Of course you can put this line in a nightly cron job. The only thing to look out for is the line above will choke if the filenames in your backup directory have spaces in them, but it looks like they don't, so you may be ok.