Empty log files daily using cron task

Empty log files daily using cron task - linux

I want to empty (not delete) log files daily at a particular time. something like
echo "" > /home/user/dir/log/*.log
but it returns
-bash: /home/user/dir/log/*.log: ambiguous redirect
is there any way to achieve this?

You can't redirect to more than one file, but you can tee to multiple files.
tee /home/user/dir/log/*.log </dev/null
The redirect from /dev/null also avoids writing an empty line to the beginning of each file, which was another bug in your attempt. (Perhaps specify nullglob to avoid creating a file with the name *.log if the wildcard doesn't match any existing files, though.)
However, a much better solution is probably to use the utility logrotate which is installed out of the box on every Debian (and thus also Ubuntu, Mint, etc) installation. It runs nightly by default, and can be configured by dropping a file in its configuration directory. It lets you compress the previous version of a log file instead of just overwrite, and takes care to preserve ownership and permissions etc.

Related

How to move a file to cron.d in Linux?

my_cron-file works when it's created directly in /etc/cron.d/:
sudo nano /etc/cron.d/my_cron
# Add content:
* * * * * username /path/to/python /path/to/file 2>/path/to/log
But it doesn't work when I copy/move it to the directory:
sudo cp ./my_cron /etc/cron.d/my_cron
ls -l /etc/cron.d outputs the same permissions both times: -rw-r--r--. The files are owned by root.
The only reason I could imagine at the moment is that I've to refresh/activate something after copying, which happens automatically on creation.
Tested on Ubuntu and Raspbian.
Any idea? Thanks!

Older cron daemons used to examine /etc/cron.d for updated content only when they saw that the last-modified timestamp of that directory, or of the /etc/crontab file, had changed since the last time cron scanned it. Recent cron daemons also examine the timestamps of the individual files in /etc/cron.d but maybe you're dealing with an old one here.
If you have an old cron, then if you copied a brand new file into /etc/cron.d then the directory's timestamp should change and cron should notice the new file.
However, if your cp was merely overwriting an existing file then that would not change the directory timestamp and cron would not pick up the new file content.
Editing a file in-place in /etc/cron.d would not necessarily update the directory timestamp, but some editors (certainly vi, unless you've configured it otherwise) will create temporary working files and perhaps a backup file in the directory where the file being edited lives. The creation and deletion of those other files will cause the directory timestamp to be updated, and that will cause cron to put the edited file into effect. This could explain why editing behaves differently for you than cp'ing does.
To force a timestamp to be updated you could do something like sudo touch /etc/crontab or create and immediately remove a scratch file (or a directory) in /etc/cron.d after you've cp'ed or rm'ed a file in there. Obviously touch is easier. If you want to go the create+delete route then mktemp would be a good tool to use for that, in order to avoid clobbering someone else's legitimate file.
If you were really paranoid, you'd wait at least a second between making file changes and then doing whatever you choose to do to force a timestamp update. That should avoid the situation where a cron rescan, your file updates, and your touch or scratch create+delete could all happen within the granularity of the timestamp.
If you want to see what your cron is actually doing, you can sudo strace -p <pid-of-cron>. Mostly it sleeps for a minute at a time, but you'll see it stat some files and directories (including /etc/crontab and /etc/cron.d) each time it wakes up. And of course if it decides that it needs to run a job, you'll see that activity too.

wget to download new wildcard files and overwrite old ones

I'm currently using wget to download specific files from a remote server. The files are updated every week, but always have the same file names. e.g new upload file1.jpg will replace local file1.jpg
This is how I am grabbing them, nothing fancy :
wget -N -P /path/to/local/folder/ http://xx.xxx.xxx.xxx/remote/files/file1.jpg
This downloads file1.jpg from the remote server if it is newer than the local version then overwrites the local one with the new one.
Trouble is, I'm doing this for over 100 files every week and have set up cron jobs to fire the 100 different download scripts at specific times.
Is there a way I can use a wildcard for the file name and have just one script that fires every 5 minutes for example?
Something like....
wget -N -P /path/to/local/folder/ http://xx.xxx.xxx.xxx/remote/files/*.jpg
Will that work? Will it check the local folder for all current file names, see what is new and then download and overwrite only the new ones? Also, is there any danger of it downloading partially uploaded files on the remote server?
I know that some kind of file sync script between servers would be a better option but they all look pretty complicated to set up.
Many thanks!

You can specify the files to be downloaded one by one in a text file, and then pass that file name using option -i or --input-file.
e.g. contents of list.txt:
http://xx.xxx.xxx.xxx/remote/files/file1.jpg
http://xx.xxx.xxx.xxx/remote/files/file2.jpg
http://xx.xxx.xxx.xxx/remote/files/file3.jpg
....
then
wget .... --input-file list.txt
Alternatively, If all your *.jpg files are linked from a particular HTML page, you can use recursive downloading, i.e. let wget follow links on your page to all linked resources. You might need to limit the "recursion level" and file types in order to prevent downloading too much. See wget --help for more info.
wget .... --recursive --level=1 --accept=jpg --no-parent http://.../your-index-page.html

Logrotate every 30th second and store logfiles in date-named directories

I've got a CentOS installation with a busy webserver.
I need to aquire stats from the log files and keep the old ones, ordered by date.
Every 30th second, the current logfile should be closed and processed (analysing entries and storing them into a database). Since this generates a lot of logfiles, I want to group them into directories, named by date.
At the moment, I have two files; rotation.conf and rotatenow.sh. The shell-file creates directories, based on YmdHMS.
After that, I run the command "logrotate ./rotation.conf -v --force" in order to invoke the proces, but how do I make the config-file put the log into the newly generated directory? Can the whole thing be done inside the config-file?
now="$(date)"
now="$(date +'%Y-%m-%d-%H:%M:%S')"
foldernavn="/var/www/html/stats/logs/nmdstats/closed/$now"
mkdir $foldernavn
logrotate ./nmdhosting.conf -v --force
At the moment, the config-file looks like this:
/var/www/html/stats/logs/nmdhosting/access_log {
ifempty
missingok
(I am stuck)
(do some post-processing - run a Perl-script)
}
Any ideas would be deeply appreciated.
Update: I tried a different approach, adding this to the httpd.conf:
TransferLog "|usr/sbin/rotatelogs /var/www/html/stats/logs/nmdstats/closed/activity_log.%Y%m%d%H%M%S 30".
It works, but apparently, it can't run a pre/post processing script when using this method. This is essential in order to update the database. I could perhaps run a shell/Perl-script using a cronjob, but I don't trust that method. The search goes on...
update 2:
I've also tested cronolog but the - for my project - required functionalities haven't been implemented yet, but are on the to-do. Since the latest version is from 2002, I'm not going to wait around for it to happen :)
However, I was unaware of the inotify-tools, so I managed to set up a listener:
srcdir="/var/www/html/stats/logs/nmdstats/history/"
inotifywait -m -e create $srcdir |
while read filename eventlist eventfile
do
echo "This logfile has just been closed: $eventfile"
done
I think, I can handle it from here. Thank you, John

No need for cron: if you use the TranserLog httpd.conf option to create a new log file every 30 seconds, you can run a post-processing daemon which watches the output directory with inotifywait (or Python's pyinotify, etc.). See here: inotify and bash - this will let you get notified by the OS very soon after a new file is created etc.

"Spoof" File Extension In Bash

Is there a way to "spoof" the file extension of a file in bash for consumption by another program? I can think of doing some shell scripting and making lots of soft-links, but that isn't very scalable.
Let's imagine I have a program I'm trying to use that requires input files to be of a specific file extension, and it has no method of turning off this check.

You could make a fifo with the requisite extension and cat any other file type into it. So, if your crazy program needs to see files that end in .funky, you can do this:
mkfifo file.funky
cat someotherfile > file.funky &
someprogram file.funky

Create a symbolic link for each file you want to have a particular extension, then pass the name of the symlink to the command.
For example suppose you have files with names of the form *.foo and you need to refer to them with extensions of .bar:
for file in *.foo ; do
ln -s $file _$$_$file.bar
done
I precede each symlink name with _$$_ to avoid the possibility of colliding with an existing file name (you don't want to do ln -s file.foo file.bar if file.bar already exists).
With a little more programming, your script can keep track of which symlinks it created and, if you like, clean them up after executing the command.
This assumes, as you stated in the question, that the command can't be forced to accept a different extension.
You could, without too much difficulty, create a wrapper script that replaces the command in question, creating the symlinks, invoking the command, and cleaning up after itself automatically.

rsync : copy files if local file doesn't exist. Don't check filesize, time, checksum etc

I am using rsync to backup a million images from my linux server to my computer (windows 7 using Cygwin).
The command I am using now is :
rsync -rt --quiet --rsh='ssh -p2200' root#X.X.X.X:/home/XXX/public_html/XXX /cygdrive/images
Whenever the process is interrupted, and I start it again, it takes long time to start the copying process.
I think it is checking each file if there is any update.
The images on my server won't change once they are created.
So, is there any faster way to run the command so that it may copy files if local file doesn't exist without checking filesize, time, checksum etc...
Please suggest.
Thank you

did you try this flag -- it might help, but it might still take some time to resume the transfer:
--ignore-existing
This tells rsync to skip updating files that already exist on the destination (this does not ignore
existing directories, or nothing would get done). See also --existing.
This option is a transfer rule, not an exclude, so it doesn't affect the data that goes into the
file-lists, and thus it doesn't affect deletions. It just limits the files that the receiver requests
to be transferred.
This option can be useful for those doing backups using the --link-dest option when they need to con-
tinue a backup run that got interrupted. Since a --link-dest run is copied into a new directory hier-
archy (when it is used properly), using --ignore existing will ensure that the already-handled files
don't get tweaked (which avoids a change in permissions on the hard-linked files). This does mean that
this option is only looking at the existing files in the destination hierarchy itself.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string