Analyze Previous months data from logs using Awstats? - ubuntu-10.04

Hi I have configured Awstats on my ubuntu machine to analyze nginx access logs and the problem is Awstats gives me report only for the present day i want to analyze the previous months logs also and i have combine all logs to a single file and ran the update script still i dnt get the report for the previous months, i checked the log files the data for the previous months are in avilable what am i doing wrong

You can use a real-time log analyzer, such as GoAccess. It's really fast and you can combine all logs. apache & nginx
http://goaccess.prosoftcorp.com/
zcat -f access.log* | goaccess -a -s -b
OR
zcat access.log.*.gz | goaccess -a -s -b

What worked for me was to use logresolvemerge.pl to create a new logfile. I then deleted all the previous logs and used the one really big large as the access.log. I then set it to the appropriate permissions. Then I deleted the domain.hash file that stored all the awstats data. You can find it by first looking in the /etc/awstats/awstats.yourdomain.conf file. It has a location for DirData. Mine was /var/lib/awstats. After deleting that hash file, I reran the update command and it took forever. Once it was done, I reloaded apache2 and I had all my data from the last year and a half. You might not need to reload Apache but I didn't test it before I did.

Related

Empty log files daily using cron task

I want to empty (not delete) log files daily at a particular time. something like
echo "" > /home/user/dir/log/*.log
but it returns
-bash: /home/user/dir/log/*.log: ambiguous redirect
is there any way to achieve this?
You can't redirect to more than one file, but you can tee to multiple files.
tee /home/user/dir/log/*.log </dev/null
The redirect from /dev/null also avoids writing an empty line to the beginning of each file, which was another bug in your attempt. (Perhaps specify nullglob to avoid creating a file with the name *.log if the wildcard doesn't match any existing files, though.)
However, a much better solution is probably to use the utility logrotate which is installed out of the box on every Debian (and thus also Ubuntu, Mint, etc) installation. It runs nightly by default, and can be configured by dropping a file in its configuration directory. It lets you compress the previous version of a log file instead of just overwrite, and takes care to preserve ownership and permissions etc.

wget to download new wildcard files and overwrite old ones

I'm currently using wget to download specific files from a remote server. The files are updated every week, but always have the same file names. e.g new upload file1.jpg will replace local file1.jpg
This is how I am grabbing them, nothing fancy :
wget -N -P /path/to/local/folder/ http://xx.xxx.xxx.xxx/remote/files/file1.jpg
This downloads file1.jpg from the remote server if it is newer than the local version then overwrites the local one with the new one.
Trouble is, I'm doing this for over 100 files every week and have set up cron jobs to fire the 100 different download scripts at specific times.
Is there a way I can use a wildcard for the file name and have just one script that fires every 5 minutes for example?
Something like....
wget -N -P /path/to/local/folder/ http://xx.xxx.xxx.xxx/remote/files/*.jpg
Will that work? Will it check the local folder for all current file names, see what is new and then download and overwrite only the new ones? Also, is there any danger of it downloading partially uploaded files on the remote server?
I know that some kind of file sync script between servers would be a better option but they all look pretty complicated to set up.
Many thanks!
You can specify the files to be downloaded one by one in a text file, and then pass that file name using option -i or --input-file.
e.g. contents of list.txt:
http://xx.xxx.xxx.xxx/remote/files/file1.jpg
http://xx.xxx.xxx.xxx/remote/files/file2.jpg
http://xx.xxx.xxx.xxx/remote/files/file3.jpg
....
then
wget .... --input-file list.txt
Alternatively, If all your *.jpg files are linked from a particular HTML page, you can use recursive downloading, i.e. let wget follow links on your page to all linked resources. You might need to limit the "recursion level" and file types in order to prevent downloading too much. See wget --help for more info.
wget .... --recursive --level=1 --accept=jpg --no-parent http://.../your-index-page.html

Logrotate every 30th second and store logfiles in date-named directories

I've got a CentOS installation with a busy webserver.
I need to aquire stats from the log files and keep the old ones, ordered by date.
Every 30th second, the current logfile should be closed and processed (analysing entries and storing them into a database). Since this generates a lot of logfiles, I want to group them into directories, named by date.
At the moment, I have two files; rotation.conf and rotatenow.sh. The shell-file creates directories, based on YmdHMS.
After that, I run the command "logrotate ./rotation.conf -v --force" in order to invoke the proces, but how do I make the config-file put the log into the newly generated directory? Can the whole thing be done inside the config-file?
now="$(date)"
now="$(date +'%Y-%m-%d-%H:%M:%S')"
foldernavn="/var/www/html/stats/logs/nmdstats/closed/$now"
mkdir $foldernavn
logrotate ./nmdhosting.conf -v --force
At the moment, the config-file looks like this:
/var/www/html/stats/logs/nmdhosting/access_log {
ifempty
missingok
(I am stuck)
(do some post-processing - run a Perl-script)
}
Any ideas would be deeply appreciated.
Update: I tried a different approach, adding this to the httpd.conf:
TransferLog "|usr/sbin/rotatelogs /var/www/html/stats/logs/nmdstats/closed/activity_log.%Y%m%d%H%M%S 30".
It works, but apparently, it can't run a pre/post processing script when using this method. This is essential in order to update the database. I could perhaps run a shell/Perl-script using a cronjob, but I don't trust that method. The search goes on...
update 2:
I've also tested cronolog but the - for my project - required functionalities haven't been implemented yet, but are on the to-do. Since the latest version is from 2002, I'm not going to wait around for it to happen :)
However, I was unaware of the inotify-tools, so I managed to set up a listener:
srcdir="/var/www/html/stats/logs/nmdstats/history/"
inotifywait -m -e create $srcdir |
while read filename eventlist eventfile
do
echo "This logfile has just been closed: $eventfile"
done
I think, I can handle it from here. Thank you, John
No need for cron: if you use the TranserLog httpd.conf option to create a new log file every 30 seconds, you can run a post-processing daemon which watches the output directory with inotifywait (or Python's pyinotify, etc.). See here: inotify and bash - this will let you get notified by the OS very soon after a new file is created etc.

wget newest file in another server's folder

I have an automatic backup of a file running on a cronjob. It outputs into a folder, let's call /backup, and appends a timestamp to each file, every hour, like so:
file_08_07_2013_01_00_00.txt, file_08_07_2013_02_00_00.txt, etc.
I want to download these to another server, to keep as a separate backup. I normally just use wget and download a specific file, but was wondering how I could automate this, ideally every hour it would download the most recent file.
What would I need to look into to set this up?
Thanks!
wget can handle that, just enable time-stamping. I'm not even going to attempt my own explanation, here's a direct quote from the manual:
The usage of time-stamping is simple. Say you would like to download a
file so that it keeps its date of modification.
wget -S http://www.gnu.ai.mit.edu/
A simple ls -l shows that the time stamp on the local file equals the state of the Last-Modified
header, as returned by the server. As you can see, the time-stamping
info is preserved locally, even without ā€˜-Nā€™ (at least for http).
Several days later, you would like Wget to check if the remote file
has changed, and download it if it has.
wget -N http://www.gnu.ai.mit.edu/
Wget will ask the server for the last-modified date. If the local file has the same timestamp as
the server, or a newer one, the remote file will not be re-fetched.
However, if the remote file is more recent, Wget will proceed to fetch
it.

Keep files updated from remote server

I have a server at hostname.com/files. Whenever a file has been uploaded I want to download it.
I was thinking of creating a script that constantly checked the files directory. It would check the timestamp of the files on the server and download them based on that.
Is it possible to check the files timestamp using a bash script? Are there better ways of doing this?
I could just download all the files in the server every 1 hour. Would it therefore be better to use a cron job?
If you have a regular interval at which you'd like to update your files, yes, a cron job is probably your best bet. Just write a script that does the checking and run that at an hourly interval.
As #Barmar commented above, rsync could be another option. Put something like this in the crontab and you should be set:
# min hour day month day-of-week user command
17 * * * * user rsync -av http://hostname.com/ >> rsync.log
would grab files from the server in that location and append the details to rsync.log on the 17th minute of every hour. Right now, though, I can't seem to get rsync to get files from a webserver.
Another option using wget is:
wget -Nrb -np -o wget.log http://hostname.com/
where -N re-downloads only files newer than the timestamp on the local version, -b sends
the process to the background, -r recurses into directories and -o specifies a log file. This works from an arbitrary web server. -np makes sure it doesn't go up into a parent directory, effectively spidering the entire server's content.
More details, as usual, will be in the man pages of rsync or wget.

Resources