Clearing Large Apache Domain Logs

Clearing Large Apache Domain Logs - linux

I am having an issue where Apache logs are growing out of proportion on several servers (Linux CentOS 5)... I will eventually disable logging completely but for now I need a quick fix to reclaim the hard disk space.
I have tried using the echo " " > /path/to/log.log or the * > /path/to/log.log but they take too long and almost crash the server as the logs are as large as 100GB
Deleting the files works fast but my question is, will it cause a problem when I restart apache. My servers are live and full of users so I can't crash them.
Your help is appreciated.

Use the truncate command
truncate -s 0 /path/to/log.log
In the longer term you should use logrotate to keep the logs from getting out of hand.

Try this:
cat /dev/null > /path/to/log.log

mv /path/to/log.log /path/to/log.log.1
Do this for your access, error and if you are really doing it on prod, you rewrite logs.
This doesn't effect Apache on *nix, since the file is open. Then restart Apache. Yes, I know I said restart, but this usually takes a second or so, so I doubt that anyone will notice -- or blame it on the network. The restarted Apache will be running with a new set of log files.
In terms of your current logs, IMO you need to keep at least the last 3 months error logs, and 1 month access logs, but look at your volumetrics to decide your rough per week volumes for error and access logs. Don't truncate the old files. If necessary do a nice tail piped to gzip -c of these to archives. If you want to split the use a loop doing a tail|head|gzip using the --bytes=nnG option. OK, you'll split across the odd line but that's better than deleting the lot as you suggest.
Of course, you could just delete the lot as you and others propose, but what are you going to do if you've realised that the site has been hacked recently? "Sorry: too late; I've deleted the evidence!"
Then for goodness sake implement a logrotate regime.

Related

Opening syslog files safely

I have a syslog server running syslog-ng (soon to be running rsyslog) on RHEL 6 with almost a 1000 hosts logging to it. I want write a script to open each file (read-only and 1 at the time), pull some data from them and close them (probably in Ruby). Will this mess up syslog or cause any other issues? What other pitfalls might I need to be aware of?
My main worry is syslog trying to write data to a file that I have open, even tho I may only have it open for a very short amount of time (maybe less than a second).
possible sudo code:
foreach file
open $file
grep "search sting" $file
close $file

If the daemon only appends to the file, I think there's no risk. It's deleting/tryncating I would worry about. I think it depends on the configuration of your daemon (not so much on which daemon is used), but I would assume most configs don't actually remove data.
Deleting is typically done with logrotate and other tools. I would look at these tools' configs (e.g. make them run at a different time than the script you're mentioning).

rsync hang in the middle of transfer with a fixed position

I am trying to use rsync to transfer some big files between servers.
For some reasons, when the file is big enough (2GB - 4GB), the rsync would hang in the middle, with the exactly same position, i.e., the progress at which it hanged always stick to the same place even if I retried.
If I remove the file from the destination server first, then the rsync would work fine.
This is the command I used:
/usr/bin/rsync --delete -avz --progress --exclude-from=excludes.txt /path/to/src user#server:/path/to/dest
I have tried to add delete-during and delete-delay, all have no luck.
The rsync version is rsync version 3.1.0 protocol version 31
Any advice please? Thanks!

Eventually I solved the problem by removing compression option: -z
Still don't know why is that so.

I had the same problem (trying to rsync multiple files of up to 500GiB each between my home NAS and a remote server).
In my case the solution (mentioned here) was to add to "/etc/ssh/sshd_config" (on the server to which I was connecting) the following:
ClientAliveInterval 15
ClientAliveCountMax 240
"ClientAliveInterval X" will send a kind of message/"probe" every X seconds to check if the client is still alive (default is not to do anything).
"ClientAliveCountMax Y" will terminate the connection if after Y-probes there has been no reply.
I guess that the root cause of the problem is that in some cases the compression (and/or block diff) that is performed locally on the server takes so much time that while that's being done the SSH-connection (created by the rsync-program) is automatically dropped while that's still ongoing.
Another workaround (e.g. if "sshd_config" cannot be changed) might be to use with rsync the option "--new-compress" and/or a lower compression level (e.g. "rsync --new-compress --compress-level=1" etc...): in my case the new compression (and diff) algorithm is a lot faster than the old/classical one, therefore the ssh-timeout might not occur than when using its default settings.

The problem for me was I thought I had plenty of disk space on a drive but the partition was not using the whole disk and was half the size of what I expected.
So check the size of the space available with lsblk and df -h and make sure the disk you are writing to reports all the space available on the device.

centos free space on disk not updating

I am new to the linux and working with centos system ,
By running command df -H it is showing 82% if full, that is only 15GB is free.
I want some more extra spaces, so using WINSCP i hav done shift deleted the 15G record.
and execured df -H once again, but still it is showing 15 GB free. but the free size of the deleted
file where it goes.
Plese help me out in finding solution to this

In most unix filesystems, if a file is open, the OS will delete the file right way, but will not release space until the file is closed. Why? Because the file is still visible for the user that opened it.
On the other side, Windows used to complain that it can't delete a file because it is in use, seems that in later incarnations explorer will pretend to delete the file.
Some applications are famous for bad behavior related to this fact. For example, I have to deal with some versions of MySQL that will not properly close some files, over the time I can find several GB of space wasted in /tmp.
You can use the lsof command to list open files (man lsof). If the problem is related to open files, and you can afford a reboot, most likely it is the easiest way to fix the problem.

Can /tmp in Linux ever fill up?

I'm putting some files in /tmp on a web server that are being used by a web application for a limited amount of time. If the files get left in the server's /tmp after the user quits using the application and this happens repeatedly, should i be concerned about the directory filling up? I read online that rebooting cleans out the /tmp directory, but this box doesn't get rebooted very much.
Tom

Yes, it will fill up. Consider implementing a cron job that will delete old files after a while.
Something like this should do the trick:
/usr/bin/find /tmp/mydata -type f -atime +1 -exec rm -f {} \;
This will delete files that have a modification time that's more than a day old.
Or as a crontab entry:
# run five minutes after midnight, every day
5 0 * * * /usr/bin/find /tmp/mydata -type f -atime +1 -exec rm -f {} \;
where /tmp/mydata is a subdirectory where your application stores its temporary files. (Simply deleting old files under /tmp would be a very bad idea, as someone else pointed out here.)
Look at the crontab and find man pages for the details. Don't go running scripts that delete files on your filesystem without understanding all the details - that's how bad things happen to good servers. :)
Of course, if you can just modify your application to delete temporary files when it's done with them, that would be a far better solution, generally.

You can't just blindly delete everything that hasn't been modified for a certain amount of time. A lot of programs store sockets in there, which never get modified but are still an integral part of the program working. Take for instance mysql from one of my servers:
srwxrwxrwx 1 mysql mysql 0 Sep 11 04:01 mysql.sock=
That's a valid, working "file" in /tmp. It just looks old because mysql hasn't been restarted in a while. Either limit your find with '-type f' or '-atime', or use one of the distro-provided tools others have mentioned.

The only thing you can write to without worrying it will fill up is /dev/null. Everything else will eventually run out of space if you keep dumping things in it.
One simple approach would be to have a cron job clean up all your /tmp files that are older than, say, a few days.

Yep It will be linked to one of your disks/partitions and can fill up.
It gets deleted on a reboot.
When the user quits the application you should clean the files up after them.

In which language is your web-application? A lot of languages propose temp files:
C
python
php
...
Search in your language if there is such a feature.

Just a warning: not all Linux installation clean the /tmp directory after each reboot

Some linux distros have a package that will clean up old files in /tmp for you. It isn't hard to implement your own, as mentioned above. One thing to look out for are long running processes, especially "zombies", which are ones that have died but which haven't finished cleaning up after themselves. If a process has a file open, just deleting it from /tmp won't actually reclaim its space - you have to kill the process or somehow coerce it to close the file. Many programs that write log or temporary files are designed to catch a signal (often SIGUSR1) and close and re-open any log or temporary files for that reason.

Many Linux distributions include something named 'tmpwatch', or similar which runs via cron and deletes things on a pre-defined gradient. Some are smart enough to go by the owner of the file .. stuff that is owned by daemon users gets cleaned out faster than stuff owned by regular users. Check on the mailing lists for your distro of choice to find out.
Still, you should have SNMP or some other kind of monitor watching how much room is available, if it fills up services like Apache aren't going to be happy. For instance, e-accelerator for PHP will need plenty of room, some mail scanners don't clean up properly, etc.

rm not freeing diskspace [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I've rm'ed a 2.5gb log file - but it doesn't seemed to have freed any space.
I did:
rm /opt/tomcat/logs/catalina.out
then this:
df -hT
and df reported my /opt mount still at 100% used.
Any suggestions?

Restart tomcat, if the file is in use and you remove it, the space becomes available when that process finishes.

As others suggested, the file probably is still opened by other processes. To find out by which ones, you can do
lsof /opt/tomcat/logs/catalina.out
which lists you the processes. Probably you will find tomcat in that list.

Your Problem:
Its possible that a running program is still holding on to the file.
Your Solution:
Per the other answers here, you can simply shutdown tomcat to stop it from holding on to the file.
If that is not an option, or if you simply want more details, check out this question: Find and remove large files that are open but have been deleted - it suggests some harsher ways to deal with it that may be more useful to your situation.
More Details:
The linux/unix filesystem considers "opened" files to be another name for them. rm removes the "name" from the file as seen in the directory tree. Until the handles are closed, the files still has more "names" and so the file still exists. The file system doesn't reap files until they are completely unnamed.
It might seem a little odd, but doing it this way allows for useful things like enabling symlinks. Symlinks can essentially be treated as an alternate name for the same file.
This is why it is important to always call your languages equivalent to close() on a file handle if you are done with it. This notifies the OS that the file is no longer being used. Although sometimes this cant be helped - which is likely the case with Tomcat. Refer to Bill Karwin's Answer to read why.
Depending on the file-system, this is usually implemented as a sort of reference count, so there may not be any real names involved. It can also get weird if things like stdin and stderr are redirected to a file or another bytestream (most commonly done with services).
This whole idea is closely related to the concept of 'inodes', so if you are the curious type, i'd recommend checking that out first.
Discussion
It doesn't work so well anymore, but you used to be able to update the entire OS, start up a new http-daemon using the new libraries, and finally close the old one when no more clients are being serviced with it (releasing the old handles) . http clients wouldn't even miss a beat.
Basicly, you can completely wipe out the kernel and all the libraries "from underneath" running programs. But since the "name" still exists for the older copies, the file still exists in memory/disk for that particular program. Then it would be a matter of restarting all the services etc. While this is an advanced usage scenario, it is a reason why some unix system have years of up-time on record.

Restarting Tomcat will release any hold Tomcat has on the file. However, to avoid restarting Tomcat (e.g. if this is a production environment and you don't want to bring the services down unncessarily), you can usually just overwrite the file:
cp /dev/null /opt/tomcat/logs/catalina.out
Or even shorter and more direct:
> /opt/tomcat/logs/catalina.out
I use these methods all the time to clear log files for currently running server processes in the course of troubleshooting or disk clearing. This leaves the inode alone but clears the actual file data, whereas trying to delete the file often either doesn't work or at the very least confuses the running process' log writer.

As FerranB and Paul Tomblin have noted on this thread, the file is in use and the disk space won't be freed until the file is closed.
The problem is that you can't signal the Catalina process to close catalina.out, because the file handle isn't under control of the java process. It was opened by shell I/O redirection in catalina.sh when you started up Tomcat. Only by terminating the Catalina process can that file handle be closed.
There are two solutions to prevent this in the future:
Don't allow output from Tomcat apps to go into catalina.out. Instead use the swallowOutput property, and configure log channels for output. Logs managed by log4j can be rotated without restarting the Catalina process.
Modify catalina.sh to pipe output to cronolog instead of simply redirecting to catalina.out. That way cronolog will rotate logs for you.

the best solution is using 'echo' ( as #ejoncas' suggestion ):
$ echo '' > huge_file.log
This operation is quite safe and fast(remove about 1G data per second), especially when you are operating on your production server.
Don't simply remove this file using 'rm' because firstly you have to stop the process writing it, otherwise the disk won't be freed.
refer to: http://siwei.me/blog/posts/how-to-deal-with-huge-log-file-in-production
UPDATED： the origin of my story
in 2013, when I was working for youku.com, on the Saturday, I found one core server was down, the reason is : disk is full ( with log files)
so I simplely rm log_file.log ( without stopping the web app proccess) but found: 1. no disk space was freed and: 2. the log file was actually not seen to me.
so I have to restart my web-server( an Rails app ) and the disk space was finally freed.
This is a quite important lesson to me. It told me that echo '' > log_file.log is the correct way to free disk space if you don't want to stop the running process which is writing log to this file.

If something still has it open, the file won't actually go away. You probably need to signal catalina somehow to close and re-open its log files.

If there is a second hard link to the file then it won't be deleted until that is removed as well.

Enter the command to check which deleted files has occupied memory
$ sudo lsof | grep deleted
It will show the deleted file that still holds memory.
Then kill the process with pid or name
$ sudo kill <pid>
$ df -h
check now you will have the same memory
If not type the command below to see which file is occupying memory
# cd /
# du --threshold=(SIZE)
mention any size it will show which files are occupying above the threshold size and delete the file

Is the rm journaled/scheduled? Try a 'sync' command for force the write.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string