How to remove a data from main memory/RAM - linux

Is there any way to remove a specific data from the main memory in Linux ? So that is has to bring again that data from the hard-disk

Rather than removing a specific piece of data, this will remove all from the linux cache. (I am assuming this is what you mean when you want linux to reload a file from the hard disk into main memory).
sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
Link: How to clear memory cache on linux
Also, my apologies to Mohit M. as he answered this in the comment section before me.

You could always change your app to use O_DIRECT flag which bypasses the page cache and fetches file from diskc for a specific read call. Now it may not work in all cases (especially in case of stacked block devices etc) in which case you should stick to drop_caches like other before me explained.

Related

Artificially modify server load in Ubuntu

I am curious if it is possible artificially modify the server load in Ubuntu or more generally linux. I am working on an application that reacts to the server load, and in order to test it it would be nice if I could change the server load easily.
I am currently running an over-active program that will literally generate load, but I'd prefer to not continue overheating my laptop (it's getting hot!).
One of the most important things to know about Linux (or Unix) systems is, everything is just a file. Since you are just reading from /proc/loadavg, the easiest was for you to accomplish what you are after is simply make a text file that contains a line of text that you would see when running cat /proc/loadavg. Then have your program read from that file you created instead of /proc/loadavg and it will be none the wiser. If you want to test under different "artificial" situations, just change the text in this file and save. When your testing is done, simply change your program back to reading from /proc/loadavg and you can be sure it will work as expected.
Note, you can make this text file anywhere you want...in your home directory, in the program directory, wherever. However, you shouldn't make it in /proc. That directory is reserved for system objects.
You can use the stress command, see http://weather.ou.edu/~apw/projects/stress/
A tool to impose load on and stress test a computer system
sudo apt-get install stress
To avoid CPU warm, you can install a virtual machine with small cpu capacity. virtualbox and qemu-kvm are free.
Use chroot to run the various pieces of software you're testing with a specified directory as the root directory. Set up a manufactured/modified /proc/loadavg relative to that new root directory, too.
chroot will let you create a dummy file that appears to have /proc/loadavg as its path, so the software will observe your manufactured values even if you can't change your code to look for load data in a different location.
Since you don't want to actually/literally stress the machine, something like stress is not what you are after.
As stated, /proc/loadavg would be the place to set system load averages (faux loads).
But if that's also not the meat of what you're after, I would absolutely suggest
getloadavg
watchdog
and even possible Munin plugins
There're two methods.
Hacking /proc/loadavg
The machine is not overstressed
Your program reads load valus from a file
Todo: hack Linux to report fake load value
Modify your prg
The machine is not overstressed
Your program reads load valus from a file
Todo: change 4 characters in your prg: replace /proc/loadavg with /tmp/loadavg
You can decide now. Calculate costs ;)

Does the Linux filesystem cache files efficiently?

I'm creating a web application running on a Linux server. The application is constantly accessing a 250K file - it loads it in memory, reads it and sends back some info to the user. Since this file is read all the time, my client is suggesting to use something like memcache to cache it to memory, presumably because it will make read operations faster.
However, I'm thinking that the Linux filesystem is probably already caching the file in memory since it's accessed frequently. Is that right? In your opinion, would memcache provide a real improvement? Or is it going to do the same thing that Linux is already doing?
I'm not really familiar with neither Linux nor memcache, so I would really appreciate if someone could clarify this.
Yes, if you do not modify the file each time you open it.
Linux will hold the file's information in copy-on-write pages in memory, and "loading" the file into memory should be very fast (page table swap at worst).
Edit: Though, as cdhowie points out, there is no 'linux filesystem'. However, I believe the relevant code is in linux's memory management, and is therefore independent of the filesystem in question. If you're curious, you can read in the linux source about handling vm_area_struct objects in linux/mm/mmap.c, mainly.
As people have mentioned, mmap is a good solution here.
But, one 250k file is very small. You might want to read it in and put it in some sort of memory structure that matches what you want to send back to the user on startup. Ie, if it is a text file an array of lines might be a good choice, etc.
The file should be cached, but make sure the noatime option is set on the mount, otherwise the access time will attempt to be saved to the file, invalidating the cache.
Yes, definitely. It will keep accessed files in memory indefinitely, unless something else needs the memory.
You can control this behaviour (to some extent) with the fadvise system call. See its "man" page for more details.
A read/write system call will still normally need to copy the data, so if you see a real bottleneck doing this, consider using mmap() which can avoid the copy, by mapping the cache pages directly into the process.
I guess putting that file into ramdisk (tmpfs) may make enough advantage without big modifications. Unless you are really serious about response time in microseconds unit.

How do I measure net used disk space change due to activity by a given process in Linux?

I'd like to monitor disk space requirements of a running process. Ideally, I want to be able to point to a process and find out the net change in used disk space attributable to it. Is there an easy way of doing this in Linux? (I'm pretty sure it would be feasible, though maybe not very easy, to do this in Solaris with DTrace)
Probably you'll have to ptrace it (or get strace to do it for you and parse the output), and then try to work out what disc is being used.
This is nontrivial, as your tracing process will need to understand which file operations use disc space - and be free of race conditions. However, you might be able to do an approximation.
Quite a lot of things can use up disc space, because most Linux filesystems support "holes". I suppose you could count holes as well for accounting purposes.
Another problem is knowing what filesystem operations free up disc space - for example, opening a file for writing may, in some cases, truncate it. This clearly frees up space. Likewise, renaming a file can free up space if it's renamed over an existing file.
Another issue is processes which invoke helper processes to do stuff - for example if myprog does a system("rm -rf somedir").
Also it's somewhat difficult to know when a file has been completely deleted, as it might be deleted from the filesystem but still open by another process.
Happy hacking :)
If you know the PID of the process to monitor, you'll find plenty of information about it in /proc/<PID>.
The file /proc/<PID>/io contains statistics about bytes read and written by the process, it should be what you are seeking for.
Moreover, in /proc/<PID>/fd/ you'll find links to all the files opened by your process, so you could monitor them.
there is Dtrace for linux is available
http://librenix.com/?inode=13584
Ashitosh

How to find or calculate a Linux process's page table size and other kernel accounting?

How can I find out how big a Linux process's page table is, along with any other variable-size process accounting?
If you are really interested in the page tables, do a
$ cat /proc/meminfo | grep PageTables
PageTables: 24496 kB
Since Linux 2.6.10, the amount of memory used by a single process' page tables has been exposed via the VmPTE field of /proc/<pid>/status.
Not sure about Linux, but most UNIX variants provide sysctl(3) for this purpose. There is also the sysctl(8) command line utility.
Hmmm, back in Ye Olden Tymes, we used to call nlist(3) to get the system address for the data we were interested in, then open /dev/kmem, seek to the address, then read the data. Not sure if this works in Linux, but it might be worth typing "man 3 nlist" and seeing what comes back.
You should describe your problem, and not ask about details. If you fork too much (especially with a process which has a large address space) there are all kind of things which go wrong (including out of memory), hitting a pagetable maximum size is IMHO not a realistic problem.
Thad said, I would also be interested to read a process pagetable share in Linux.
As a simple rule of thumb you can however asume that each process occopies a share in the pagetable which is equal to its virtual size, for example 6 bytes for each page. So for example if you have a Oracle Database with 8GB SGA and 500 Processes sharing it, each of the process will use 14MB pagetable, which results in 7GB pagetables+8GB SGA. (sample numbers from http://kevinclosson.wordpress.com/2009/07/25/little-things-doth-crabby-make-%E2%80%93-part-ix-sometimes-you-have-to-really-really-want-your-hugepages/)

rm not freeing diskspace [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I've rm'ed a 2.5gb log file - but it doesn't seemed to have freed any space.
I did:
rm /opt/tomcat/logs/catalina.out
then this:
df -hT
and df reported my /opt mount still at 100% used.
Any suggestions?
Restart tomcat, if the file is in use and you remove it, the space becomes available when that process finishes.
As others suggested, the file probably is still opened by other processes. To find out by which ones, you can do
lsof /opt/tomcat/logs/catalina.out
which lists you the processes. Probably you will find tomcat in that list.
Your Problem:
Its possible that a running program is still holding on to the file.
Your Solution:
Per the other answers here, you can simply shutdown tomcat to stop it from holding on to the file.
If that is not an option, or if you simply want more details, check out this question: Find and remove large files that are open but have been deleted - it suggests some harsher ways to deal with it that may be more useful to your situation.
More Details:
The linux/unix filesystem considers "opened" files to be another name for them. rm removes the "name" from the file as seen in the directory tree. Until the handles are closed, the files still has more "names" and so the file still exists. The file system doesn't reap files until they are completely unnamed.
It might seem a little odd, but doing it this way allows for useful things like enabling symlinks. Symlinks can essentially be treated as an alternate name for the same file.
This is why it is important to always call your languages equivalent to close() on a file handle if you are done with it. This notifies the OS that the file is no longer being used. Although sometimes this cant be helped - which is likely the case with Tomcat. Refer to Bill Karwin's Answer to read why.
Depending on the file-system, this is usually implemented as a sort of reference count, so there may not be any real names involved. It can also get weird if things like stdin and stderr are redirected to a file or another bytestream (most commonly done with services).
This whole idea is closely related to the concept of 'inodes', so if you are the curious type, i'd recommend checking that out first.
Discussion
It doesn't work so well anymore, but you used to be able to update the entire OS, start up a new http-daemon using the new libraries, and finally close the old one when no more clients are being serviced with it (releasing the old handles) . http clients wouldn't even miss a beat.
Basicly, you can completely wipe out the kernel and all the libraries "from underneath" running programs. But since the "name" still exists for the older copies, the file still exists in memory/disk for that particular program. Then it would be a matter of restarting all the services etc. While this is an advanced usage scenario, it is a reason why some unix system have years of up-time on record.
Restarting Tomcat will release any hold Tomcat has on the file. However, to avoid restarting Tomcat (e.g. if this is a production environment and you don't want to bring the services down unncessarily), you can usually just overwrite the file:
cp /dev/null /opt/tomcat/logs/catalina.out
Or even shorter and more direct:
> /opt/tomcat/logs/catalina.out
I use these methods all the time to clear log files for currently running server processes in the course of troubleshooting or disk clearing. This leaves the inode alone but clears the actual file data, whereas trying to delete the file often either doesn't work or at the very least confuses the running process' log writer.
As FerranB and Paul Tomblin have noted on this thread, the file is in use and the disk space won't be freed until the file is closed.
The problem is that you can't signal the Catalina process to close catalina.out, because the file handle isn't under control of the java process. It was opened by shell I/O redirection in catalina.sh when you started up Tomcat. Only by terminating the Catalina process can that file handle be closed.
There are two solutions to prevent this in the future:
Don't allow output from Tomcat apps to go into catalina.out. Instead use the swallowOutput property, and configure log channels for output. Logs managed by log4j can be rotated without restarting the Catalina process.
Modify catalina.sh to pipe output to cronolog instead of simply redirecting to catalina.out. That way cronolog will rotate logs for you.
the best solution is using 'echo' ( as #ejoncas' suggestion ):
$ echo '' > huge_file.log
This operation is quite safe and fast(remove about 1G data per second), especially when you are operating on your production server.
Don't simply remove this file using 'rm' because firstly you have to stop the process writing it, otherwise the disk won't be freed.
refer to: http://siwei.me/blog/posts/how-to-deal-with-huge-log-file-in-production
UPDATED: the origin of my story
in 2013, when I was working for youku.com, on the Saturday, I found one core server was down, the reason is : disk is full ( with log files)
so I simplely rm log_file.log ( without stopping the web app proccess) but found: 1. no disk space was freed and: 2. the log file was actually not seen to me.
so I have to restart my web-server( an Rails app ) and the disk space was finally freed.
This is a quite important lesson to me. It told me that echo '' > log_file.log is the correct way to free disk space if you don't want to stop the running process which is writing log to this file.
If something still has it open, the file won't actually go away. You probably need to signal catalina somehow to close and re-open its log files.
If there is a second hard link to the file then it won't be deleted until that is removed as well.
Enter the command to check which deleted files has occupied memory
$ sudo lsof | grep deleted
It will show the deleted file that still holds memory.
Then kill the process with pid or name
$ sudo kill <pid>
$ df -h
check now you will have the same memory
If not type the command below to see which file is occupying memory
# cd /
# du --threshold=(SIZE)
mention any size it will show which files are occupying above the threshold size and delete the file
Is the rm journaled/scheduled? Try a 'sync' command for force the write.

Resources