Accessing system performance data directly from the linux kernel - linux

I need to write an application that gets performance statistics on a Linux machine. Unfortunately the environment is extremely memory constrained and so using the standard command line tools isn't really an option as I would need to poll them pretty frequently.
Ideally what I would like to be able to do would be to get the performance data directly from the kernel itself, using the same buffers and data that it uses to try and reduce the RAM requirements for my application as much as possible. Tying my app to the Linux kernel so closely isn't really a problem we have only ever used Linux in production and I can't see that ever changing really.
I've spent the last day or two looking through the kernel source but I have to admit to being somewhat lost. Can anyone point me to the right place for getting access to CPU performance information / I/O performance information / networking performance information and bandwidth usage information please?

I think there are several files under /proc, such as /proc/stat, /proc/diskstats, /proc/net/*.
For CPU performance information, using /proc/stat, the file format is defined in the file ./fs/proc/stat.c in Linux Kernel source code tree.
For disk access information, using /proc/diskstats, the file format is defined in the file ./block/genhd.c in Linux Kernel source code tree, the function is diskstats_show().
For network related statistics, one can refer to files under /proc/net/. But I don't know how to calculate the bandwidth usage based on file under directory /proc/net.


Is there a way to see what happens during program execution in detail on Linux?

I am trying to debug a performance of my program. What would be ideal is to have a way to see in detail when was the thread doing useful work, when was it blocked by page faults, when was it executing some memory writes and reads, etc...
I would simply like to have a detailed understanding of whats going on. Is it possible?
The linux kernel sources come with the perf tool that can measure a large number of performance counter, all of those you listed included, and can print statistics about it, annotate symbols, instructions and source lines with them (if debug symbols are available), and can track any process or also logical cpu cores.
Your Linux distribution will have the tool probably in a standalone package. Some hardening options of the kernel may limit what information root or non-root users can collect with it.
You can use perf and visualizing a perf output file graphically with hotspot

Telling Linux not to keep a file in the cache when it is written to disk

I am writing a large file to disk from a user-mode application. In parallel to it, I am writing one or more smaller files. The large file won't be read back anytime soon, but the small files could be. I have enough RAM for the application + smaller files, but not enough for the large file. Can I tell the OS not to keep parts of the large file in cache after they are written to disk so that more cache is available for smaller files? I still want writes to the large file be fast enough.
Can I tell the OS not to keep parts of the large file in cache ?
Yes, you probably want to use some system call like posix_fadvise(2) or madvise(2). In weird cases, you might use readahead(2) or userfaultfd(2) or Linux-specific flags to mmap(2). Or very cleverly handle SIGSEGV (see signal(7), signal-safety(7) and eventfd(2) and signalfd(2)) You'll need to write your C program doing that.
But I am not sure that it is worth your development efforts. In many cases, the behavior of a recent Linux kernel is good enough.
See also proc(5) and
You many want to read the GC handbook. It is relevant to your concerns
Conbsider studying for inspiration the source code of existing open-source software such as GCC, Qt, RefPerSys, PostGreSQL, GNU Bash, etc...
Most of the time, it is simply not worth the effort to explicitly code something to manage your page cache.
I guess that mount(2) options in your /etc/fstab file (see fstab(5)...) are in practice more important. Or changing or tuning your file system (e.g. ext4(5), xfs(5)..). Or read(2)-ing in large pieces (1Mbytes).
Play with dd(1) to measure. See also time(7)
Most applications are not disk-bound, and for those who are disk bound, renting more disk space is cheaper that adding and debugging extra code.
don't forget to benchmark, e.g. using strace(1) and time(1)
PS. Don't forget your developer costs. They often are a lot above the price of a RAM module (or of some faster SSD disk).

Limit Number of Core Dumps by Process Name

QUESTION Is there an easy, established and accepted way to limit the number of core dumps for a given process on Linux?
WHAT I WANT My ideal solution would be a one-line command to set the per-application limit of x core dumps for all applications. Alternatively, I would be happy with a method to set the limit for each application individually.
WHAT I DON'T WANT I know I can already set a limit for the size of the core dumps using ulimit. I don't want to limit the size of the dumps, just the number of them. I also know I could modify the apport script to get any functionality I desire, but I would like to avoid this if there is a less intrusive solution.
MOTIVATION I am working on a system that is sensitive to excessive disk usage. If a given application cores, I want to keep the core file so that I can debug the problem. If it cores again, which is highly likely since several applications are restarted by a watcher if they die, I don't want to keep the core file because it is unlikely to contain new information and it will just take up disk space.
Process can coredump once, then it is killed. I presume you meant programs like in the rest of the question.
There is nothing of the sort in stock kernels, but things like grsecurity at least used to offer the relevant feature to tamper brute forcing against ASLR.
What do you need this for?

Linux CPU Usage Tools

I've written a tool to capture CPU usage on a per/thread basis. The output of the tools is a binary file, that I can pump into my parsing utility that I wrote. And the output of the parsing utility is a CSV file that I can import into Excel to chart pretty graphs of process/thread CPU usage.
This CPU usage capture tool is running on an embedded ARM platform running a Linux kernel based on That being said, I was concerned about making the tool light weight. I didn't want it to store directly to a CSV file, in order to minimize the processing time and the file size of the captured data.
The tool works, but I'm wondering if I took the long way around the problem? Is there already a tool out there that does this (or something like it)?
You're probably wondering why I care if I already made a tool that works. Well, it's not as light weight as I'd like. It's taking up about 10% of CPU usage. As a benchmark, top only takes up about 1% (max).
I've decided to continue using my tool for now. At least until a better solution becomes available. I was able to shave off a couple percentage points by using open() instead of fopen() on /proc/stat. I'm also using read() instead of fgets().
IBM has a tool called nmon which does the same(for AIX & Linux): According to IBM's documentation, it takes ~2% CPU. You may want to look at that.
Comparing nmon with your tool could give you a fair idea about your program's performance and how you may improve your csv capture.
This might be a bit of a steep learning curve, but you might want look into SystemTap:

Does the Linux filesystem cache files efficiently?

I'm creating a web application running on a Linux server. The application is constantly accessing a 250K file - it loads it in memory, reads it and sends back some info to the user. Since this file is read all the time, my client is suggesting to use something like memcache to cache it to memory, presumably because it will make read operations faster.
However, I'm thinking that the Linux filesystem is probably already caching the file in memory since it's accessed frequently. Is that right? In your opinion, would memcache provide a real improvement? Or is it going to do the same thing that Linux is already doing?
I'm not really familiar with neither Linux nor memcache, so I would really appreciate if someone could clarify this.
Yes, if you do not modify the file each time you open it.
Linux will hold the file's information in copy-on-write pages in memory, and "loading" the file into memory should be very fast (page table swap at worst).
Edit: Though, as cdhowie points out, there is no 'linux filesystem'. However, I believe the relevant code is in linux's memory management, and is therefore independent of the filesystem in question. If you're curious, you can read in the linux source about handling vm_area_struct objects in linux/mm/mmap.c, mainly.
As people have mentioned, mmap is a good solution here.
But, one 250k file is very small. You might want to read it in and put it in some sort of memory structure that matches what you want to send back to the user on startup. Ie, if it is a text file an array of lines might be a good choice, etc.
The file should be cached, but make sure the noatime option is set on the mount, otherwise the access time will attempt to be saved to the file, invalidating the cache.
Yes, definitely. It will keep accessed files in memory indefinitely, unless something else needs the memory.
You can control this behaviour (to some extent) with the fadvise system call. See its "man" page for more details.
A read/write system call will still normally need to copy the data, so if you see a real bottleneck doing this, consider using mmap() which can avoid the copy, by mapping the cache pages directly into the process.
I guess putting that file into ramdisk (tmpfs) may make enough advantage without big modifications. Unless you are really serious about response time in microseconds unit.
