linux get process name from pid within kernel - linux

hi
i have used sys_getpid() from within kernel to get process id
how can I find out process name from kernel struct? does it exist in kernel??
thanks very much

struct task_struct contains a member called comm, it contains executable name excluding path.
Get current macro from this file will get you the name of the program that launched the current process (as in insmod / modprobe).
Using above info you can use get the name info.

Not sure, but find_task_by_pid_ns might be useful.

My kernel module loads with "modprobe -v my_module --allow-unsupported -o some-data" and I extract the "some-data" parameter. The following code gave me the entire command line, and here is how I parsed out the parameter of interest:
struct mm_struct *mm;
unsigned char x, cmdlen;
mm = get_task_mm(current);
down_read(&mm->mmap_sem);
cmdlen = mm->arg_end - mm->arg_start;
for(x=0; x<cmdlen; x++) {
if(*(unsigned char *)(mm->arg_start + x) == '-' && *(unsigned char *)(mm->arg_start + (x+1)) == 'o') {
break;
}
}
up_read(&mm->mmap_sem);
if(x == cmdlen) {
printk(KERN_ERR "inject: ERROR - no target specified\n");
return -EINVAL;
}
strcpy(target,(unsigned char *)(mm->arg_start + (x+3)));
"target" holds the string after the -o parameter. You can compress this somewhat - the caller (in this case, modprobe) will be the first string in mm->arg_start - to suit your needs.

you can look at the special files in /proc/<pid>/
For example, /proc/<pid>/exe is a symlink pointing to the actual binary.
/proc/<pid>/cmdline is a null-delimited list of the command line, so the first word is the process name.

Related

Configure kern.log to give more info about a segfault

Currently I can find in kern.log entries like this:
[6516247.445846] ex3.x[30901]: segfault at 0 ip 0000000000400564 sp 00007fff96ecb170 error 6 in ex3.x[400000+1000]
[6516254.095173] ex3.x[30907]: segfault at 0 ip 0000000000400564 sp 00007fff0001dcf0 error 6 in ex3.x[400000+1000]
[6516662.523395] ex3.x[31524]: segfault at 7fff80000000 ip 00007f2e11e4aa79 sp 00007fff807061a0 error 4 in libc-2.13.so[7f2e11dcf000+180000]
(You see, apps causing segfault are named ex3.x, means exercise 3 executable).
Is there a way to ask kern.log to log the complete path? Something like:
[6...] /home/user/cclass/ex3.x[3...]: segfault at 0 ip 0564 sp 07f70 error 6 in ex3.x[4...]
So I can easily figure out from who (user/student) this ex3.x is?
Thanks!
Beco
That log message comes from the kernel with a fixed format that only includes the first 16 letters of the executable excluding the path as per show_signal_msg, see other relevant lines for segmentation fault on non x86 architectures.
As mentioned by Makyen, without significant changes to the kernel and a recompile, the message given to klogd which is passed to syslog won't have the information you are requesting.
I am not aware of any log transformation or injection functionality in syslog or klogd which would allow you to take the name of the file and run either locate or file on the filesystem in order to find the full path.
The best way to get the information you are looking for is to use crash interception software like apport or abrt or corekeeper. These tools store the process metadata from the /proc filesystem including the process's commandline which would include the directory run from, assuming the binary was run with a full path, and wasn't already in path.
The other more generic way would be to enable core dumps, and then to set /proc/sys/kernel/core_pattern to include %E, in order to have the core file name including the path of the binary.
The short answer is: No, it is not possible without making code changes and recompiling the kernel. The normal solution to this problem is to instruct your students to name their executable <student user name>_ex3.x so that you can easily have this information.
However, it is possible to get the information you desire from other methods. Appleman1234 has provided some alternatives in his answer to this question.
How do we know the answer is "Not possible to the the full path in the kern.log segfault messages without recompiling the kernel":
We look in the kernel source code to find out how the message is produced and if there are any configuration options.
The files in question are part of the kernel source. You can download the entire kernel source as an rpm package (or other type of package) for whatever version of linux/debian you are running from a variety of places.
Specifically, the output that you are seeing is produced from whichever of the following files is for your architecture:
linux/arch/sparc/mm/fault_32.c
linux/arch/sparc/mm/fault_64.c
linux/arch/um/kernel/trap.c
linux/arch/x86/mm/fault.c
An example of the relevant function from one of the files(linux/arch/x86/mm/fault.c):
/*
* Print out info about fatal segfaults, if the show_unhandled_signals
* sysctl is set:
*/
static inline void
show_signal_msg(struct pt_regs *regs, unsigned long error_code,
unsigned long address, struct task_struct *tsk)
{
if (!unhandled_signal(tsk, SIGSEGV))
return;
if (!printk_ratelimit())
return;
printk("%s%s[%d]: segfault at %lx ip %p sp %p error %lx",
task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
tsk->comm, task_pid_nr(tsk), address,
(void *)regs->ip, (void *)regs->sp, error_code);
print_vma_addr(KERN_CONT " in ", regs->ip);
printk(KERN_CONT "\n");
}
From that we see that the variable passed to printout the process identifier is tsk->comm where struct task_struct *tsk and regs->ip where struct pt_regs *regs
Then from linux/include/linux/sched.h
struct task_struct {
...
char comm[TASK_COMM_LEN]; /* executable name excluding path
- access with [gs]et_task_comm (which lock
it with task_lock())
- initialized normally by setup_new_exec */
The comment makes it clear that the path for the executable is not stored in the structure.
For regs->ip where struct pt_regs *regs, it is defined in whichever of the following are appropriate for your architecture:
arch/arc/include/asm/ptrace.h
arch/arm/include/asm/ptrace.h
arch/arm64/include/asm/ptrace.h
arch/cris/include/arch-v10/arch/ptrace.h
arch/cris/include/arch-v32/arch/ptrace.h
arch/metag/include/asm/ptrace.h
arch/mips/include/asm/ptrace.h
arch/openrisc/include/asm/ptrace.h
arch/um/include/asm/ptrace-generic.h
arch/x86/include/asm/ptrace.h
arch/xtensa/include/asm/ptrace.h
From there we see that struct pt_regs is defining registers for the architecture. ip is just: unsigned long ip;
Thus, we have to look at what print_vma_addr() does. It is defined in mm/memory.c
/*
* Print the name of a VMA.
*/
void print_vma_addr(char *prefix, unsigned long ip)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
/*
* Do not print if we are in atomic
* contexts (in exception stacks, etc.):
*/
if (preempt_count())
return;
down_read(&mm->mmap_sem);
vma = find_vma(mm, ip);
if (vma && vma->vm_file) {
struct file *f = vma->vm_file;
char *buf = (char *)__get_free_page(GFP_KERNEL);
if (buf) {
char *p;
p = d_path(&f->f_path, buf, PAGE_SIZE);
if (IS_ERR(p))
p = "?";
printk("%s%s[%lx+%lx]", prefix, kbasename(p),
vma->vm_start,
vma->vm_end - vma->vm_start);
free_page((unsigned long)buf);
}
}
up_read(&mm->mmap_sem);
}
Which shows us that a path was available. We would need to check that it was the path, but looking a bit further in the code gives a hint that it might not matter. We need to see what kbasename() did with the path that is passed to it. kbasename() is defined in include/linux/string.h as:
/**
* kbasename - return the last part of a pathname.
*
* #path: path to extract the filename from.
*/
static inline const char *kbasename(const char *path)
{
const char *tail = strrchr(path, '/');
return tail ? tail + 1 : path;
}
Which, even if the full path is available prior to it, chops off everything except for the last part of a pathname, leaving the filename.
Thus, no amount of runtime configuration options will permit printing out the full pathname of the file in the segment fault messages you are seeing.
NOTE: I've changed all of the links to kernel source to be to archives, rather than the original locations. Those links will get close to the code as it was at the time I wrote this, 2104-09. As should be no surprise, the code does evolve over time, so the code which is current when you're reading this may or may not be similar or perform in the way which is described here.

How to use proc_pid_cmdline in kernel module

I am writing a kernel module to get the list of pids with their complete process name. The proc_pid_cmdline() gives the complete process name;using same function /proc/*/cmdline gets the complete process name. (struct task_struct) -> comm gives hint of what process it is, but not the complete path.
I have included the function name, but it gives error because it does not know where to find the function.
How to use proc_pid_cmdline() in a module ?
You are not supposed to call proc_pid_cmdline().
It is a non-public function in fs/proc/base.c:
static int proc_pid_cmdline(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
However, what it does is simple:
get_cmdline(task, m->buf, PAGE_SIZE);
That is not likely to return the full path though and it will not be possible to determine the full path in every case. The arg[0] value may be overwritten, the file could be deleted or moved, etc. A process may exec() in a way which obscures the original command line, and all kinds of other maladies.
A scan of my Fedora 20 system /proc/*/cmdline turns up all kinds of less-than-useful results:
-F
BUG:
WARNING: at
WARNING: CPU:
INFO: possible recursive locking detecte
ernel BUG at
list_del corruption
list_add corruption
do_IRQ: stack overflow:
ear stack overflow (cur:
eneral protection fault
nable to handle kernel
ouble fault:
RTNL: assertion failed
eek! page_mapcount(page) went negative!
adness at
NETDEV WATCHDOG
ysctl table check failed
: nobody cared
IRQ handler type mismatch
Machine Check Exception:
Machine check events logged
divide error:
bounds:
coprocessor segment overrun:
invalid TSS:
segment not present:
invalid opcode:
alignment check:
stack segment:
fpu exception:
simd exception:
iret exception:
/var/log/messages
--
/usr/bin/abrt-dump-oops
-xtD
I have managed to solve a version of this problem. I wanted to access the cmdline of all PIDs but within the kernel itself (as opposed to a kernel module as the question states), but perhaps these principles can be applied to kernel modules as well?
What I did was, I added the following function to fs/proc/base.c
int proc_get_cmdline(struct task_struct *task, char * buffer) {
int i;
int ret = proc_pid_cmdline(task, buffer);
for(i = 0; i < ret - 1; i++) {
if(buffer[i] == '\0')
buffer[i] = ' ';
}
return 0;
}
I then added the declaration in include/linux/proc_fs.h
int proc_get_cmdline(struct task_struct *, char *);
At this point, I could access the cmdline of all processes within the kernel.
To access the task_struct, perhaps you could refer to kernel: efficient way to find task_struct by pid?.
Once you have the task_struct, you should be able to do something like:
char cmdline[256];
proc_get_cmdline(task, cmdline);
if(strlen(cmdline) > 0)
printk(" cmdline :%s\n", cmdline);
else
printk(" cmdline :%s\n", task->comm);
I was able to obtain the commandline of all processes this way.
To get the full path of the binary behind a process.
char * exepathp;
struct file * exe_file;
struct mm_struct *mm;
char exe_path [1000];
//straight up stolen from get_mm_exe_file
mm = get_task_mm(current);
down_read(&mm->mmap_sem); //lock read
exe_file = mm->exe_file;
if (exe_file) get_file(exe_file);
up_read(&mm->mmap_sem); //unlock read
//reduce exe path to a string
exepathp = d_path( &(exe_file->f_path), exe_path, 1000*sizeof(char) );
Where current is the task struct for the process you are interested in. The variable exepathp gets the string of the full path. This is slightly different than the process cmd, this is the path of binary which was loaded to start the process. Combining this path with the process cmd should give you the full path.

fuse: Setting offsets for the filler function in readdir

I am implementing a virtual filesystem using the fuse, and need some understanding regarding the offset parameter in readdir.
Earlier we were ignoring the offset and passing 0 in the filler function, in which case the kernel should take care.
Our filesystem database, is storing: directory name, filelength, inode number and parent inode number.
How do i calculate get the offset?
Then is the offset of each components, equal to their size sorted in incremental form of their inode number? What happens is there is a directory inside a directory, is the offset in that case equal to the sum of the files inside?
Example: in case the dir listing is - a.txt b.txt c.txt
And inode number of a.txt=3, b.txt=5, c.txt=7
Offset of a.txt= directory offset
Offset of b.txt=dir offset + size of a.txt
Offset of c.txt=dir offset + size of b.txt
Is the above assumption correct?
P.S: Here are the callbacks of fuse
The selected answer is not correct
Despite the lack of upvotes on this answer, this is the correct answer. Cracking into the format of the void buffer should be discouraged, and that's the intent behind declaring such things void in C code - you shouldn't write code that assumes knowledge of the format of the data behind void pointers, use whatever API is provided properly instead.
The code below is very simple and straightforward, as it should be. No knowledge of the format of the Fuse buffer is required.
Fictitious API
This is a contrived example of what some device's API could look
like. This is not part of Fuse.
// get_some_file_names() -
// returns a struct with buffers holding the names of files.
// PARAMETERS
// * path - A path of some sort that the fictitious device groks.
// * offset - Where in the list of file names to start.
// RETURNS
// * A name_list, it has some char buffers holding the file names
// and a couple other auxiliary vars.
//
name_list *get_some_file_names(char *path, size_t offset);
Listing the files in parts
Here's a Fuse callback that can be registered with the Fuse system to
list the filenames provided by get_some_file_names(). It's arbitrarily named readdir_callback() so its purpose is obvious.
int readdir_callback( char *path,
void *buf, // This is meant to be "opaque".
fuse_fill_dir_t *filler, // filler takes care of buf.
off_t off, // Last value given to filler.
struct fuse_file_info *fi )
{
// Call the fictitious API to get a list of file names.
name_list *list = get_some_file_names(path, off);
for (int i = 0; i < list->length; i++)
{
// Feed the file names to filler() one at a time.
if (filler(buf, list->names[i], NULL, off + i + 1))
{
break; // filler() returned 1, requesting a break.
}
incr_num_files_listed(list);
}
if (all_files_listed(list))
{
return 1; // Tell Fuse we're done.
}
return 0;
}
The off (offset) value is not used by the filler function to fill its opaque buffer, buf. The off value is, however, meaningful to the callback as an offset base as it provides file names to filler(). Whatever value was last passed to filler() is what gets passed back to readdir_callback() on its next invocation. filler()
itself only cares whether the off value is 0 or not-0.
Indicating "I'm done listing!" to Fuse
To signal to the Fuse system that your readdir_callback() is done listing file names in parts (when the last of the list of names has been given to filler()), simply return 1 from it.
How off Is Used
The off, offset, parameter should be non-0 to perform the partial listings. That's its only requirement as far as filler() is concerned. If off is 0, that indicates to Fuse that you're going to do a full listing in one shot (see below).
Although filler() doesn't care what the off value is beyond it being non-0, the value can still be meaningfully used. The code above is using the index of the next item in its own file list as its value. Fuse will keep passing the last off value it received back to the read dir callback on each invocation until the listing is complete (when readdir_callback() returns 1).
Listing the files all at once
int readdir_callback( char *path,
void *buf,
fuse_fill_dir_t *filler,
off_t off,
struct fuse_file_info *fi )
{
name_list *list = get_all_file_names(path);
for (int i = 0; i < list->length; i++)
{
filler(buf, list->names[i], NULL, 0);
}
return 0;
}
Listing all the files in one shot, as above, is simpler - but not by much. Note that off is 0 for the full listing. One may wonder, 'why even bother with the first approach of reading the folder contents in parts?'
The in-parts strategy is useful where a set number of buffers for file names is allocated, and the number of files within folders may exceed this number. For instance, the implementation of name_list above may only have 8 allocated buffers (char names[8][256]). Also, buf may fill up and filler() start returning 1 if too many names are given at once. The first approach avoids this.
The offset passed to the filler function is the offset of the next item in the directory. You can have the entries in the directory in any order you want. If you don't want to return an entire directory at once, you need to use the offset to determine what gets asked for and stored. The order of items in the directory is up to you, and doesn't matter what order the names or inodes or anything else is.
Specifically, in the readdir call, you are passed an offset. You want to start calling the filler function with entries that will be at this callback or later. In the simplest case, the length of each entry is 24 bytes + strlen(name of entry), rounded up to the nearest multiple of 8 bytes. However, see the fuse source code at http://sourceforge.net/projects/fuse/ for when this might not be the case.
I have a simple example, where I have a loop (pseudo c-code) in my readdir function:
int my_readdir(const char *path, void *buf, fuse_fill_dir_t filler, off_t offset, struct fuse_file_info *fi)
{
(a bunch of prep work has been omitted)
struct stat st;
int off, nextoff=0, lenentry, i;
char namebuf[(long enough for any one name)];
for (i=0; i<NumDirectoryEntries; i++)
{
(fill st with the stat information, including inode, etc.)
(fill namebuf with the name of the directory entry)
lenentry = ((24+strlen(namebuf)+7)&~7);
off = nextoff; /* offset of this entry */
nextoff += lenentry;
/* Skip this entry if we weren't asked for it */
if (off<offset)
continue;
/* Add this to our response until we are asked to stop */
if (filler(buf, namebuf, &st, nextoff))
break;
}
/* All done because we were asked to stop or because we finished */
return 0;
}
I tested this within my own code (I had never used the offset before), and it works fine.

Segmentation fault while trying to print parts of a pointer struct

I'm writing a program that must take user input to assign values to parts of a structure. I need to create a pointer to the structure that I will pass through as a one and only parameter for a function that will print each part of the structure individually. I also must malloc memory for the structure. As it is now, the program compiles and runs through main and asks the user for inputs. A segmentation fault occurs after the last user input is collected and when I'm assuming the call to the printContents function is run. Any help would be appreciated!
#include <stdio.h>
#include <stdlib.h>
struct info
{
char name[100], type;
int size;
long int stamp;
};
void printContents(struct info *iptr);
int main(void)
{
struct info *ptr=malloc(sizeof(struct info));
printf("Enter the type: \n");
scanf("%c", &(*ptr).type);
printf("Enter the filename: \n");
scanf("%s", (*ptr).name);
printf("Enter the access time: \n");
scanf("%d", &(*ptr).stamp);
printf("Enter the size: \n");
scanf("%d", &(*ptr).size);
printf("%c", (*ptr).type);
printContents(ptr);
}
void printContents(struct info *iptr)
{
printf("Filename %s Size %d Type[%s] Accessed # %d \n", (*iptr).name, (*iptr).size, (*iptr).type, (*iptr).stamp);
}
Check the operator precedence. Is this &(*ptr).type the thing you're trying to do? Maybe &((*ptr).type) ?
ptr->member is like access to structure variable right? Also same for scanf() usr &ptr->member to get value. For char input use only ptr->charmember .
First let's do it the hard way. We'll assume that the code is already written, the compiler tells us nothing useful, and we don't have a debugger. First we put in some diagnostic output statements, and we discover that the crash happens in printContents:
printf("testing four\n"); /* we see this, so the program gets this far */
printf("Filename %s Size %d Type[%s] Accessed # %d \n", (*iptr).name, (*iptr).size, (*iptr).type, (*iptr).stamp);
printf("testing five\n"); /* the program crashes before this */
If we still can't see the bug, we narrow the problem down by preparing a minimal compete example. (This is a very valuable skill.) We compile and run the code over and over, commenting things out. When we comment something out and the code still segfaults, we remove it entirely; but if commenting it out makes the problem go away, we put it back in. Eventually we get down to a minimal case:
#include <stdio.h>
int main(void)
{
char type;
type = 'a';
printf("Type[%s]\n", type);
}
Now it should be obvious: when we printf("%s", x) something, printf expects x to be a string. That is, x should be a pointer to (i.e. the address of) the first element of a character array which ends with a null character. Instead we've given it a character (in this case 'a'), which it interprets as a number (in this case 97), and it tries to go to that address in memory and start reading; we're lucky to get nothing worse than a segfault. The fix is easy: decide whether type should be a char or a char[], if it's char then change the printf statement to "%c", if it's char[] then change its declaration.
Now an easy way. If we're using a good compiler like gcc, it will warn us that we're doing something fishy:
gcc foo.c -o foo
foo.c:35: warning: format ‘%s’ expects type ‘char *’, but argument 4 has type ‘int’
In future, there's a way you can save yourself all this trouble. Instead of writing a lot of code, getting a mysterious bug and backtracking, you can write in small increments. If you had added one term to that printf statement at a time, you would have seen exactly when the bug appeared, and which term was to blame.
Remember: start small and simple, add complexity a little at a time, test at every step, and never add to code that doesn't work.

Tools to reduce risk regarding password security and HDD slack space

Down at the bottom of this essay is a comment about a spooky way to beat passwords. Scan the entire HDD of a user including dead space, swap space etc, and just try everything that looks like it might be a password.
The question: part 1, are there any tools around (A live CD for instance) that will scan an unmounted file system and zero everything that can be? (Note I'm not trying to find passwords)
This would include:
Slack space that is not part of any file
Unused parts of the last block used by a file
Swap space
Hibernation files
Dead space inside of some types of binary files (like .DOC)
The tool (aside from the last case) would not modify anything that can be detected via the file system API. I'm not looking for a block device find/replace but rather something that just scrubs everything that isn't part of a file.
part 2, How practical would such a program be? How hard would it be to write? How common is it for file formats to contain uninitialized data?
One (risky and costly) way to do this would be to use a file system aware backup tool (one that only copies the actual data) to back up the whole disk, wipe it clean and then restore it.
I don't understand your first question (do you want to modify the file system? Why? Isn't this dead space exactly where you want to look?)
Anyway, here's an example of such a tool:
#include <stdio.h>
#include <alloca.h>
#include <string.h>
#include <ctype.h>
/* Number of bytes we read at once, >2*maxlen */
#define BUFSIZE (1024*1024)
/* Replace this with a function that tests the passwort consisting of the first len bytes of pw */
int testPassword(const char* pw, int len) {
/*char* buf = alloca(len+1);
memcpy(buf, pw,len);
buf[len] = '\0';
printf("Testing %s\n", buf);*/
int rightLen = strlen("secret");
return len == rightLen && memcmp(pw, "secret", len) == 0;
}
int main(int argc, char* argv[]) {
int minlen = 5; /* We know the password is at least 5 characters long */
int maxlen = 7; /* ... and at most 7. Modify to find longer ones */
int avlen = 0; /* available length - The number of bytes we already tested and think could belong to a password */
int i;
char* curstart;
char* curp;
FILE* f;
size_t bytes_read;
char* buf = alloca(BUFSIZE+maxlen);
if (argc != 2) {
printf ("Usage: %s disk-file\n", argv[0]);
return 1;
}
f = fopen(argv[1], "rb");
if (f == NULL) {
printf("Couldn't open %s\n", argv[1]);
return 2;
}
for(;;) {
/* Copy the rest of the buffer to the front */
memcpy(buf, buf+BUFSIZE, maxlen);
bytes_read = fread(buf+maxlen, 1, BUFSIZE, f);
if (bytes_read == 0) {
/* Read the whole file */
break;
}
for (curstart = buf;curstart < buf+bytes_read;) {
for (curp = curstart+avlen;curp < curstart + maxlen;curp++) {
/* Let's assume the password just contains letters and digits. Use isprint() otherwise. */
if (!isalnum(*curp)) {
curstart = curp + 1;
break;
}
}
avlen = curp - curstart;
if (avlen < minlen) {
/* Nothing to test here, move along */
curstart = curp+1;
avlen = 0;
continue;
}
for (i = minlen;i <= avlen;i++) {
if (testPassword(curstart, i)) {
char* found = alloca(i+1);
memcpy(found, curstart, i);
found[i] = '\0';
printf("Found password: %s\n", found);
}
}
avlen--;
curstart++;
}
}
fclose(f);
return 0;
}
Installation:
Start a Linux Live CD
Copy the program to the file hddpass.c in your home directory
Open a terminal and type the following
su || sudo -s # Makes you root so that you can access the HDD
apt-get install -y gcc # Install gcc
This works only on Debian/Ubuntu et al, check your system documentation for others
gcc -o hddpass hddpass.c # Compile.
./hddpass /dev/YOURDISK # The disk is usually sda, hda on older systems
Look at the output
Test (copy to console, as root):
gcc -o hddpass hddpass.c
</dev/zero head -c 10000000 >testdisk # Create an empty 10MB file
mkfs.ext2 -F testdisk # Create a file system
rm -rf mountpoint; mkdir -p mountpoint
mount -o loop testdisk mountpoint # needs root rights
</dev/urandom head -c 5000000 >mountpoint/f # Write stuff to the disk
echo asddsasecretads >> mountpoint/f # Write password in our pagefile
# On some file systems, you could even remove the file.
umount testdisk
./hdpass testdisk # prints secret
Test it yourself on an Ubuntu Live CD:
# Start a console and type:
wget http://phihag.de/2009/so/hddpass-testscript.sh
sh hddpass-testscript.sh
Therefore, it's relatively easy. As I found out myself, ext2 (the file system I used) overwrites deleted files. However, I'm pretty sure some file systems don't. Same goes for the pagefile.
How common is it for file formats to contain uninitialized data?
Less and less common, I would've thought. The classic "offender" is older versions of MS office applications that (essentially) did a memory dump to disk as its "quicksave" format. No serialisation, no selection of what to dump and a memory allocator that doesn't zero newly allocated memory pages. That lead to not only juicy things from previous versions of the document (so the user could use undo), but also juicy snippets from other applications.
How hard would it be to write?
Something that clears out unallocated disk blocks shouldn't be that hard. It'd need to run either off-line or as a kernel module, so as to not interfer with normal file-system operations, but most file systems have an "allocated"/"not allocated" structure that is fairly straight-forward to parse. Swap is harder, but as long as you're OK with having it cleared on boot (or shutdown), it's not too tricky. Clearing out the tail block is trickier, definitely not something I'd want to try to do on-line, but it shouldn't be TOO hard to make it work for off-line cleaning.
How practical would such a program be?
Depends on your threat model, really. I'd say that on one end, it'd not give you much at all, but on the other end, it's a definite help to keep information out of the wrong hands. But I can't give a hard and fast answer,
Well, if I was going to code it for a boot CD, I'd do something like this:
File is 101 bytes but takes up a 4096-byte cluster.
Copy the file "A" to "B" which has nulls added to the end.
Delete "A" and overwrite it's (now unused) cluster.
Create "A" again and use the contents of "B" without the tail (remember the length).
Delete "B" and overwrite it.
Not very efficient, and would need a tweak to make sure you don't try to copy the first (and therefor full) clusters in a file. Otherwise, you'll run into slowness and failure if there's not enough free space.
There's tools that do this efficiently that are open source?

Resources