Recently, I am working on developing an Application Whitelisting solution for embedded linux based on the Linux Security Framework. The main focus of my LSM is implementing the bprm_check_security hook, invoked, when a program executing in the user-space (we do not consider kernel processes).
This hook is given a pointer of type "struct linux_binprm *bprm". This pointer includes a file pointer (including the executable file of the executed program), and a char pointer (including the name of the executed program).
Our application whitelisting solution is based on hash calculation. Accordingly, in my LSM, I use the file pointer(contained in the bprm pointer) to calculate a new hash value and store that value together with the filename (in the bprm pointer) as an entry in a list.
However, during the linux boot (before the /sbin/init is executed), there are missmatches between the filename, and the file pointer.
For instance, in one of first executing programs, the filename in the bprm pointer is "/bin/cat", however, the file pointer in the same bprm pointer is not the actual file of /bin/cat, rather busybox.
After researching for a long time, I found out, that those files are executed by busybox to create an initial initrd, which consequently create the actual rootfs, and all of those files have the magic number RAMFS_MAGIC (stored in inode->i_sb->s_magic). So I used this number to filter those processes, however, I am not sure, whether it would be the right way or not. I would appreciate any helps.
It is to be noted that, I use the file pointer (included in the bprm pointer) to calculate the hash values, in other words, I dont read files depending on their filename or filepath from the userspace.
thanks.
/include/linux/binfmts.h
struct linux_binprm {
struct file * file;
const char * filename; /* Name of binary as seen by procps */
};
Related
I am currently developing a simple kernel module that can steal system calls such as open, read, write and replace them with a simple function which logs the files being opened, read, written, into a file and return the original system calls.
My query is, I am able to get the File Descriptor in read and write system calls, but I am not able to understand how to obtain file name using the same.
Currently I am able to access the file structure associated with given FD using following code:
struct file *file;
file = fcheck(fd);
This file structure has two important entities in it, which are of my concern I believe:
f_path
f_inode
Can anybody help me get dentry or inode or the path name associated with this fd using the file structure associated with it?
Is my approach correct? Or do I need to do something different?
I am using Ubuntu 14.04 and my kernel version is 3.19.0-25-generic, for the kernel module development.
.f_inode is actually an inode.
.f_path->dentry is a dentry.
Traversing this dentry via ->d_parent link, until f_path.mnt.mnt_root dentry will be touched, and collecting dentry->d_name components, will construct the file's path, relative to the mount point. This is done, e.g., with d_path, but in more carefull way.
Instead of fcheck(fd), which should be used inside RCU read section, you can also use fget(fd), which should be paired with fput().
The approach is completely incorrect - see http://www.watson.org/~robert/2007woot/
Linux already has a reliable mechanism for doing this thing (audit). If you want to implement it anyway (for fun I presume), you want to place your hooks roughly where audit is doing that. Chances are LSM hooks are in appropriate places, have not checked.
Can please summarize the events/steps that happen when I try to execute a read()/write() system call. How does the kernel know which file system to issue these commands.
Lets say a process calls write().
Then It will call sys_write().
Now probably, since sys_write() is executed on behalf of the current process, it can access the struct task_struct and hence it can access the struct files_struct and struct fs_struct which contains file system information.
But after that I am not seeing, how this fs_struct is helping to identify the file system.
Edit: Now that Alex has described the flow...I have still doubt how the read/write are getting routed to a FS, since the VFS does not do it, then it must be happening somewhere else, Also how is the underlying block device and then finally the hardware protocol PCI/USB getting attached.
A simple flow chart involving actual data structures would be helpful
Please help.
This answer is based on kernel version 4.0. I traced out some of the code which handles a read syscall. I recommend you clone the Linux source repo and follow along in the source code.
Syscall handler for read, at fs/read_write.c:620 is called. It receives a file descriptor (integer) as an argument, and calls fdget_pos to convert it to a struct fd.
fdget_pos calls __fdget_pos calls __fdget calls __fget_light. __fget_light uses current->files, the file descriptor table for the current process, to look up the struct file which corresponds to the passed file descriptor number.
Back in the syscall handler, the file struct is passed to vfs_read, at fs/read_write.c:478.
vfs_read calls __vfs_read, which calls file->f_op->read. From here on, you are in filesystem-specific code.
So the VFS doesn't really bother "identifying" the filesystem which a file lives on; it simply uses the table of "file operation" function pointers which is stored in its struct file. When that struct file is initialized, it is given the correct f_op function pointer table which implements all the filesystem-specific operations for its filesystem.
Each filesystem registers itself to VFS. When a filesystem is mounted, its superblock is read and VFS superblock is populated with this information. Function pointer table for this filesystem is also populated at this time. when file->f_op->read call happens, registered function from the filesystem is actually called. You can refer to text in http://www.science.unitn.it/~fiorella/guidelinux/tlk/node102.html
Given that in Linux utimes(2) is a system call and futimes(3) is a library function, I would think that futimes is implemented in terms of utimes. However, utimes takes a pathname, whereas futimes takes a file descriptor.
Since, it is "not possible" to determine a pathname from the file descriptor or i-node number I wonder how this can be done? Does the "real" system call always work on i-node numbers?
First, you likely wrongly mentioned Posix because the latter doesn't differ system calls and library functions. The putting of futimes() to library calls is Linux specific. In glibc (file sysdeps/unix/sysv/linux/futimes.c), there is the comment:
/* Change the access time of the file associated with FD to TVP[0] and
the modification time of FILE to TVP[1].
Starting with 2.6.22 the Linux kernel has the utimensat syscall which
can be used to implement futimes. Earlier kernels have no futimes()
syscall so we use the /proc filesystem. */
So, this is done using utimensat() with the specified descriptor as the reference one as for all *at() calls. Previously, this worked using utimes() for the path /proc/${pid}/fd/${fd} (too cumbersome and only if /proc is mounted). This is a reply to your second question: despite it isn't generally possible to detect a file name from its descriptor, the file still could be accessed separately. (BTW, the initial path used to open the file is sometimes stored; see /proc/$pid/{cwd,exe} for a Linux process.)
To compare with, FreeBSD provides explicit futimes() and futimesat() syscalls (but I wonder why the latter isn't named "utimesat").
I am writing a program consisting of user program and a kernel module. The kernel module needs to gather data that it will then "send" to the user program. This has to be done via a /proc file. Now, I create the file, everything is fine, and spent ages reading the internet for answer and still cannot find one. How do you read/write a /proc file from the kernel space ? The write_proc and read_proc supplied to the procfile are used to read and write data from USER space, whereas I need the module to be able to write the /proc file itself.
That's not how it works. When a userspace program opens the files, they are generated on the fly on a case-by-case basis. Most of them are readonly and generated by a common mechanism:
Register an entry with create_proc_read_entry
Supply a callback function (called read_proc by convention) which is called when the file is read
This callback function should populate a supplied buffer and (typically) call proc_calc_metrics to update the file pointer etc supplied to userspace.
You (from the kernel) do not "write" to procfs files, you supply the results dynamically when userspace requests them.
One of the approaches to get data across to the user space would be seq_files. In order to configure (write) kernel parameters you may want to consider sys-fs nodes.
Thanks,
Vijay
When a program accesses files, uses system(), etc., how and where is the current working directory of that program physically known/stored? Since logically the working directory of a program is similar to a global variable, it should ideally be thread-local, especially in languages like D where "global" variables are thread-local by default. Would it be possible to make the current working directory of a program thread-local?
Note: If you are not familiar with D specifically, even a language-agnostic answer would be useful.
On Linux, each process is represented by a process descriptor - a task_struct. This structure is defined in include/linux/sched.h in the kernel source.
One of the fields of task_struct is a pointer to an fs_struct, which stores filesystem-related information. fs_struct is defined in include/linux/fs_struct.h.
fs_struct has a field called pwd, which stores information about the current working directory (the filesystem it is on, and the details of the directory itself).
Current directory is maintained by the OS, not by language or framework. See description of GetCurrentDirectory WinAPI function for details.
From description:
Multithreaded applications and shared
library code should not use the
GetCurrentDirectory function and
should avoid using relative path
names. The current directory state
written by the SetCurrentDirectory
function is stored as a global
variable in each process, therefore
multithreaded applications cannot
reliably use this value without
possible data corruption from other
threads that may also be reading or
setting this value.