Is the sscanf function in the Linux kernel susceptible to buffer overflow attacks? - linux

From what I understand, a typical buffer overflow attack occurs when an attack overflows a buffer of memory on the stack, thus allowing the attacker to inject malicious code and rewrite the return address on the stack to point to that code.
This is a common concern when using functions (such as sscanf) that blindly copy data from one area to another, checking one for a termination byte:
char str[8]; /* holds up to 8 bytes of data */
char *buf = "lots and lots of foobars"; /* way more than 8 bytes of data */
sscanf(buf, "%s", str); /* buffer overflow occurs here! */
I noticed some sysfs_ops store functions in the Linux kernel are implemented with the Linux kernel's version of the sscanf function:
static char str[8]; /* global string */
static ssize_t my_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t size)
{
sscanf(buf, "%s", str); /* buf holds more than 8 bytes! */
return size;
}
Suppose this store callback function is set to a writable sysfs attribute. Would a malicious user be able to intentionally overflow the buffer via a write call?
Normally, I would expect guards against buffer overflow attacks -- such as limiting the number of bytes read -- but I see none in a good number of functions (for example in drivers/scsi/scsi_sysfs.c).
Does the implementation of the Linux kernel version of sscanf protect against buffer overflow attacks; or is there another reason -- perhaps buffer overflow attacks are impossible given how the Linux kernel works under the hood?

The Linux sscanf() is vulnerable to buffer overflows; inspection of the source shows this. You can use width specifiers to limit the amount a %s is allowed to write. At some point your str must have had copy_from_user() run on it as well. It is possible the user space to pass some garbage pointer to the kernel.
In the version of Linux you cited, the scsi_sysfs.c does have a buffer overflow. The latest version does not. The committed fix should fix the issue you see.

Short answer:
sscanf, when well called, will not cause buffer overflow, especially in sysfs xxx_store() function. (There are a lot sscanf in sysfs XXX_store() examples), because Linux kernel add a '\0' (zero-terminated) byte after the string (buf[len] = 0;) for your XXX_store() function.
Long answer:
Normally, sysfs are defined to have a strict formatted data. Since you expect 8 bytes at most, it's reasonable to limit the size you get like this:
static char str[8]; /* global string */
static ssize_t my_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t size)
{
if (size > 8) {
printk("Error: Input size > 8: too large\n");
return -EINVAL;
}
sscanf(buf, "%s", str); /* buf holds more than 8 bytes! */
return size;
}
(Note: use 9 rather than 8, if you expect a 8-bytes string plus '\n')
(Note that you do reject some inputs such as those with many leading white spaces. However, who would send a string with many leading white spaces? Those who want to break your code, right? If they don't follow your spec, just reject them.)
Note that Linux kernel purposely inserts a '\0' at offset len (i.e. buf[len] = 0;) when the user write len bytes to sysfs purposely for safe sscanf, as said in a comment in kernel 2.6: fs/sysfs/file.c:
static int
fill_write_buffer(struct sysfs_buffer * buffer, const char __user * buf, size_t count)
{
int error;
if (!buffer->page)
buffer->page = (char *)get_zeroed_page(GFP_KERNEL);
if (!buffer->page)
return -ENOMEM;
if (count >= PAGE_SIZE)
count = PAGE_SIZE - 1;
error = copy_from_user(buffer->page,buf,count);
buffer->needs_read_fill = 1;
/* if buf is assumed to contain a string, terminate it by \0,
so e.g. sscanf() can scan the string easily */
buffer->page[count] = 0;
return error ? -EFAULT : count;
}
...
static ssize_t
sysfs_write_file(struct file *file, const char __user *buf, size_t count, loff_t *ppos)
{
struct sysfs_buffer * buffer = file->private_data;
ssize_t len;
mutex_lock(&buffer->mutex);
len = fill_write_buffer(buffer, buf, count);
if (len > 0)
len = flush_write_buffer(file->f_path.dentry, buffer, len);
if (len > 0)
*ppos += len;
mutex_unlock(&buffer->mutex);
return len;
}
Higher kernel version keeps the same logic (though already completely rewritten).

Related

Why does cat call read() twice when once was enough?

I am new to Linux kernel module. I am learning char driver module based on a web course. I have a very simple module that creates a /dev/chardevexample, and I have a question for my understanding:
When I do echo "hello4" > /dev/chardevexample, I see the write execute exactly once as expected. However, when I do cat /dev/chardevexample, I see the read executed two times.
I see this both in my code and in the course material. All the data was returned in the first read(), so why does cat call it again?
All the things I did so far are as follows:
insmod chardev.ko to load my module
echo "hello4" > /dev/chardevexample. This is the write and I see it happening exactly once in dmesg
cat /dev/chardevexample. This is the read, and dmesg shows it happening twice.
I did strace cat /dev/chardevexample, and I indeed see the function call being called twice for read. There is a write in between as well
read(3, "hello4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 4096
write(1, "hello4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096hello4) = 4096
read(3, "", 131072)
dmesg after read (cat command)
[909836.517402] DEBUG-device_read: To User hello4 and bytes_to_do 4096 ppos 0 # Read #1
[909836.517428] DEBUG-device_read: Data send to app hello4, nbytes=4096 # Read #1
[909836.519086] DEBUG-device_read: To User and bytes_to_do 0 ppos 4096 # Read #2
[909836.519093] DEBUG-device_read: Data send to app hello4, nbytes=0 # Read #2
Code snippet for read, write and file_operations is attached. Any
guidance would help. I searched extensively and couldn't understand.
Hence the post.
/*!
* #brief Write to device from userspace to kernel space
* #returns Number of bytes written
*/
static ssize_t device_write(struct file *file, //!< File pointer
const char *buf,//!< from for copy_from_user. Takes 'buf' from user space and writes to
//!< kernel space in 'buffer'. Happens on fwrite or write
size_t lbuf, //!< length of buffer
loff_t *ppos) //!< position to write to
{
int nbytes = lbuf - copy_from_user(
buffer + *ppos, /* to */
buf, /* from */
lbuf); /* how many bytes */
*ppos += nbytes;
buffer[strcspn(buffer, "\n")] = 0; // Remove End of line character
pr_info("Recieved data \"%s\" from apps, nbytes=%d\n", buffer, nbytes);
return nbytes;
}
/*!
* #brief Read from device - from kernel space to user space
* #returns Number of bytes read
*/
static ssize_t device_read(struct file *file,//!< File pointer
char *buf, //!< for copy_to_user. buf is 'to' from buffer
size_t lbuf, //!< Length of buffer
loff_t *ppos)//!< Position {
int nbytes;
int maxbytes;
int bytes_to_do;
maxbytes = PAGE_SIZE - *ppos;
if(maxbytes >lbuf)
bytes_to_do = lbuf;
else
bytes_to_do = maxbytes;
buffer[strcspn(buffer, "\n")] = 0; // Remove End of line character
printk("DEBUG-device_read: To User %s and bytes_to_do %d ppos %lld\n", buffer + *ppos, bytes_to_do, *ppos);
nbytes = bytes_to_do - copy_to_user(
buf, /* to */
buffer + *ppos, /* from */
bytes_to_do); /* how many bytes*/
*ppos += nbytes;
pr_info("DEBUG-device_read: Data send to app %s, nbytes=%d\n", buffer, nbytes);
return nbytes;} /* Every Device is like a file - this is device file operation */ static struct file_operations device_fops = {
.owner = THIS_MODULE,
.write = device_write,
.open = device_open,
.read = device_read,};
The Unix convention for indicating end-of-file is to have read return 0 bytes.
In this case, cat asks for 131072 bytes and only receives 4096. This is normal and not to be interpreted as having reached the end of the file. For example, it happens when you read from the keyboard but the user only inputs a small amount of data.
Because cat has not yet seen EOF (i.e. read did not return 0), it continues to issue read calls until it does. This means that if there's any data, you will always see a minimum of two read calls: one (or more) for the data, and one final one that returns 0.

Is it possible to dump inode information from the inotify subsystem?

I am trying to figure out what files my editor is watching on.
I have learnt that count the number of inotify fds from /proc/${PID}/fd is possible, and my question is: Is it possible to dump the list of watched inodes by one process?
UPDATE:
I have updated one working solution, and thanks for a helpful reference here.
UPDATE 2: well, recently I found kallsyms_lookup_name (and more symbols) not export since Linux Kernel v5.7, so I decide to update my own solution if anyone else cares.
Solved.
With the help of kprobe mechanism used in khook , I just simply hook the __x64_sys_inotify_add_watch and use user_path_at to steal the dentry.
The code snippet is listed below, and my working solution is provided here.
#define IN_ONLYDIR 0x01000000 /* only watch the path if it is a directory */
#define IN_DONT_FOLLOW 0x02000000 /* don't follow a sym link */
//regs->(di, si, dx, r10), reference: arch/x86/include/asm/syscall_wrapper.h#L125
//SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname, u32, mask)
KHOOK_EXT(long, __x64_sys_inotify_add_watch, const struct pt_regs *);
static long khook___x64_sysinotify_add_watch(const struct pt_regs *regs)
{
int wd;
struct path path;
unsigned int flags = 0;
char buf[PATH_MAX];
char *pname;
// decode the registers
int fd = (int) regs->di;
const char __user *pathname = (char __user *) regs->si;
u32 mask = (u32) regs->dx;
// do the original syscall
wd = KHOOK_ORIGIN(__x64_sys_inotify_add_watch, regs);
// get the pathname
if (!(mask & IN_DONT_FOLLOW))
flags |= LOOKUP_FOLLOW;
if (mask & IN_ONLYDIR)
flags |= LOOKUP_DIRECTORY;
if ( wd>=0 && (user_path_at(AT_FDCWD, pathname, flags, &path)==0) )
{
pname = dentry_path_raw(path.dentry, buf, PATH_MAX); //"pname" points to "buf[PATH_MAX]"
path_put(&path);
printk("%s, PID %d add (%d,%d): %s\n", current->comm, task_pid_nr(current), fd, wd, pname);
}
return wd;
}

Write from mmapped buffer to `O_DIRECT` output file

I have a device which writes to a video buffer. This buffer is allocated in system memory using CMA and I want to implement streaming write from this buffer to a block device. My application opens video buffer with mmap and I would like to use O_DIRECT write to avoid page cache related overhead. Basically, the pseudo-code of the application looks like this:
f_in = open("/dev/videobuf", O_RDONLY);
f_mmap = mmap(0, BUFFER_SIZE, PROT_READ, MAP_SHARED, f_in, 0);
f_out = open("/dev/sda", O_WRONLY | O_DIRECT);
write(f_out, f_mmap, BLOCK_SIZE);
where BLOCK_SIZE is sector aligned value. f_out is opened without any errors, but write results in EFAULT. I tried to track down this issue and it turned out that mmap implementation in video buffer's driver uses remap_pfn_range(), which sets VM_IO and VM_PFNMAP flags for VMA. The O_DIRECT path in block device drivers checks these flags and returns EFAULT. As far as I understand, O_DIRECT writes need to pin the memory pages, but VMA flags indicate the absence of struct page for underlying memory which causes an error. Am I right here?
And the main question is how to correctly implement O_DIRECT write from mmapped buffer? I have video buffer driver and can modify it appropriately.
I found similar question, but these is no clear answer there.
remap_pfn_range will set your vma as special via pte_mkspecial and add VM_IO/VM_PFNMAP to vma, so you cannot pass the following checks when do Direct I/O.
You say your memory comes from CMA, that's good because cma memory already has struct page support, so you can just use vm_insert_pages as the following steps:
declare cma region from kernel argument or dts
get struct pages from cma:
dma_page = dma_alloc_contiguous(&pdev->dev, size, GFP_KERNEL);
if (!dma_page) {
pr_err("%s %d, dma_alloc_contiguous fail\n", __func__, __LINE__);
return -ENOMEM;
}
nr_pages = DIV_ROUND_UP(size, PAGE_SIZE);
pages = kvmalloc_array(nr_pages, sizeof(*pages), GFP_KERNEL);
for (i = 0; i < nr_pages; i++)
pages[i] = &dma_page[i];
insert pages to vma when mmap
int your_mmap(struct file *file, struct vm_area_struct *vma) {
int ret = 0;
unsigned long temp_nr_pages;
if (vma->vm_end - vma->vm_start > size)
return -EINVAL;
/* duplicitate nr_pages in that vm_insert_pages can change nr_pages */
temp_nr_pages = nr_pages;
ret = vm_insert_pages(vma, vma->vm_start, pages, &temp_nr_pages);
if (ret < 0)
pr_err("%s vm_insert_pages fail, error is %d\n", __func__, ret);
return ret;
}
export dma_alloc_contiguous(the only mm code change, but not so bad).
modified kernel/dma/contiguous.c
## -332,6 +332,7 ## struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
return cma_alloc_aligned(dma_contiguous_default_area, size, gfp); }
+EXPORT_SYMBOL(dma_alloc_contiguous);
/**
* dma_free_contiguous() - release allocated pages

kmalloc: only allocating 4 bytes

So I am trying to dynamically allocate a buffer on module initialization. The buffer needs to be in scope at all times as it stores data that user space programs interact with. So here is my code:
static char* file_data
#define MAX_SIZE 256
.
.
.
{
file_data = kzalloc(MAX_SIZE, GFP_KERNEL)
.
.
.
}
However when I do sizeof file_data it always returns 4. What am I doing wrong?
Edit: The buffer stores input from a user space program, but 4 characters is all that can be stored.
size_t read_file(char* __user buf, size_t count)
{
unsigned int len = 0;
len = copy_to_user(buf, file_data, count);
return count;
}
ssize_t write_file(char* __user buf, size_t count)
{
if(count >= MAX_SIZE)
return -EINVAL;
copy_from_user(file_data, buf,count)
return count;
}
file_data is a pointer. On a 32-bit platform, it's size is 32 bits, or 4 bytes. What you want to know is the size of the data pointed to by file_data. You can't use the sizeof operator for this because sizeof is a compile time operation. You can't use it on things allocated dynamically at run time.
(Besides, you already know the size of the data pointed to by file_data -- it's MAX_SIZE?)
char *file_data is a pointer to a char. Evidently you're on a 32-bit system so any pointer is 4 bytes. The compiler (which handles sizeof) doesn't know or care how much memory you're allocating for file_data to point to, it just knows you're asking for the size of the pointer (which you are, whether you meant to or not). If you want the size of the memory it points to, you'll have to keep track of it yourself.

String manipulation in Linux kernel module

I am having a hard time in manipulating strings while writing module for linux. My problem is that I have a int Array[10] with different values in it. I need to produce a string to be able send to the buffer in my_read procedure. If my array is {0,1,112,20,4,0,0,0,0,0}
then my output should be:
0:(0)
1:-(1)
2:-------------------------------------------------------------------------------------------------------(112)
3:--------------------(20)
4:----(4)
5:(0)
6:(0)
7:(0)
8:(0)
9:(0)
when I try to place the above strings in char[] arrays some how weird characters end up there
here is the code
int my_read (char *page, char **start, off_t off, int count, int *eof, void *data)
{
int len;
if (off > 0){
*eof =1;
return 0;
}
/* get process tree */
int task_dep=0; /* depth of a task from INIT*/
get_task_tree(&init_task,task_dep);
char tmp[1024];
char A[ProcPerDepth[0]],B[ProcPerDepth[1]],C[ProcPerDepth[2]],D[ProcPerDepth[3]],E[ProcPerDepth[4]],F[ProcPerDepth[5]],G[ProcPerDepth[6]],H[ProcPerDepth[7]],I[ProcPerDepth[8]],J[ProcPerDepth[9]];
int i=0;
for (i=0;i<1024;i++){ tmp[i]='\0';}
memset(A, '\0', sizeof(A));memset(B, '\0', sizeof(B));memset(C, '\0', sizeof(C));
memset(D, '\0', sizeof(D));memset(E, '\0', sizeof(E));memset(F, '\0', sizeof(F));
memset(G, '\0', sizeof(G));memset(H, '\0', sizeof(H));memset(I, '\0', sizeof(I));memset(J, '\0', sizeof(J));
printk("A:%s\nB:%s\nC:%s\nD:%s\nE:%s\nF:%s\nG:%s\nH:%s\nI:%s\nJ:%s\n",A,B,C,D,E,F,G,H,I,J);
memset(A,'-',sizeof(A));
memset(B,'-',sizeof(B));
memset(C,'-',sizeof(C));
memset(D,'-',sizeof(D));
memset(E,'-',sizeof(E));
memset(F,'-',sizeof(F));
memset(G,'-',sizeof(G));
memset(H,'-',sizeof(H));
memset(I,'-',sizeof(I));
memset(J,'-',sizeof(J));
printk("A:%s\nB:%s\nC:%s\nD:%s\nE:%s\nF:%s\nG:%s\nH:%s\nI:%s\nJ:%\n",A,B,C,D,E,F,G,H,I,J);
len = sprintf(page,"0:%s(%d)\n1:%s(%d)\n2:%s(%d)\n3:%s(%d)\n4:%s(%d)\n5:%s(%d)\n6:%s(%d)\n7:%s(%d)\n8:%s(%d)\n9:%s(%d)\n",A,ProcPerDepth[0],B,ProcPerDepth[1],C,ProcPerDepth[2],D,ProcPerDepth[3],E,ProcPerDepth[4],F,ProcPerDepth[5],G,ProcPerDepth[6],H,ProcPerDepth[7],I,ProcPerDepth[8],J,ProcPerDepth[9]);
return len;
}
it worked out with this:
char s[500];
memset(s,'-',498);
for (i=len=0;i<10;++i){
len+=sprintf(page+len,"%d:%.*s(%d)\n",i,ProcPerDepth[i],s,ProcPerDepth[i]);
}
I wonder if there is an easy flag to multiply string char in sprintf. thanx –
Here are a some issues:
You have entirely filled the A, B, C ... arrays with characters. Then, you pass them to an I/O routine that is expecting null-terminated strings. Because your strings are not null-terminated, printk() will keep printing whatever is in stack memory after your object until it finds a null by luck.
Multi-threaded kernels like Linux have strict and relatively small constraints regarding stack allocations. All instances in the kernel call chain must fit into a specific size or something will be overwritten. You may not get any detection of this error, just some kind of downstream crash as memory corruption leads to a panic or a wedge. Allocating large and variable arrays on a kernel stack is just not a good idea.
If you are going to write the tmp[] array and properly nul-terminate it, there is no reason to also initialize it. But if you were going to initialize it, you could do so with compiler-generated code by just saying: char tmp[1024] = { 0 }; (A partial initialization of an aggregate requires by C99 initialization of the entire aggregate.) A similar observation applies to the other arrays.
How about getting rid of most of those arrays and most of that code and just doing something along the lines of:
for(i = j = 0; i < n; ++i)
j += sprintf(page + j, "...", ...)

Resources