How to read and write with mmap() on a Linux system - linux

I need to make some stream in and out classes using mmap() in Linux. To do so I tried to make some test code that writes some integers to a file, saves it, loads it again and write the data in the file to cout. If that test code works, then It wont be a problem making stream in and out afterwards.
When I first started out I got segment faults and If I did not get that nothing happened, so I googled a little. I found this book http://www.advancedlinuxprogramming.com/alp-folder/alp-ch05-ipc.pdf where around page 107 there is some usefull code. I copy pasted that code and made some small changes and got this code:
int fd;
void* file_memory;
/* Prepare a file large enough to hold an unsigned integer. */
fd = open ("mapTester", O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
//Make the file big enough
lseek (fd, 4 * 10 + 1, SEEK_SET);
write (fd, "", 1);
lseek (fd, 0, SEEK_SET);
/* Create the memory mapping. */
file_memory = mmap (0, 4 * 10, PROT_WRITE, MAP_SHARED, fd, 0);
close (fd);
/* Write a random integer to memory-mapped area. */
sprintf((char*) file_memory, "%d\n", 22);
/* Release the memory (unnecessary because the program exits). */
munmap (file_memory, 4 * 10);
cout << "Mark" << endl;
//Start the part where I read from the file
int integer;
/* Open the file. */
fd = open (argv[1], O_RDWR, S_IRUSR | S_IWUSR);
/* Create the memory mapping. */
file_memory = mmap (0, 4 * 10, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
close (fd);
/* Read the integer, print it out, and double it. */
scanf ((char *) file_memory, "%d", &integer);
printf ("value: %d\n", integer);
sprintf ((char*) file_memory, "%d\n", 2 * integer);
/* Release the memory (unnecessary because the program exits). */
munmap (file_memory, 4 * 10);
But I get a segment fults after the "mark" cout.
Then I replace the "read part" with this:
fd = open("mapTester", O_RDONLY);
int* buffer = (int*) malloc (4*10);
read(fd, buffer, 4 * 10);
for(int i = 0; i < 1; i++)
{
cout << buffer[i] << endl;
}
That is some working code that shows me that the file is empty. I tries out a couple of ways to write to the mapping without any change in result.
So how am I amble to make my code write?
And does my mmap read code seem okay (just in case you can see some obvious flaws)?
I have found some other resources that did not help me yet, but because I am a new user I may only post max 2 links.

You should test the result of mmap. If it gives MAP_FAILED check out errno to find out why.
And you'll better mmap a multiple of pages, often 4K bytes each, and given by sysconf(_SC_PAGESIZE)
You can use stat to find out the size of (and many other numbers about) some given file.
You could use strace on existing Linux programs to learn what syscalls they are doing.
See also this about /proc/ etc.

Your scanf() call should be sscanf(), and the second open should use "mapTester" instead of argv[1] as the filename. When I fix these bugs, your posted program works (printing out 22 and leaving 44 in the file).

Related

Why does cat call read() twice when once was enough?

I am new to Linux kernel module. I am learning char driver module based on a web course. I have a very simple module that creates a /dev/chardevexample, and I have a question for my understanding:
When I do echo "hello4" > /dev/chardevexample, I see the write execute exactly once as expected. However, when I do cat /dev/chardevexample, I see the read executed two times.
I see this both in my code and in the course material. All the data was returned in the first read(), so why does cat call it again?
All the things I did so far are as follows:
insmod chardev.ko to load my module
echo "hello4" > /dev/chardevexample. This is the write and I see it happening exactly once in dmesg
cat /dev/chardevexample. This is the read, and dmesg shows it happening twice.
I did strace cat /dev/chardevexample, and I indeed see the function call being called twice for read. There is a write in between as well
read(3, "hello4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 4096
write(1, "hello4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096hello4) = 4096
read(3, "", 131072)
dmesg after read (cat command)
[909836.517402] DEBUG-device_read: To User hello4 and bytes_to_do 4096 ppos 0 # Read #1
[909836.517428] DEBUG-device_read: Data send to app hello4, nbytes=4096 # Read #1
[909836.519086] DEBUG-device_read: To User and bytes_to_do 0 ppos 4096 # Read #2
[909836.519093] DEBUG-device_read: Data send to app hello4, nbytes=0 # Read #2
Code snippet for read, write and file_operations is attached. Any
guidance would help. I searched extensively and couldn't understand.
Hence the post.
/*!
* #brief Write to device from userspace to kernel space
* #returns Number of bytes written
*/
static ssize_t device_write(struct file *file, //!< File pointer
const char *buf,//!< from for copy_from_user. Takes 'buf' from user space and writes to
//!< kernel space in 'buffer'. Happens on fwrite or write
size_t lbuf, //!< length of buffer
loff_t *ppos) //!< position to write to
{
int nbytes = lbuf - copy_from_user(
buffer + *ppos, /* to */
buf, /* from */
lbuf); /* how many bytes */
*ppos += nbytes;
buffer[strcspn(buffer, "\n")] = 0; // Remove End of line character
pr_info("Recieved data \"%s\" from apps, nbytes=%d\n", buffer, nbytes);
return nbytes;
}
/*!
* #brief Read from device - from kernel space to user space
* #returns Number of bytes read
*/
static ssize_t device_read(struct file *file,//!< File pointer
char *buf, //!< for copy_to_user. buf is 'to' from buffer
size_t lbuf, //!< Length of buffer
loff_t *ppos)//!< Position {
int nbytes;
int maxbytes;
int bytes_to_do;
maxbytes = PAGE_SIZE - *ppos;
if(maxbytes >lbuf)
bytes_to_do = lbuf;
else
bytes_to_do = maxbytes;
buffer[strcspn(buffer, "\n")] = 0; // Remove End of line character
printk("DEBUG-device_read: To User %s and bytes_to_do %d ppos %lld\n", buffer + *ppos, bytes_to_do, *ppos);
nbytes = bytes_to_do - copy_to_user(
buf, /* to */
buffer + *ppos, /* from */
bytes_to_do); /* how many bytes*/
*ppos += nbytes;
pr_info("DEBUG-device_read: Data send to app %s, nbytes=%d\n", buffer, nbytes);
return nbytes;} /* Every Device is like a file - this is device file operation */ static struct file_operations device_fops = {
.owner = THIS_MODULE,
.write = device_write,
.open = device_open,
.read = device_read,};
The Unix convention for indicating end-of-file is to have read return 0 bytes.
In this case, cat asks for 131072 bytes and only receives 4096. This is normal and not to be interpreted as having reached the end of the file. For example, it happens when you read from the keyboard but the user only inputs a small amount of data.
Because cat has not yet seen EOF (i.e. read did not return 0), it continues to issue read calls until it does. This means that if there's any data, you will always see a minimum of two read calls: one (or more) for the data, and one final one that returns 0.

Write from mmapped buffer to `O_DIRECT` output file

I have a device which writes to a video buffer. This buffer is allocated in system memory using CMA and I want to implement streaming write from this buffer to a block device. My application opens video buffer with mmap and I would like to use O_DIRECT write to avoid page cache related overhead. Basically, the pseudo-code of the application looks like this:
f_in = open("/dev/videobuf", O_RDONLY);
f_mmap = mmap(0, BUFFER_SIZE, PROT_READ, MAP_SHARED, f_in, 0);
f_out = open("/dev/sda", O_WRONLY | O_DIRECT);
write(f_out, f_mmap, BLOCK_SIZE);
where BLOCK_SIZE is sector aligned value. f_out is opened without any errors, but write results in EFAULT. I tried to track down this issue and it turned out that mmap implementation in video buffer's driver uses remap_pfn_range(), which sets VM_IO and VM_PFNMAP flags for VMA. The O_DIRECT path in block device drivers checks these flags and returns EFAULT. As far as I understand, O_DIRECT writes need to pin the memory pages, but VMA flags indicate the absence of struct page for underlying memory which causes an error. Am I right here?
And the main question is how to correctly implement O_DIRECT write from mmapped buffer? I have video buffer driver and can modify it appropriately.
I found similar question, but these is no clear answer there.
remap_pfn_range will set your vma as special via pte_mkspecial and add VM_IO/VM_PFNMAP to vma, so you cannot pass the following checks when do Direct I/O.
You say your memory comes from CMA, that's good because cma memory already has struct page support, so you can just use vm_insert_pages as the following steps:
declare cma region from kernel argument or dts
get struct pages from cma:
dma_page = dma_alloc_contiguous(&pdev->dev, size, GFP_KERNEL);
if (!dma_page) {
pr_err("%s %d, dma_alloc_contiguous fail\n", __func__, __LINE__);
return -ENOMEM;
}
nr_pages = DIV_ROUND_UP(size, PAGE_SIZE);
pages = kvmalloc_array(nr_pages, sizeof(*pages), GFP_KERNEL);
for (i = 0; i < nr_pages; i++)
pages[i] = &dma_page[i];
insert pages to vma when mmap
int your_mmap(struct file *file, struct vm_area_struct *vma) {
int ret = 0;
unsigned long temp_nr_pages;
if (vma->vm_end - vma->vm_start > size)
return -EINVAL;
/* duplicitate nr_pages in that vm_insert_pages can change nr_pages */
temp_nr_pages = nr_pages;
ret = vm_insert_pages(vma, vma->vm_start, pages, &temp_nr_pages);
if (ret < 0)
pr_err("%s vm_insert_pages fail, error is %d\n", __func__, ret);
return ret;
}
export dma_alloc_contiguous(the only mm code change, but not so bad).
modified kernel/dma/contiguous.c
## -332,6 +332,7 ## struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
return cma_alloc_aligned(dma_contiguous_default_area, size, gfp); }
+EXPORT_SYMBOL(dma_alloc_contiguous);
/**
* dma_free_contiguous() - release allocated pages

ioctl() call resets file descriptor to 0

Consider the following code:
file_fd = open(device, O_RDWR);
if (file_fd < 0) {
perror("open");
return -1;
}
printf("File descriptor: %d\n", file_fd);
uint32_t DskSize;
if (ioctl(file_fd, BLKGETSIZE, &DskSize) < 0) {
perror("ioctl");
return -1;
}
printf("File descriptor after: %d\n", file_fd);
This snippet yields this:
File descriptor: 3
File descriptor after: 0
Why does my file descriptor get reset to 0? The program writes the stuff out to stdout instead of my block device.
This should not happen. I expect my file_fd to be non-zero and retain its value.
Looks like you smash your stack.
Since there are only two stack variables file_fd and DskSize and changing DskSize changes file_fd suggests that DiskSize must be unsigned long or size_t (a 64-bit value), not uint32_t.
Looking at BLKGETSIZE implementation confirms that the value type is unsigned long.
You may like to run your applications under valgrind, it reports this kind of errors.

Accessing another process virtual memory in Linux (debugging)

How does gdb access another process virtual memory on Linux? Is it all done via /proc?
How does gdb access another process virtual memory on Linux? Is it all done via /proc?
On Linux for reading memory:
1) If the number of bytes to read is fewer than 3 * sizeof (long) or the filesystem /proc is unavailable or reading from /proc/PID/mem is unsuccessful then ptrace is used with PTRACE_PEEKTEXT to read data.
These are these conditions in the function linux_proc_xfer_partial():
/* Don't bother for one word. */
if (len < 3 * sizeof (long))
return 0;
/* We could keep this file open and cache it - possibly one per
thread. That requires some juggling, but is even faster. */
xsnprintf (filename, sizeof filename, "/proc/%d/mem",
ptid_get_pid (inferior_ptid));
fd = gdb_open_cloexec (filename, O_RDONLY | O_LARGEFILE, 0);
if (fd == -1)
return 0;
2) If the number of bytes to read is greater or equal to 3 * sizeof (long) and /proc is available then pread64 or (lseek() and read() are used:
static LONGEST
linux_proc_xfer_partial (struct target_ops *ops, enum target_object object,
const char *annex, gdb_byte *readbuf,
const gdb_byte *writebuf,
ULONGEST offset, LONGEST len)
{
.....
/* If pread64 is available, use it. It's faster if the kernel
supports it (only one syscall), and it's 64-bit safe even on
32-bit platforms (for instance, SPARC debugging a SPARC64
application). */
#ifdef HAVE_PREAD64
if (pread64 (fd, readbuf, len, offset) != len)
#else
if (lseek (fd, offset, SEEK_SET) == -1 || read (fd, readbuf, len) != len)
#endif
ret = 0;
else
ret = len;
close (fd);
return ret;
}
On Linux for writing memory:
1) ptrace with PTRACE_POKETEXT or PTRACE_POKEDATA is used.
As for your second question:
where can I find information about ... setting hardware watchpoints
gdb, Internals Watchpoint:s http://sourceware.org/gdb/wiki/Internals%20Watchpoints
Reference:
http://linux.die.net/man/2/ptrace
http://www.alexonlinux.com/how-debugger-works

mmap slower than ioremap

I am developing for an ARM device running Linux 2.6.37. I am trying to toggle an IO pin as fast as possible. I made a little kernel module and a user space application. I tried two things :
Manipulate the GPIO control registers directly from the kernel space using ioremap.
mmap() the GPIO control registers without caching and using them from user space.
Both methods work, but the second is about 3 times slower than the first (observed on oscilloscope). I think I disabled all caching mechanisms.
Of course I'd like to get the best of the two worlds : flexibility and ease of development from user space with the speed of kernel space.
Does anybody know why the mmap() could be slower than the ioremap() ?
Here's my code :
Kernel module code
static int ti81xx_usmap_mmap(struct file* pFile, struct vm_area_struct* pVma)
{
pVma->vm_flags |= VM_RESERVED;
pVma->vm_page_prot = pgprot_noncached(pVma->vm_page_prot);
if (io_remap_pfn_range(pVma, pVma->vm_start, pVma->vm_pgoff,
pVma->vm_end - pVma->vm_start, pVma->vm_page_prot))
return -EAGAIN;
pVma->vm_ops = &ti81xx_usmap_vm_ops;
return 0;
}
static void ti81xx_usmap_test_gpio(void)
{
u32* pGpIoRegisters = ioremap_nocache(TI81XX_GPIO0_BASE, 0x400);
const u32 pin = 1 << 24;
int i;
/* I should use IO read/write functions instead of pointer deferencing,
* but portability isn't the issue here */
pGpIoRegisters[OMAP4_GPIO_OE >> 2] &= ~pin; /* Set pin as output*/
for (i = 0; i < 200000000; ++i)
{
pGpIoRegisters[OMAP4_GPIO_SETDATAOUT >> 2] = pin;
pGpIoRegisters[OMAP4_GPIO_CLEARDATAOUT >> 2] = pin;
}
pGpIoRegisters[OMAP4_GPIO_OE >> 2] |= pin; /* Set pin as input*/
iounmap(pGpIoRegisters);
}
User space application code
int main(int argc, char** argv)
{
int file, i;
ulong* pGpIoRegisters = NULL;
ulong pin = 1 << 24;
file = open("/dev/ti81xx-usmap", O_RDWR | O_SYNC);
if (file < 0)
{
printf("open failed (%d)\n", errno);
return 1;
}
printf("Toggle from kernel space...");
fflush(stdout);
ioctl(file, TI81XX_USMAP_IOCTL_TEST_GPIO);
printf(" done\n");
pGpIoRegisters = mmap(NULL, 0x400, PROT_READ | PROT_WRITE, MAP_SHARED, file, TI81XX_GPIO0_BASE);
printf("Toggle from user space...");
fflush(stdout);
pGpIoRegisters[OMAP4_GPIO_OE >> 2] &= ~pin;
for (i = 0; i < 30000000; ++i)
{
pGpIoRegisters[OMAP4_GPIO_SETDATAOUT >> 2] = pin;
pGpIoRegisters[OMAP4_GPIO_CLEARDATAOUT >> 2] = pin;
}
pGpIoRegisters[OMAP4_GPIO_OE >> 2] |= pin;
printf(" done\n");
fflush(stdout);
munmap(pGpIoRegisters, 0x400);
close(file);
return 0;
}
This is because ioremap_nocache() still enables the CPU write buffer in your VM mapping whereas pgprot_noncached() disables both bufferability and cacheability.
Apples to apples comparison would be to use ioremap_strongly_ordered() instead.
My guess would be that since mmap has to check to make sure you're writing to memory you're allowed to write to, it's going to be slower than the kernel version (which I believe/assume doesn't do that kind of checking--with a kernel module you're responsible for testing until you're very sure you're not breaking things).
Try using do_mmap (I believe that's the one) to use mmap from kernel space, and see how that compares. If it's comparably faster, then I'm right. If it's not, it's something else.

Resources