resizing file size with ftruncate() after mmap() - linux

The code snippet works fine on my machine(Linux/x86-64)
int main()
{
char* addr;
int rc;
int fd;
const size_t PAGE_SIZE = 4096; // assuming the page size is 4096
char buf[PAGE_SIZE];
memset(buf, 'x', sizeof(buf));
// error checking is ignored, for demonstration purpose
fd = open("abc", O_RDWR | O_CREAT, S_IWUSR | S_IRUSR);
ftruncate(fd, 0);
write(fd, buf, 4090);
// the file size is less than one page, but we allocate 2 page address space
addr = mmap(NULL, PAGE_SIZE * 2, PROT_WRITE, MAP_SHARED, fd, 0);
// it would crash if we read/write from addr[4096]
// extend the size after mmap
ftruncate(fd, PAGE_SIZE * 2);
// now we can access(read/write) addr[4096]...addr[4096*2 -1]
munmap(addr, PAGE_SIZE * 2);
close(fd);
exit(EXIT_SUCCESS);
}
But POSIX says:
If the size of the mapped file changes after the call to mmap() as a result of some other operation on the mapped file, the effect of references to portions of the mapped region that correspond to added or removed portions of the file is unspecified.
So I guess this is not a portable way. But is it guaranteed to work on Linux?

Related

mmap failed to allocate virtual memory

I got the following output in ftrace:
mmap(0x200000000000, 17179869184, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
My code:
void alloc_page_full_reverse()
{
printf("Allocating default pagesize pages > 128TB \n");
mmap_chunks_higher(24575, 0);
printf("Allocating default pagesize pages < 128TB \n");
/* Note: Allocating a 16GB chunk less due to heap space required
for other mappings */
mmap_chunks_lower(8190, 0);
}
int mmap_chunks_higher(unsigned long no_of_chunks, unsigned long hugetlb_arg)
{
unsigned long i;
char *hptr;
char *hint;
int mmap_args = 0;
for (i = 0; i < no_of_chunks; i++){
hint = hind_addr();
hptr = mmap(hint, MAP_CHUNK_SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | hugetlb_arg, -1, 0); // MAP_CHUNK_SIZE = 16GB
if (hptr == MAP_FAILED){
printf("\n Map failed at address %p < 384TB in iteration = %d \n", hptr, i);
exit(-1);
}
if (validate_addr(hptr, 1)){
printf("\n Address failed, not in > 128Tb iterator = %d\n", i);
exit(-1);
}
}
printf("> 128Tb: \n chunks allocated= %d \n", i);
}
static char *hind_addr(void)
{
int bits = 48 + rand() % 15;
return (char *) (1UL << bits);
}
Need to understand before mmap how to validate **void mmap(void addr, size_t length, int prot, int flags, int fd, off_t offset); all its argument are validated,
EX: size_t length is validated.
I still want to make sure I have enough memory before doing a mmap
There isn't an interface that allows a process to check this, and for good reason. Suppose such a syscall existed, and the kernel told a process it could allocate 1 GB of memory. However, it is possible the kernel is not able to allocate that memory by the time process actually requests the allocation. So, this information would not be useful.
Instead, you should attempt to allocate memory, and handle ENOMEM.

why the fd still available after shm_unlink()

I'm reading the source code in https://wayland-book.com/surfaces/shared-memory.html .
The author create a shared memory using shm_open(), and shm_unlink() it immediately, then ftruncate() the fd to a specific size, mmap() the fd and fill the region with pixels.
I'm so confused why the fd still available after shm_unlink().
according to the man page:
The operation of shm_unlink() is analogous to unlink(2): it removes a shared memory object name, and, once all processes have unmapped the object, de-allocates and destroys the contents of the associated memory region. After a successful shm_unlink(), attempts to shm_open() an object with the same name will fail (unless O_CREAT was specified, in which case a new, distinct object is created).
so shm_unlink() will cause the memory destroyed because there is no process mmap
the region. But how fd still avaliable?
here is the code:
static int
create_shm_file(void)
{
int retries = 100;
do {
char name[] = "/wl_shm-XXXXXX";
randname(name + sizeof(name) - 7);
--retries;
int fd = shm_open(name, O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR);
if (fd >= 0) {
shm_unlink(name); // unlink immediately
return fd;
}
} while (retries > 0 && errno == EEXIST);
return -1;
}
static int
allocate_shm_file(size_t size)
{
int fd = create_shm_file();
if (fd < 0)
return -1;
int ret;
do {
ret = ftruncate(fd, size); //why the fd still available?
} while (ret < 0 && errno == EINTR);
if (ret < 0) {
close(fd);
return -1;
}
return fd;
}
//after above, there was mmap
uint32_t *data = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

How to get memory address from shm_open?

I want to share memory using a file descriptor with another process created via fork.
The problem is that I get different address regions from mmap.
I want that mmap returns the same address value. Only in such case I can be sure that I really share the memory.
Probably it is possible to use MAP_FIXED flag to mmap, but how to get memory address from shm_open?
Is it possible to share memory via shm_open at all?
Maybe shmget must be used instead?
This is the minimal working example:
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
int main(void)
{
/* Create a new memory object */
int fd = shm_open( "/dummy", O_RDWR | O_CREAT, 0777 );
if(fd == -1) {
fprintf(stderr, "Open failed:%m\n");
return 1;
}
/* Set the memory object's size */
size_t size = 4096; /* minimal */
if (ftruncate(fd, size) == -1) {
fprintf(stderr, "ftruncate: %m\n");
return 1;
}
void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (ptr == MAP_FAILED) return 1;
void *ptr2 = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (ptr2 == MAP_FAILED) return 1;
printf("%p\n%p\n", ptr, ptr2);
return 0;
}
Compile it with gcc test.c -lrt.
This is the output:
0x7f3247a78000
0x7f3247a70000
EDIT
If I try to use method described in comment, child does not see changes in memory made by parent. Why? This is how I do it:
In parent:
shm_data = mmap(NULL, shm_size, PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0);
...
snprintf(shared_memory, 20, "%p", shm_data);
execl("/path/to/prog", "prog", shared_memory, (char *) NULL);
In child:
...
void *base_addr;
sscanf(argv[1], "%p", (void **)&base_addr);
shm_data = mmap(base_addr, shm_size, PROT_READ, MAP_ANONYMOUS | MAP_SHARED | MAP_FIXED, -1, 0);
...
EDIT2
See also this question: How to get memory address from memfd_create?

how to implement splice_read for a character device file with uncached DMA buffer

I have a character device driver. It includes a 4MB coherent DMA buffer. The buffer is implemented as a ring buffer. I also implemente the splice_read call for the driver to improve the performance. But this implementation does not work well. Below is the using example:
(1)splice the 16 pages of device buffer data to a pipefd[1]. (the DMA buffer is managed as in page unit).
(2)splice the pipefd[0] to the socket.
(3)the receiving side (tcp client) receives the data, and then check the correctness.
I found that the tcp client got errors. The splice_read implementation is show below (I steal it from the vmsplice implementation):
/* splice related functions */
static void rdma_ring_pipe_buf_release(struct pipe_inode_info *pipe,
struct pipe_buffer *buf)
{
put_page(buf->page);
buf->flags &= ~PIPE_BUF_FLAG_LRU;
}
void rdma_ring_spd_release_page(struct splice_pipe_desc *spd, unsigned int i)
{
put_page(spd->pages[i]);
}
static const struct pipe_buf_operations rdma_ring_page_pipe_buf_ops = {
.can_merge = 0,
.map = generic_pipe_buf_map,
.unmap = generic_pipe_buf_unmap,
.confirm = generic_pipe_buf_confirm,
.release = rdma_ring_pipe_buf_release,
.steal = generic_pipe_buf_steal,
.get = generic_pipe_buf_get,
};
/* in order to simplify the caller work, the parameter meanings of ppos, len
* has been changed to adapt the internal ring buffer of the driver. The ppos
* indicate wich page is refferred(shoud start from 1, as the csr page are
* not allowed to do the splice), The len indicate how many pages are needed.
* Also, we constrain that maximum page number for each splice shoud not
* exceed 16 pages, if else, a EINVAL will return. If a high speed device
* need a more big page number, it can rework this routing. The off is also
* used to return the total bytes shoud be transferred, use can compare it
* with the return value to determint whether all bytes has been transfered.
*/
static ssize_t do_rdma_ring_splice_read(struct file *in, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags)
{
struct rdma_ring *priv = to_rdma_ring(in->private_data);
struct rdma_ring_buf *data_buf;
struct rdma_ring_dstatus *dsta_buf;
struct page *pages[PIPE_DEF_BUFFERS];
struct partial_page partial[PIPE_DEF_BUFFERS];
ssize_t total_sz = 0, error;
int i;
unsigned offset;
struct splice_pipe_desc spd = {
.pages = pages,
.partial = partial,
.nr_pages_max = PIPE_DEF_BUFFERS,
.flags = flags,
.ops = &rdma_ring_page_pipe_buf_ops,
.spd_release = rdma_ring_spd_release_page,
};
/* init the spd, currently we omit the packet header, if a control
* is needed, it may be implemented by define a control variable in
* the device struct */
spd.nr_pages = len;
for (i = 0; i < len; i++) {
offset = (unsigned)(*ppos) + i;
data_buf = get_buf(priv, offset);
dsta_buf = get_dsta_buf(priv, offset);
pages[i] = virt_to_page(data_buf);
get_page(pages[i]);
partial[i].offset = 0;
partial[i].len = dsta_buf->bytes_xferred;
total_sz += partial[i].len;
}
error = _splice_to_pipe(pipe, &spd);
/* use the ppos to return the theory total bytes shoud transfer */
*ppos = total_sz;
return error;
}
/* splice read */
static ssize_t rdma_ring_splice_read(struct file *in, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len, unsigned int flags)
{
ssize_t ret;
MY_PRINT("%s: *ppos = %lld, len = %ld\n", __func__, *ppos, (long)len);
if (unlikely(len > PIPE_DEF_BUFFERS))
return -EINVAL;
ret = do_rdma_ring_splice_read(in, ppos, pipe, len, flags);
return ret;
}
The _splice_to_pipe is just the same one as the splice_to_pipe in kernel. As this function is not an exported symbol, so I re-implemented it.
I think the main cause is that the some kind of lock of pages are omitted, but
I don't know where and how.
My kernel version is 3.10.

Quickly close mmap discarding unflushed changes

I am using an mmap'ed file as a virtual memory arena - the file is manually allocated because I want to control its location. On munmap, all the current contents of the buffers are flushed to the file, but I don't really need the file contents. Is it possible to simply discard the mmap area without write back?
Linux-specific solutions are OK.
I mean something like
char* myswaparea = "/tmp/myswaparea";
int64_t len = 1LL << 30;
fd = open(myswaparea, O_CREAT|O_RDWR, 0600);
ftruncate(fd, len);
void* arena = mmap(NULL, len, .... fd ...);
/* use arena */
munmap(arena, len); /* here comes an unnecessary flush */
close(fd);
unlink(myswaparea);
If you don't need / want to write back the changes to the file, just use the MAP_PRIVATE flag when you create the map (4th argument to mmap(2)).
From the manpage:
MAP_PRIVATE
Create a private copy-on-write mapping. Updates to the
mapping are not visible to other processes mapping the same file, and
are not carried through to the underlying file. It is unspecified
whether changes made to the file after the mmap() call are visible in
the mapped region.
EXAMPLE
fd = open("myfile", O_RDWR);
if (fd < 0) {
/* Handle error... */
}
void *ptr;
size_t len = 1024;
ptr = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
if (ptr == MAP_FAILED) {
/* Handle error... */
}
/* ... */
if (munmap(ptr, len) < 0) {
/* Handle error... */
}

Resources