What is the purpose of MAP_ANONYMOUS flag in mmap system call? - linux

From the man page,
MAP_ANONYMOUS
The mapping is not backed by any file; its contents are initialized to zero. The fd and offset arguments are ignored; however, some implementations require
fd to be -1 if MAP_ANONYMOUS (or MAP_ANON) is specified, and portable applications should ensure this. The use of MAP_ANONYMOUS in conjunction with
MAP_SHARED is only supported on Linux since kernel 2.4.
What is the purpose of using MAP_ANONYMOUS? Any example would be good. Also From where the memory will be mapped?
It is written on man page that The use of MAP_ANONYMOUS in conjunction with MAP_SHARED is only supported on Linux since kernel 2.4.
How can i share the memory mapped with MAP_ANONYMOUS with other process?

Anonymous mappings can be pictured as a zeroized virtual file.
Anonymous mappings are simply large, zero-filled blocks of memory ready for use.
These mappings reside outside of the heap, thus do not contribute to data segment fragmentation.
MAP_ANONYMOUS + MAP_PRIVATE:
every call creates a distinct mapping
children inherit parent's mappings
childrens' writes on the inherited mapping are catered in copy-on-write manner
the main purpose of using this kind of mapping is to allocate a new zeroized memory
malloc employs anonymous private mappings to serve memory allocation requests larger than MMAP_THRESHOLD bytes.
typically, MMAP_THRESHOLD is 128kB.
MAP_ANONYMOUS + MAP_SHARED:
each call creates a distinct mapping that doesn't share pages with any other mapping
children inherit parent's mappings
no copy-on-write when someone else sharing the mapping writes on the shared mapping
shared anonymous mappings allow IPC in a manner similar to System V memory segments, but only between related processes
On Linux, there are two ways to create anonymous mappings:
specify MAP_ANONYMOUS flag and pass -1 for fd
addr = mmap(NULL, length, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED)
exit(EXIT_FAILURE);
open /dev/zero and pass this opened fd
fd = open("/dev/zero", O_RDWR);
addr = mmap(NULL, length, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
(this method is typically used on systems like BSD, that do not have MAP_ANONYMOUS flag)
Advantages of anonymous mappings:
- no virtual address space fragmentation; after unmapping, the memory is immediately returned to the system
- they are modifiable in terms of allocation size, permissions and they can also receive advice just like normal mappings
- each allocation is a distinct mapping, separate from global heap
Disadvantages of anonymous mappings:
- size of each mapping is an integer multiple of system's page size, thus it can lead to wastage of address space
- creating and returning mappings incur more overhead than that of from the pre-allocated heap
if a program containing such mapping, forks a process, the child inherits the mapping.
The following program demonstrates this kinda inheritance:
#ifdef USE_MAP_ANON
#define _BSD_SOURCE
#endif
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/wait.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
/*Pointer to shared memory region*/
int *addr;
#ifdef USE_MAP_ANON /*Use MAP_ANONYMOUS*/
addr = mmap(NULL, sizeof(int), PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED) {
fprintf(stderr, "mmap() failed\n");
exit(EXIT_FAILURE);
}
#else /*Map /dev/zero*/
int fd;
fd = open("/dev/zero", O_RDWR);
if (fd == -1) {
fprintf(stderr, "open() failed\n");
exit(EXIT_FAILURE);
}
addr = mmap(NULL, sizeof(int), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) {
fprintf(stderr, "mmap() failed\n");
exit(EXIT_FAILURE);
}
if (close(fd) == -1) { /*No longer needed*/
fprintf(stderr, "close() failed\n");
exit(EXIT_FAILURE);
}
#endif
*addr = 1; /*Initialize integer in mapped region*/
switch(fork()) { /*Parent and child share mapping*/
case -1:
fprintf(stderr, "fork() failed\n");
exit(EXIT_FAILURE);
case 0: /*Child: increment shared integer and exit*/
printf("Child started, value = %d\n", *addr);
(*addr)++;
if (munmap(addr, sizeof(int)) == -1) {
fprintf(stderr, "munmap()() failed\n");
exit(EXIT_FAILURE);
}
exit(EXIT_SUCCESS);
default: /*Parent: wait for child to terminate*/
if (wait(NULL) == -1) {
fprintf(stderr, "wait() failed\n");
exit(EXIT_FAILURE);
}
printf("In parent, value = %d\n", *addr);
if (munmap(addr, sizeof(int)) == -1) {
fprintf(stderr, "munmap()() failed\n");
exit(EXIT_FAILURE);
}
exit(EXIT_SUCCESS);
}
Sources:
The Linux Programming Interface
Chapter 49: Memory Mappings,
Author: Michael Kerrisk
Linux System Programming (3rd edition)
Chapter 8: Memory Management,
Author: Robert Love

Related

Is it possible to update the flags of an existing memory mapping?

I am writing a small program in x64 assembly that will spawn children, all sharing their memory mappings so they can modify each other's code. For this, since the argument CLONE_VM of sys_clone seems to place the program into undefined behaviour, I plan to use mmap's MAP_SHARED argument.
However, I would also need the children to modify the code of the father. One option is to also allocate a MAP_SHARED mapping and give it to the father, but I'd like to avoid this if possible (only for elegance reasons).
Since the base mapping (the 0x400000 one on 64-bits Linux) of the program will not have the MAP_SHARED flag by default, is it possible to update it using a syscall to set this flag? munmap then mmap will not do and cause a SIGSEGV, and mprotect can only change the RWX permissions.
You can't change whether an existing mapping is private or shared, but you can map a new shared mapping over an existing private mapping. You can even do so in C, like this:
#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
int main(void) {
FILE *stream = fopen("/proc/self/maps", "rb");
if(!stream) {
perror("fopen");
return 1;
}
char *text_start, *text_end;
do {
if(fscanf(stream, " %p-%p%*[^\n]", &text_start, &text_end) != 2) {
perror("scanf");
return 1;
}
} while(!(text_start <= main && main < text_end));
if(fclose(stream)) {
perror("fclose");
return 1;
}
size_t text_len = text_end - text_start;
char *mem = mmap(NULL, text_len, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED|MAP_ANONYMOUS, -1, 0);
if(mem == MAP_FAILED) {
perror("mmap");
return 1;
}
memcpy(mem, text_start, text_len);
__builtin___clear_cache(mem, mem + text_len);
if(mremap(mem, text_len, text_len, MREMAP_MAYMOVE|MREMAP_FIXED, text_start) == MAP_FAILED) {
perror("mremap");
return 1;
}
/* you can check /proc/PID/maps now to see the new mapping */
getchar();
}
As a bonus, this program supports ASLR and doesn't require that the text section start at 0x400000.

How to get memory address from shm_open?

I want to share memory using a file descriptor with another process created via fork.
The problem is that I get different address regions from mmap.
I want that mmap returns the same address value. Only in such case I can be sure that I really share the memory.
Probably it is possible to use MAP_FIXED flag to mmap, but how to get memory address from shm_open?
Is it possible to share memory via shm_open at all?
Maybe shmget must be used instead?
This is the minimal working example:
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
int main(void)
{
/* Create a new memory object */
int fd = shm_open( "/dummy", O_RDWR | O_CREAT, 0777 );
if(fd == -1) {
fprintf(stderr, "Open failed:%m\n");
return 1;
}
/* Set the memory object's size */
size_t size = 4096; /* minimal */
if (ftruncate(fd, size) == -1) {
fprintf(stderr, "ftruncate: %m\n");
return 1;
}
void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (ptr == MAP_FAILED) return 1;
void *ptr2 = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (ptr2 == MAP_FAILED) return 1;
printf("%p\n%p\n", ptr, ptr2);
return 0;
}
Compile it with gcc test.c -lrt.
This is the output:
0x7f3247a78000
0x7f3247a70000
EDIT
If I try to use method described in comment, child does not see changes in memory made by parent. Why? This is how I do it:
In parent:
shm_data = mmap(NULL, shm_size, PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0);
...
snprintf(shared_memory, 20, "%p", shm_data);
execl("/path/to/prog", "prog", shared_memory, (char *) NULL);
In child:
...
void *base_addr;
sscanf(argv[1], "%p", (void **)&base_addr);
shm_data = mmap(base_addr, shm_size, PROT_READ, MAP_ANONYMOUS | MAP_SHARED | MAP_FIXED, -1, 0);
...
EDIT2
See also this question: How to get memory address from memfd_create?

resizing file size with ftruncate() after mmap()

The code snippet works fine on my machine(Linux/x86-64)
int main()
{
char* addr;
int rc;
int fd;
const size_t PAGE_SIZE = 4096; // assuming the page size is 4096
char buf[PAGE_SIZE];
memset(buf, 'x', sizeof(buf));
// error checking is ignored, for demonstration purpose
fd = open("abc", O_RDWR | O_CREAT, S_IWUSR | S_IRUSR);
ftruncate(fd, 0);
write(fd, buf, 4090);
// the file size is less than one page, but we allocate 2 page address space
addr = mmap(NULL, PAGE_SIZE * 2, PROT_WRITE, MAP_SHARED, fd, 0);
// it would crash if we read/write from addr[4096]
// extend the size after mmap
ftruncate(fd, PAGE_SIZE * 2);
// now we can access(read/write) addr[4096]...addr[4096*2 -1]
munmap(addr, PAGE_SIZE * 2);
close(fd);
exit(EXIT_SUCCESS);
}
But POSIX says:
If the size of the mapped file changes after the call to mmap() as a result of some other operation on the mapped file, the effect of references to portions of the mapped region that correspond to added or removed portions of the file is unspecified.
So I guess this is not a portable way. But is it guaranteed to work on Linux?

Quickly close mmap discarding unflushed changes

I am using an mmap'ed file as a virtual memory arena - the file is manually allocated because I want to control its location. On munmap, all the current contents of the buffers are flushed to the file, but I don't really need the file contents. Is it possible to simply discard the mmap area without write back?
Linux-specific solutions are OK.
I mean something like
char* myswaparea = "/tmp/myswaparea";
int64_t len = 1LL << 30;
fd = open(myswaparea, O_CREAT|O_RDWR, 0600);
ftruncate(fd, len);
void* arena = mmap(NULL, len, .... fd ...);
/* use arena */
munmap(arena, len); /* here comes an unnecessary flush */
close(fd);
unlink(myswaparea);
If you don't need / want to write back the changes to the file, just use the MAP_PRIVATE flag when you create the map (4th argument to mmap(2)).
From the manpage:
MAP_PRIVATE
Create a private copy-on-write mapping. Updates to the
mapping are not visible to other processes mapping the same file, and
are not carried through to the underlying file. It is unspecified
whether changes made to the file after the mmap() call are visible in
the mapped region.
EXAMPLE
fd = open("myfile", O_RDWR);
if (fd < 0) {
/* Handle error... */
}
void *ptr;
size_t len = 1024;
ptr = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
if (ptr == MAP_FAILED) {
/* Handle error... */
}
/* ... */
if (munmap(ptr, len) < 0) {
/* Handle error... */
}

Shared memory across processes on Linux/x86_64

I have a few questions on using shared memory with processes. I looked at several previous posts and couldn't glean the answers precisely enough. Thanks in advance for your help.
I'm using shm_open + mmap like below. This code works as intended with parent and child alternating to increment g_shared->count (the synchronization is not portable; it works only for certain memory models, but good enough for my case for now). However, when I change MAP_SHARED to MAP_ANONYMOUS | MAP_SHARED, the memory isn't shared and the program hangs since the 'flag' doesn't get flipped. Removing the flag confirms what's happening with each process counting from 0 to 10 (implying that each has its own copy of the structure and hence the 'count' field). Is this the expected behavior? I don't want the memory to be backed by a file; I really want to emulate what might happen if these were threads instead of processes (they need to be processes for other reasons).
Do I really need shm_open? Since the processes belong to the same hierarchy, can I just use mmap alone instead? I understand this would be fairly straightforward if there wasn't an 'exec,' but how do I get it to work when there is an 'exec' following the 'fork?'
I'm using kernel version 3.2.0-23 on x86_64 (Intel i7-2600). For this implementation, does mmap give the same behavior (correctness as well as performance) as shared memory with pthreads sharing the same global object? For example, does the MMU map the segment with 'cacheable' MTRR/TLB attributes?
Is the cleanup_shared() code correct? Is it leaking any memory? How could I check? For example, is there an equivalent of System V's 'ipcs?'
thanks,
/Doobs
shmem.h:
#ifndef __SHMEM_H__
#define __SHMEM_H__
//includes
#define LEN 1000
#define ITERS 10
#define SHM_FNAME "/myshm"
typedef struct shmem_obj {
int count;
char buff[LEN];
volatile int flag;
} shmem_t;
extern shmem_t* g_shared;
extern char proc_name[100];
extern int fd;
void cleanup_shared() {
munmap(g_shared, sizeof(shmem_t));
close(fd);
shm_unlink(SHM_FNAME);
}
static inline
void init_shared() {
int oflag;
if (!strcmp(proc_name, "parent")) {
oflag = O_CREAT | O_RDWR;
} else {
oflag = O_RDWR;
}
fd = shm_open(SHM_FNAME, oflag, (S_IREAD | S_IWRITE));
if (fd == -1) {
perror("shm_open");
exit(EXIT_FAILURE);
}
if (ftruncate(fd, sizeof(shmem_t)) == -1) {
perror("ftruncate");
shm_unlink(SHM_FNAME);
exit(EXIT_FAILURE);
}
g_shared = mmap(NULL, sizeof(shmem_t),
(PROT_WRITE | PROT_READ),
MAP_SHARED, fd, 0);
if (g_shared == MAP_FAILED) {
perror("mmap");
cleanup_shared();
exit(EXIT_FAILURE);
}
}
static inline
void proc_write(const char* s) {
fprintf(stderr, "[%s] %s\n", proc_name, s);
}
#endif // __SHMEM_H__
shmem1.c (parent process):
#include "shmem.h"
int fd;
shmem_t* g_shared;
char proc_name[100];
void work() {
int i;
for (i = 0; i &lt ITERS; ++i) {
while (g_shared->flag);
++g_shared->count;
sprintf(g_shared->buff, "%s: %d", proc_name, g_shared->count);
proc_write(g_shared->buff);
g_shared->flag = !g_shared->flag;
}
}
int main(int argc, char* argv[], char* envp[]) {
int status, child;
strcpy(proc_name, "parent");
init_shared(argv);
fprintf(stderr, "Map address is: %p\n", g_shared);
if (child = fork()) {
work();
waitpid(child, &status, 0);
cleanup_shared();
fprintf(stderr, "Parent finished!\n");
} else { /* child executes shmem2 */
execvpe("./shmem2", argv + 2, envp);
}
}
shmem2.c (child process):
#include "shmem.h"
int fd;
shmem_t* g_shared;
char proc_name[100];
void work() {
int i;
for (i = 0; i &lt ITERS; ++i) {
while (!g_shared->flag);
++g_shared->count;
sprintf(g_shared->buff, "%s: %d", proc_name, g_shared->count);
proc_write(g_shared->buff);
g_shared->flag = !g_shared->flag;
}
}
int main(int argc, char* argv[], char* envp[]) {
int status;
strcpy(proc_name, "child");
init_shared(argv);
fprintf(stderr, "Map address is: %p\n", g_shared);
work();
cleanup_shared();
return 0;
}
Passing MAP_ANONYMOUS causes the kernel to ignore your file descriptor argument and give you a private mapping instead. That's not what you want.
Yes, you can create an anonymous shared mapping in a parent process, fork, and have the child process inherit the mapping, sharing the memory with the parent and any other children. That obvoiusly doesn't survive an exec() though.
I don't understand this question; pthreads doesn't allocate memory. The cacheability will depend on the file descriptor you mapped. If it's a disk file or anonymous mapping, then it's cacheable memory. If it's a video framebuffer device, it's probably not.
That's the right way to call munmap(), but I didn't verify the logic beyond that. All processes need to unmap, only one should call unlink.
2b) as a middle-ground of a sort, it is possible to call:
int const shm_fd = shm_open(fn,...);
shm_unlink(fn);
in a parent process, and then pass fd to a child process created by fork()/execve() via argp or envp. since open file descriptors of this type will survive the fork()/execve(), you can mmap the fd in both the parent process and any dervied processes. here's a more complete code example copied and simplified/sanitized from code i ran successfully under Ubuntu 12.04 / linux kernel 3.13 / glibc 2.15:
int create_shm_fd( void ) {
int oflags = O_RDWR | O_CREAT | O_TRUNC;
string const fn = "/some_shm_fn_maybe_with_pid";
int fd;
neg_one_fail( fd = shm_open( fn.c_str(), oflags, S_IRUSR | S_IWUSR ), "shm_open" );
if( fd == -1 ) { rt_err( strprintf( "shm_open() failed with errno=%s", str(errno).c_str() ) ); }
// for now, we'll just pass the open fd to our child process, so
// we don't need the file/name/link anymore, and by unlinking it
// here we can try to minimize the chance / amount of OS-level shm
// leakage.
neg_one_fail( shm_unlink( fn.c_str() ), "shm_unlink" );
// by default, the fd returned from shm_open() has FD_CLOEXEC
// set. it seems okay to remove it so that it will stay open
// across execve.
int fd_flags = 0;
neg_one_fail( fd_flags = fcntl( fd, F_GETFD ), "fcntl" );
fd_flags &= ~FD_CLOEXEC;
neg_one_fail( fcntl( fd, F_SETFD, fd_flags ), "fcntl" );
// resize the shm segment for later mapping via mmap()
neg_one_fail( ftruncate( fd, 1024*1024*4 ), "ftruncate" );
return fd;
}
it's not 100% clear to me if it's okay spec-wise to remove the FD_CLOEXEC and/or assume that after doing so the fd really will survive the exec. the man page for exec is unclear; it says: "POSIX shared memory regions are unmapped", but to me that's redundant with the general comments earlier that mapping are not preserved, and doesn't say that shm_open()'d fd will be closed. any of course there's the fact that, as i mentioned, the code does seem to work in at least one case.
the reason i might use this approach is that it would seem to reduce the chance of leaking the shared memory segment / filename, and it makes it clear that i don't need persistence of the memory segment.

Resources