Creating a process in Linux with a different mount namespace - linux

I'm trying to create a process that has a different mnt namespace from his parent.
For that, I use the following code:
static int childFunc(void *arg){
if (mount("/","/myfs", "sysfs", 0, NULL) == -1)
errExit("mount");
printf("Starting new bash. Child PID is %d\n",getpid());
execle("/bin/bash",NULL);
printf("Shouldn't arrive here.\n");
return 0; /* Child terminates now */
}
#define STACK_SIZE (1024 * 1024) /* Stack size for cloned child */
int main(int argc, char *argv[]){
char *stack; /* Start of stack buffer */
char *stackTop; /* End of stack buffer */
pid_t pid;
/* Allocate stack for child */
stack = malloc(STACK_SIZE);
if (stack == NULL)
errExit("malloc");
stackTop = stack + STACK_SIZE; /* Assume stack grows downward */
/* Create child that has its own MNT namespaces*/
pid = clone(childFunc, stackTop, CLONE_NEWNS | SIGCHLD, argv[1]);
if (pid == -1)
errExit("clone");
printf("clone() returned %ld\n", (long) pid);
sleep(1);
if (waitpid(pid, NULL, 0) == -1) /* Wait for child */
errExit("waitpid");
printf("child has terminated\n");
exit(EXIT_SUCCESS);
}
When running it, I do get a bash shell, running in a different MNT namespace.
In order to verify it, I execute in another shell sudo ls -l /proc/<child_pid>/ns, and I indeed see that the child process has a different namespace from the rest of the processes in the system.
However, if I execute mount from both of the shells - I get the same FSTAB output, and the line myfs on /myfs type sysfs (rw,relatime) appears in both of them.
What is the explanation for that?

You need to mark the the existing mounts as "private" before creating the new namespace:
mount --make-rprivate /

Related

How to invoke an executable in the path /usr/bin using a C++ program?

I have an GUI based executable in the path /usr/bin in the linux machine
This executable takes three arguments - two integer values and one char
Can you let me know how to invoke and run this executable from a user space C++ program
Not leaving this unanswered for no reason
pid_t runprocess(int arg1, int arg2, char arg3)
{
static const char program[] = "/usr/bin/...";
char arg1c[12];
char arg2c[12];
char arg3c[2];
sprintf(arg1c, "%d", arg1);
sprintf(arg2c, "%d", arg2);
arg3c[0] = arg3;
arg3c[1] = 0;
pid_t pid = vfork();
if (pid == 0) {
signal(SIGHUP, SIG_IGN); /* since it's a GUI program, detach from console HUP */
close(0); /* and detach from stdin */
if (open("/dev/null", O_RDWR)) _exit(137); /* assertion failure */
execl(program, program, arg1c, arg2c, arg3c, NULL);
_exit(errno);
}
return pid;
}
Build arguments as strings, fork and exec it. Trivial really. Don't forget to wait().
Since the child process is a GUI process, we detach HUP from the terminal we may or may not be running on and replace stdin with /dev/null.

How to set the LD_PRELOAD environment variable for a ptrace child

I'm trying to load a pre-load library to the ptrace child process using environment variables. But somehow I got an error when creating the child process:
int main(int argc, char **argv)
{
char *env[] = {"LD_PRELOAD=/<path-to-the-preload-library>/preload.so"};
pid_t pid = fork();
switch (pid) {
case -1: /* error */
log_fatal("%s. pid -1", strerror(errno));
break;
case 0: /* child, executing the tracee */
ptrace(PTRACE_TRACEME, 0, 0, 0);
execve(argv[1], argv + 1, env); // Fail to launch ptrace child!
//execvp(argv[1], argv + 1); // It works fine!
log_fatal("%s. child", strerror(errno));
}
waitpid(pid, 0, 0); // sync with PTRACE_TRACEME
ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_EXITKILL);
The simple preloaded library code:
$ cat preload.c
#include <stdio.h>
static void _init() __attribute__((constructor));
void _init() {
printf("I'm a constructor\n");
}
Any idea why it fails?
It'd be nice if you told us what the error message was, but I think I can guess: "Bad address"?
The env vector passed to execve needs to be terminated with a NULL pointer, just like the argv vector. So you want
char *env[] = {"LD_PRELOAD=/<path-to-the-preload-library>/preload.so", NULL};

Linux process data structure

According to GNU website the linux shell uses the following data structure for a process :
typedef struct process
{
struct process *next; /* next process in pipeline */
char **argv; /* for exec */
pid_t pid; /* process ID */
char completed; /* true if process has completed */
char stopped; /* true if process has stopped */
int status; /* reported status value */
} process;
Why can't the shell use the task_struct data structure for a process when it is already present in the kernel. Why use a separate data structure ?

Shared memory across processes on Linux/x86_64

I have a few questions on using shared memory with processes. I looked at several previous posts and couldn't glean the answers precisely enough. Thanks in advance for your help.
I'm using shm_open + mmap like below. This code works as intended with parent and child alternating to increment g_shared->count (the synchronization is not portable; it works only for certain memory models, but good enough for my case for now). However, when I change MAP_SHARED to MAP_ANONYMOUS | MAP_SHARED, the memory isn't shared and the program hangs since the 'flag' doesn't get flipped. Removing the flag confirms what's happening with each process counting from 0 to 10 (implying that each has its own copy of the structure and hence the 'count' field). Is this the expected behavior? I don't want the memory to be backed by a file; I really want to emulate what might happen if these were threads instead of processes (they need to be processes for other reasons).
Do I really need shm_open? Since the processes belong to the same hierarchy, can I just use mmap alone instead? I understand this would be fairly straightforward if there wasn't an 'exec,' but how do I get it to work when there is an 'exec' following the 'fork?'
I'm using kernel version 3.2.0-23 on x86_64 (Intel i7-2600). For this implementation, does mmap give the same behavior (correctness as well as performance) as shared memory with pthreads sharing the same global object? For example, does the MMU map the segment with 'cacheable' MTRR/TLB attributes?
Is the cleanup_shared() code correct? Is it leaking any memory? How could I check? For example, is there an equivalent of System V's 'ipcs?'
thanks,
/Doobs
shmem.h:
#ifndef __SHMEM_H__
#define __SHMEM_H__
//includes
#define LEN 1000
#define ITERS 10
#define SHM_FNAME "/myshm"
typedef struct shmem_obj {
int count;
char buff[LEN];
volatile int flag;
} shmem_t;
extern shmem_t* g_shared;
extern char proc_name[100];
extern int fd;
void cleanup_shared() {
munmap(g_shared, sizeof(shmem_t));
close(fd);
shm_unlink(SHM_FNAME);
}
static inline
void init_shared() {
int oflag;
if (!strcmp(proc_name, "parent")) {
oflag = O_CREAT | O_RDWR;
} else {
oflag = O_RDWR;
}
fd = shm_open(SHM_FNAME, oflag, (S_IREAD | S_IWRITE));
if (fd == -1) {
perror("shm_open");
exit(EXIT_FAILURE);
}
if (ftruncate(fd, sizeof(shmem_t)) == -1) {
perror("ftruncate");
shm_unlink(SHM_FNAME);
exit(EXIT_FAILURE);
}
g_shared = mmap(NULL, sizeof(shmem_t),
(PROT_WRITE | PROT_READ),
MAP_SHARED, fd, 0);
if (g_shared == MAP_FAILED) {
perror("mmap");
cleanup_shared();
exit(EXIT_FAILURE);
}
}
static inline
void proc_write(const char* s) {
fprintf(stderr, "[%s] %s\n", proc_name, s);
}
#endif // __SHMEM_H__
shmem1.c (parent process):
#include "shmem.h"
int fd;
shmem_t* g_shared;
char proc_name[100];
void work() {
int i;
for (i = 0; i &lt ITERS; ++i) {
while (g_shared->flag);
++g_shared->count;
sprintf(g_shared->buff, "%s: %d", proc_name, g_shared->count);
proc_write(g_shared->buff);
g_shared->flag = !g_shared->flag;
}
}
int main(int argc, char* argv[], char* envp[]) {
int status, child;
strcpy(proc_name, "parent");
init_shared(argv);
fprintf(stderr, "Map address is: %p\n", g_shared);
if (child = fork()) {
work();
waitpid(child, &status, 0);
cleanup_shared();
fprintf(stderr, "Parent finished!\n");
} else { /* child executes shmem2 */
execvpe("./shmem2", argv + 2, envp);
}
}
shmem2.c (child process):
#include "shmem.h"
int fd;
shmem_t* g_shared;
char proc_name[100];
void work() {
int i;
for (i = 0; i &lt ITERS; ++i) {
while (!g_shared->flag);
++g_shared->count;
sprintf(g_shared->buff, "%s: %d", proc_name, g_shared->count);
proc_write(g_shared->buff);
g_shared->flag = !g_shared->flag;
}
}
int main(int argc, char* argv[], char* envp[]) {
int status;
strcpy(proc_name, "child");
init_shared(argv);
fprintf(stderr, "Map address is: %p\n", g_shared);
work();
cleanup_shared();
return 0;
}
Passing MAP_ANONYMOUS causes the kernel to ignore your file descriptor argument and give you a private mapping instead. That's not what you want.
Yes, you can create an anonymous shared mapping in a parent process, fork, and have the child process inherit the mapping, sharing the memory with the parent and any other children. That obvoiusly doesn't survive an exec() though.
I don't understand this question; pthreads doesn't allocate memory. The cacheability will depend on the file descriptor you mapped. If it's a disk file or anonymous mapping, then it's cacheable memory. If it's a video framebuffer device, it's probably not.
That's the right way to call munmap(), but I didn't verify the logic beyond that. All processes need to unmap, only one should call unlink.
2b) as a middle-ground of a sort, it is possible to call:
int const shm_fd = shm_open(fn,...);
shm_unlink(fn);
in a parent process, and then pass fd to a child process created by fork()/execve() via argp or envp. since open file descriptors of this type will survive the fork()/execve(), you can mmap the fd in both the parent process and any dervied processes. here's a more complete code example copied and simplified/sanitized from code i ran successfully under Ubuntu 12.04 / linux kernel 3.13 / glibc 2.15:
int create_shm_fd( void ) {
int oflags = O_RDWR | O_CREAT | O_TRUNC;
string const fn = "/some_shm_fn_maybe_with_pid";
int fd;
neg_one_fail( fd = shm_open( fn.c_str(), oflags, S_IRUSR | S_IWUSR ), "shm_open" );
if( fd == -1 ) { rt_err( strprintf( "shm_open() failed with errno=%s", str(errno).c_str() ) ); }
// for now, we'll just pass the open fd to our child process, so
// we don't need the file/name/link anymore, and by unlinking it
// here we can try to minimize the chance / amount of OS-level shm
// leakage.
neg_one_fail( shm_unlink( fn.c_str() ), "shm_unlink" );
// by default, the fd returned from shm_open() has FD_CLOEXEC
// set. it seems okay to remove it so that it will stay open
// across execve.
int fd_flags = 0;
neg_one_fail( fd_flags = fcntl( fd, F_GETFD ), "fcntl" );
fd_flags &= ~FD_CLOEXEC;
neg_one_fail( fcntl( fd, F_SETFD, fd_flags ), "fcntl" );
// resize the shm segment for later mapping via mmap()
neg_one_fail( ftruncate( fd, 1024*1024*4 ), "ftruncate" );
return fd;
}
it's not 100% clear to me if it's okay spec-wise to remove the FD_CLOEXEC and/or assume that after doing so the fd really will survive the exec. the man page for exec is unclear; it says: "POSIX shared memory regions are unmapped", but to me that's redundant with the general comments earlier that mapping are not preserved, and doesn't say that shm_open()'d fd will be closed. any of course there's the fact that, as i mentioned, the code does seem to work in at least one case.
the reason i might use this approach is that it would seem to reduce the chance of leaking the shared memory segment / filename, and it makes it clear that i don't need persistence of the memory segment.

Can I set the process group of an existing process?

I have a bunch of mini-server processes running. They're in the same process group as a FastCGI server I need to stop. The FastCGI server will kill everything in its process group, but I need those mini-servers to keep running.
Can I change the process group of a running, non-child process (they're children of PID 1)? setpgid() fails with "No such process" though I'm positive its there.
This is on Fedora Core 10.
NOTE the processes are already running. New servers do setsid(). These are some servers spawned by older code which did not.
One thing you could try is to do setsid() in the miniservers. That will make them session and process group leaders.
Also, keep in mind that you can't change the process group id to one from another session, and that you have to do the call to change the process group either from within the process that you want to change the group of, or from the parent of the process.
I've recently written some test code to periodically change the process group of a set of processes for a very similar task. You need not change the group id periodically, it's just that I thought I might evade a certain script that periodically checked for a group that runs for longer than a certain amount of time. It may also help you track down the error that you get with setpgid():
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <string.h>
void err(const char *msg);
void prn(const char *msg);
void mydaemon();
int main(int arc, char *argv[]) {
mydaemon();
if (setsid() < 0)
err("setsid");
int secs = 5*60;
/* creating a pipe for the group leader to send changed
group ids to the child */
int pidx[2];
if (pipe(pidx))
err("pipe");
fcntl(pidx[0], F_SETFL, O_NONBLOCK);
fcntl(pidx[1], F_SETFL, O_NONBLOCK);
prn("begin");
/* here the child forks, it's a stand in for the set of
processes that need to have their group ids changed */
int child = fork();
switch (child) {
case -1: err("fork3");
case 0:
close(pidx[1]);
while(1) {
sleep(7);
secs -= 7;
if (secs <= 0) { prn("end child"); exit(0); }
int pid;
/* read new pid if available */
if (read(pidx[0], &pid, sizeof pid) != sizeof pid) continue;
/* set new process group id */
if (setpgid(getpid(), pid)) err("setpgid2");
prn("child group changed");
}
default: break;
}
close(pidx[0]);
/* here the group leader is forked every 20 seconds so that
a new process group can be sent to the child via the pipe */
while (1) {
sleep(20);
secs -= 20;
int pid = fork();
switch (pid) {
case -1: err("fork2");
case 0:
pid = getpid();
/* set process group leader for this process */
if (setpgid(pid, pid)) err("setpgid1");
/* inform child of change */
if (write(pidx[1], &pid, sizeof pid) != sizeof pid) err("write");
prn("group leader changed");
break;
default:
close(pidx[1]);
_exit(0);
}
if (secs <= 0) { prn("end leader"); exit(0); }
}
}
void prn(const char *msg) {
char buf[256];
strcpy(buf, msg);
strcat(buf, "\n");
write(2, buf, strlen(buf));
}
void err(const char *msg) {
char buf[256];
strcpy(buf, msg);
strcat(buf, ": ");
strcat(buf, strerror(errno));
prn(buf);
exit(1);
}
void mydaemon() {
int pid = fork();
switch (pid) {
case -1: err("fork");
case 0: break;
default: _exit(0);
}
close(0);
close(1);
/* close(2); let's keep stderr */
}
After some research I figured it out. Inshalla got the essential problem, "you can't change the process group id to one from another session" which explains why my setpgid() was failing (with a misleading message). However, it seems you can change it from any other process in the group (not necessarily the parent).
Since these processes were started by a FastCGI server and that FastCGI server was still running and in the same process group. Thus the problem, can't restart the FastCGI server without killing the servers it spawned. I wrote a new CGI program which did a setpgid() on the running servers, executed it through a web request and problem solved!
It sounds like you actually want to daemonise the process rather than move process groups. (Note: one can move process groups, but I believe you need to be in the same session and the target needs to already be a process group.)
But first, see if daemonising works for you:
#include <unistd.h>
#include <stdio.h>
int main() {
if (fork() == 0) {
setsid();
if (fork() == 0) {
printf("I'm still running! pid:%d", getpid());
sleep(10);
}
_exit(0);
}
return 0;
}
Obviously you should actually check for errors and such in real code, but the above should work.
The inner process will continue running even when the main process exits. Looking at the status of the inner process from /proc we find that it is, indeed, a child of init:
Name: a.out
State: S (sleeping)
Tgid: 21513
Pid: 21513
PPid: 1
TracerPid: 0

Resources