Return code when OS kills your process - linux

I've wanted to test if with multiply processes I'm able to use more than 4GB of ram on 32bit O.S (mine: Ubuntu with 1GB ram).
So I've written a small program that mallocs slightly less then 1GB, and do some action on that array, and ran 5 instances of this program vie forks.
The thing is, that I suspect that O.S killed 4 of them, and only one survived and displayed it's "PID: I've finished").
(I've tried it with small arrays and got 5 printing, also when I look at the running processes with TOP, I see only one instance..)
The weird thing is this - I've received return code 0 (success?) in ALL of the instances, including the ones that were allegedly killed by O.S.
I didn't get any massage stating that processes were killed.
Is this return code normal for this situation?
(If so, it reduces my trust in 'return codes'...)
thanks.
Edit: some of the answers suggested possible errors in the small program, so here it is. the larger program that forks and saves return codes is larger, and I have trouble uploading it here, but I think (and hope) it's fine.
Also I've noticed that if instead of running it with my forking program, I run it with terminal using './a.out & ./a.out & ./a.out & ./a.out &' (when ./a.out is the binary of the small program attached)
I do see some 'Killed' messages.
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#define SMALL_SIZE 10000
#define BIG_SIZE 1000000000
#define SIZE BIG_SIZE
#define REAPETS 1
int
main()
{
pid_t my_pid = getpid();
char * x = malloc(SIZE*sizeof(char));
if (x == NULL)
{
printf("Malloc failed!");
return(EXIT_FAILURE);
}
int x2=0;
for(x2=0;x2<REAPETS;++x2)
{
int y;
for(y=0;y<SIZE;++y)
x[y] = (y+my_pid)%256;
}
printf("%d: I'm over.\n",my_pid);
return(EXIT_SUCCESS);
}

Well, if your process is unable to malloc() the 1GB of memory, the OS will not kill the process. All that happens is that malloc() returns NULL. So depending on how you wrote your code, it's possible that the process could return 0 anyway - if you wanted it to return an error code when a memory allocation fails (which is generally good practice), you'd have to program that behavior into it.

What signal was used to kill the processes?
Exit codes between 0 and 127, inclusive, can be used freely, and codes above 128 indicate that the process was terminated by a signal, where the exit code is
128 + the number of the signal used

A process' return status (as returned by wait, waitpid and system) contains more or less the following:
Exit code, only applies if process terminated normally
whether normal/abnormal termination occured
Termination signal, only applies if process was terminated by a signal
The exit code is utterly meaningless if your process was killed by the OOM killer (which will apparently send you a SIGKILL signal)
for more information, see the man page for the wait command.

This code shows how to get the termination status of a child:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
int
main (void)
{
pid_t pid = fork();
if (pid == -1)
{
perror("fork()");
}
/* parent */
else if (pid > 0)
{
int status;
printf("Child has pid %ld\n", (long)pid);
if (wait(&status) == -1)
{
perror("wait()");
}
else
{
/* did the child terminate normally? */
if(WIFEXITED(status))
{
printf("%ld exited with return code %d\n",
(long)pid, WEXITSTATUS(status));
}
/* was the child terminated by a signal? */
else if (WIFSIGNALED(status))
{
printf("%ld terminated because it didn't catch signal number %d\n",
(long)pid, WTERMSIG(status));
}
}
}
/* child */
else
{
sleep(10);
exit(0);
}
return 0;
}

Have you checked the return value from fork()? There's a good chance that if fork() can't allocate enough memory for the new process' address space, then it will return an error (-1). A typical way to call fork() is:
pid_t pid;
switch(pid = fork())
{
case 0:
// I'm the child process
break;
case -1:
// Error -- check errno
fprintf(stderr, "fork: %s\n", strerror(errno));
break;
default:
// I'm the parent process
}

Exit code is only "valid" when WIFEXITED macro evaluates to true. See man waitpid(2).
You can use WIFSIGNALED macro to see if your program has been signaled.

Related

Using execvp to read command line arguments as commands error

pretty new to Linux and im trying to read in command line arguments in a Linux operating system. I want to be able to execute the commands i give as command line arguments programatically. Here is what I have so far:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char* argv[])
{
int counter;
for(counter = 1; counter < argc; counter++){
pid_t pid = fork();
if(pid < 0)
{
perror("Forking failed");
exit(1);
}
else if(pid == 0)
{
char *args[] = {argv[counter], NULL};
printf("Argument to be passed: %s \n", argv[counter]);
execvp(args[0], args);
perror("Command failed.");
exit(0);
}
printf("Process %s completed successfully.\n", argv[counter]);
}
exit(0);
}
My output on terminal:
darren#darren-VirtualBox:~/Desktop$ ./cmdarguments /home/darren/Desktop/fullpathdoc1 /home/darren/Desktop/fullpathdoc2
Process /home/darren/Desktop/fullpathdoc1 completed successfully.
Process /home/darren/Desktop/fullpathdoc2 completed successfully.
darren#darren-VirtualBox:~/Desktop$ Argument to be passed: /home/darren/Desktop/fullpathdoc2
This is the second program that simply prints this statement.
Argument to be passed: /home/darren/Desktop/fullpathdoc1
This is the first program that simply prints this statement.
I want to be able to print out the process name, and say process completed after each command line argument has been successfully executed. For some reason, my output results in everything seeming to execute backwards, with my process completed messages coming up first as well as reading in the command lines from right to left. Can someone please help with my code and how I can rectify this?
When there are multiple processes, which process get to run first is totally up to your operating system(Linux)'s decision.
Broadly, the parent process -- that's where fork() returns > 0 -- needs to wait for the child process to complete. Bear in mind that the three calls to execvp() result in three, concurrent, processes. So if you don't monitor them, they'll proceed in their own merry way. There is already a discussion of this issue on SO:
how to correctly use fork, exec, wait

How to prevent page faults after child exits?

A nice way of creating a snapshot of a process is to use fork() to create a child process. The memory of the child process will be a copy of the parent process.
Instead of eagerly copying all the memory, the OS simply marks the pages as copy-on-write: the pages will be cloned if the event of one of the processes writing to it. This saves both time and space, which is great.
In the event the child process exits, the copy-on-write behavior should be deactivated. However, I'm getting page faults for the whole array -- is there any way of optimizing these page faults? e.g. similar to how MAP_POPULATE avoids page faults for the initial access to the pages of a mapped region.
Below there is a simple benchmark that demonstrates the behavior I'm asking about. I check for page faults via perf stat -e minor-faults,major-faults ./a.out.
If no child process is created (WITH_CHILD set to false) I have very few page faults (around 125 and constant). However, just by creating and reaping the child process, I get page faults in everything (around 131260, proportional to array size). As the pages are mapped by a single process, I wouldn't expect any page faults to happen! Why do they?
This is a follow-up of Kernel copying CoW pages after child process exit.
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <array>
#include <cassert>
#include <cstring>
#include <iostream>
#define ARRAY_SIZE 536870912 // 512MB
#define WITH_CHILD true
using inttype = uint64_t;
constexpr uint64_t NUM_ELEMS() {
return ARRAY_SIZE / sizeof(inttype);
}
int main() {
// allocate array
void *arraybuf = mmap(nullptr, ARRAY_SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
assert(arraybuf != nullptr);
std::array<inttype, NUM_ELEMS()> *array =
new (arraybuf) std::array<inttype, NUM_ELEMS()>();
#if WITH_CHILD
// spawn checkpointing process
int pid = fork();
assert(pid != -1);
// child process -- do nothing, just exit
if (pid == 0) {
exit(0);
}
// wait for child thread to exit
assert(waitpid(pid, nullptr, 0) == pid);
#endif
// write to array -- this shouldnt generate page faults, right? :(
std::fill(array->begin(), array->end(), 0);
// cleanup
munmap(array, ARRAY_SIZE);
}

Python3 fuzzer get return code name

I've written a fuzzer to cause a buffer overflow on a vulnerable C application by creating a subprocess of it.
CASE #2 (Size = 24):
IN: AjsdfFjSueFmVnJiSkOpOjHk
OUT: -11
IN symbolizes the value passed to scanf
OUT symbolizes the return value
the vulnerable program:
#include <stdio.h>
#include <stdlib.h>
#define N 16 /* buffer size */
int main(void) {
char name[N]; /* buffer */
/* prompt user for name */
printf("What's your name? ");
scanf("%s", name);
printf("Hi there, %s!\n", name); /* greet the user */
return EXIT_SUCCESS;
}
running this vulnerable program manually with my above generated payload it returns:
Segmentation Fault
Now to properly print the error cause I'd like to map the int return value to an enumeration -> like Segmentation Fault = -11
However, during my research I could not find any information on how these error codes are actually mapped, even for my example -11 = Segmentation fault
I found the solution:
Popen.returncode
The child return code, set by poll() and wait() (and indirectly by communicate()). A None value indicates that the process hasn’t
terminated yet.
A negative value -N indicates that the child was terminated by signal N (Unix only).
-> Unix Signals
Hope this helps someone else too.

What are some conditions that may cause fork() or system() calls to fail on Linux?

And how can one find out whether any of them are occuring, and leading to an error returned by fork() or system()? In other words, if fork() or system() returns with an error, what are some things in Linux that I can check to diagnose why that particular error is happening?
For example:
Just plain out of memory (results in errno ENOMEM) - check memory use with 'free' etc.
Not enough memory for kernel to copy page tables and other accounting information of parent process (results in errno EAGAIN)
Is there a global process limit? (results in errno EAGAIN also?)
Is there a per-user process limit? How can I find out what it is?
...?
And how can one find out whether any of them are occuring?
Check the errno value if the result (return value) is -1
From the man page on Linux:
RETURN VALUE
On success, the PID of the child process is returned in the parent, and 0 is returned in the child. On failure, -1 is returned in the parent, no child process is created, and errno is set appropriately.
ERRORS
EAGAIN
fork() cannot allocate sufficient memory to copy the parent's page tables and allocate a task structure for the child.
EAGAIN
It was not possible to create a new process because the caller's RLIMIT_NPROC resource limit was encountered. To exceed this limit, the process must have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE capability.
ENOMEM
fork() failed to allocate the necessary kernel structures because memory is tight.
CONFORMING TO
SVr4, 4.3BSD, POSIX.1-2001.
nproc in /etc/security/limits.conf can limit the number of processes per user.
You can check for failure by examining the return from fork. A 0 means you are in the child, a positive number is the pid of the child and means you are in the parent, and a negative number means the fork failed. When fork fails it sets the external variable errno. You can use the functions in errno.h to examine it. I normally just use perror to print the error (with some text prepended to it) to stderr.
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
int main(int argc, char** argv) {
pid_t pid;
pid = fork();
if (pid == -1) {
perror("Could not fork: ");
return 1;
} else if (pid == 0) {
printf("in child\n");
return 0;
};
printf("in parent, child is %d\n", pid);
return 0;
}

What is a good Linux exit error code strategy?

I have several independent executable Perl, PHP CLI scripts and C++ programs for which I need to develop an exit error code strategy. These programs are called by other programs using a wrapper class I created to use exec() in PHP. So, I will be able to get an error code back. Based on that error code, the calling script will need to do something.
I have done a little bit of research and it seems like anything in the 1-254 (or maybe just 1-127) range could be fair game to user-defined error codes.
I was just wondering how other people have approached error handling in this situation.
The only convention is that you return 0 for success, and something other than zero for an error. Most well-known unix programs document the various return codes that they can return, and so should you. It doesn't make a lot of sense to try to make a common list for all possible error codes that any arbitrary program could return, or else you end up with tens of thousands of them like some other OS's, and even then, it doesn't always cover the specific type of error you want to return.
So just be consistent, and be sure to document whatever scheme you decide to use.
1-127 is the available range. Anything over 127 is supposed to be "abnormal" exit - terminated by a signal.
While you're at it, consider using stdout rather than exit code. Exit code is by tradition used to indicate success, failure, and may be one other state. Rather than using exit code, try using stdout the way expr and wc use it. You can then use backtick or something similar in the caller to extract the result.
the unix manifesto states -
Exit as soon and as loud as possible on error
or something like that
Don't try to encode too much meaning into the exit value: detailed statuses and error reports should go to stdout / stderr as Arkadiy suggests.
However, I have found it very useful to represent just a handful of states in the exit values, using binary digits to encode them. For example, suppose you have the following contrived meanings:
0000 : 0 (no error)
0001 : 1 (error)
0010 : 2 (I/O error)
0100 : 4 (user input error)
1000 : 8 (permission error)
Then, a user input error would have a return value of 5 (4 + 1), while a log file not having write permission might have a return value of 11 (8 + 2 + 1). As the different meanings are independently encoded in the return value, you can easily see what's happened by checking which bits are set.
As a special case, to see if there was an error you can AND the return code with 1.
By doing this, you can encode a couple of different things in the return code, in a clear and simple way. I use this only to make simple decisions such as "should the process be restarted", "do the return value and relevant logs need to be sent to an admin", that sort of thing. Any detailed diagnostic information should go to logs or to stdout / stderr.
The normal exit statuses run from 0 to 255 (see Exit codes bigger than 255 posssible for a discussion of why). Normally, status 0 indicates success; anything else is an implementation-defined error. I do know of a program that reports the state of a DBMS server via the exit status; that is a special case of implementation-defined exit statuses. Note that you get to define the implementation of the statuses of your programs.
I couldn't fit this into 300 characters; otherwise it would have been a comment to #Arkadiy's answer.
Arkadiy is right that in one part of the exit status word, values other than zero indicate the signal that terminated the process and the 8th bit normally indicates a core dump, but that section of the exit status is different from the main 0..255 status. However, the shell (whichever shell it is) is presented with a problem when a process dies as a result of a signal. There is 16 bits of data to be presented in an 8-bit value, which is always tricky. What the shells seem to do is to take the signal number and add 128 to it. So, if a process dies as a result of an interrupt (signal number 2, SIGINT), the shell reports the exit status as 130. However, the kernel reported the status as 0x0002; the shell has modified what the kernel reports.
The following C code demonstrates this. There are two programs
suicide which kills itself using a signal of your choosing (interrupt by default).
exitstatus which runs a command (such as suicide) and reports the kernel exit status.
Here's suicide.c:
/*
#(#)File: $RCSfile: suicide.c,v $
#(#)Version: $Revision: 1.2 $
#(#)Last changed: $Date: 2008/12/28 03:45:18 $
#(#)Purpose: Commit suicide using kill()
#(#)Author: J Leffler
#(#)Copyright: (C) JLSS 2008
#(#)Product: :PRODUCT:
*/
/*TABSTOP=4*/
#if __STDC_VERSION__ >= 199901L
#define _XOPEN_SOURCE 600
#else
#define _XOPEN_SOURCE 500
#endif /* __STDC_VERSION__ */
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include "stderr.h"
static const char usestr[] = "[-V][-s signal]";
#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
extern const char jlss_id_suicide_c[];
const char jlss_id_suicide_c[] = "#(#)$Id: suicide.c,v 1.2 2008/12/28 03:45:18 jleffler Exp $";
#endif /* lint */
int main(int argc, char **argv)
{
int signum = SIGINT;
int opt;
char *end;
err_setarg0(argv[0]);
while ((opt = getopt(argc, argv, "Vs:")) != -1)
{
switch (opt)
{
case 's':
signum = strtol(optarg, &end, 0);
if (*end != '\0' || signum <= 0)
err_error("invalid signal number %s\n", optarg);
break;
case 'V':
err_version("SUICIDE", &"#(#)$Revision: 1.2 $ ($Date: 2008/12/28 03:45:18 $)"[4]);
break;
default:
err_usage(usestr);
break;
}
}
if (optind != argc)
err_usage(usestr);
kill(getpid(), signum);
return(0);
}
And here's exitstatus.c:
/*
#(#)File: $RCSfile: exitstatus.c,v $
#(#)Version: $Revision: 1.2 $
#(#)Last changed: $Date: 2008/12/28 03:45:18 $
#(#)Purpose: Run command and report 16-bit exit status
#(#)Author: J Leffler
#(#)Copyright: (C) JLSS 2008
#(#)Product: :PRODUCT:
*/
/*TABSTOP=4*/
#if __STDC_VERSION__ >= 199901L
#define _XOPEN_SOURCE 600
#else
#define _XOPEN_SOURCE 500
#endif /* __STDC_VERSION__ */
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include "stderr.h"
#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
extern const char jlss_id_exitstatus_c[];
const char jlss_id_exitstatus_c[] = "#(#)$Id: exitstatus.c,v 1.2 2008/12/28 03:45:18 jleffler Exp $";
#endif /* lint */
int main(int argc, char **argv)
{
pid_t pid;
err_setarg0(argv[0]);
if (argc < 2)
err_usage("cmd [args...]");
if ((pid = fork()) < 0)
err_syserr("fork() failed: ");
else if (pid == 0)
{
/* Child */
execvp(argv[1], &argv[1]);
return(1);
}
else
{
pid_t corpse;
int status;
corpse = waitpid(pid, &status, 0);
if (corpse != pid)
err_syserr("waitpid() failed: ");
printf("0x%04X\n", status);
}
return(0);
}
The missing code, stderr.c and stderr.h, can easily be found in essentially any of my published programs. If you need it urgently, get it from the program SQLCMD at the IIUG Software Archive; alternatively, contact me by email (see my profile).

Resources