What is a good Linux exit error code strategy? - linux

I have several independent executable Perl, PHP CLI scripts and C++ programs for which I need to develop an exit error code strategy. These programs are called by other programs using a wrapper class I created to use exec() in PHP. So, I will be able to get an error code back. Based on that error code, the calling script will need to do something.
I have done a little bit of research and it seems like anything in the 1-254 (or maybe just 1-127) range could be fair game to user-defined error codes.
I was just wondering how other people have approached error handling in this situation.

The only convention is that you return 0 for success, and something other than zero for an error. Most well-known unix programs document the various return codes that they can return, and so should you. It doesn't make a lot of sense to try to make a common list for all possible error codes that any arbitrary program could return, or else you end up with tens of thousands of them like some other OS's, and even then, it doesn't always cover the specific type of error you want to return.
So just be consistent, and be sure to document whatever scheme you decide to use.

1-127 is the available range. Anything over 127 is supposed to be "abnormal" exit - terminated by a signal.
While you're at it, consider using stdout rather than exit code. Exit code is by tradition used to indicate success, failure, and may be one other state. Rather than using exit code, try using stdout the way expr and wc use it. You can then use backtick or something similar in the caller to extract the result.

the unix manifesto states -
Exit as soon and as loud as possible on error
or something like that

Don't try to encode too much meaning into the exit value: detailed statuses and error reports should go to stdout / stderr as Arkadiy suggests.
However, I have found it very useful to represent just a handful of states in the exit values, using binary digits to encode them. For example, suppose you have the following contrived meanings:
0000 : 0 (no error)
0001 : 1 (error)
0010 : 2 (I/O error)
0100 : 4 (user input error)
1000 : 8 (permission error)
Then, a user input error would have a return value of 5 (4 + 1), while a log file not having write permission might have a return value of 11 (8 + 2 + 1). As the different meanings are independently encoded in the return value, you can easily see what's happened by checking which bits are set.
As a special case, to see if there was an error you can AND the return code with 1.
By doing this, you can encode a couple of different things in the return code, in a clear and simple way. I use this only to make simple decisions such as "should the process be restarted", "do the return value and relevant logs need to be sent to an admin", that sort of thing. Any detailed diagnostic information should go to logs or to stdout / stderr.

The normal exit statuses run from 0 to 255 (see Exit codes bigger than 255 posssible for a discussion of why). Normally, status 0 indicates success; anything else is an implementation-defined error. I do know of a program that reports the state of a DBMS server via the exit status; that is a special case of implementation-defined exit statuses. Note that you get to define the implementation of the statuses of your programs.
I couldn't fit this into 300 characters; otherwise it would have been a comment to #Arkadiy's answer.
Arkadiy is right that in one part of the exit status word, values other than zero indicate the signal that terminated the process and the 8th bit normally indicates a core dump, but that section of the exit status is different from the main 0..255 status. However, the shell (whichever shell it is) is presented with a problem when a process dies as a result of a signal. There is 16 bits of data to be presented in an 8-bit value, which is always tricky. What the shells seem to do is to take the signal number and add 128 to it. So, if a process dies as a result of an interrupt (signal number 2, SIGINT), the shell reports the exit status as 130. However, the kernel reported the status as 0x0002; the shell has modified what the kernel reports.
The following C code demonstrates this. There are two programs
suicide which kills itself using a signal of your choosing (interrupt by default).
exitstatus which runs a command (such as suicide) and reports the kernel exit status.
Here's suicide.c:
/*
#(#)File: $RCSfile: suicide.c,v $
#(#)Version: $Revision: 1.2 $
#(#)Last changed: $Date: 2008/12/28 03:45:18 $
#(#)Purpose: Commit suicide using kill()
#(#)Author: J Leffler
#(#)Copyright: (C) JLSS 2008
#(#)Product: :PRODUCT:
*/
/*TABSTOP=4*/
#if __STDC_VERSION__ >= 199901L
#define _XOPEN_SOURCE 600
#else
#define _XOPEN_SOURCE 500
#endif /* __STDC_VERSION__ */
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include "stderr.h"
static const char usestr[] = "[-V][-s signal]";
#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
extern const char jlss_id_suicide_c[];
const char jlss_id_suicide_c[] = "#(#)$Id: suicide.c,v 1.2 2008/12/28 03:45:18 jleffler Exp $";
#endif /* lint */
int main(int argc, char **argv)
{
int signum = SIGINT;
int opt;
char *end;
err_setarg0(argv[0]);
while ((opt = getopt(argc, argv, "Vs:")) != -1)
{
switch (opt)
{
case 's':
signum = strtol(optarg, &end, 0);
if (*end != '\0' || signum <= 0)
err_error("invalid signal number %s\n", optarg);
break;
case 'V':
err_version("SUICIDE", &"#(#)$Revision: 1.2 $ ($Date: 2008/12/28 03:45:18 $)"[4]);
break;
default:
err_usage(usestr);
break;
}
}
if (optind != argc)
err_usage(usestr);
kill(getpid(), signum);
return(0);
}
And here's exitstatus.c:
/*
#(#)File: $RCSfile: exitstatus.c,v $
#(#)Version: $Revision: 1.2 $
#(#)Last changed: $Date: 2008/12/28 03:45:18 $
#(#)Purpose: Run command and report 16-bit exit status
#(#)Author: J Leffler
#(#)Copyright: (C) JLSS 2008
#(#)Product: :PRODUCT:
*/
/*TABSTOP=4*/
#if __STDC_VERSION__ >= 199901L
#define _XOPEN_SOURCE 600
#else
#define _XOPEN_SOURCE 500
#endif /* __STDC_VERSION__ */
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include "stderr.h"
#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
extern const char jlss_id_exitstatus_c[];
const char jlss_id_exitstatus_c[] = "#(#)$Id: exitstatus.c,v 1.2 2008/12/28 03:45:18 jleffler Exp $";
#endif /* lint */
int main(int argc, char **argv)
{
pid_t pid;
err_setarg0(argv[0]);
if (argc < 2)
err_usage("cmd [args...]");
if ((pid = fork()) < 0)
err_syserr("fork() failed: ");
else if (pid == 0)
{
/* Child */
execvp(argv[1], &argv[1]);
return(1);
}
else
{
pid_t corpse;
int status;
corpse = waitpid(pid, &status, 0);
if (corpse != pid)
err_syserr("waitpid() failed: ");
printf("0x%04X\n", status);
}
return(0);
}
The missing code, stderr.c and stderr.h, can easily be found in essentially any of my published programs. If you need it urgently, get it from the program SQLCMD at the IIUG Software Archive; alternatively, contact me by email (see my profile).

Related

Cygwin FIFO vs native Linux FIFO - discrepancy in blocking behaviour?

The code shown is based on an example using named pipes from some tutorial site
server.c
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#define FIFO_FILE "MYFIFO"
int main()
{
int fd;
char readbuf[80];
int read_bytes;
// mknod(FIFO_FILE, S_IFIFO|0640, 0);
mkfifo(FIFO_FILE, 0777);
while(1) {
fd = open(FIFO_FILE, O_RDONLY);
read_bytes = read(fd, readbuf, sizeof(readbuf));
readbuf[read_bytes] = '\0';
printf("Received string: \"%s\". Length is %d\n", readbuf, (int)strlen(readbuf));
}
return 0;
}
When executing the server in Windows, using Cygwin, then the server enters an undesired loop, repeating the same message. For example, if you write in a shell:
$ ./server
|
then the "server" waits for the client, but when the FIFO is not empty, e.g. writing in a new shell
$ echo "Hello" > MYFIFO
then the server enters an infinite loop, repeating the "Hello"-string
Received string: "Hello". Length is 4
Received string: "Hello". Length is 4
...
Furthermore, new strings written to the fifo doesn't seem to be read by the server. However, in Linux the behaviour is quite different. In Linux, the server prints the string and waits for new data to appear on the fifo. What is the reason for this discrepancy ?
You need to fix your code to remove at least 3 bugs:
You're not doing a close(fd) so you will get a file descriptor leak and eventually be unable to open() new files.
You're not checking the value of fd (if it returns -1 then there was an error).
You're not checking the value of read (if it returns -1 then there was an error)... and your readbuf[read_bytes] = '\0'; will not be doing what you expect as a result.
When you get an error then errno will tell you what went wrong.
These bugs probably explain why you keep getting Hello output (especially the readbuf[read_bytes] problem).

Python3 fuzzer get return code name

I've written a fuzzer to cause a buffer overflow on a vulnerable C application by creating a subprocess of it.
CASE #2 (Size = 24):
IN: AjsdfFjSueFmVnJiSkOpOjHk
OUT: -11
IN symbolizes the value passed to scanf
OUT symbolizes the return value
the vulnerable program:
#include <stdio.h>
#include <stdlib.h>
#define N 16 /* buffer size */
int main(void) {
char name[N]; /* buffer */
/* prompt user for name */
printf("What's your name? ");
scanf("%s", name);
printf("Hi there, %s!\n", name); /* greet the user */
return EXIT_SUCCESS;
}
running this vulnerable program manually with my above generated payload it returns:
Segmentation Fault
Now to properly print the error cause I'd like to map the int return value to an enumeration -> like Segmentation Fault = -11
However, during my research I could not find any information on how these error codes are actually mapped, even for my example -11 = Segmentation fault
I found the solution:
Popen.returncode
The child return code, set by poll() and wait() (and indirectly by communicate()). A None value indicates that the process hasn’t
terminated yet.
A negative value -N indicates that the child was terminated by signal N (Unix only).
-> Unix Signals
Hope this helps someone else too.

How to to get custom return value from system()

I need to pass 1 value between programs. In my case, I run (VERY SIMPLE) program within another by calling system("SimpleProgram").
Is there a way how to pass 1 value (integer) returned by SimpleProgram. Neither "return 123" nor "exit(123)" doesnt work.
Is there any elegant way to pass such value? (I dont want to write and read an external file)
EDIT:
The language is C++, the programming is done on BeagleBone with Angstrom distribution.
retCode = system("cd /home/martin/uart/temp/xml_parser && ./xmldom");
Note what the man page for system(3) says about the return code:
The value returned is -1 on error (e.g. fork(2) failed), and the
return status of the command otherwise.
This latter return status is in the format specified in wait(2). Thus, the exit code of the command will
be WEXITSTATUS(status).
So you're almost there. If you have a simple program that returns 123, as you stated:
int main(int argc, char **argv) {
return 123;
}
then you can run it with system(3) and see its return code by using WEXITSTATUS():
#include <iostream>
using namespace std;
#include <stdlib.h>
int main(int argc, char **argv) {
int rc = system(argv[1]);
cout << WEXITSTATUS(rc) << '\n';
}
Naming the first program return123 and the second system:
$ ./system ./return123
123
If you leave off the WEXITSTATUS() and just print rc directly, you will get an incorrect value.
The standard way to do this is with UNIX pipes.
If it's just a hack, you might as well just use the binary return value, but in either case, you'd have to use execve() instead of system().

Linux 2.6.23 . Error In Receiving . Read function Returns -1

Please refer the Below code :
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <termios.h>
#define BAUDRATE B115200
#define SER_DEVICE "/dev/ttyS0"
#define FALSE 0
#define TRUE 1
int main()
{
int fd,c,res,i,n;
struct termios oldtio,newtio;
unsigned char buf[255] = "WELCOME TO THE WORLD OF LINUX PROGRAMMING\n";
unsigned char buf2[255]= {"\0"};
//Opening a Device for Reading Writing.
//O_NOCTTY : - The Port Never Becomes the Controlling Terminal of the Process.
//O_NDELAY : - Use NON-Blocking IO. on some system this also means Deactivating the DCD line.
fd=open("/dev/ttyS0",O_RDWR | O_NOCTTY | O_NDELAY);
if(fd<0)
{
printf("\nError in opening the File\n");
exit(0);
}
else
{
printf("File Opened SuccessFull..HurraYYY !!!!1\n");
}
//printf("--------------Test Begin---------------\n");
//Save Current Serial Port Settings
tcgetattr(fd,&oldtio);
//clear the struct for New port settings
memset(&newtio,0,sizeof(newtio));
//Baud rate : Set bps rate .
//You could also use cfsetispeed and cfsetospeed.
//CRTSCTS : Output Hardware Flow control
//CS8 : 8n1(8bit No Parity 1 Stopbit)
//CLOCAL : local connection no modem control
//CREAD : Enable Receiving character
//printf("Setting Configuration for Port");
newtio.c_cflag |= (BAUDRATE | CRTSCTS | CS8 | CLOCAL | CREAD);
//IGNPAR : Ignore bytes with parity error.
//ICRNL : map CR to NL
//printf("Setting Parity\n");
newtio.c_cflag |= (IGNPAR | ICRNL);
//RAW output
//printf("Raw Output\n");
newtio.c_oflag = 0;
//printf("Enabling Canonical format \n");
//ICANON : Enable canonical input.
newtio.c_lflag |= ICANON;
//printf("Initialising Char\n");
//Initialise all characters
newtio.c_cc[VMIN] = 1; /*Blocking read until one character arrives*/
newtio.c_cc[VTIME] = 0; /*Inter character timer unused*/
/*
Now clean the Modem Line and Activate the Settings for the Port.
*/
tcflush(fd,TCIFLUSH);
printf("Flushing Lines\n");
tcsetattr(fd,TCSANOW,&newtio);
n=write(fd,&buf,42);
printf("n=%d",n);
for(i=0;i<sizeof(unsigned int);i++);
for(i=0;i<sizeof(unsigned int);i++);
for(i=0;i<sizeof(unsigned int);i++);
for(i=0;i<200;i++)
printf("");
n=0;
n = read(fd,&buf2,42);
if(n==-1)
{
printf("\nError in Receiving");
}
else
printf("Received String = %s",buf2);
/*
Restore the Old Port Setting
*/
tcsetattr(fd,TCSANOW,&oldtio);
printf("==============TEST END==============\n");
close(fd);
}
I am able to transmit the String which appears on the Hyperterminal. But the Function Read returns value as -1.
The Possibility i found is :
1. for Receiving the Configuration is Wrong.
2. Looping back is needed or not.
I tried Looping Back to but it does not Work.
i executed the Code in while(1)
transmit ans Receive ... and if read returns something != -1 ..break from the Loop. But that to doesn't work.
What is the minimum delay that one should add in read/write cycle.
I am Executing this Code on MPC 8641d Processor.
Please Your Suggestion are important to me.
Hoping for your Guidence !!!! :)
To know the detailed reason for read() failing, you need to see what value is stored in the global variable errno (this is documented in the man page for read). An easy way to do that is to use perror() instead of printf() when you print the failure message--perror() will append a human-readable string that tells you the reason.
Read John Zwinck's answer before this one ;)
For background info about errno: http://www.gnu.org/software/libc/manual/html_node/Checking-for-Errors.html
To elaborate on the significance of the specific errno WRT read: not all "errors" mean "you've done something wrong" or "this connection cannot be read from". They may mean simply that this connection cannot be read from at this instant, eg, if errno is EAGAIN on a non-blocking connection.
That means you will have to figure out what the error is, and if it is of that sort, how to deal with it. Then you have to test against errno specifically, eg:
#include <errno.h>
int bytes = read(...);
if (bytes == -1) {
// example of an error which may happen under normal conditions
// for certain kinds of file descriptors:
if (errno == EAGAIN) {
// handle appropriately
} else {
// this is a real error which should not happen
}
}
You can find the constants by printing the int value of errno and looking thru errno.h. Chances are, they are actually in a file included by errno.h, such as /usr/include/asm-generic/errno.h and errno-base.h. Random example from the former on my system:
#define ECOMM 70 /* Communication error on send */
So perror() or strerror() would (probably) report "Communication error on send", but in any case, the int value of this is 70. Do not use that in your code, they can vary across implementations; #include <errno.h> and use the constant ECOMM.

Return code when OS kills your process

I've wanted to test if with multiply processes I'm able to use more than 4GB of ram on 32bit O.S (mine: Ubuntu with 1GB ram).
So I've written a small program that mallocs slightly less then 1GB, and do some action on that array, and ran 5 instances of this program vie forks.
The thing is, that I suspect that O.S killed 4 of them, and only one survived and displayed it's "PID: I've finished").
(I've tried it with small arrays and got 5 printing, also when I look at the running processes with TOP, I see only one instance..)
The weird thing is this - I've received return code 0 (success?) in ALL of the instances, including the ones that were allegedly killed by O.S.
I didn't get any massage stating that processes were killed.
Is this return code normal for this situation?
(If so, it reduces my trust in 'return codes'...)
thanks.
Edit: some of the answers suggested possible errors in the small program, so here it is. the larger program that forks and saves return codes is larger, and I have trouble uploading it here, but I think (and hope) it's fine.
Also I've noticed that if instead of running it with my forking program, I run it with terminal using './a.out & ./a.out & ./a.out & ./a.out &' (when ./a.out is the binary of the small program attached)
I do see some 'Killed' messages.
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#define SMALL_SIZE 10000
#define BIG_SIZE 1000000000
#define SIZE BIG_SIZE
#define REAPETS 1
int
main()
{
pid_t my_pid = getpid();
char * x = malloc(SIZE*sizeof(char));
if (x == NULL)
{
printf("Malloc failed!");
return(EXIT_FAILURE);
}
int x2=0;
for(x2=0;x2<REAPETS;++x2)
{
int y;
for(y=0;y<SIZE;++y)
x[y] = (y+my_pid)%256;
}
printf("%d: I'm over.\n",my_pid);
return(EXIT_SUCCESS);
}
Well, if your process is unable to malloc() the 1GB of memory, the OS will not kill the process. All that happens is that malloc() returns NULL. So depending on how you wrote your code, it's possible that the process could return 0 anyway - if you wanted it to return an error code when a memory allocation fails (which is generally good practice), you'd have to program that behavior into it.
What signal was used to kill the processes?
Exit codes between 0 and 127, inclusive, can be used freely, and codes above 128 indicate that the process was terminated by a signal, where the exit code is
128 + the number of the signal used
A process' return status (as returned by wait, waitpid and system) contains more or less the following:
Exit code, only applies if process terminated normally
whether normal/abnormal termination occured
Termination signal, only applies if process was terminated by a signal
The exit code is utterly meaningless if your process was killed by the OOM killer (which will apparently send you a SIGKILL signal)
for more information, see the man page for the wait command.
This code shows how to get the termination status of a child:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
int
main (void)
{
pid_t pid = fork();
if (pid == -1)
{
perror("fork()");
}
/* parent */
else if (pid > 0)
{
int status;
printf("Child has pid %ld\n", (long)pid);
if (wait(&status) == -1)
{
perror("wait()");
}
else
{
/* did the child terminate normally? */
if(WIFEXITED(status))
{
printf("%ld exited with return code %d\n",
(long)pid, WEXITSTATUS(status));
}
/* was the child terminated by a signal? */
else if (WIFSIGNALED(status))
{
printf("%ld terminated because it didn't catch signal number %d\n",
(long)pid, WTERMSIG(status));
}
}
}
/* child */
else
{
sleep(10);
exit(0);
}
return 0;
}
Have you checked the return value from fork()? There's a good chance that if fork() can't allocate enough memory for the new process' address space, then it will return an error (-1). A typical way to call fork() is:
pid_t pid;
switch(pid = fork())
{
case 0:
// I'm the child process
break;
case -1:
// Error -- check errno
fprintf(stderr, "fork: %s\n", strerror(errno));
break;
default:
// I'm the parent process
}
Exit code is only "valid" when WIFEXITED macro evaluates to true. See man waitpid(2).
You can use WIFSIGNALED macro to see if your program has been signaled.

Resources