A process is considered to have completed correctly in Linux if its exit status was 0.
I've seen that segmentation faults often result in an exit status of 11, though I don't know if this is simply the convention where I work (the applications that failed like that have all been internal) or a standard.
Are there standard exit codes for processes in Linux?
Part 1: Advanced Bash Scripting Guide
As always, the Advanced Bash Scripting Guide has great information:
(This was linked in another answer, but to a non-canonical URL.)
1: Catchall for general errors
2: Misuse of shell builtins (according to Bash documentation)
126: Command invoked cannot execute
127: "command not found"
128: Invalid argument to exit
128+n: Fatal error signal "n"
255: Exit status out of range (exit takes only integer args in the range 0 - 255)
Part 2: sysexits.h
The ABSG references sysexits.h.
On Linux:
$ find /usr -name sysexits.h
/usr/include/sysexits.h
$ cat /usr/include/sysexits.h
/*
* Copyright (c) 1987, 1993
* The Regents of the University of California. All rights reserved.
(A whole bunch of text left out.)
#define EX_OK 0 /* successful termination */
#define EX__BASE 64 /* base value for error messages */
#define EX_USAGE 64 /* command line usage error */
#define EX_DATAERR 65 /* data format error */
#define EX_NOINPUT 66 /* cannot open input */
#define EX_NOUSER 67 /* addressee unknown */
#define EX_NOHOST 68 /* host name unknown */
#define EX_UNAVAILABLE 69 /* service unavailable */
#define EX_SOFTWARE 70 /* internal software error */
#define EX_OSERR 71 /* system error (e.g., can't fork) */
#define EX_OSFILE 72 /* critical OS file missing */
#define EX_CANTCREAT 73 /* can't create (user) output file */
#define EX_IOERR 74 /* input/output error */
#define EX_TEMPFAIL 75 /* temp failure; user is invited to retry */
#define EX_PROTOCOL 76 /* remote error in protocol */
#define EX_NOPERM 77 /* permission denied */
#define EX_CONFIG 78 /* configuration error */
#define EX__MAX 78 /* maximum listed value */
8 bits of the return code and 8 bits of the number of the killing signal are mixed into a single value on the return from wait(2) & co..
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <signal.h>
int main() {
int status;
pid_t child = fork();
if (child <= 0)
exit(42);
waitpid(child, &status, 0);
if (WIFEXITED(status))
printf("first child exited with %u\n", WEXITSTATUS(status));
/* prints: "first child exited with 42" */
child = fork();
if (child <= 0)
kill(getpid(), SIGSEGV);
waitpid(child, &status, 0);
if (WIFSIGNALED(status))
printf("second child died with %u\n", WTERMSIG(status));
/* prints: "second child died with 11" */
}
How are you determining the exit status? Traditionally, the shell only stores an 8-bit return code, but sets the high bit if the process was abnormally terminated.
$ sh -c 'exit 42'; echo $?
42
$ sh -c 'kill -SEGV $$'; echo $?
Segmentation fault
139
$ expr 139 - 128
11
If you're seeing anything other than this, then the program probably has a SIGSEGV signal handler which then calls exit normally, so it isn't actually getting killed by the signal. (Programs can chose to handle any signals aside from SIGKILL and SIGSTOP.)
None of the older answers describe exit status 2 correctly. Contrary to what they claim, status 2 is what your command line utilities actually return when called improperly. (Yes, an answer can be nine years old, have hundreds of upvotes, and still be wrong.)
Here is the real, long-standing exit status convention for normal termination, i.e. not by signal:
Exit status 0: success
Exit status 1: "failure", as defined by the program
Exit status 2: command line usage error
For example, diff returns 0 if the files it compares are identical, and 1 if they differ. By long-standing convention, unix programs return exit status 2 when called incorrectly (unknown options, wrong number of arguments, etc.) For example, diff -N, grep -Y or diff a b c will all result in $? being set to 2. This is and has been the practice since the early days of Unix in the 1970s.
The accepted answer explains what happens when a command is terminated by a signal. In brief, termination due to an uncaught signal results in exit status 128+[<signal number>. E.g., termination by SIGINT (signal 2) results in exit status 130.
Notes
Several answers define exit status 2 as "Misuse of bash builtins". This applies only when bash (or a bash script) exits with status 2. Consider it a special case of incorrect usage error.
In sysexits.h, mentioned in the most popular answer, exit status EX_USAGE ("command line usage error") is defined to be 64. But this does not reflect reality: I am not aware of any common Unix utility that returns 64 on incorrect invocation (examples welcome). Careful reading of the source code reveals that sysexits.h is aspirational, rather than a reflection of true usage:
* This include file attempts to categorize possible error
* exit statuses for system programs, notably delivermail
* and the Berkeley network.
* Error numbers begin at EX__BASE [64] to reduce the possibility of
* clashing with other exit statuses that random programs may
* already return.
In other words, these definitions do not reflect the common practice at the time (1993) but were intentionally incompatible with it. More's the pity.
'1': Catch-all for general errors
'2': Misuse of shell builtins (according to Bash documentation)
'126': Command invoked cannot execute
'127': "command not found"
'128': Invalid argument to exit
'128+n': Fatal error signal "n"
'130': Script terminated by Ctrl + C
'255': Exit status out of range
This is for Bash. However, for other applications, there are different exit codes.
There are no standard exit codes, aside from 0 meaning success. Non-zero doesn't necessarily mean failure either.
Header file stdlib.h does define EXIT_FAILURE as 1 and EXIT_SUCCESS as 0, but that's about it.
The 11 on segmentation fault is interesting, as 11 is the signal number that the kernel uses to kill the process in the event of a segmentation fault. There is likely some mechanism, either in the kernel or in the shell, that translates that into the exit code.
Header file sysexits.h has a list of standard exit codes. It seems to date back to at least 1993 and some big projects like Postfix use it, so I imagine it's the way to go.
From the OpenBSD man page:
According to style(9), it is not good practice to call exit(3) with arbitrary values to indicate a failure condition when ending a program. Instead, the predefined exit codes from sysexits should be used, so the caller of the process can get a rough estimation about the failure class without looking up the source code.
To a first approximation, 0 is success, non-zero is failure, with 1 being general failure, and anything larger than one being a specific failure. Aside from the trivial exceptions of false and test, which are both designed to give 1 for success, there's a few other exceptions I found.
More realistically, 0 means success or maybe failure, 1 means general failure or maybe success, 2 means general failure if 1 and 0 are both used for success, but maybe success as well.
The diff command gives 0 if files compared are identical, 1 if they differ, and 2 if binaries are different. 2 also means failure. The less command gives 1 for failure unless you fail to supply an argument, in which case, it exits 0 despite failing.
The more command and the spell command give 1 for failure, unless the failure is a result of permission denied, nonexistent file, or attempt to read a directory. In any of these cases, they exit 0 despite failing.
Then the expr command gives 1 for success unless the output is the empty string or zero, in which case, 0 is success. 2 and 3 are failure.
Then there's cases where success or failure is ambiguous. When grep fails to find a pattern, it exits 1, but it exits 2 for a genuine failure (like permission denied). klist also exits 1 when it fails to find a ticket, although this isn't really any more of a failure than when grep doesn't find a pattern, or when you ls an empty directory.
So, unfortunately, the Unix powers that be don't seem to enforce any logical set of rules, even on very commonly used executables.
Programs return a 16 bit exit code. If the program was killed with a signal then the high order byte contains the signal used, otherwise the low order byte is the exit status returned by the programmer.
How that exit code is assigned to the status variable $? is then up to the shell. Bash keeps the lower 7 bits of the status and then uses 128 + (signal nr) for indicating a signal.
The only "standard" convention for programs is 0 for success, non-zero for error. Another convention used is to return errno on error.
Standard Unix exit codes are defined by sysexits.h, as David mentioned.
The same exit codes are used by portable libraries such as Poco - here is a list of them:
Class Poco::Util::Application, ExitCode
A signal 11 is a SIGSEGV (segment violation) signal, which is different from a return code. This signal is generated by the kernel in response to a bad page access, which causes the program to terminate. A list of signals can be found in the signal man page (run "man signal").
When Linux returns 0, it means success. Anything else means failure. Each program has its own exit codes, so it would been quite long to list them all...!
About the 11 error code, it's indeed the segmentation fault number, mostly meaning that the program accessed a memory location that was not assigned.
Some are convention, but some other reserved ones are part of POSIX standard.
126 -- A file to be executed was found, but it was not an executable utility.
127 -- A utility to be executed was not found.
>128 -- A command was interrupted by a signal.
See the section RATIONALE of man 1p exit.
Related
I am running the following Python code and checked the docs of subprocess but still din't find the code map:
import pkg_resources
from subprocess import call
packages = [dist.project_name for dist in pkg_resources.working_set]
call("sudo pip3 install --upgrade " + ' '.join(packages), shell=True)
Output: 2
What 2 means? Sometimes it's 0 or 1.
Every Linux command executed by the shell script or user, has an exit status.
The Linux man pages stats the exit statuses of each command.
0 exit status means the command was successful without any errors.
A non-zero (1-255 values) exit status means command was failure.
Typical signals are:
0 – Success.
1 – A built-in command failure.
2 – A syntax error has occurred.
3 – Signal received that is not trapped.
Certain exit status values have been reserved for special uses:
126 - A file to be executed was found, but it was not an executable utility.
127 - A utility to be executed was not found.
128 - A command was interrupted by a signal.
A negative value -N indicates that the child was terminated by signal N (POSIX only).
The codes are platform specific, for example on Windows:
0 - Success
1 - Invalid function
2 - File not found
3 - Path not found
4 - Too many open files
etc...
From the python doc:
Some systems have a convention for assigning specific meanings to
specific exit codes, but these are generally underdeveloped; Unix
programs generally use 2 for command line syntax errors and 1 for all
other kind of errors.
So I was following a tutorial about buffer overflow with the following code:
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
int main(int argc, char **argv)
{
volatile int modified;
char buffer[64];
modified = 0;
gets(buffer);
if(modified != 0) {
printf("you have changed the 'modified' variable\n");
} else {
printf("Try again?\n");
}
}
I then compile it with gcc and additionnally run beforehand sudo sysctl -w kernel.randomize_va_space=0 to prevent random memory and allow the stack smashing (buffer overflow) exploit
gcc protostar.c -g -z execstack -fno-stack-protector -o protostar
-g is to allow debugging in gdb ('list main')
-z execstack -fno-stack-protector is to remove the stack protection
and then execute it:
python -c 'print "A"*76' | ./protostar
Try again?
python -c 'print "A"*77' | ./protostar
you have changed the 'modified' variable
So I do not understand why the buffer overflow occurs with 77 while it should have been 65, so it makes a 12 bits difference (3 bytes). I wonder the reason why if anyone has a clear explanation ?
Also it remains this way from 77 to 87:
python -c 'print "A"*87' | ./protostar
you have changed the 'modified' variable
And from 88 it adds a segfault:
python -c 'print "A"*88' | ./protostar
you have changed the 'modified' variable
Segmentation fault (core dumped)
Regards
To fully understand what's happening, it's first important to make note of how your program is laying out memory.
From your comment, you have that for this particular run, memory for buffer starts at 0x7fffffffdf10 and then modified starts at 0x7fffffffdf5c (although randomize_va_space may keep this consistent across runs, but I'm not quite sure).
So you have something like this:
0x7fffffffdf10 0x7fffffffdf50 0x7fffffffdf5c
↓ ↓ ↓
(64 byte buffer)..........(some 12 bytes).....(modified)....
Essentially, you have the 64 character buffer, then when that ends, there's 12 bytes that are used for some other stack variable (likely 4 bytes argc and 8 bytes for argv), and then modified comes after, precisely starting 64+12 = 76 bytes after the buffer starts.
Therefore, when you write between 65 and 76 characters into the 64 byte buffer, it goes past and starts writing into those 12 bytes that are in-between the buffer and modified. When you start writing the 77th character, it starts overwriting what's in modified which causes you to see the "you have changed the 'modified' variable" message.
You asked also "why does it work if I go up to 87 and then at 88 there's a segfault? The answer is that because it's undefined behavior, as soon as you start writing into invalid memory and the kernel recognizes it, it'll immediately kill your process because you are trying to read/write memory you don't have access to.
Note that you should almost never use gets in practice and this is a big reason, since you don't know exactly how many bytes you will be reading so there's a chance to overwrite. Also note that the behavior you're seeing is not the same behavior I'm seeing on my machine when I run it. This is normal, and that's because it's undefined behavior. There are no guarantees to what will happen when you run it. On my machine, modified actually comes before buffer in memory, so I don't ever see the modified variable get overwritten. I think this is a good learning example to understand why undefined behavior like this is just so unpredictable.
I know $? from shell holds the last executed programs exit status.
For example, when I run below commands, I saw different status for different situation.
test$ hello
-bash: hello: command not found
test$ echo $?
127
test$ expr 1 / 0
expr: division by zero
test$ echo $?
2
I was wondering if there is any common exit status list in system or internet where i can get all the exit status with their descriptions. I found a list here, but some codes are missing, for example status code 127.
There can be no comprehensive list, because the meaning of command exit statuses is inherently command-specific. For a given command, you can usually get information about this on the respective command's manual page and Info documents.
In the case of
test$ hello
-bash: hello: command not found
test$ echo $?
127
the exit code 127 comes from bash, because the requested command itself couldn't be found.
In the case of
test$ expr 1 / 0
expr: division by zero
test$ echo $?
2
the exit code 2 comes from expr.
Some of these commands might be standardized or at least coordinated for several commands or a group of commands (e.g. "sh-compatible shells", I could imagine), but unless a command wants to conform to one of these conventions (and there are probably multiple conflicting conventions around), the command's authors are completely free to decide what they want their exit status codes to mean.
There's one important exception: All UNIX commands should adhere to this loose rule to be good citizens and provide meaningful composability (e.g. with pipes) on the command line:
0 means 'success' or "true"/"truthy"
non-0 means (in a very broad sense) 'failure' or 'non-success' or "false"/"falsy"
As you can see, this still leaves a lot of room for interpretation, which is perfectly intended, because these meanings must be specific to the context of the individual commands. (Consider e.g. the false command, that has the very purpose to "fail", thus always returns a non-0 exit code.)
The list you found describes return codes for system calls. System calls are when a program makes a request (in)to the kernel and are not the same as command invocation, thus these return codes are not (necessarily) the same as command exit codes.
The exit status is a numeric value that is returned by a program to the calling program or shell. In C programs, this is represented by the return value of the main() function or the value you give to exit(3). The only part of the number that matters are the least significant 8 bits, which means there are only values from 0 to 255.
Code Description
0 success
1-255 failure (in general)
126 the requested command (file) can't be executed (but was found)
127 command (file) not found
128 according to ABS it's used to report an invalid argument to the exit
builtin, but I wasn't able to verify that in the source code of Bash
(see code 255)
128 + N the shell was terminated by the signal N (also used like this by
various other programs)
255 wrong argument to the exit builtin (see code 128)
The lower codes 0 to 125 are not reserved and may be used for whatever the program likes to report. A value of 0 means successful termination, a value not 0 means unsuccessful termination. This behaviour (== 0, != 0) is also what Bash reacts on in some code flow control statements like if or while.
The above excerpt taken from Exit Status section from Bash Hackers Wiki.
The list you showed is really the closest possible thing to a "standardization", but frankly it looks more legit than it actually is. As far as I am aware of, almost no one pays much attention to these guys, but instead everyone names their own exit statuses:
Execute test1.sh
#!/bin/bash
a=10 ; [ "$a" -eq 9 ] && echo "Cool!" || exit 200
Output:
:~$ test1.sh
:~$ echo $?
200
This question already has answers here:
What is the $? (dollar question mark) variable in shell scripting? [duplicate]
(9 answers)
Closed 7 years ago.
I have tried those commands.
~$top
(ctrl + z)stopped the process
~$echo $?
147
~$top
(ctrl + c)killed the process
~$echo $?
0
What happened here, please explain it and why it showing some constant value. What is the meaning of those values.
$? is the return code from the last run process. 0 means no error happened. Other values represent some kind of unusual condition.
Values 128 and above usually represent some kind of signal. 147 - 128 = 19, which means the program received signal 19 (SIGSTOP on Linux). Now, normally pressing ^Z sends SIGTSTP (a different signal from SIGSTOP), which probably meant that top caught that signal, did some (probably terminal-related) cleanup, and reissued SIGSTOP to actually suspend the program.
top also caught SIGINT (which is normally issued after pressing ^C), to do cleanup and exit cleanly (with exit value 0).
You can run kill -l to see what all the signal numbers are for the current platform. Note that the numbers are different for different platforms; for example, SIGSTOP is 17 on Darwin and 19 on Linux.
echo $? returns the return value (exit status) of the last executed command (0 is usually success).
I have a binary that occasionally throws a fatal error. I have no access to the source code, but it can be something like the following C++ code compiled as test.a:
#include <ctime>
int main() {
int *a = new int;
delete a;
time_t t=time(0);
if (t%2==0)
delete a;
return 0;
}
In approximately 50% of runs it leads to the error. What is important, this type of error cannot be caught through the redirecting of stderr:
$ ./test.a 2>/tmp/Error
*** Error in `./test.a': double free or corruption (fasttop): 0x00000000007de010 ***
Aborted (core dumped)
When I asked, why this happens, I received an answer that "gcc (and probably glibc) open /dev/tty directly to output such fatal errors bypassing IO redirection".
However, it would be very nice to have a way to catch this error from a script. As it occurs randomly and not all the time, it would be useful to retry, e.g.:
while :; do
./test.a 2>/tmp/Error
ERROR="$(</tmp/Error)"
if [ -z "$ERROR" ]; then
echo "Smooth run!"
break
else
echo "Error occured. Retry!"
fi
done
This of course does not work and in 50% cases outputs this:
*** Error in `./test.a': double free or corruption (fasttop): 0x0000000000df3010 ***
./test.sh: line 10: 16788 Aborted (core dumped) ./test.a 2> /tmp/Error
Smooth run!
Therefore, I wonder, if there is a way to catch a fatal error and then make a retry to run the bugged binary (I primarily look for a linux solution)?
EDIT:
If a binary is called directly from the script, the desired behaviour can be achieved by checking the exit status:
while :; do
./test.a
if (( $? == 0 )); then
echo "Smooth run!"
break
else
echo "Error occured. Retry!"
fi
done
Of course, for this to work the callable executable itself must be written properly. In my real case, for example, the error-causing binary itself is called from another binary. Let us consider test2.a as the result of compilation of the following code:
#include <cstdlib>
#include <iostream>
using namespace std;
int main() {
std::system("./test.a");
cout<<"Execution of test.a finished"<<endl;
return 0;
}
In this case call of ./test2.a always exits with status 0, as crash during the system() call does not interrupt the execution of the binary:
*** Error in `./test.a': double free or corruption (fasttop): 0x0000000002431010 ***
Aborted (core dumped)
Execution of test.a finished
Smooth run!
How to circumvent this (as I cannot modify the binaries) I still do not know.