what function in Linux API implements execution of a script file with a shebang? - linux

From https://unix.stackexchange.com/a/2910/674
... the way shebang (#!) is typically implemented:
The kernel opens the executable, and finds that it starts with #!.
The kernel closes the executable and opens the interpreter instead.
The kernel inserts the path to the script to the argument list (as argv[1]), and executes the interpreter.
I was wondering what function in Linux API implements the above steps for execution of a script file with a shebang?
I have considered the following possibilities, but none of them seems a match:
execve() will fail to execute a script.
Either execlp() or execvp() seems to be just for executing a
script without any shebang, default to be /bin/sh, according to
APUE:
If either execlp() or execvp() finds an executable file using one of
the path prefixes, but the file isn’t a machine executable that was
generated by the link editor, the function assumes that the file is a
shell script and tries to invoke /bin/sh with the filename as input to
the shell.
Can either execlp() or execvp() execute a script with a shebang
for any language's interpreter (Python, Perl, Bash, ...).
Thanks.

It should be implemented by execve(). All the other functions in the exec family are just wrappers around this (the ones ending with p perform the $PATH search to find the executable argument, the ones with l build the argv array by iterating through the variadic argument list).
It works the same for any language's interpreter -- the mechanism doesn't really care what the program in the shebang line does, it just executes it with the script pathname as an argument. You can even do:
#!/bin/cat
to create a file that just prints itself when you execute it.

Related

What is the purpose of a shebang [duplicate]

In a script you must include a #! on the first line followed by the path to the program that will execute the script (e.g.: sh, perl).
As far as I know, the # character denotes the start of a comment and that line is supposed to be ignored by the program executing the script. It would seem, that this first line is at some point read by something in order for the script to be executed by the proper program.
Could somebody please shed more light on the workings of the #!?
I'm really curious about this, so the more in-depth the answer the better.
Recommended reading:
The UNIX FAQ: Why do some scripts start with #! ... ?
The #! magic, details about the shebang/hash-bang mechanism on various Unix flavours
Wikipedia: Shebang
The unix kernel's program loader is responsible for doing this. When exec() is called, it asks the kernel to load the program from the file at its argument. It will then check the first 16 bits of the file to see what executable format it has. If it finds that these bits are #! it will use the rest of the first line of the file to find which program it should launch, and it provides the name of the file it was trying to launch (the script) as the last argument to the interpreter program.
The interpreter then runs as normal, and treats the #! as a comment line.
The Linux kernel exec system call uses the initial bytes #! to identify file type
When you do on bash:
./something
on Linux, this calls the exec system call with the path ./something.
This line gets called in the kernel on the file passed to exec: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_script.c#L25
if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
It reads the very first bytes of the file, and compares them to #!.
If the comparison is true, then the rest of the line is parsed by the Linux kernel, which makes another exec call with path /usr/bin/env python and current file as the first argument:
/usr/bin/env python /path/to/script.py
and this works for any scripting language that uses # as a comment character.
And yes, you can make an infinite loop with:
printf '#!/a\n' | sudo tee /a
sudo chmod +x /a
/a
Bash recognizes the error:
-bash: /a: /a: bad interpreter: Too many levels of symbolic links
#! is human readable, but that is not necessary.
If the file started with different bytes, then the exec system call would use a different handler. The other most important built-in handler is for ELF executable files: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_elf.c#L1305 which checks for bytes 7f 45 4c 46 (which also happens to be human readable for .ELF). Let's confirm that by reading the 4 first bytes of /bin/ls, which is an ELF executable:
head -c 4 "$(which ls)" | hd
output:
00000000 7f 45 4c 46 |.ELF|
00000004
So when the kernel sees those bytes, it takes the ELF file, puts it into memory correctly, and starts a new process with it. See also: How does kernel get an executable binary file running under linux?
Finally, you can add your own shebang handlers with the binfmt_misc mechanism. For example, you can add a custom handler for .jar files. This mechanism even supports handlers by file extension. Another application is to transparently run executables of a different architecture with QEMU.
I don't think POSIX specifies shebangs however: https://unix.stackexchange.com/a/346214/32558 , although it does mention in on rationale sections, and in the form "if executable scripts are supported by the system something may happen". macOS and FreeBSD also seem to implement it however.
Short story: The shebang (#!) line is read by the shell (e.g. sh, bash, etc.) the operating system's program loader. While it formally looks like a comment, the fact that it's the very first two bytes of a file marks the whole file as a text file and as a script. The script will be passed to the executable mentioned on the first line after the shebang. Voilà!
Slightly longer story: Imagine you have your script, foo.sh, with the executable bit (x) set. This file contains e.g. the following:
#!/bin/sh
# some script commands follow...:
# *snip*
Now, on your shell, you type:
> ./foo.sh
Edit: Please also read the comments below after or before you read the following! As it turns out, I was mistaken. It's apparently not the shell that passes the script to the target interpreter, but the operating system (kernel) itself.
Remember that you type this inside the shell process (let's assume this is the program /bin/sh). Therefore, that input will have to be processed by that program. It interprets this line as a command, since it discovers that the very first thing entered on the line is the name of a file that actually exists and which has the executable bit(s) set.
/bin/sh then starts reading the file's contents and discovers the shebang (#!) right at the very beginning of the file. To the shell, this is a token ("magic number") by which it knows that the file contains a script.
Now, how does it know which programming language the script is written it? After all, you can execute Bash scripts, Perl scripts, Python scripts, ... All the shell knows so far is that it is looking at a script file (which is not a binary file, but a text file). Thus it reads the next input up to the first line break (which will result in /bin/sh, compare with the above). This is the interpreter to which the script will be passed for execution. (In this particular case, the target interpreter is the shell itself, so it doesn't have to invoke a new shell for the script; it simply processes the rest of the script file itself.)
If the script was destined for e.g. /bin/perl, all that the Perl interpreter would (optionally) have to do is look whether the shebang line really mentions the Perl interpreter. If not, the Perl interpreter would know that it cannot execute this script. If indeed the Perl interpreter is mentioned in the shebang line, it reads the rest of the script file and executes it.

How to know what script header to use and why it matters?

I infrequently have to write bash scripts for various unrelated purposes and while I usually have a good idea what commands I want in the script, I often have no idea what header to use or why I'm using one when I do find it. For example(s):
Standard shell script:
#!/bin/bash
Python:
#!/usr/bin/env python
Scripts seem to work fine without headers but if headers are the standard, there's a reason for them and they shouldn't be ignored. If it has an effect, then it's a valuable tool that could be used to accomplish more.
Minimally, I'd like to know what headers to use with MySQL scripts and what the headers do on Standard, Python, and MySQL scripts. Ideally, I'd like a generic list of headers or an understanding of how to create a header based on what program is being used.
How the Kernel Executes Things
Simplified (a bit), there are two ways the kernel in a POSIX system knows how to execute a program. One, if the program is in a binary format the kernel understands (such as ELF), the kernel can execute it "directly" (more detail out of scope). If the program is a text file starting with a shebang, such as
#!/usr/bin/somebinary -arg
or what-have-you, the kernel actually executes the command as if it had been directed to execute:
/usr/bin/somebinary -arg "$0"
where $0 here is the name of the script file you just tried to execute. (So you can immediately tell why so many scripting languages use # as a comment-starter – it means they don't have to treat the shebang as special.)
PATH and the env command
The kernel does not look at the PATH environment variable to determine which executable you're talking about, so if you are distributing a python script to systems that may have multiple versions of python installed, you can't guarantee that there will be a
#!/usr/bin/python
env, however, is POSIX, so you can count on it existing, and it will look up python in PATH. Thus,
#!/usr/bin/env python
will execute the script with the first python found in your PATH.
BASH, SH and Special Meanings for Invocation
Some programs have special semantics for how they're invoked. In particular, on many systems /bin/sh is a symlink to another shell, such as /bin/bash. While bash does not contain a perfectly POSIXLY_STRICT implementation of sh, when it is invoked as /bin/sh it is stricter than it would be if invoked as plain-old-bash.
MySQL and arg limitations
The shebang line can be length limited and technically, it can only support one argument, so mysql is a bit tricky – you can't expect to pass a username and database name to a mysql script.
#!/usr/bin/env mysql
use mydb;
select * from mytbl;
Will fail because the kernel will try mysql "$0". Even if you have your credentials in a .my.cnf file, mysql itself will try to treat "$0" as a database name. Likewise:
#!/usr/bin/mysql -e
use mydb;
select * from mytbl;
will fail because again, "$0" is not a table name (you hope).
There does not seem to be an appropriate syntax for directly executing a mysql script this way. Your best bet is to pipe the sql commands to mysql directly:
mysql < my_sql_commands
http://mywiki.wooledge.org/BashGuide/Practices#Choose_Your_Shell
When the first line of a script starts with #!, that's what's called a "shebang". When that script is run as an executable, the operating system uses that line to determine how to run the script -- that is to say, to find the program with which the script should be executed.
It's incorrect that "scripts work fine without headers" -- if you don't have a shebang line, you can't be invoked using the execve() call, which means that many (most?) programs won't be able to execute your script. Sometimes invocation from a shell will try to use that shell itself in the absence of a shebang, but you can't trust that to be the case.
(There's an exception to that -- if someone starts your script by running sh yourscript or bash yourscript, the shebang line isn't read at all, and the script they chose is used; however, running scripts this way is a bad practice, as the author typically knows better than the user what the correct interpreter is).
In short:
If you want to use modern features, and you want the user to be able to override the shell version in use by putting a different release of bash earlier in their path, use #!/usr/bin/env bash
If you want to use modern features and ensure that you always run with the system shell, use #!/bin/bash
If you're going to write your script to strictly conform with POSIX sh, use #!/bin/sh
There's not a limited list of shebang lines we can give you, since any native executable (non-script program) can be used as a script interpreter, and thus be placed in a shebang. If you created a file called myscript with #!/usr/bin/env yourprogram, gave it executable permissions, and ran ./myscript foo bar, this would result in /usr/bin/env yourprogram myscript foo bar being invoked; yourprogram would be run by /usr/bin/env (after a PATH lookup), and would be responsible for knowing what to do with myscript and its arguments.
For an extremely detailed history of shebang lines and how they work across systems both modern and ancient, see http://www.in-ulm.de/~mascheck/various/shebang/

Shell commands are written in what language?

There are many shell commands, like
ls, cd, cat etc.
what programming language is used in writing these commands? How are they compiled?
My understanding:
Shell is a program which takes command; ** does this mean that it interprets those commands(like ls is interpreted by shell program)?**
One more question, what language is Shell program written in?
Most of the basic utilities in linux are written in C .This u can verify in busybox source code which supports most of basic linux command utility which are written in C.
So command like ls,cd ...etc are in c
How shell will interpret check in below link
in an operating system there is a special program called the shell. The shell accepts human readable commands and translates them into something the kernel can read and process.
http://www.math.iitb.ac.in/resources/manuals/Unix_Unleashed/Vol_1/ch08.htm
These programs are mainly written in the C programming language as is the linux kernel.
The programs are ordinary executable written in any language (mostly C).
The shell takes a command entered which is just a string. It then looks for certain sequences of characters which have special meaning to the shell such as environmental variables which are $ followed by a word or redirects which are > followed by a path. After this substitution has been preformed it has a string which is split on spaces to generate a name of an executable and parameters. The shell will then search for the executable in the list of directory's in the environmental variable PATH. The shell then uses system calls to create a process from the executable with the parameters.
For example to execute the command ls $HOME the shell would first recognize that $HOME is an environmental variable and substitute it for its value in this case /home/user leaving the command ls /home/user. It then splits the command to on the space to get the executable name ls and parameter /home/user. The shell finds the first executable that matches ls usually /bin/ls. It then uses ether the spawn()/ posix_spawn() or fork() and exec() system calls to create the new process.

In bash, what does dot command ampersand do?

I'm trying to understand a bash script I'm supposed to be maintaining and got stuck. The command is of this form:
. $APP_LOCATION/somescript.sh param1 param2 &
The line is not being called in a loop, not is any return code bening sent back to the calling script from somescript.sh
I know that the "." will make the process run in the same shell. But "&" will spawn off a different process.
That sounds contradictory. What's is really happening here? Any ideas?
The script is running in a background process, but it is a subshell, not a separately-invoked interpreter as it would be without the dot.
That is to say -- the current interpreter forks and then begins running the command (sourcing the script). As such, it inherits shell variables, not just environment variables.
Otherwise the new script's interpreter would be invoked via an execv() call, which would replace the current interpreter with a new one. That's usually the right thing, because it provides more flexibility -- you can't run anything but a script written for the same shell with . or source, after all, whereas starting a new interpreter means that your other script could be rewritten in Python, Perl, a compiled binary, etc without its callers needing to change.
(This is part of why scripts intended to be exec'd, as opposed to than libraries meant to be sourced, should not have filename extensions -- and part of why bash libraries should be .bash, not .sh, such that inaccurate information isn't provided about what kind of interpreter they can be sourced into).
TL;DR
. $APP_LOCATION/somescript.sh param1 param2 &
This sources a script as a background job in the current shell.
Sourcing a Script
In Bash, using . is equivalent to the [source builtin]. The help for the source builtin says (in part):
$ help source
source: source filename [arguments]
Execute commands from a file in the current shell.
In other words, it reads in your Bash script and evaluates it in the current shell rather than in a sub-shell. This is often important to give a script access to unexported variables.
Background Jobs
The ampersand executes the script in the background using job control. In this case, while the sourced script is evaluated in the context of the current shell, it is executed in a separate process that can be managed using job control builtins.

Why do you need to put #!/bin/bash at the beginning of a script file?

I have made Bash scripts before and they all ran fine without #!/bin/bash at the beginning.
What's the point of putting it in? Would things be any different?
Also, how do you pronounce #? I know that ! is pronounced as "bang."
How is #! pronounced?
It's a convention so the *nix shell knows what kind of interpreter to run.
For example, older flavors of ATT defaulted to sh (the Bourne shell), while older versions of BSD defaulted to csh (the C shell).
Even today (where most systems run bash, the "Bourne Again Shell"), scripts can be in bash, python, perl, ruby, PHP, etc, etc. For example, you might see #!/bin/perl or #!/bin/perl5.
PS:
The exclamation mark (!) is affectionately called "bang". The shell comment symbol (#) is sometimes called "hash".
PPS:
Remember - under *nix, associating a suffix with a file type is merely a convention, not a "rule". An executable can be a binary program, any one of a million script types and other things as well. Hence the need for #!/bin/bash.
To be more precise the shebang #!, when it is the first two bytes of an executable (x mode) file, is interpreted by the execve(2) system call (which execute programs). But POSIX specification for execve don't mention the shebang.
It must be followed by a file path of an interpreter executable (which BTW could even be relative, but most often is absolute).
A nice trick (or perhaps not so nice one) to find an interpreter (e.g. python) in the user's $PATH is to use the env program (always at /usr/bin/env on all Linux) like e.g.
#!/usr/bin/env python
Any ELF executable can be an interpreter. You could even use #!/bin/cat or #!/bin/true if you wanted to! (but that would be often useless)
It's called a shebang. In unix-speak, # is called sharp (like in music) or hash (like hashtags on twitter), and ! is called bang. (You can actually reference your previous shell command with !!, called bang-bang). So when put together, you get haSH-BANG, or shebang.
The part after the #! tells Unix what program to use to run it. If it isn't specified, it will try with bash (or sh, or zsh, or whatever your $SHELL variable is) but if it's there it will use that program. Plus, # is a comment in most languages, so the line gets ignored in the subsequent execution.
Every distribution has a default shell. Bash is the default on the majority of the systems. If you happen to work on a system that has a different default shell, then the scripts might not work as intended if they are written specific for Bash.
Bash has evolved over the years taking code from ksh and sh.
Adding #!/bin/bash as the first line of your script, tells the OS to invoke the specified shell to execute the commands that follow in the script.
#! is often referred to as a "hash-bang", "she-bang" or "sha-bang".
The shebang is a directive to the loader to use the program which is specified after the #! as the interpreter for the file in question when you try to execute it. So, if you try to run a file called foo.sh which has #!/bin/bash at the top, the actual command that runs is /bin/bash foo.sh. This is a flexible way of using different interpreters for different programs. This is something implemented at the system level and the user level API is the shebang convention.
It's also worth knowing that the shebang is a magic number - a human readable one that identifies the file as a script for the given interpreter.
Your point about it "working" even without the shebang is only because the program in question is a shell script written for the same shell as the one you are using. For example, you could very well write a javascript file and then put a #! /usr/bin/js (or something similar) to have a javascript "Shell script".
The operating system takes default shell to run your shell script. so mentioning shell path at the beginning of script, you are asking the OS to use that particular shell. It is also useful for portability.
It is called a shebang. It consists of a number sign and an exclamation point character (#!), followed by the full path to the interpreter such as /bin/bash. All scripts under UNIX and Linux execute using the interpreter specified on a first line.
Bash standards for “Bourne-Again shell” is just one type of many available
shells in Linux.
A shell is a command line interpreter that accepts and runs commands.
Bash is often the default shell in most Linux distributions. This is why bash is
synonymous to shell.
The shell scripts often have almost the same syntaxes, but they also differ sometimes. For example, array index starts at 1 in Zsh instead of 0 in bash. A script
written for Zsh shell won’t work the same in bash if it has arrays.
To avoid unpleasant surprises, you should tell the interpreter that your shell script
is written for bash shell. How do you do that?
simply begin your bash script into #!/bin/bash
Also you will see some other parameters after #!/bin/bash,
for example
#!/bin/bash -v -x
read this to get more idea.
https://unix.stackexchange.com/questions/124272/what-do-the-arguments-v-and-x-mean-to-bash .
It can be useful to someone that uses a different system that does not have that library readily available. If that is not declared and you have some functions in your script that are not supported by that system, you should declare #/bin/bash. I've ran into this problem before at work and now I just include it as a practice.

Resources