Is chaining interpreters via shebang lines portable? - linux

Tying a script to a specific interpreter via a so-called shebang line is a well-known practice on POSIX operating systems. For example, if the following script is executed (given sufficient file-system permissions), the operating system will launch the /bin/sh interpreter with the file name of the script as its first argument. Subsequently, the shell will execute the commands in the script skipping over the shebang line which it will treat as a comment.
#! /bin/sh
date -R
echo hello world
Possible output:
Sat, 01 Apr 2017 12:34:56 +0100
hello world
I used to believe that the interpreter (/bin/sh in this example) must be a native executable and cannot be a script itself that, in turn, would require yet another interpreter to be launched.
However, I went ahead and tried the following experiment nonetheless.
Using the following dumb shell saved as /tmp/interpreter.py, …
#! /usr/bin/python3
import sys
import subprocess
for script in sys.argv[1:]:
with open(script) as istr:
status = any(
map(
subprocess.call,
map(
str.split,
filter(
lambda s : s and not s.startswith('#'),
map(str.strip, istr)
)
)
)
)
if status:
sys.exit(status)
… and the following script saved as /tmp/script.xyz,
#! /tmp/interpreter.py
date -R
echo hello world
… I was able (after making both files executable), to execute script.xyz.
5gon12eder:/tmp> ls -l
total 8
-rwxr-x--- 1 5gon12eder 5gon12eder 493 Jun 19 01:01 interpreter.py
-rwxr-x--- 1 5gon12eder 5gon12eder 70 Jun 19 01:02 script.xyz
5gon12eder:/tmp> ./script.xyz
Mon, 19 Jun 2017 01:07:19 +0200
hello world
This surprised me. I was even able to launch scrip.xyz via another script.
So, what I am asking is this:
Is the behavior observed by my experiment portable?
Was the experiment even conducted correctly or are there situations where this doesn't work? How about different (Unix-like) operating systems?
If this is supposed to work, is it true that there is no observable difference between a native executable and an interpreted script as far as invocation is concerned?

New executables in Unix-like operating systems are started by the system call execve(2). The man page for execve includes:
Interpreter scripts
An interpreter script is a text file that has execute
permission enabled and whose first line is of the form:
#! interpreter [optional-arg]
The interpreter must be a valid pathname for an executable which
is not itself a script. If the filename argument of execve()
specifies an interpreter script, then interpreter will be invoked
with the following arguments:
interpreter [optional-arg] filename arg...
where arg... is the series of words pointed to by the argv
argument of execve().
For portable use, optional-arg should either be absent, or be
specified as a single word (i.e., it should not contain white
space); see NOTES below.
So within those contraints (Unix-like, optional-arg at most one word), yes, shebang scripts are portable. Read the man page for more details, including other differences in invocation between binary executables and scripts.

See boldfaced text below:
This mechanism allows scripts to be used in virtually any context
normal compiled programs can be, including as full system programs,
and even as interpreters of other scripts. As a caveat, though, some
early versions of kernel support limited the length of the interpreter
directive to roughly 32 characters (just 16 in its first
implementation), would fail to split the interpreter name from any
parameters in the directive, or had other quirks. Additionally, some
modern systems allow the entire mechanism to be constrained or
disabled for security purposes (for example, set-user-id support has
been disabled for scripts on many systems). -- WP
And this output from COLUMNS=75 man execve | grep -nA 23 "
Interpreter scripts" | head -39 on a Ubuntu 17.04 box,
particularly lines #186-#189 which tells us what works on Linux, (i.e. scripts can be interpreters, up to four levels deep):
166: Interpreter scripts
167- An interpreter script is a text file that has execute permission
168- enabled and whose first line is of the form:
169-
170- #! interpreter [optional-arg]
171-
172- The interpreter must be a valid pathname for an executable file.
173- If the filename argument of execve() specifies an interpreter
174- script, then interpreter will be invoked with the following argu‐
175- ments:
176-
177- interpreter [optional-arg] filename arg...
178-
179- where arg... is the series of words pointed to by the argv argu‐
180- ment of execve(), starting at argv[1].
181-
182- For portable use, optional-arg should either be absent, or be
183- specified as a single word (i.e., it should not contain white
184- space); see NOTES below.
185-
186- Since Linux 2.6.28, the kernel permits the interpreter of a script
187- to itself be a script. This permission is recursive, up to a
188- limit of four recursions, so that the interpreter may be a script
189- which is interpreted by a script, and so on.
--
343: Interpreter scripts
344- A maximum line length of 127 characters is allowed for the first
345- line in an interpreter scripts.
346-
347- The semantics of the optional-arg argument of an interpreter
348- script vary across implementations. On Linux, the entire string
349- following the interpreter name is passed as a single argument to
350- the interpreter, and this string can include white space. How‐
351- ever, behavior differs on some other systems. Some systems use
352- the first white space to terminate optional-arg. On some systems,
353- an interpreter script can have multiple arguments, and white spa‐
354- ces in optional-arg are used to delimit the arguments.
355-
356- Linux ignores the set-user-ID and set-group-ID bits on scripts.

From Solaris 11 exec(2) man page:
An interpreter file begins with a line of the form
#! pathname [arg]
where pathname is the path of the interpreter, and arg is an
optional argument. When an interpreter file is executed, the
system invokes the specified interpreter. The pathname
specified in the interpreter file is passed as arg0 to the
interpreter. If arg was specified in the interpreter file,
it is passed as arg1 to the interpreter. The remaining
arguments to the interpreter are arg0 through argn of the
originally exec'd file. The interpreter named by pathname
must not be an interpreter file.
As stated by the last statement, chaining interpreters is not supported at all in Solaris, trying to do that will result in the last non-interpreted interpreter (such as /usr/bin/python3) to interpret the first script (such as /tmp/script.xyz, the final command line would become /usr/bin/python3 /tmp/script.xyz), without chaining.
So doing script interpreter chaining is not portable at all.

Related

can we pass options to tcl source command in tcl 8.5

I am using this command to source get.tcl file and giving options 'verbose' and 'instant':
source -verbose -instant get.tcl
the above command worked for me in tcl 8.4 but showing this error in tcl 8.5
source (script wrong # args: should be "source_orig ?-encoding name?
fileName"
if I write only
source get.tcl
It get passed in tcl 8.5
Is there any change related to this in tcl 8.5?
The source command only accepts one option (since 8.5), -encoding, which is used to specify what encoding the file being read is in (instead of the default guess of encoding as returned by encoding system). All it does is read the file into memory and (internally-equivalent-to-) eval the contents.
You can write to any variable you want prior to doing the source, including global variables like argv. With that (plus appropriate use of uplevel and catch, as required, and maybe also interp create) you can simulate running the script as a subprocess. But it's probably easier to not have the file expect to be handling arguments like that, and instead for it to define a command that you call immediately after the sourcing.
You can pass arguments to your sourced file by doing the following:
set ::argv [list -verbose -instant]
source get.tcl
I recommend using:
set ::argv [list -- -verbose -instant]
The -- will stop tclsh from processing any arguments after the --.
Sometimes tclsh will recognize an argument that is meant for your
program and process it. Your programs will need to know about
the -- and handle it appropriately.

How to get the complete calling command of a BASH script from inside the script (not just the arguments)

I have a BASH script that has a long set of arguments and two ways of calling it:
my_script --option1 value --option2 value ... etc
or
my_script val1 val2 val3 ..... valn
This script in turn compiles and runs a large FORTRAN code suite that eventually produces a netcdf file as output. I already have all the metadata in the netcdf output global attributes, but it would be really nice to also include the full run command one used to create that experiment. Thus another user who receives the netcdf file could simply reenter the run command to rerun the experiment, without having to piece together all the options.
So that is a long way of saying, in my BASH script, how do I get the last command entered from the parent shell and put it in a variable? i.e. the script is asking "how was I called?"
I could try to piece it together from the option list, but the very long option list and two interface methods would make this long and arduous, and I am sure there is a simple way.
I found this helpful page:
BASH: echoing the last command run
but this only seems to work to get the last command executed within the script itself. The asker also refers to use of history, but the answers seem to imply that the history will only contain the command after the programme has completed.
Many thanks if any of you have any idea.
You can try the following:
myInvocation="$(printf %q "$BASH_SOURCE")$((($#)) && printf ' %q' "$#")"
$BASH_SOURCE refers to the running script (as invoked), and $# is the array of arguments; (($#)) && ensures that the following printf command is only executed if at least 1 argument was passed; printf %q is explained below.
While this won't always be a verbatim copy of your command line, it'll be equivalent - the string you get is reusable as a shell command.
chepner points out in a comment that this approach will only capture what the original arguments were ultimately expanded to:
For instance, if the original command was my_script $USER "$(date +%s)", $myInvocation will not reflect these arguments as-is, but will rather contain what the shell expanded them to; e.g., my_script jdoe 1460644812
chepner also points that out that getting the actual raw command line as received by the parent process will be (next to) impossible. Do tell me if you know of a way.
However, if you're prepared to ask users to do extra work when invoking your script or you can get them to invoke your script through an alias you define - which is obviously tricky - there is a solution; see bottom.
Note that use of printf %q is crucial to preserving the boundaries between arguments - if your original arguments had embedded spaces, something like $0 $* would result in a different command.
printf %q also protects against other shell metacharacters (e.g., |) embedded in arguments.
printf %q quotes the given argument for reuse as a single argument in a shell command, applying the necessary quoting; e.g.:
$ printf %q 'a |b'
a\ \|b
a\ \|b is equivalent to single-quoted string 'a |b' from the shell's perspective, but this example shows how the resulting representation is not necessarily the same as the input representation.
Incidentally, ksh and zsh also support printf %q, and ksh actually outputs 'a |b' in this case.
If you're prepared to modify how your script is invoked, you can pass $BASH_COMMANDas an extra argument: $BASH_COMMAND contains the raw[1]
command line of the currently executing command.
For simplicity of processing inside the script, pass it as the first argument (note that the double quotes are required to preserve the value as a single argument):
my_script "$BASH_COMMAND" --option1 value --option2
Inside your script:
# The *first* argument is what "$BASH_COMMAND" expanded to,
# i.e., the entire (alias-expanded) command line.
myInvocation=$1 # Save the command line in a variable...
shift # ... and remove it from "$#".
# Now process "$#", as you normally would.
Unfortunately, there are only two options when it comes to ensuring that your script is invoked this way, and they're both suboptimal:
The end user has to invoke the script this way - which is obviously tricky and fragile (you could however, check in your script whether the first argument contains the script name and error out, if not).
Alternatively, provide an alias that wraps the passing of $BASH_COMMAND as follows:
alias my_script='/path/to/my_script "$BASH_COMMAND"'
The tricky part is that this alias must be defined in all end users' shell initialization files to ensure that it's available.
Also, inside your script, you'd have to do extra work to re-transform the alias-expanded version of the command line into its aliased form:
# The *first* argument is what "$BASH_COMMAND" expanded to,
# i.e., the entire (alias-expanded) command line.
# Here we also re-transform the alias-expanded command line to
# its original aliased form, by replacing everything up to and including
# "$BASH_COMMMAND" with the alias name.
myInvocation=$(sed 's/^.* "\$BASH_COMMAND"/my_script/' <<<"$1")
shift # Remove the first argument from "$#".
# Now process "$#", as you normally would.
Sadly, wrapping the invocation via a script or function is not an option, because the $BASH_COMMAND truly only ever reports the current command's command line, which in the case of a script or function wrapper would be the line inside that wrapper.
[1] The only thing that gets expanded are aliases, so if you invoked your script via an alias, you'll still see the underlying script in $BASH_COMMAND, but that's generally desirable, given that aliases are user-specific.
All other arguments and even input/output redirections, including process substitutiions <(...) are reflected as-is.
"$0" contains the script's name, "$#" contains the parameters.
Do you mean something like echo $0 $*?

Bash script execution with and without shebang in Linux and BSD

How and who determines what executes when a Bash-like script is executed as a binary without a shebang?
I guess that running a normal script with shebang is handled with binfmt_script Linux module, which checks a shebang, parses command line and runs designated script interpreter.
But what happens when someone runs a script without a shebang? I've tested the direct execv approach and found out that there's no kernel magic in there - i.e. a file like that:
$ cat target-script
echo Hello
echo "bash: $BASH_VERSION"
echo "zsh: $ZSH_VERSION"
Running compiled C program that does just an execv call yields:
$ cat test-runner.c
void main() {
if (execv("./target-script", 0) == -1)
perror();
}
$ ./test-runner
./target-script: Exec format error
However, if I do the same thing from another shell script, it runs the target script using the same shell interpreter as the original one:
$ cat test-runner.bash
#!/bin/bash
./target-script
$ ./test-runner.bash
Hello
bash: 4.1.0(1)-release
zsh:
If I do the same trick with other shells (for example, Debian's default sh - /bin/dash), it also works:
$ cat test-runner.dash
#!/bin/dash
./target-script
$ ./test-runner.dash
Hello
bash:
zsh:
Mysteriously, it doesn't quite work as expected with zsh and doesn't follow the general scheme. Looks like zsh executed /bin/sh on such files after all:
greycat#burrow-debian ~/z/test-runner $ cat test-runner.zsh
#!/bin/zsh
echo ZSH_VERSION=$ZSH_VERSION
./target-script
greycat#burrow-debian ~/z/test-runner $ ./test-runner.zsh
ZSH_VERSION=4.3.10
Hello
bash:
zsh:
Note that ZSH_VERSION in parent script worked, while ZSH_VERSION in child didn't!
How does a shell (Bash, dash) determines what gets executed when there's no shebang? I've tried to dig up that place in Bash/dash sources, but, alas, looks like I'm kind of lost in there. Can anyone shed some light on the magic that determines whether the target file without shebang should be executed as script or as a binary in Bash/dash? Or may be there is some sort of interaction with kernel / libc and then I'd welcome explanations on how does it work in Linux and FreeBSD kernels / libcs?
Since this happens in dash and dash is simpler, I looked there first.
Seems like exec.c is the place to look, and the relevant functionis are tryexec, which is called from shellexec which is called whenever the shell things a command needs to be executed. And (a simplified version of) the tryexec function is as follows:
STATIC void
tryexec(char *cmd, char **argv, char **envp)
{
char *const path_bshell = _PATH_BSHELL;
repeat:
execve(cmd, argv, envp);
if (cmd != path_bshell && errno == ENOEXEC) {
*argv-- = cmd;
*argv = cmd = path_bshell;
goto repeat;
}
}
So, it simply always replaces the command to execute with the path to itself (_PATH_BSHELL defaults to "/bin/sh") if ENOEXEC occurs. There's really no magic here.
I find that FreeBSD exhibits identical behavior in bash and in its own sh.
The way bash handles this is similar but much more complicated. If you want to look in to it further I recommend reading bash's execute_command.c and looking specifically at execute_shell_script and then shell_execve. The comments are quite descriptive.
(Looks like Sorpigal has covered it but I've already typed this up and it may be of interest.)
According to Section 3.16 of the Unix FAQ, the shell first looks at the magic number (first two bytes of the file). Some numbers indicate a binary executable; #! indicates that the rest of the line should be interpreted as a shebang. Otherwise, the shell tries to run it as a shell script.
Additionally, it seems that csh looks at the first byte, and if it's #, it'll try to run it as a csh script.

Running Scheme from the command line

How do you run Scheme programs from the terminal in linux(ubuntu)? Also how to accept arguments from the command-line in a Scheme program?
Edit: Im using the DrScheme implementation.
The DrScheme scheme implementation, and the name you use to execute it from the command line, is mzscheme. The documentation for starting a command line script is found here: Unix Scripts (PLT Scheme documentation). Use of the command line args is explained here: Command-line Parsing (PLT Scheme Documentation).
The upshot is that you can use shebang scripts like this:
#! /usr/bin/env mzscheme
#lang scheme/base
(...scheme s-exps...)
or if you want more control over the command line flags for mzscheme, you need to start the script like this:
#! /bin/sh
#|
exec mzscheme -cu "$0" ${1+"$#"}
|#
#lang scheme/base
(...scheme s-exps...)
The function you use to process command line args is command-line. You will find examples of how to use it in the article linked to by the second link.
It is not standardized in the R6RS. There is a recommendation SRFI-22, which some interpreters support. If your interpreter does not support SRFI-22 then it depends on your implementation.
Below is an example from the SRFI. It assumes your interpreter is a binary named scheme-r5rs. Basically it calls a function named main with a single arg that is a list of command line args.
#! /usr/bin/env scheme-r5rs
(define (main arguments)
(for-each display-file (cdr arguments))
0)
(define (display-file filename)
(call-with-input-file filename
(lambda (port)
(let loop ()
(let ((thing (read-char port)))
(if (not (eof-object? thing))
(begin
(write-char thing)
(loop))))))))
This solution works for me
#! /usr/bin/env guile
!#
(display "hello")
(newline)
Also how to accept arguments from the
command-line in a Scheme program?
The R6RS library defines a function called command-line which returns the list of the arguments (the first one being the name of the program). Not all implementations of Scheme implement R6RS though; your implementation might have some other function for this.
How do you run Scheme programs from
the terminal in linux(ubuntu)?
It depends on which implementation of Scheme you're using.

How do I increase the /proc/pid/cmdline 4096 byte limit?

For my Java apps with very long classpaths, I cannot see the main class specified near the end of the arg list when using ps. I think this stems from my Ubuntu system's size limit on /proc/pid/cmdline. How can I increase this limit?
For looking at Java processes jps is very useful.
This will give you the main class and jvm args:
jps -vl | grep <pid>
You can't change this dynamically, the limit is hard-coded in the kernel to PAGE_SIZE in fs/proc/base.c:
274 int res = 0;
275 unsigned int len;
276 struct mm_struct *mm = get_task_mm(task);
277 if (!mm)
278 goto out;
279 if (!mm->arg_end)
280 goto out_mm; /* Shh! No looking before we're done */
281
282 len = mm->arg_end - mm->arg_start;
283
284 if (len > PAGE_SIZE)
285 len = PAGE_SIZE;
286
287 res = access_process_vm(task, mm->arg_start, buffer, len, 0);
I temporarily get around the 4096 character command line argument limitation of ps (or rather /proc/PID/cmdline) is by using a small script to replace the java command.
During development, I always use an unpacked JDK version from SUN and never use the installed JRE or JDK of the OS no matter if Linux or Windows (eg. download the bin versus the rpm.bin).
I do not recommend changing the script for your default Java installation (e.g. because it might break updates or get overwritten or create problems or ...)
So assuming the java command is in /x/jdks/jdk1.6.0_16_x32/bin/java
first move the actual binary away:
mv /x/jdks/jdk1.6.0_16_x32/bin/java /x/jdks/jdk1.6.0_16_x32/bin/java.orig
then create a script /x/jdks/jdk1.6.0_16_x32/bin/java like e.g.:
#!/bin/bash
echo "$#" > /tmp/java.$$.cmdline
/x/jdks/jdk1.6.0_16_x32/bin/java.orig $#
and then make the script runnable
chmod a+x /x/jdks/jdk1.6.0_16_x32/bin/java
in case of copy and pasting the above, you should make sure that there are not extra spaces in /x/jdks/jdk1.6.0_16_x32/bin/java and #!/bin/bash is the first line
The complete command line ends up in e.g. /tmp/java.26835.cmdline where 26835 is the PID of the shell script.
I think there is also some shell limit on the number of command line arguments, cannot remember but it was possibly 64K characters.
you can change the script to remove the command line text from /tmp/java.PROCESS_ID.cmdline
at the end
After I got the commandline, I always move the script to something like "java.script" and copy (cp -a) the actual binary java.orig back to java. I only use the script when I hit the 4K limit.
There might be problems with escaped characters and maybe even spaces in paths or such, but it works fine for me.
You can use jconsole to get access to the original command line without all the length limits.
It is possible to use newer linux distributions, where this limit was removed, for example RHEL 6.8 or later
"The /proc/pid/cmdline file length limit for the ps command was previously hard-coded in the kernel to 4096 characters. This update makes sure the length of /proc/pid/cmdline is unlimited, which is especially useful for listing processes with long command line arguments. (BZ#1100069)"
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/6.8_Release_Notes/new_features_kernel.html
For Java based programs where you are just interested in inspecting the command line args your main class got, you can run:
jps -m
I'm pretty sure that if you're actually seeing the arguments truncated in /proc/$pid/cmdline then you're actually exceeding the maximum argument length supported by the OS. As far as I can tell, in Linux, the size is limited to the memory page size. See "ps ww" length restriction for reference.
The only way to get around that would be to recompile the kernel. If you're interested in going that far to resolve this then you may find this post useful: "Argument list too long": Beyond Arguments and Limitations
Additional reference:
ARG_MAX, maximum length of arguments for a new process
Perhaps the 'w' parameter to ps is what you want. Add two 'w' for greater output. It tells ps to ignore the line width of the terminal.

Resources