Why are some Bash commands both built-in and external? - linux

Some commands are internal built-in Bash commands while others are external (other programs). I see why certain commands need to be built-in. Some of the reasons are:
If a command needs to change the internal state of the shell process.
If a command performs a very basic operation in the shell.
If a command is called often and needs to be made fast. An external command is executed by loading an external program and hence is slower.
But why are some commands both built-in and external, for example echo and test? I understand echo is used a lot and thus is built-in (Reason 3). But then why also have it as an external command and have a binary for it in /bin/echo? The built-in version of echo will always take precedence over the external version and thus, the external version is hardly ever used. So, why then have an external version of it at all?

It's exactly your point 3. When a command does very little (echo is a good example), spawning a new process dominates the run time behavior. With growing disks and bandwidth and code bases you always reach a spot when you have so much data and so many files (our code base at work has 100k files!!) that one less spawn per file makes a difference of minutes.
That's also why the typical built-in is a drop-in replacement which takes (perhaps a superset of) the same arguments as the binary.
You also ask why the old binary is still retained even though Bash has it as a built-in — the answer is that a lot of programs rely on the existence of that /bin/echo. It's actually standardized.
Bash is only one of many user interfaces and offline command interpreters. They all have different sets of built-ins. Some shells are purposefully small and rely a lot on what you could call "legacy" binaries. One example is ash and its successor, Dash. Dash is now the default /bin/sh in Ubuntu and Debian due to its speed, and is popular for embedded systems due to its small size. (But even Dash has builtins for echo, test and dozens of other commands, and provides a command history for interactive use.)

Related

What are the differences between running a process with and without a shell?

In the Node.js docs for child_process, I came upon this line:
Since a shell is not spawned, behaviors such as I/O redirection and file globbing are not supported.
That’s good to know, but the “such as” worries me. What other behaviors are missing? What even counts as running without a shell — isn’t sh/cmd.exe still parsing the command-line input?
It looks like my initial assumption was flawed: neither the Bourne shell (sh) nor Windows’s cmd.exe parses the command when invoked without a shell — it’s up to the consuming application to do so.
The most significant lost feature is the lack of filename expansion. Unless the called application binary understands UNIX file syntax, you can’t do most relative file path tricks:
No ../ for a parent directory
No ./ for the current working directory
No ~ for the user directory
Other shell syntax characters like redirection operators (> for example) and pipes (|) are unsupported.
Basically, if it’s listed as a feature on the shell’s wikipedia page, non-shell execution won’t have it.

Sensitive Data in Command Line Interfaces

I know it's frowned upon to use passwords in command line interfaces like in this example:
./commandforsomething -u username -p plaintextpassword
My understanding that the reason for that (in unix systems at least) is because it'll be able to be read in the scrollback as well as the .bash_history file (or whatever flavor shell you use).
HOWEVER, I was wondering if it was safe to use that sort of interface with sensitive data programatically while programming things. For example, in perl, you can execute a command using two ``, the exec command, or system command (I'm not 100% sure on the differences between these apart from the return value from the two backticks being the output of the executed command versus the return value... but that's a question for another post I guess).
So, my question is this: Is it safe to do things LIKE
system("command", "userarg", "passwordarg");
as it essentially does the same thing, just without getting posted in scrollback or history? (note that I only use perl as an example - I don't care about the answer specific to perl but instead the generally accepted principle).
It's not only about shell history.
ps shows all arguments passed to the program. The reason why passing arguments like this is bad is that you could potentially see other users' passwords by just looping around and executing ps. The cited code won't change much, as it essentially does the same.
You can try to pass some secrets via environment, since if the user doesn't have an access to the given process, the environment won't be shown. This is better, but is a pretty bad solution too (e.g.: in case program fails and dumps a core, all passwords will get written to disk).
If you use environment variables, use ps -E which will show you environment variables of the process. Use it as a different users than the one executing the program. Basically simulate the "attacker" and see if you can snoop the password. On a properly configured system you shouldn't be able to do it.

Which shell should be used for Linux/MacOS/UNIX best compatibility?

My question may seem related to SO question "What Linux shell should I use?", but my problem is to know which shell shall be used to write an application start script, knowing that this is a cross-platform Java application (almost all Linux distributions, MacOS, Solaris, ...). So i'm adding compatibility concerns here.
Please note that i'm not asking "which is the best shell to use" in general (which may have no sense in my opinion: subjective, depends on needs), but I'd like to know, which shell has the best chance, today, to be available (and suitable for Java application start) on most operating systems.
Also, may I simply have to use the shebang #!/bin/bash to "use bash"? (or for example #!/bin/ksh for Korn shell). What if this shell is not available on this OS?
We're actually using a ".sh" file with the shebang #!/bin/sh (which is Bourne shell I guess) but some users are complaining about errors on some Linux distributions (we don't know yet which one they use, but we would like to have a more global approach instead of fixing errors one by one). MacOS is currently using bash as the default shell but at this time we don't have any issue on MacOS using /bin/sh...
Note: we'd like to avoid having several start scripts (i.e. using different shells)
For maximum portability, your best bet is /bin/sh using only POSIX sh features (no extensions). Any other shell you pick might not be installed on some system (BSDs rarely have bash, while Linux rarely has ksh).
The problem you can run into is that frequently, /bin/sh is not actually Bourne sh or a strictly POSIX sh -- it's frequently just a link for /bin/bash or /bin/ksh that runs that other shell in sh-compatibility mode. That means that while any POSIX sh script should run fine, there will also be extensions supported that will cause things that are illegal per POSIX to run as well. So you might have a script that you think is fine (runs fine when you test it), but its actually depending on some bash or ksh extension that other shells don't support.
You can try running your script with multiple shells in POSIX compatibility mode (try say, bash, ksh, and dash) and make sure it runs on all of them and you're not accidentally using some extension that only one supports.
You won't find a shell implementation that will be installed on every of these OSes, however, all of them are either POSIX compliant or more or less close to being compliant.
You should then restrict your shell scripts to stick to the POSIX standard as far as possible.
However, there is no simple way to tell a script is to be executed in a POSIX context, and in particular to specify what shebang to set. I would suggest to use a postinstaller script that would insert the correct shebang on the target platform retrieved using this command:
#!/bin/sh
printf "#!%s\n" `PATH=\`getconf PATH\` command -v sh`
You scripts should also include this instruction once and before calling any external command:
export PATH=$(getconf PATH):$PATH
to make sure the utilities called are the POSIX ones. Moreover, beware that some Unix implementations might require an environment variable to be set for them to behave a POSIX way (eg BIN_SH=xpg4 is required on Tru64/OSF1, XPG_SUS_ENV=ON on AIX, ...).
To develop your script, I would recommend to use a shell that has the less extensions to the standard, like dash. That would help to quickly detect errors caused by bashisms (or kshisms or whatever).
PS: beware that despite popular belief, /bin/sh is not guaranteed to be POSIX compliant even on a POSIX compliant OS.

Can you load a tree structure in memory with Linux shell?

I want to create an application with a Linux shell script like this — but can it be done?
This application will create a tree containing data. The tree should be loaded in the memory. The tree (loaded in memory) could be readable from any other external Linux script.
Is it possible to do it with a Linux shell?
If yes, how can you do it?
And are there any simple examples for that?
There are a large number of misconceptions on display in the question.
Each process normally has its own memory; there's no trivial way to load 'the tree' into one process's memory and make it available to all other processes. You might devise a system of related programs that know about a shared memory segment (somehow — there's a problem right there) that contains the tree, but that's about it. They'd be special programs, not general shell scripts. That doesn't meet your 'any other external Linux script' requirement.
What you're seeking is simply not available in the Linux shell infrastructure. That answers your first question; the other two are moot given the answer to the first.
There is a related discussion here. They use shared memory device /dev/shm and, ostensibly, it works for multiple users. At least, it's worth a try:
http://www.linuxquestions.org/questions/linux-newbie-8/bash-is-it-possible-to-write-to-memory-rather-than-a-file-671891/
Edit: just tried it with two users on Ubuntu - looks like a normal directory and REALLY WORKS with the right chmod.
See also:
http://www.cyberciti.biz/tips/what-is-devshm-and-its-practical-usage.html
I don't think there is a way to do this as if you want to keep all the requirements of:
Building this as a shell script
In-memory
Usable across terminals / from external scripts
You would have to give up at least one requirement:
Give up shell script req - Build this in C to run as a Linux process. I only understand this up to the point to say that it would be non-trivial
Give up in-memory req - You can serialize the tree and keep the data in a temp file. This works as long as the file is small and performance bottleneck isn't around access to the tree. The good news is you can use the data across terminals / from external scripts
Give up usability from external scripts req - You can technically build a script and run it by sourcing it to add many (read: a mess of) variables representing the tree into your current shell session.
None of these alternatives are great, but if you had to go with one, number 2 is probably the least problematic.

TCL, Linux, and FLOCK

So I am working with a program written in TCL that uses the FLOCK function to lock files. I am testing it on a newer version of Linux than the one it currently runs on and I found that when the newer machine runs the script, it uses FLOCK from /usr/bin/flock, which differs from the TCL version of FLOCK. The TCL version uses -read -write and such, while the Linux version uses completely different options.
In short, the program stops working and errors out when it gets to any FLOCK call. If I change the options to fit the Linux version, it breaks the program on the other machines.
Is there a way to make it use the TCL version as opposed to the Linux one?
Tcl itself does not come with a flock command, though you might be seeing it automatically trying to use the system command if you're testing interactively. Such automated use of system commands is not done in scripts (that would be hellishly prone to instability due to varying PATHs) so when writing a script you should be explicit as to what you mean.
If you want to use the system command (itself non-portable, especially to non-Linux systems) then just do:
exec flock $options...
Be aware that Tcl uses a different form of argument quoting to the shell. This can sometimes catch people out when writing exec calls.
Alternatively, use the flock Tcl command that is in the TclX package. The syntax is a little different to that of the Linux system utility, in large part because it's a bit lower-level. In its favor, it is rather more portable.

Resources