What happens when you execute "ls" in Bash - linux

Can some one provide me a detailed description of what happens when you execute the "ls" command in linux. What system calls are used? What does the file system do? Obviously depending on which file system is used. Is someone can provide an in depth discussion on this topic or point me to some good resources that would be great! Thanks!

Bash as command interpreter checks whether there such special words in it's own language: shell keywords or shell built-ins.
ls isn't among shell keywords, then it checks aliases and replace alias with it's value, most likely there should be something like that: ls='ls --color=auto'
it looks for ls executable in paths specified by PATH env variable. Usually it's /bin/ls
It forks (fork()) new process and execs it's code (exec()). Process env is inherited from parent process to new "ls" process.
New process becomes session leader and starts working on foreground (bash is moved to background)
ls process loads shared libraries from LD_PATHs (ldd /bin/ls)
it executes lots of systemcalls, you can check by strace, and the main part I believe are openat() and getdents() first opens directory and second reads entries inside there.
prints output and exits, sends wait() signal and parent process bash terminates it completely.

current process (we call parent process, or parent) find ls in $PATH variable,eg /usr/bin/ls
parent(process) fork a child( process), and pass all enviroment, child process image is /usr/bin/ls
without arguments, so child find env PWD, eg /foo/bar and excute (/usr/bin/ls /foo/bar)
child process output , exit
parent become interactive again

Related

Bash built-in "history" can be executed, but is not in $PATH

I know shell builtins are loaded into memory and thought I could find all builtins in /usr/bin or somewhere in echo $PATH. I was trying to find out how the history command works. My assumption was that it is reading from ~/bash_history. So I tried objdump -S $(which history)
which history
echo $?
1
This did not return the path of the command which makes me where the binary for history is located.
type -t which
builtin
I assume this means that it is loaded onto memory. So does a shell process load builtins that are stored outside of echo $PATH
A shell builtin is literally built into the shell executable itself. It is invoked by the shell not as a separate process, but simply as a regular function call within the shell process. So if you want to find its source code, you need to look in the source code of bash.
For many builtins, such as cd, the main reason for it being a builtin is that it modifies the state of the shell itself. It would be pointless to have cd be a separate process, as that process would only change its own current directory, not that of the shell process. In the case of history, the reason is presumably that ~/.bash_history is only written when the shell exits, so the command also needs access to the in-memory history of the current session, which is contained within the running bash process. For other builtins, such as echo, the reason is performance: the command is assumed to be so frequently used that we want to avoid spawning a new process every time it is invoked (but if you really do want a process, there is also /bin/echo, which may behave differently).

Why calling a script by "scriptName" doesn't work?

I have a simple script cmakeclean to clean cmake temp files:
#!/bin/bash -f
rm CMakeCache.txt
rm *.cmake
which I call like
$ cmakeclean
And it does remove CMakeCache.txt, but it doesn't remove cmake_install.cmake:
rm: *.cmake: No such file or directory
When I run it like:
$ . cmakeclean
it does remove both.
What is the difference and can I make this script work like an usual linux command (without . in front)?
P.S.
I am sure the both times is same script is executed. To check this I added echo meme in the script and rerun it in both ways.
Remove the -f from your #!/bin/bash -f line.
-f prevents pathname expansion, which means that *.cmake will not match anything. When you run your script as a script, it interprets the shebang line, and in effect runs /bin/bash -f scriptname. When you run it as . scriptname, the shebang is just seen as a comment line and ignored, so the fact that you do not have -f set in your current environment allows it to work as expected.
. script is short for source script which means the current shell executes the commands in the script. If there's an exit in there, the current shell will exit (and e. g. the terminal window will close).
This is typically used to modify the environment of the current shell (set variables etc.).
script asks the shell to fork itself, then exec the given script in the child process, and then wait in the father for the termination of the child. If there's an exit in the script, this will be executed by the child shell and thus only terminate this. The father shell stays intact and unaltered by this call.
This is typically used to start other programs from the current shell.
Is this about ClearCase? What did you do in your poor life where you've been assigned to work in the deepest bowels of hell?
For years, I was a senior ClearCase Administer. I haven't touched it in over a decade. My life is way better now. The sky is bluer, bird songs are more melodious, and my dread over coming to work every day is now a bit less.
Getting back to your issue: It's hard to say exactly what's going on. ClearCase does some wacky things. In a dynamic view, the ClearCase repository on Unix systems is hidden in the shell's environment. Now you see it, now you don't.
When you run a shell script, it starts up a new environment. If a particular shell variable is not imported, it is invisible that shell script. When you merely run cmakeclean from the command line, you are spawning a new shell -- one that does not contain your ClearCase environment.
When you run a shell script with a dot prefix like . cmakeclean, you are running that shell script in the current shell which contains your ClearCase environment. Thus, it can see your ClearCase view.
If you're using a snapshot view, it is possible that you have a $HOME/.bashrc that's changing directories on you. When a new shell environment runs in BASH (the default shell in MacOS X and Linux), it first runs $HOME/.bashrc. If this sets a particular directory, then you end up in that directory and not in the directory where you ran your shell script. I use to see this when I too was involved in ClearCase hell. People setup their .kshrc script (it was the days before BASH and most people used Kornshell) to setup their views. Unfortunately, this made running any other shell script almost impossible to do.

Is there a way to run a shell script as one whole task(with single PID)?

I have a shell script called run.sh. In it, I may call other shell scripts like:
./run_1.sh
./run_2.sh
.........
If I call the script by ./run.sh, I have found actually it will invoke different tasks inside the script sequentially with different PIDs(i.e., run_1.sh will be a task and run_2.sh will be another task). This disables me to kill the whole group of tasks using one "kill" command or run the whole group of tasks all in background by running "./run.sh &".
So is there a way to run the script just as one whole task?
pkill can be used for killing the children of a process, using the -P option.
pkill -P $PID
where $PID is the PID of the parent process.
You can source the run_1.sh command so that it is executed in the same shell (This could cause side effects, since now all scripts will share the same scope).
source run_1.sh
source run_2.sh

What occurs when a file is `source`-d in Unix/Linux context?

I've seen shell scripts that include a line such as:
source someOtherFile
I know that causes the content of someOtherFile to execute, but what is the significance of source?
Follow-up questions: Can ANY script be sourced, or only certain type of scripts? Are there any side-effects other than environment variables when a script is sourced (as opposed to normally executing it)?
Running the command source on a script executes the script within the context of the current process. This means that environment variables set by the script remain available after it's finished running. This is in contrast to running a script normally, in which case environment variables set within the newly-spawned process will be lost once the script exits.
You can source any runnable shell script. The end effect will be the same as if you had typed the commands in the script into your terminal. For example, if the script changes directories, when it finishes running, your current working directory will have changed.
If you tell the shell, e.g. bash, to read a file and execute the commands in the file, it's called sourcing. The main point is, the current process (shell) does this, not a new child process.
In BASH you can use the source command or simply . to source a file.
source is a Unix command that evaluates the file following the command, as a list of commands, executed in the current context. You can also use . for sourcing the file.
source my-script.sh;
. my-script.sh;
Both commands will have the same effect.
In contrast, passing the script filename to the desired shell will run the script in a subshell, not the current context.

How does the shell know which directory it's in?

I have been trying to figure out how a shell knows which directory you're currently in. I know there is an environment variable $PWD but when I try changing it manually, it changes what my shell shows at the prompt but commands like ls and cd are unaffected.
cd is an internal shell command so I can understand it might use info stored within the shell memory, but ls is external and yet running ls without anything will give me whatever directory I was originally in regardless what I do to $PWD.
Each process has its own individual current working directory which the Linux system tracks. This is one of the pieces of information the OS manages for each process. There is a system call getcwd() which retrieves this directory.
The $PWD environment variable reflects what getcwd() was the last time the shell checked, but changing it does not actually change the current directory. To do that the shell would have to call chdir() when $PWD changes, which it does not do.
This also is the reason cd has to be a shell built-in. When you run a sub-process that child process gets its own working directory, so if cd were an executable then its calls to chdir() would be useless as that would not change its parent's working directory. It would only be changing its own (short-lived) working directory. Hence, cd is a shell built-in to avoid a sub-process being launched.
The shell sets that variable, but stores the knowledge internally (which is why you can't make cd an external program, it must be a built-in). The shell prompt is composed just before it is displayed each time, and you have specified using $PWD in yours, so the shell reads that in.
Remember: the shell is just a program, like any other program. It can---and does---store things in variables.
As AndiDog and John point out unix-like systems (i.e. including linux) actually maintains the working directory for each process through a set of system calls. The storage is still process local, however.
The Linux kernel stores the current directory of each process. You can look it up in the /proc filesystem (for example, "/proc/1/cwd" for the init process).
The current directory can be changed with the chdir syscall and retrieved with getcwd.
The current directory is a property of a running program (process) that gets inherited by processes created by that process. Changing the current directory is made via an operating system call. The shell maps the cd operation to that system call. When you write an external program like ls, that program inherits the current directory.
The $PWD variable is how the shell shows you the current directory for you to use it as a variable if you need it. Changing it does not have effect in the real current directory of the shell itself.
You (OP) launch ls via your command shell, and any process you launch, the shell launches in the context of its current working directory. So, each process you launch has its own $PWD variable (in a way).

Resources