How does the shell know which directory it's in? - linux

I have been trying to figure out how a shell knows which directory you're currently in. I know there is an environment variable $PWD but when I try changing it manually, it changes what my shell shows at the prompt but commands like ls and cd are unaffected.
cd is an internal shell command so I can understand it might use info stored within the shell memory, but ls is external and yet running ls without anything will give me whatever directory I was originally in regardless what I do to $PWD.

Each process has its own individual current working directory which the Linux system tracks. This is one of the pieces of information the OS manages for each process. There is a system call getcwd() which retrieves this directory.
The $PWD environment variable reflects what getcwd() was the last time the shell checked, but changing it does not actually change the current directory. To do that the shell would have to call chdir() when $PWD changes, which it does not do.
This also is the reason cd has to be a shell built-in. When you run a sub-process that child process gets its own working directory, so if cd were an executable then its calls to chdir() would be useless as that would not change its parent's working directory. It would only be changing its own (short-lived) working directory. Hence, cd is a shell built-in to avoid a sub-process being launched.

The shell sets that variable, but stores the knowledge internally (which is why you can't make cd an external program, it must be a built-in). The shell prompt is composed just before it is displayed each time, and you have specified using $PWD in yours, so the shell reads that in.
Remember: the shell is just a program, like any other program. It can---and does---store things in variables.
As AndiDog and John point out unix-like systems (i.e. including linux) actually maintains the working directory for each process through a set of system calls. The storage is still process local, however.

The Linux kernel stores the current directory of each process. You can look it up in the /proc filesystem (for example, "/proc/1/cwd" for the init process).
The current directory can be changed with the chdir syscall and retrieved with getcwd.

The current directory is a property of a running program (process) that gets inherited by processes created by that process. Changing the current directory is made via an operating system call. The shell maps the cd operation to that system call. When you write an external program like ls, that program inherits the current directory.
The $PWD variable is how the shell shows you the current directory for you to use it as a variable if you need it. Changing it does not have effect in the real current directory of the shell itself.

You (OP) launch ls via your command shell, and any process you launch, the shell launches in the context of its current working directory. So, each process you launch has its own $PWD variable (in a way).

Related

Bash built-in "history" can be executed, but is not in $PATH

I know shell builtins are loaded into memory and thought I could find all builtins in /usr/bin or somewhere in echo $PATH. I was trying to find out how the history command works. My assumption was that it is reading from ~/bash_history. So I tried objdump -S $(which history)
which history
echo $?
1
This did not return the path of the command which makes me where the binary for history is located.
type -t which
builtin
I assume this means that it is loaded onto memory. So does a shell process load builtins that are stored outside of echo $PATH
A shell builtin is literally built into the shell executable itself. It is invoked by the shell not as a separate process, but simply as a regular function call within the shell process. So if you want to find its source code, you need to look in the source code of bash.
For many builtins, such as cd, the main reason for it being a builtin is that it modifies the state of the shell itself. It would be pointless to have cd be a separate process, as that process would only change its own current directory, not that of the shell process. In the case of history, the reason is presumably that ~/.bash_history is only written when the shell exits, so the command also needs access to the in-memory history of the current session, which is contained within the running bash process. For other builtins, such as echo, the reason is performance: the command is assumed to be so frequently used that we want to avoid spawning a new process every time it is invoked (but if you really do want a process, there is also /bin/echo, which may behave differently).

Why do we need execution permission although we can run any script without it using "bash script file"?

I am wondering when and why do we need execution permission in linux although we can run any script without execute permission when we execute that script using the syntax bellow?
bash SomeScriptFile
Not all programs are scripts — bash for example isn't. So you need execute permission for executable programs.
Also, when you say bash SomeScriptFile, the script has to be in the current directory. If you have the script executable and in a directory on your PATH (e.g. $HOME/bin), then you can run the script without the unnecessary circumlocution of bash $HOME/bin/SomeScriptFile (or bash ~/bin/SomeScriptFile); you can simply run SomeScriptFile. This economy is worth having.
Execute permission on a directory is somewhat different, of course, but also important. It permits the 'class of user' (owner, group, others) to access files in the directory, subject to per-file permissions also allowing that.
Executing the script by invoking it directly and running the script through bash are two very different things.
When you run bash ~/bin/SomeScriptFile you are really just executing bash -- a command interpreter. bash in turns load the scripts and runs it.
When you run ~/bin/SomeSCriptFile directly, the system is able to tell this file is a script file and finds the interpreter to run it. There is a big of magic invoking the #! on the first line to look for the right interpreter.
The reason we run scripts directly is that the user (and system) couldn't know or care of the command we are running is a script or a compiled executable.
For instance, if I write a nifty shell script called fixAllIlls and later I decide to re-write it in C, as long a I keep the same interface, the users don't have to do anything different.
To them, it is just a program to run.
edit
The operating system checks permissions first for several reasons:
Checking permissions is faster
In the days of old, you could have SUID scripts, so one needed to check the permission bits.
As a result, it was possible to run scripts that you could not actually read the contents of. (That is still true of binaries.)

How to implement a new shell command?

I want to implement my own version of ps command & I am trying to understand how shell commands are implemented in Linux. Is it part of the Shell application or part of the module?
In my understanding, when I type ps in shell prompt: the actual implementation for ps can only reside in kernel & shell just invokes a kernel api/binary.
Now, if I want to add a new command (say myps): what should I modify in shell application? & What should I modify in kernel module?
How shell application knows the list of commands supported in individual modules (kernel, network, fs, etc) ?
Lastly, if (just for example) network module is configured & built only for ipv4 and then there is no point in supporting ipv6 commands in shell? How is this taken care?
ps is just a regular program (typically located at /bin/ps). You can find its location on your system by running which ps.
When you run a command (assuming it's not a builtin function, alias, etc. implemented by the shell itself), the shell searches the directories listed in your PATH environment variable. If you're using sh or bash as your shell, you can see it by running echo $PATH. The shell searches the directories in the order they're listed in PATH, and runs the first matching program it finds.
If you want to create a new version of ps, just write a program and put it in one of the directories on your PATH. Typically somewhere like /usr/local/bin/ (accessible by all users) or ~/bin (in your home directory). Or you could add a new directory to your PATH. No need to mess with the kernel or the shell itself, thankfully.

Why calling a script by "scriptName" doesn't work?

I have a simple script cmakeclean to clean cmake temp files:
#!/bin/bash -f
rm CMakeCache.txt
rm *.cmake
which I call like
$ cmakeclean
And it does remove CMakeCache.txt, but it doesn't remove cmake_install.cmake:
rm: *.cmake: No such file or directory
When I run it like:
$ . cmakeclean
it does remove both.
What is the difference and can I make this script work like an usual linux command (without . in front)?
P.S.
I am sure the both times is same script is executed. To check this I added echo meme in the script and rerun it in both ways.
Remove the -f from your #!/bin/bash -f line.
-f prevents pathname expansion, which means that *.cmake will not match anything. When you run your script as a script, it interprets the shebang line, and in effect runs /bin/bash -f scriptname. When you run it as . scriptname, the shebang is just seen as a comment line and ignored, so the fact that you do not have -f set in your current environment allows it to work as expected.
. script is short for source script which means the current shell executes the commands in the script. If there's an exit in there, the current shell will exit (and e. g. the terminal window will close).
This is typically used to modify the environment of the current shell (set variables etc.).
script asks the shell to fork itself, then exec the given script in the child process, and then wait in the father for the termination of the child. If there's an exit in the script, this will be executed by the child shell and thus only terminate this. The father shell stays intact and unaltered by this call.
This is typically used to start other programs from the current shell.
Is this about ClearCase? What did you do in your poor life where you've been assigned to work in the deepest bowels of hell?
For years, I was a senior ClearCase Administer. I haven't touched it in over a decade. My life is way better now. The sky is bluer, bird songs are more melodious, and my dread over coming to work every day is now a bit less.
Getting back to your issue: It's hard to say exactly what's going on. ClearCase does some wacky things. In a dynamic view, the ClearCase repository on Unix systems is hidden in the shell's environment. Now you see it, now you don't.
When you run a shell script, it starts up a new environment. If a particular shell variable is not imported, it is invisible that shell script. When you merely run cmakeclean from the command line, you are spawning a new shell -- one that does not contain your ClearCase environment.
When you run a shell script with a dot prefix like . cmakeclean, you are running that shell script in the current shell which contains your ClearCase environment. Thus, it can see your ClearCase view.
If you're using a snapshot view, it is possible that you have a $HOME/.bashrc that's changing directories on you. When a new shell environment runs in BASH (the default shell in MacOS X and Linux), it first runs $HOME/.bashrc. If this sets a particular directory, then you end up in that directory and not in the directory where you ran your shell script. I use to see this when I too was involved in ClearCase hell. People setup their .kshrc script (it was the days before BASH and most people used Kornshell) to setup their views. Unfortunately, this made running any other shell script almost impossible to do.

Location of cd executable

I read that the executables for the commands issued using exec() calls are supposed to be stored in directories that are part of the PATH variable.
Accordingly, I found the executables for ls, chmod, grep, cat in /bin.
However, I could not find the executable for cd.
Where is it located?
A process can only affect its own working directory. When an executable is executed by the shell it executes as a child process, so a cd executable (if one existed) would change that child process's working directory without affecting the parent process (the shell), hence the cd command must be implemented as a shell built-in that actually executes in the shell's own process.
cd is a shell built-in, unfortunately.
$ type cd
cd is a shell builtin
...from http://www.linuxquestions.org/questions/linux-newbie-8/whereis-cd-sudo-doesnt-find-cd-464767/
But you should be able to get it working with:
sh -c "cd /somedir; do something"
Not all utilities that you can execute at a shell prompt need actually exist as actual executables in the filesystem. They can also be so-called shell built-ins, which means – you guessed it – that they are built into the shell.
The Single Unix Specification does, in general, not specify whether a utility has to be provided as an executable or as a built-in, that is left as a private internal implementation detail to the OS vendor.
The only exceptions are the so-called special built-ins, which must be provided as built-ins, because they affect the behavior of the shell itself in a manner that regular executables (or even regular built-ins) can't (for example set, which sets variables that persist even after set exits). Those special built-ins are:
break
:
continue
.
eval
exec
exit
export
readonly
return
set
shift
times
trap
unset
Note that cd is not on that list, which means that cd is not a special built-in. In fact, according to the specification, it would be perfectly legal to implement cd as a regular executable. It's just not possible, for the reasons given by the other answers.
And if you scroll down to the non-normative section of the specification, i.e. to the part that is not officially part of the specification but only purely informational, you will find that fact explicitly mentioned:
Since cd affects the current shell execution environment, it is always provided as a shell regular built-in.
So, the specification doesn't require cd to be a built-in, but it's simply impossible to do otherwise.
Note that sometimes utilities are provided both as a built-in and as an executable. A good example is the time utility, which on a typical GNU system is provided both as an executable by the Coreutils package and as a shell regular built-in by Bash. This can lead to confusion, because when you do man time, you get the manpage of the time executable (the time builtin is documented in man builtins), but when you execute time you get the time built-in, which does not support the same features as the time executable whose manpage you just read. You have to explicitly run /usr/bin/time (or whatever path you installed Coreutils into) to get the executable.
According to this, cd is always a built-in command and never an executable:
Since cd affects the current shell execution environment, it is always provided as a shell regular built-in.
cd is part of the shell; an internal command. There is no binary for it.
The command cd is built-in in your command line shell. It could not affect the working directory of your shell otherwise.
I also searched the executable of "cd" and there is no such.
You can work with chdir (pathname) in C, it has the same effect.

Resources