A confusion in APUE2(about symbolic link in UNIX) - linux

The original text is below.It is in Section 4.22
The program in Figure 4.24 changes to a specific directory and then calls getcwd to print the working directory. If we run the program, we get
$ ./a.out
cwd = /var/spool/uucppublic
$ ls -l /usr/spool
lrwxrwxrwx 1 root 12 Jan 31 07:57 /usr/spool -> ../var/spool
Note that chdir follows the symbolic link as we expect it to, from Figure 4.17 .but when it goes up the directory tree, getcwd has no idea when it hits the /var/spool directory that it is pointed to by the symbolic link /usr/spool. This is a characteristic of symbolic links.
What does the author really mean by saying that the program hits the /var/spool?
What is the characteristic of symbolic links pointed out by the author?
I did not really understand.

Note that some shells, notably bash, keep track of whether you arrived at a given directory by chasing a symbolic link, and print the current directory accordingly. At least bash has options to cd to do a physical or logical change directory:
cd [-L|-P] [dir]
Change the current directory to dir. The variable HOME is the default dir. [...] The -P option says to use the physical directory structure instead of following symbolic links (see also the -P option to the set builtin command); the -L option forces symbolic links to be followed. An argument of - is equivalent to $OLDPWD. If a non-empty directory name from CDPATH is used, or if - is the first argument, and the directory change is successful, the absolute pathname of the new working directory is written to the standard output. The return value is true if the directory was successfully changed; false otherwise.
In the scenario shown, where /usr/spool is a symbolic link to /var/spool, then:
$ pwd
/
$ cd /usr/spool/uucppublic
/usr/spool/uucppublic
$ cd -L ..
/usr/spool
$ cd /usr/spool/uucppublic
/usr/spool/uucppublic
$ cd -P ..
/var/spool
$
For most people, a plain cd .. would do the same as cd -L ... You can choose to have bash do the same as cd -P .. instead if you prefer (using set -P or set -L).
The process of finding the pathname of the current directory should be understood too. Logically, the process (kernel) opens the current directory (.) and reads the inode number (and device number). It then opens the parent directory (..), and reads entries from that until it finds one with the matching inode number (and device number). This then gives it the last component of the pathname. It can now repeat the process, finding the the inode number of the next directory up, and opening its parent directory (../..), etc, until it reaches the root directory (where the inode number for both . and .. is the same, and the value is conventionally 2). Note that this even works across mount points. Beware of auto-mounted remote (NFS) file systems, though; it can be really slow if you scan through a directory containing several hundred automounted machines - as the naïve search outline above mounts all the machines until it finds the correct one. So, actual getcwd() functions are cleverer than this, but it explains how the path of the current directory is found. And it also shows why the process will not encounter /usr/spool when evaluating the directory under /var/spool/uucppublic - it simply never opens the /usr directory.
Note that the realpath() function (system call) takes a name possibly referencing symlinks and resolves it to a name that contains no symlinks at all. Passed /usr/spool/uucppublic, it would return /var/spool/uucppublic, for example.

Expanding on what #undor_gongor wrote:
Each process has a current working directory. It's not stored as the path name of the directory; it's a reference to the directory itself.
If it were stored as a path name, then the getcwd() function's job would be trivial: just print the path name. Instead, it has to readi the current directory, open its .. entry, then open that directory's .. entry, and so forth until it reaches the root (i.e., a directory whose .. entry points to the directory itself). It builds up the full path of the current directory in reverse order as it does this.
Since .. can't be a symlink, this process is not affected by symbolic links.
(Shells might have a $PWD or $CWD variable, or a pwd built-in, that is affected by symlinks; these typically work by remembering the string that was passed to cd or pushd.)

Assume you have a symlink /usr/spool pointing to /var/spool.
It says if you follow that symlink (e.g. cd /usr/spool), you end up in the pointed-to directory (/var/spool). Then, the information that you followed a symlink is lost. You are in /var/spool as if you had done cd /var/spool directly.
A further cd .. brings you to /var (as opposed to /usr).
UPDATE:
As pointed out by Keith Thompson and Jonathan Leffler, there are some shells that do remember the path you followed (i.e. /usr/spool). In such shells, cd ..
would go to /usr/. However, programs started from such a shell would still see /var/spool as the working directory.
This is probably the reason the author let you write a program for displaying cwd (to work-around such shells' internals).

Related

What is the Root directoy in a Linux System? [duplicate]

When creating filepaths and URLs, I noticed that many times the path starts with ./ or ~/.
What is the difference between filepaths that start with ./ and ~/?
What do each of them mean?
For the sake of completeness ...
Just path is a file or directory named path in the current directory.
./path is a file or directory named path in the current directory, with the directory spelled out. The dot directory . represents the current directory, and path is the name of the file or directory within this directory.
~/path is a shorthand for $HOME/path where $HOME is a variable which refers to your home directory. Typically your home directory will be somewhere like /home/you or /Users/you where you is your account name. (The command echo "$HOME" will display your home directory.) The expanded value is an absolute path (unless you have messed up the value of $HOME thoroughly), as indicated by the initial slash.
/path is an absolute path which refers to a file or directory named path which is in the root directory /. Every file on Unix is ultimately somewhere in the directory tree which starts with the root directory.
A file name which begins with $ includes the value of a shell variable in its name (like for example $HOME above); you have to know the value of that variable to determine whether it ends up containing a relative or an absolute path. Similarly, ~ at the beginning of a file name gets replaced ("expanded") by the shell to a different string, as outlined above.
(Technically, it's possible for a file name to begin with a literal dollar sign or tilde, too; you would then have to quote or backslash-escape that character to avoid having the shell expand it to something else. This is rather inconvenient, so these file names tend to be rare in practice.)
In the following exposition, we refer to the result of any such replacements, and ignore the complication of possible quoting.
Every file name which begins with / is an absolute path (aka full path) which explains how to reach a particular node starting from the root directory. For example, /var/tmp/you/reminder.txt refers to a file or directory reminder.txt (probably a file, judging from the name; but Unix doesn't care what you call your files or directories) which is in the directory you which is in the directory tmp which is in the directory var which is in the root directory.
Every file name which doesn't begin with / is a relative path which indicates how to reach a particular file or directory starting from the current directory. The special directory .. is the parent directory (that is, the directory which contains this directory) and the special directory . is the current directory. So path/there refers to the file or directory there inside the directory path in the current directory; and (hover the mouse over the gray area to display the spoiler)
there/.././and/back/.. is a (wicked complicated) way to refer to the directory and in the current directory, where we traverse the there directory and then move back to the current directory; then stay in the current directory; then refer to the directory back inside the directory and, but then move back to the parent directory of that, ending up with ./and.
In addition to ~/ for the current user's home directory, some shells and applications allow the notation ~them/ to refer to the home directory of the user account them. Also, some web server configurations allow each user to have a public web site in their directory ~/public_html and the URL notation http://server/~them/ would serve up the site of the user account them for outside visitors.
The current directory is a convenience which the shell provides so you don't have to type long paths all the time. You can, if you want to.
/bin/ls /home/you/Documents/unix-101/directories.txt
is a longwinded but perfectly valid way to say (assuming you are in your home directory),
ls Documents/unix-101/directories.txt
You could also say
cd Documents/unix-101
ls directories.txt
and until you cd again, all your commands will run in this directory.
See What exactly is current working directory? for a longer exposition of this related concept.
A "directory" is sometimes called a "folder" by people who are not yet old enough to prefer the former.
Tangentially, don't confuse the directory name . with the Bourne shell command which comprises a single dot (also known by its Bash alias source). The command
. ./scriptname
runs the commands from the file ./scriptname in the context of the current shell instance, as opposed to in a separate subshell (which is what just ./scriptname does). In other words, this command line invokes the dot command on a file scriptname in the dot directory.
The Bourne shell (and derivatives like Bash, Zsh, etc) use single quotes to prevent variable expansion and wildcard expansion, and double quotes to permit variable expansion, but inhibit wildcard expansion in a string. The quoting rules on Windows are different, and generally use double quotes to keep whitespace-separated values as a single string (and % instead of $ for variable substitutions).
./ means "starting from the current directory". . refers to the current working directory, so something like ./foo.bar would be looking for a file called foo.bar in the current directory. (As a side note, .. means refers to the parent directory of the current directory. So ../foo.bar would be looking for that file one directory above.)
~/ means "starting from the home directory". This could have different meanings in different scenarios. For example, in a Unix environment ~/foo.bar would be looking for a file called foo.bar in your home directory, something like /home/totzam/foo.bar. In many web applications, ~/foo.bar would be looking for a file called foo.bar in the web application root, something like /var/http/mywebapp/foo.bar.
./ is the current directory
~/ is the home directory of the current user
./ means that path is relative to your current position.
~/ means that path is relative to your home directory.
I will explain a simple example of it. As developers mentioned:
./ is current directory.
~/ is the home directory of the current user.
How both of the file path expressions can help us? Suppose you want to execute a script (.sh) and you're in the same directory where file exists then you can simply do it ./filename.sh
I mostly use ~/ to access my home directory files like .bashrc when I want to add any config in it. It's easier since the file path expression (for home directory) feels much easier and makes accessibility to the file from anywhere, without worrying about the path or changing the path.
. represents current directory
.. represents the parent directory
~ represents the home directory for the current user. Home directory is also represented by HOME env variable. you can do echo $HOME on the shell to see it.
These are generally used to specify relative paths. The / in the end of each notation is a separator that you can use when using these notations together.
Ex:
$ cd ../.. # Go 2 directories backwards
$ cd ~ # Takes you to $HOME directory
$ cd . # Does nothing :) As it literally means go to the directory that you are already present in.
$ cd ~/dir1 $ go to `$HOME/dir1`
On Unix, in any directory if you do ls -a you would see that . and .. will be mentioned (even for empty directory). Like mentioned, these have special meaning and are generated by default in Unix systems and are generally helpful to specify relative paths (i.e, path to a different directory relative to your current directory)
cd command is harmless. So, just play around by combining notations with cd command. You will eventually get a grip of them.

Odd behaviour of pwd with symlinks in terminal tabs

If I create a symlink through
ln -s /path/to/linked/dir current/path/link_name
and change the directory to current/path/link_name via
cd link_name
then I can check where I am using pwd-command. The output will be
current/path/link_name
But if I use some terminal emulator, such as terminator, konsole or others, I can split the tab or create a new tab in the same directory. The output of the pwd-command in the newly created tab will be
/path/to/linked/dir
In many cases, this is not convenient. Does anybody know how to change this behaviour in some terminal emulator(s)?
P.S. I also noticed that the output of ls typed from /current/path/link_name is the same as the output of ls typed from /path/to/linked/dir.
You can't. The reason is that you lose the information on how did you get there after the system call has executed. Some terminal emulators and mainly the bash(1) shell try to remember this, and implement pwd as an internal command, to cope with this scenarios. But in general if you try
/bin/pwd
You'll discover that all the information about how did you get to that final directory was lost in the course of time.
Ask yourself how can the /bin/pwd work and how can it determine the directory you are on, and you'll answer yourself the question:
The system maintains a current directory (the pwd command inherits this from its parent shell) in the system data for each process, but to save resources, it only stores the inode number of the directory that is actually your current directory (not actually, it maintains a reference to the in-core inode structure). It doesn't store the path you used to locate it, and it stores that info only to be able to get a starting point when you ask for a relative path when opening a file. The problem is the same as determining which directory a multiple linked file belongs to... no parent directory is stored for a file, as it can be in multiple directories linking to it all at the same time... this is also true for directories, but they have an .. entry inside themselves that links to their parents (their true parents, as one directory is now allowed to belong to different directories by means on normal links, this is forbidden by system, and ensured by the mkdir(2) system call) The pwd(1) commands uses precisely these links to find the parent directory (and then find the current dir in the parent directory, by searching for the inode number of the current directory on it), until this algorithm leads to the same inode (the root directory has this special characteristic, its .. entry points again to itself) so it stops going up. pwd can only work because it is following directories, and never files.
Terminator 1.90 does what you want. In an example session:
$ cd -- "$(mktemp --directory)"
$ mkdir a
$ ln -s a b
$ cd b
Press Ctrl-Shift-e (or o, or t). At this point I'm still in b.

No man page for the cd command

Ubuntu Linux 15.10 - I just noticed that there is no man page for cd
This seems a bit strange.
I tried:
man cd
at the cmd line and I get back
No manual entry for cd
I was trying to find documentation on
cd -
which is super-handy for flipping between the last dir and the current dir
and cd --
which seems to be an alias for
cd ~
Am I missing something very obvious here, or should the man page be present?
cd is not a command, it's built into your shell. This is necessary because your current working directory is controlled by the PWD environment variable named after the pwd or "print working directory" command.
The environment variables of a parent process cannot be changed by a child process. So if your shell ran /bin/cd which changed PWD it would only affect /bin/cd and anything it ran. It would not change the shell's PWD.
Some systems, like OS X and CentOS, map the cd man page to builtin which lists all the shell built ins and lets you know you should look at your shell's man page.
You can check what shell you have with echo $SHELL, it's probably bash.
cd is a builtin shell command.
$ type cd
cd is a shell builtin
You can open a help page for cd on Bash with
$ help cd
Which currently shows (Ubuntu 16.04):
$ help cd
cd: cd [-L|[-P [-e]] [-#]] [dir]
Change the shell working directory.
Change the current directory to DIR. The default DIR is the value of the
HOME shell variable.
The variable CDPATH defines the search path for the directory containing
DIR. Alternative directory names in CDPATH are separated by a colon (:).
A null directory name is the same as the current directory. If DIR begins
with a slash (/), then CDPATH is not used.
If the directory is not found, and the shell option `cdable_vars' is set,
the word is assumed to be a variable name. If that variable has a value,
its value is used for DIR.
Options:
-L force symbolic links to be followed: resolve symbolic links in
DIR after processing instances of `..'
-P use the physical directory structure without following symbolic
links: resolve symbolic links in DIR before processing instances
of `..'
-e if the -P option is supplied, and the current working directory
cannot be determined successfully, exit with a non-zero status
-# on systems that support it, present a file with extended attributes
as a directory containing the file attributes
The default is to follow symbolic links, as if `-L' were specified.
`..' is processed by removing the immediately previous pathname component
back to a slash or the beginning of DIR.
Exit Status:
Returns 0 if the directory is changed, and if $PWD is set successfully when
-P is used; non-zero otherwise.
Unfortunately, it does not answer your questions. There is documentation that does, however.
You can get to it with
$ man builtins
It opens many pages of help with less, my default viewer. I can find the help for cd by pressing the / key, then typing cd, then Enter, and pressing n twice gets me to the third instance of the substring, and the help, which reads:
cd [-L|[-P [-e]] [-#]] [dir]
Change the current directory to dir. if dir is not supplied,
the value of the HOME shell variable is the default. Any addi‐
tional arguments following dir are ignored. The variable CDPATH
defines the search path for the directory containing dir: each
directory name in CDPATH is searched for dir. Alternative
directory names in CDPATH are separated by a colon (:). A null
directory name in CDPATH is the same as the current directory,
i.e., ``.''. If dir begins with a slash (/), then CDPATH is not
used. The -P option causes cd to use the physical directory
structure by resolving symbolic links while traversing dir and
before processing instances of .. in dir (see also the -P option
to the set builtin command); the -L option forces symbolic links
to be followed by resolving the link after processing instances
of .. in dir. If .. appears in dir, it is processed by removing
the immediately previous pathname component from dir, back to a
slash or the beginning of dir. If the -e option is supplied
with -P, and the current working directory cannot be success‐
fully determined after a successful directory change, cd will
return an unsuccessful status. On systems that support it, the
-# option presents the extended attributes associated with a
file as a directory. An argument of - is converted to $OLDPWD
before the directory change is attempted. If a non-empty direc‐
tory name from CDPATH is used, or if - is the first argument,
and the directory change is successful, the absolute pathname of
the new working directory is written to the standard output.
The return value is true if the directory was successfully
changed; false otherwise.
Look for the - argument about the seventh line from the end:
An argument of - is converted to $OLDPWD before the directory change is attempted.
Note that there is no -- argument - which seems to mean that it actually ignores it.
relevant excerpt from the bash man page covering usage of cd -
cd [-L|[-P [-e]] [-#]] [dir]
Change the current directory to dir.
...
An argument of -
is converted to $OLDPWD before the directory change is attempted. If a non-
empty directory name from CDPATH is used, or if - is the first argument, and
the directory change is successful, the absolute pathname of the new working
directory is written to the standard output. The return value is true if the
directory was successfully changed; false otherwise.

How exactly does the cd ./ command work in Bash?

I'm just curious, when I'm using the cd command in bash, commands
cd foobar
and
cd ./foobar
work in the same way. I understand, that ./ refers to the current catalogue directory, but why then does "cd foobar" work? Is it just default, that when I'm not writing ./ on the beginning, program adds it on its own, or is it more complicated?
The cd command in bash (somewhat indirectly, as described below) invokes the chdir() syscall, behavior of which is specified by POSIX. The cd shell comand itself also has a POSIX specification for its behavior, in perhaps more detail than is appropriate for an accessible/readable definition.
Yes, the default for all operations (not only chdir() but also fopen() and others) is the current working directory.
This isn't specific to bash, but is operating-system-level behavior: "Current working directory" is part of the metadata about each process tracked by the kernel, and impacts filesystem-level requests made to the kernel: Any program, in any language, can call chdir("foo") or open("foo", O_RDONLY), and behavior will be to look for foo in the current directory, as inherited from the parent process or modified with prior chdir() calls.
That said, for the purposes of bash, cd ./foo is more specific than merely cd foo: The former says that you explicitly want the foo subdirectory of the current working directory. In bash, if the CDPATH shell variable is set, then cd foo will look in all directories listed in it, whereas cd ./foo explicitly only asks for foo under the current working directory.
Try this experiment:
# setup
tempdir=$(mkdir -d "${TMPDIR:-/tmp}/cdpath-test.XXXXXX")
mkdir -p "$tmpdir/i-am-a-test-directory"
CDPATH=".:$tempdir"
# this works, because CDPATH is honored
cd /
cd i-am-a-test-directory
# this does not, because you're explicitly asking for a directory that does not exist
cd /
cd ./i-am-a-test-directory
# cleanup
rm -rf "$tempdir"
unset CDPATH
Ignoring the CDPATH environment variable for the time being, the name after cd is a directory path name. There are only two sorts of path name in Unix:
absolute names that begin with a /
relative names which don't start with a /
All relative names are treated by the kernel as relative to the current directory — as if the name was prefixed by ./.
If you have CDPATH set, it complicates the search, and it is then conceivable that cd somewhere and cd ./somewhere land you in different directories. A directory name with no leading / and with no leading . or .. component (where cd .hidden doesn't count — the leading component is .hidden— but cd ./visible or cd ../visible do count) is searched for using CDPATH.
For example, consider this tree structure (only directories shown):
. - current directory
./somewhere
./src
./src/somewhere
Suppose you have CDPATH=src:. Then cd somewhere will go to ./src/somewhere and cd ./somewhere will go (unsurprisingly) to ./somewhere. If you type a name such as cd src/somewhere, the cd command will search for a sub-directory /xyz/pqr/src/somewhere for each directory /xyz/pqr on CDPATH in turn. I perpetually use CDPATH (7 directories on my own machines; 14 on my work machines).
Rule of thumb: use CDPATH=:…whatever…
That puts the current directory first on the search path, and is usually the most sane behaviour.
A similar story applies to searching for ordinary external (non-aliased, non-function, non-builtin) commands. Command names that start with a slash are absolute pathnames; command names containing a slash are relative to the current directory; command names without a slash are searched for on $PATH.

symbolic link without expanding $HOME or "~"?

the basic idea is that I want to link to path that's relative to $HOME, rather than explicitly expand the $HOME variable, as I want to make sure the link works on multiple machines, e.g.,
when I do
ln -s ~/data datalnk
I want it to be directed to directory /home/user/data on one machine which has a user $HOME of /home/user, and to /home/machine/user/data on another machine which has a user $HOME of /home/machine/user/data.
I cannot create a symbolic link on the second machine using
ln -s /home/machine/user /home/user
because I don't have the permission to do that, and I cannot link relative paths as the two machines have different hierarchies of directories.
anyideas on possible ways to fix or circumvent this?
EDIT:
what I am really trying to accompanish is to make the same link work on two macihnes, where the targets have the same directories in terms of their relative path to $/HOME only, not their absolute path, and not their relative path to the link either.
tl,dr it won't work
You can use an escaping mechanism such as single-quotes to get the ~ into the symbolic link:
> cd ~
> echo hello > a
> ln -s '~/a' b
However, ~ is a shell expansion and is not understood by the filesystem (actually, to the filesystem it's "just another character"). This is a good thing -- want the file-system layer to know about environment variables, as ~ is generally determined by $HOME?
> ls -l b
lrwxrwxrwx 1 root root 3 Oct 27 17:39 b -> ~/a
> ls b
ls: b: No such file or directory
You could still "manually" look at said symbolic link entries (as done with ls -l), but that would have to be done in a non-transparent fashion by a program (think of a ".LNK" in Windows). As can be seen, the filesystem just doesn't understand ~.
Happy sh'ing.
First of all: It can't be done directly. Symbolic links are plain text files, no extensions are performed. If you can't formulate a fixed relative or absolute path to the place you are referring, you can't symbolically link to it.
You can build a script to put links to appropriate directories in appropriate places, but the best way depends on your application.
The only way to make symlinks dynamic in this way is to use a relative path instead of an absolute path. In other words, don't start your path with /.
For example:
cd
ln -s data datalnk
At runtime your app or script will need to refer to ~/datalnk or $HOME/datalnk.
You haven't really said what you're trying to accomplish, so I can't really tell whether I'm solving your problem or suggesting that you need to go at it a different way.

Resources