chown does not set SGID - linux

I am trying to create a a directory with permissions 02770, so that the resultant permissions would be drwxrws---
When I run the below commands I get the expected behavior
rsam.svtest2.serendipity> (/home/svtest2)
$ mkdir abc
rsam.svtest2.serendipity> (/home/svtest2)
$ ls -lrt
drwxrwxr-x 2 svtest2 users 6 Apr 18 10:57 abc
rsam.svtest2.serendipity> (/home/svtest2)
$ chmod 02770 abc
rsam.svtest2.serendipity> (/home/svtest2)
$ ls -lrt
drwxrws--- 2 svtest2 users 6 Apr 18 10:57 abc
UPDATE#1
Following from above, after running mkdir and chmod on a directory, when I run chown the SGID bit gets cleared off.
rsam.svtest2.serendipity> (/home/svtest2)
$ chown svtest2:users abc
rsam.svtest2.serendipity> (/home/svtest2)
$ ls -lrt
drwxrwx--- 2 svtest2 users 6 Apr 18 10:57 abc
From the chown documentation,
Only a privileged process (Linux: one with the CAP_CHOWN capability)
may change the owner of a file. The owner of a file may change the
group of the file to any group of which that owner is a member. A
privileged process (Linux: with CAP_CHOWN) may change the group
arbitrarily.
The problem is that my user svtest does not have CAP_CHOWN capability.
Now the question boils down to - How to I get the user to have CAP_CHOWN capability?
It looks like there is some instruction here - SO - setting CAP_CHOWN
but I am yet to try it out.
However, when I run below C++ code(part of tuxedo server)
// Check if the directory exists and if not creates the directory
// with the given permissions.
struct stat st;
int lreturn_code = stat(l_string, &st);
if (lreturn_code != 0 &&
(mkdir(l_string, lpermission) != 0 ||
chmod(l_string, lpermission) != 0)) {
....
....
}
....
....
// Convert group name to group id into lgroup
if (chown(l_string, -1, lgroup) != 0) {
// System error.
}
The directory is created as below:
$ ls -l|grep DirLevel1
drwxrwx--- 2 svtest2 users 6 Apr 18 11:14 DirLevel1
Notice that the SGUID bit is not set as against when the commands were run directly as mentioned above.
Excerpt from strace for the operation:
5864 stat("/home/svtest2/data/server/log/DirLevel1/", 0x7ffd235f29f0) = -1 ENOENT (No such file or directory)
5864 mkdir("/home/svtest2/data/server/log/DirLevel1/", 02770) = 0
5864 chmod("/home/svtest2/data/server/log/DirLevel1/", 02770) = 0
5864 socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 15
5864 connect(15, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
5864 close(15) = 0
5864 socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 15
5864 connect(15, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
5864 close(15) = 0
5864 open("/etc/group", O_RDONLY|O_CLOEXEC) = 15
5864 fstat(15, {st_mode=S_IFREG|0644, st_size=652, ...}) = 0
5864 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f414d7c4000
5864 read(15, "root:x:0:\nbin:x:1:\ndaemon:x:2:\ns"..., 4096) = 652
5864 close(15) = 0
5864 munmap(0x7f414d7c4000, 4096) = 0
5864 chown("/home/svtest2/data/server/log/DirLevel1/", 4294967295, 100) = 0
5864 write(7, "\0\0\2~\6\0\0\0\0\0\21i\216\376\377\377\377\377\377\377\377\1\0\0\0\0\0\0\0\1\0\0"..., 638) = 638
5864 read(7, "\0\0\0\300\6\0\0\0\0\0\10\0\0\0\0\250\0\0\0\0\0\0\0\0\0(\0\0\0\0\0\0"..., 8208) = 192
5864 write(7, "\0\0\1}\6\0\0\0\0\0\3h\221\1\0\0\0\0\0\0\0\376\377\377\377\377\377\377\377\250\0\0"..., 381) = 381
5864 read(7, "\0\0\0\26\6\0\0\0\0\0\10\4\0\0\0\t\1\0\0\0\215\f", 8208) = 22
5864 msgsnd(43679799, {805306373, "y\0\0\0007\200\232\2\0\0\0\0\f\2\0\0\0\0\0\200\0\0\0\0\0\0\0\0\0\0\0\0"...}, 516, IPC_NOWAIT) = 0
5864 msgrcv(43614264,
From http://man.sourcentral.org/RHEL7/2+chown,
When the owner or group of an executable file are changed by an
unprivileged user the S_ISUID and S_ISGID mode bits are cleared. POSIX
does not specify whether this also should happen when root does the
chown(); the Linux behavior depends on the kernel version. In case of
a non-group-executable file (i.e., one for which the S_IXGRP bit is
not set) the S_ISGID bit indicates mandatory locking, and is not
cleared by a chown().
The above highlights a possible scenario, but I'm not sure how that is applicable to my case because it is not executable file but a directory.

Since *nix system consider a file to be an executable by seeing the 'x' permission bit, I believe a searchable directory might be considered as an executable, too.

Related

crontab with ed by commands on stream, results in "no modification made"

I am trying to append a line to my crontab file. I know there are other ways to work around this problem, but still want to know what caused it. The command is run on raspberry pi 3 B+, raspbian lite is installed, with GNU ed 1.15, cron 3.0pl1-134+deb10u1.
The command that I'm stuck on is this:
$ echo -e 'a\n#asdf\n.\nwQ' | EDITOR=ed crontab -e
902
909
No modification made
I'm expecting it to add line #asdf at the end of my crontab file, but it doesn't.
Setting EDITOR='tee -a' as suggested on https://stackoverflow.com/a/30123606/8842387 does not solve the problem. So I guess it is the problem with cron.
Strangely enough, when I give ed commands from the keyboard directly, rather than streaming it, it just works. Maybe subshell creation caused the problem?
Here I'm attaching a few of the last lines from strace result.
$ echo -e 'a\n#asdf\n.\nwQ' | EDITOR=ed strace crontab -e
execve("/usr/bin/crontab", ["crontab", "-e"], 0x7ee54c14 /* 29 vars */) = 0
access("/etc/suid-debug", F_OK) = -1 ENOENT (No such file or directory)
...
read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\7\0\0\0\7\0\0\0\0"..., 4096) = 659
_llseek(3, -393, [266], SEEK_CUR) = 0
read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\7\0\0\0\7\0\0\0\0"..., 4096) = 393
close(3) = 0
getpid() = 18579
socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 3
connect(3, {sa_family=AF_UNIX, sun_path="/dev/log"}, 110) = 0
send(3, "<78>Nov 20 15:31:25 crontab[1857"..., 56, MSG_NOSIGNAL) = 56
openat(AT_FDCWD, "crontabs/pi", O_RDONLY) = -1 EACCES (Permission denied)
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0
read(4, "# Locale name alias data base.\n#"..., 4096) = 2995
read(4, "", 4096) = 0
close(4) = 0
openat(AT_FDCWD, "/usr/share/locale/en_GB.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_GB.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_GB/LC_MESSAGES/libc.mo", O_RDONLY) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=1433, ...}) = 0
mmap2(NULL, 1433, PROT_READ, MAP_PRIVATE, 4, 0) = 0x76f50000
close(4) = 0
openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, "crontabs/pi/: fdopen: Permission"..., 39crontabs/pi/: fdopen: Permission denied) = 39
exit_group(1) = ?
+++ exited with 1 +++
openat(AT_FDCWD, "crontabs/pi", O_RDONLY) = -1 EACCES (Permission denied) looks a bit suspicious, but not sure why it opens the file read-only.
EDIT:
As suggested by #tink, I ran EDITOR=ed strace crontab -e to see what strace gives on an interactive session. The result was almost same (only varying on pid and fd numbers).
I noticed that running echo "..." | EDITOR=ed crontab -e exited with message No modification made but with strace the process halts without any messages. (EDITOR=ed strace crontab -e 2>&1 | grep "No mod" prints nothing). Guess the strace triggers different errors.
Following up on my VISUAL comment, these worked for me:
( unset VISUAL; printf '%s\n' a '#abcd' . wq | EDITOR=ed crontab -e )
printf '%s\n' a '#abcd' . wq | VISUAL=ed crontab -e
In my environment, both VISUAL and EDITOR are set to "vim"
Or, more roundabout, but don't need to monkey with env vars. This one also allows you to do it silently:
crontab <(printf '%s\n' a '#asdf' . '%p' | ed -s <(crontab -l))
I was doing the above on a Mac. On Linux, I can reproduce your observations, but can't explain them.
A small tweak to the last command works:
printf '%s\n' a '#asdf' . '%p' Q | ed -s <(crontab -l) | crontab -
TLDR; (sleep 1; echo -e 'a\n#asdf\n.\nwQ') | EDITOR=ed crontab -e works!
The problem was on crontab.
When I invoke crontab -e it creates a temporary copy of the user cron table in /tmp directory.
Then opens the temporary file with an editor specified by $EDITOR.
After the editing is done, crontab check if the file modification date have changed since its creation.
This is implemented in the patch that enables editing cron table via temporary file.
In my case, ed getting its command from stdin finished the editing too fast so that even a single digit of the modification timestamp of the temporary file had not been changed.
As crontab considered no human can make edition that fast, it assumes no modification made and discards it.
To bypass this behavior, I added sleep 1 before the release of the command.
This will hold ed to wait for its command from stdin after crontab created tempfile, which effectively lets the modification timestamp different.

Copy and move's command effect on inode

I interpret inode as a pointer to the actual place where the file is stored.
But I have problem understanding:
If I use cp file1 file2 in a place where file2 already exists, the inode doesn't change. And If there is originally a hard-link to file2, they now both point to the new file just copied here.
The only reason I can think of is that Linux interprets this as modifying
the file instead of deleting and creating a new file. I don't understand why it's designed this way?
But when I use mv file1 file2, the inode changes to the inode of file1.
You are correct in stating that cp will modify the file instead of deleting and recreating.
Here is a view of the underlying system calls as seen by strace (part of the output of strace cp file1 file2):
open("file2", O_WRONLY|O_TRUNC) = 4
stat("file2", {st_mode=S_IFREG|0664, st_size=6, ...}) = 0
stat("file1", {st_mode=S_IFREG|0664, st_size=3, ...}) = 0
stat("file2", {st_mode=S_IFREG|0664, st_size=6, ...}) = 0
open("file1", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=3, ...}) = 0
open("file2", O_WRONLY|O_TRUNC) = 4
fstat(4, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
read(3, "hi\n", 65536) = 3
write(4, "hi\n", 3) = 3
read(3, "", 65536) = 0
close(4) = 0
close(3) = 0
As you can see, it detects that file2 is present (stat returns 0), but then opens it for writing (O_WRONLY|O_TRUNC) without first doing an unlink.
See for example POSIX.1-2017, which specifies that the destination file shall only be unlink-ed where it could not be opened for writing and -f is used:
A file descriptor for dest_file shall be obtained by performing
actions equivalent to the open() function defined in the System
Interfaces volume of POSIX.1-2017 called using dest_file as the path
argument, and the bitwise-inclusive OR of O_WRONLY and O_TRUNC as the
oflag argument.
If the attempt to obtain a file descriptor fails and the -f option is
in effect, cp shall attempt to remove the file by performing actions
equivalent to the unlink() function defined in the System Interfaces
volume of POSIX.1-2017 called using dest_file as the path argument. If
this attempt succeeds, cp shall continue with step 3b.
This implies that if the destination file exists, the copy will succeed (without resorting to -f behaviour) if the cp process has write permission on it (not necessarily run as the user that owns the file), even if it does not have write permission on the containing directory. By contrast, unlinking and recreating would require write permission on the directory. I would speculate that this is behind the reason why the standard is as it is.
The --remove-destination option on GNU cp will make it do instead what you thought ought to be the default.
Here is the relevant part of the output of strace cp --remove-destination file1 file2. Note the unlink this time.
stat("file2", {st_mode=S_IFREG|0664, st_size=6, ...}) = 0
stat("file1", {st_mode=S_IFREG|0664, st_size=3, ...}) = 0
lstat("file2", {st_mode=S_IFREG|0664, st_size=6, ...}) = 0
unlink("file2") = 0
open("file1", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=3, ...}) = 0
open("file2", O_WRONLY|O_CREAT|O_EXCL, 0664) = 4
fstat(4, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
read(3, "hi\n", 65536) = 3
write(4, "hi\n", 3) = 3
read(3, "", 65536) = 0
close(4) = 0
close(3) = 0
When you use mv and the source and destination paths are on the same file filesystem, it will do an rename, and this will have the effect of unlinking any existing file at the target path. Here is the relevant part of the output of strace mv file1 file2.
access("file2", W_OK) = 0
rename("file1", "file2") = 0
In either case where an destination path is unlinked (whether explicitly by unlink() as called from cp --remove-destination, or as part of the effect of rename() as called from mv), the link count of the inode to which it was pointing will be decremented, but it will remain on the filesystem if either the link count is still >0 or if any processes have open filehandles on it. Any other (hard) links to this inode (i.e. other directory entries for it) will remain.
Investigating using ls -i
ls -i will show the inode numbers (as the first column when combined with -l), which helps demonstrate what is happening.
Example with default cp action
$ rm file1 file2 file3
$ echo hi > file1
$ echo world > file2
$ ln file2 file3
$ ls -li file*
49 -rw-rw-r-- 1 myuser mygroup 3 Jun 13 10:43 file1
50 -rw-rw-r-- 2 myuser mygroup 6 Jun 13 10:43 file2
50 -rw-rw-r-- 2 myuser mygroup 6 Jun 13 10:43 file3
$ cp file1 file2
$ ls -li file*
49 -rw-rw-r-- 1 myuser mygroup 3 Jun 13 10:43 file1
50 -rw-rw-r-- 2 myuser mygroup 3 Jun 13 10:43 file2 <=== exsting inode
50 -rw-rw-r-- 2 myuser mygroup 3 Jun 13 10:43 file3 <=== exsting inode
(Note existing inode 50 now has size 3).
Example with --remove-destination
$ rm file1 file2 file3
$ echo hi > file1
$ echo world > file2
$ ln file2 file3
$ ls -li file*
49 -rw-rw-r-- 1 myuser mygroup 3 Jun 13 10:46 file1
50 -rw-rw-r-- 2 myuser mygroup 6 Jun 13 10:46 file2
50 -rw-rw-r-- 2 myuser mygroup 6 Jun 13 10:46 file3
$ cp --remove-destination file1 file2
$ ls -li file*
49 -rw-rw-r-- 1 myuser mygroup 3 Jun 13 10:46 file1
55 -rw-rw-r-- 1 myuser mygroup 3 Jun 13 10:47 file2 <=== new inode
50 -rw-rw-r-- 1 myuser mygroup 6 Jun 13 10:46 file3 <=== existing inode
(Note new inode 55 has size 3. Unmodified inode 50 still has size 6.)
Example with mv
$ rm file1 file2 file3
$ echo hi > file1
$ echo world > file2
$ ln file2 file3
$ ls -li file*
49 -rw-rw-r-- 1 myuser mygroup 3 Jun 13 11:05 file1
50 -rw-rw-r-- 2 myuser mygroup 6 Jun 13 11:05 file2
50 -rw-rw-r-- 2 myuser mygroup 6 Jun 13 11:05 file3
$ mv file1 file2
$ ls -li file*
49 -rw-rw-r-- 1 myuser mygroup 3 Jun 13 11:05 file2 <== existing inode
50 -rw-rw-r-- 1 myuser mygroup 6 Jun 13 11:05 file3 <== existing inode
#alaniwi's answer covers what is is happening, but there's an implicit why here as well.
The reason cp works the way it does is to provide a way of replacing a file with mulitple names, having all of those names refer to the new file. When the destination of cp is a file that already exists, possibly with multiple names via hard or soft links, cp will make all of those names refer to the new file. There will be no 'orphan' references to the old file left over.
Given this command, it is pretty easy to get the 'just change the file for one name' behavior -- unlink the file first. Given just that as a primitive it would be very hard to implement the 'change all references to point to the new contents' behavior.
Of course, doing rm+cp has some race condition issues (it is two commands), which is why the install command got added in BSD unix -- it basically just does rm + cp, along with some checks to make it atomic in the rare case two people try to install to the same path simultaneously, as well as the more serious problems of someone reading from the file you're trying to install to (a problem with plain cp). Then the GNU version added options to backup the old version and various other useful bookkeeping.
An inode is a collection of metadata for a file, i.e. information about a file, in a Unix/ Unix-like filesystem. It includes permission data, last access/ modify time, file size, etc.
Notably, a file's name/ path is not part of the inode. A filename is just a human-readable identifier for an inode. A file can have one or more names, the number of which is represented in the inode by its number of "links" (hard links). The number associated with the inode, the inode number, which I believe you're interpreting as its physical location on disk, is rather simply a unique identifier for the inode. An inode does contain the location of the file on disk, but that is not the inode number.
So knowing this, the difference you're seeing is in how cp and mv function. When you cp a file you're creating a new inode with a new name and copying the contents of the old file to a new location on disk. When you mv a file all you're doing is changing one of its names. If the new name is already the name of another file, the name is disassociated with the old file (and the old file's link count is reduced by 1) and associated with the new file.
You can read more about inodes here.

What happens internally when `ls *.c` is executed?

I've got very interested in Linux internals recently, and currently trying to understand how things work.
I knew that when I type ls
opendir() - function is called;
readdir() - function called for each directory entry in directory data store;
stat() - function can be called to get additional information on files, if required.
Please correct me if I'm missing something or if it's wrong.
The part which is mystery to me is filename expansion(globbing).
I've compared the output of strace ls
open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=270336, ...}) = 0
getdents(3, /* 14 entries */, 32768) = 440
getdents(3, /* 0 entries */, 32768) = 0
close(3) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
write(1, "2q.c ds.c fglob fnoglob\n", 272q.c ds.c fglob fnoglob
and strace ls *.c,
stat("2q.c", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
lstat("2q.c", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
stat("ds.c", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
lstat("ds.c", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
write(1, "2q.c ds.c\n", 112q.c ds.c
) = 11
and from my limited knowledge can tell that in the first case, it indeed behaves as I expected open, stat followed by getdents.
But the later one, with shell globbing isn't clear to me, because it there's already a list of files, which match the pattern. Where does this list came from?
Thanks!
Shell globbing patterns on the command line are expanded by the shell before the utility is invoked.
You can see this by enabling tracing in the shell with set -x:
$ set -x
$ ls -l f*
+ ls -l file1 file2 file3
-rw-r--r-- 1 kk wheel 0 May 11 16:49 file1
-rw-r--r-- 1 kk wheel 0 May 11 16:49 file2
-rw-r--r-- 1 kk wheel 0 May 11 16:49 file3
As you can see, the shell tells you what command it invokes (at the + prompt), and at that point it has already expanded the pattern on the command line.
The ls command does not do filename globbing. In fact, if you single quote the globbing pattern to protect it from the shell, ls is bound to be confused:
$ ls -l 'f*'
+ ls -l f*
ls: f*: No such file or directory
(unless there's actually something in the current directory called f* of course).

Which one is PID1: /sbin/init or systemd

I'm using Arch Linux. I have read about systemd, and as I understand it, systemd is the first process, and it starts the rest of the processes.
But when I use:
ps -aux
The result shows that /sbin/init has PID 1. And when I use:
pstree -Apn
The result shows that systemd has PID 1. Which is correct? Is /sbin/init starting systemd?
They're probably both right.
$ sudo ls -ltrh /proc/1/exe
[sudo] password for user:
lrwxrwxrwx 1 root root 0 May 30 21:22 /proc/1/exe -> /lib/systemd/systemd
$ echo $(tr '\0' ' ' < /proc/1/cmdline )
/sbin/init splash
$ stat /sbin/init
File: '/sbin/init' -> '/lib/systemd/systemd'
Size: 20 Blocks: 0 IO Block: 4096 symbolic link
Device: 801h/2049d Inode: 527481 Links: 1
Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2017-05-30 21:27:12.058023583 -0500
Modify: 2016-10-26 08:04:58.000000000 -0500
Change: 2016-11-19 11:38:45.749226284 -0600
Birth: -
The commands above show us:
what is the file corresponding to pid 1's executable image?
what was invoked (passed to exec()) when pid 1 was started?
what are the characteristics of the path at /sbin/init?
On my system, /sbin/init is a symlink to "/lib/systemd/systemd". This is likely similar to your system. We can see what information ps -aux is using by straceing it.
$ strace ps -aux
...
open("/proc/1/cmdline", O_RDONLY) = 6
read(6, "/sbin/init\0splash\0", 131072) = 18
read(6, "", 131054) = 0
close(6) = 0
...
and likewise for pstree:
$ strace pstree -Apn
...
getdents(3, /* 332 entries */, 32768) = 8464
open("/proc/1/stat", O_RDONLY) = 4
stat("/proc/1", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "1 (systemd) S 0 1 1 0 -1 4194560"..., 8192) = 192
read(4, "", 7168) = 0
open("/proc/1/task", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
...
So the difference in output is because they use different sources of information. /proc/1/cmdline tells us how the process was invoked. Whereas /proc/1/stat shows that the process' name is systemd.
$ cat /proc/1/stat
1 (systemd) S 0 1 1 0 -1 4194560 34371 596544 1358 3416 231 144 298 1758 20 0 1 0 4 190287872 772 18446744073709551615 1 1 0 0 0 0 671173123 4096 1260 0 0 0 17 2 0 0 12188 0 0 0 0 0 0 0 0 0 0

Linux proc/pid/fd for stdout is 11?

Executing a script with stdout redirected to a file. So /proc/$$/fd/1 should point to that file (since stdout fileno is 1). However, actual fd of the file is 11. Please, explain, why.
Here is session:
$ cat hello.sh
#!/bin/sh -e
ls -l /proc/$$/fd >&2
$ ./hello.sh > /tmp/1
total 0
lrwx------ 1 nga users 64 May 28 22:05 0 -> /dev/pts/0
lrwx------ 1 nga users 64 May 28 22:05 1 -> /dev/pts/0
lr-x------ 1 nga users 64 May 28 22:05 10 -> /home/me/hello.sh
l-wx------ 1 nga users 64 May 28 22:05 11 -> /tmp/1
lrwx------ 1 nga users 64 May 28 22:05 2 -> /dev/pts/0
I have a suspicion, but this is highly dependent on how your shell behaves. The file descriptors you see are:
0: standard input
1: standard output
2: standard error
10: the running script
11: a backup copy of the script's normal standard out
Descriptors 10 and 11 are close on exec, so won't be present in the ls process. 0-2 are, however, prepared for ls before forking. I see this behaviour in dash (Debian Almquist shell), but not in bash (Bourne again shell). Bash instead does the file descriptor manipulations after forking, and incidentally uses 255 rather than 10 for the script. Doing the change after forking means it won't have to restore the descriptors in the parent, so it doesn't have the spare copy to dup2 from.
The output of strace can be helpful here.
The relevant section is
fcntl64(1, F_DUPFD, 10) = 11
close(1) = 0
fcntl64(11, F_SETFD, FD_CLOEXEC) = 0
dup2(2, 1) = 1
stat64("/home/random/bin/ls", 0xbf94d5e0) = -1 ENOENT (No such file or
+++++++>directory)
stat64("/usr/local/bin/ls", 0xbf94d5e0) = -1 ENOENT (No such file or directory)
stat64("/usr/bin/ls", 0xbf94d5e0) = -1 ENOENT (No such file or directory)
stat64("/bin/ls", {st_mode=S_IFREG|0755, st_size=96400, ...}) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
+++++++>child_tidptr=0xb75a8938) = 22748
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 22748
--- SIGCHLD (Child exited) # 0 (0) ---
dup2(11, 1) = 1
So, the shell moves the existing stdout to an available file descriptor above 10 (namely, 11), then moves the existing stderr onto its own stdout (due to the >&2 redirect), then restores 11 to its own stdout when the ls command is finished.

Resources