socket fd from /proc/$PID/fd seems invalid - linux

I know pid of the process, and I need to obtain socket fd used by it, so I look for it in /proc/$pid/fd, for instance:
$ ls -la /proc/1442/fd | grep socket
lrwx------ 1 root root 64 Jan 23 16:22 7 -> socket:[21807]
$
Now, when I pass the value 7 representing socket descriptor to getsockopt() I'm getting EBADF error. Is it not allowed to do this from another process even with root privileges?
What am I doing wrong?

File descriptors are per-process. They are not shared between processes.
If you want to access a file descriptor owned by another process, you can sometimes open() the path in /proc/<pid>/fd to get a copy of it. However, this only works on normal files; it doesn't work on sockets. (This question addresses the internal details.)
So, in short, no. There's no straightforward way I'm aware of for one process to "take over" a socket from another process, without that process's cooperation.

It seems you can "takeover" a socket being root, look:

Related

What kind of file is the smp_affinity-file?

The IRQ affinity can be set by writing a bit mask to /proc/irq/<irqid>/smp_affinity.
I guess there is a kernel module behind smp_affinity, however, ls tells me it is a normal file:
# ls
-rw-r--r-- 1 root root 0 Feb 9 16:06 smp_affinity
So I wonder, what kind of file /proc/irq/<irqid>/smp_affinity is?
Read about procfs - https://man7.org/linux/man-pages/man5/procfs.5.html https://en.wikipedia.org/wiki/Procfs etc.
smp_affinity is a file inside /proc filesystem. File operation on that file are handled specially by the kernel. Writing or reading - instead of storing or retrieving the data using some non-volatile medium - the kernel executes special function with special semantics instead.
The file would be created somewhere in kernel/irq/proc.c.

Identify other end of a unix domain socket connection

I'm trying to figure out what process is holding the other end of a unix domain socket. In some strace output I've identified a given file descriptor which is involved in the problem I'm currently debugging, and I'd like to know which process is on the other end of that. As there are multiple connections to that socket, simply going by path name won't work.
lsof provides me with the following information:
dbus-daem 4175 mvg 10u unix 0xffff8803e256d9c0 0t0 12828 #/tmp/dbus-LyGToFzlcG
So I know some address (“kernel address”?), I know some socket number, and I know the path. I can find that same information in other places:
$ netstat -n | grep 12828
unix 3 [ ] STREAM CONNECTED 12828 #/tmp/dbus-LyGToFzlcG
$ grep -E '12828|ffff8803e256d9c0' /proc/net/unix
ffff8803e256d9c0: 00000003 00000000 00000000 0001 03 12828 #/tmp/dbus-LyGToFzlcG
$ ls -l /proc/*/fd/* 2>/dev/null | grep 12828
lrwx------ 1 mvg users 64 10. Aug 09:08 /proc/4175/fd/10 -> socket:[12828]
However, none of this tells me what the other end of my socket connection is. How can I tell which process is holding the other end?
Similar questions have been asked on Server Fault and Unix & Linux. The accepted answer is that this information is not reliably available to the user space on Linux.
A common suggestion is to look at adjacent socket numbers, but ls -l /proc/*/fd/* 2>/dev/null | grep 1282[79] gave no results here. Perhaps adjacent lines in the output from netstat can be used. It seems like there was a pattern of connections with and without an associated socket name. But I'd like some kind of certainty, not just guesswork.
One answer suggests a tool which appears to be able to address this by digging through kernel structures. Using that option requires debug information for the kernel, as generated by the CONFIG_DEBUG_INFO option and provided as a separate package by some distributions. Based on that answer, using the address provided by lsof, the following solution worked for me:
# gdb /usr/src/linux/vmlinux /proc/kcore
(gdb) p ((struct unix_sock*)0xffff8803e256d9c0)->peer
This will print the address of the other end of the connection. Grepping lsof -U for that number will provide details like the process id and the file descriptor number.
If debug information is not available, it might be possible to access the required information by knowing the offset of the peer member into the unix_sock structure. In my case, on Linux 3.5.0 for x86_64, the following code can be used to compute the same address without relying on debugging symbols:
(gdb) p ((void**)0xffff8803e256d9c0)[0x52]
I won't make any guarantees about how portable that solution is.
Update: It's been possible to to do this using actual interfaces for a while now. Starting with Linux 3.3, the UNIX_DIAG feature provides a netlink-based API for this information, and lsof 4.89 and later support it. See https://unix.stackexchange.com/a/190606/1820 for more information.

running slattach on pseudo tty

I try to open a network connection through a pair of pseudo tty's on linux os.
# slattach -v /dev/ptmx
cslip started on /dev/ptmx interface sl0
OK, this was the "creating side" for the pseudo tty.
I can look in /dev/pts and find the new pty there.
If I now try to use slattach also on this side I got:
slattach -v /dev/pts/3
slattach: tty_open(/dev/pts/3, RW): Input/output error
I traced with strace:
28 5505 write(1, "slattach: tty_open: trying to op"..., 46) = 46
29 5505 open("/dev/pts/3", O_RDWR|O_NONBLOCK) = -1 EIO (Input/output error)
30 5505 write(2, "slattach: tty_open(/dev/pts/3, R"..., 55) = 55
31 5505 exit_group(3)
All this happens on different distros of ubuntu, tested on 10.04 and 11.04, both are failing.
What I'm doing wrong?
You may want to take a look at the man page pty(7).
Basically, /dev/ptmx uses the Unix 98 pseudo-terminal interface and requires that your program uses grantpt(3) and unlockpt(3). Here, slattach (the one that opens /dev/ptmx, not the other one) doesn't do so, and any program that tries to open the slave pseudo-terminal associated to the master will fail, as you experienced.
You can force slattach to do grantpt() and unlockpt() by overloading the open() call with an external
routine, see this example

Less gets keyboard input from stderr?

I'm taking a look at the code to the 'less' utility, specifically how it gets keyboard input. Interestingly, on line 80 of ttyin.c, it sets the file descriptor to read from:
/*
* Try /dev/tty.
* If that doesn't work, use file descriptor 2,
* which in Unix is usually attached to the screen,
* but also usually lets you read from the keyboard.
*/
#if OS2
/* The __open() system call translates "/dev/tty" to "con". */
tty = __open("/dev/tty", OPEN_READ);
#else
tty = open("/dev/tty", OPEN_READ);
#endif
if (tty < 0)
tty = 2;
Isn't file descriptor 2 stderr? If so, WTH?! I thought keyboard input was sent through stdin.
Interestingly, even if you do ls -l * | less, after the file finishes loading, you can still use the keyboard to scroll up and down, but if you do ls -l * | vi, then vi will yell at you because it doesn't read from stdin. What's the big idea? How did I end up in this strange new land where stderr is both a way to report errors to the screen and read from the keyboard? I don't think I'm in Kansas anymore...
$ ls -l /dev/fd/
lrwx------ 1 me me 64 2009-09-17 16:52 0 -> /dev/pts/4
lrwx------ 1 me me 64 2009-09-17 16:52 1 -> /dev/pts/4
lrwx------ 1 me me 64 2009-09-17 16:52 2 -> /dev/pts/4
When logged in at an interative terminal, all three standard file descriptors point to the same thing: your TTY (or pseudo-TTY).
$ ls -fl /dev/std{in,out,err}
lrwxrwxrwx 1 root root 4 2009-09-13 01:57 /dev/stdin -> fd/0
lrwxrwxrwx 1 root root 4 2009-09-13 01:57 /dev/stdout -> fd/1
lrwxrwxrwx 1 root root 4 2009-09-13 01:57 /dev/stderr -> fd/2
By convention, we read from 0 and write to 1 and 2. However, nothing prevents us from doing otherwise.
When your shell runs ls -l * | less, it creates a pipe from ls's file descriptor 1 to less's file descriptor 0. Obviously, less can no longer read the user's keyboard input from file descriptor 0 – it tries to get the TTY back however it can.
If less has not been detached from the terminal, open("/dev/tty") will give it the TTY.
However, in case that fails... what can you do? less makes one last attempt at getting the TTY, assuming that file descriptor 2 is attached to the same thing that file descriptor 0 would be attached to, if it weren't redirected.
This is not failproof:
$ ls -l * | setsid less 2>/dev/null
Here, less is given its own session (so it is no longer a part of the terminal's active process group, causing open("/dev/tty") to fail), and its file descriptor 2 has been changed – now less exits immediately, because it is outputting to a TTY yet it fails to get any user input.
Well... first off, you seem to missing the open() call which opens '/dev/tty'. It only uses file descriptor 2 if the call to open() fails. On a standard Linux system, and probably many Unices, '/dev/tty' exists and is unlikely to cause a fail.
Secondly, the comment at the top provides a limited amount of explanation as to why they fall back to file descriptor 2. My guess is that stdin, stdout, and stderr are pretty much connected to '/dev/tty/' anyway, unless redirected. And since the most common redirections for for stdin and/ or stdout (via piping or < / >), but less often for stderr, odds on are that using stderr would be most likely to still be connect to the "keyboard".
The same question with an answer ultimately from the person who asked it is on linuxquestions although they quote slightly different source from less. And no, I don't understand most of it so I can't help beyond that :)
It appears to be Linux specific functionality that sends keyboard input to FD 2.

core dump files on Linux: how to get info on opened files?

I have a core dump file from a process that has probably a file descriptor leak (it opens files and sockets but apparently sometimes forgets to close some of them). Is there a way to find out which files and sockets the process had opened before crashing? I can't easily reproduce the crash, so analyzing the core file seems to be the only way to get a hint on the bug.
If you have a core file and you have compiled the program with debugging options (-g), you can see where the core was dumped:
$ gcc -g -o something something.c
$ ./something
Segmentation fault (core dumped)
$ gdb something core
You can use this to do some post-morten debugging. A few gdb commands: bt prints the stack, fr jumps to given stack frame (see the output of bt).
Now if you want to see which files are opened at a segmentation fault, just handle the SIGSEGV signal, and in the handler, just dump the contents of the /proc/PID/fd directory (i.e. with system('ls -l /proc/PID/fs') or execv).
With these information at hand you can easily find what caused the crash, which files are opened and if the crash and the file descriptor leak are connected.
Your best bet is to install a signal handler for whatever signal is crashing your program (SIGSEGV, etc.).
Then, in the signal handler, inspect /proc/self/fd, and save the contents to a file. Here is a sample of what you might see:
Anderson cxc # ls -l /proc/8247/fd
total 0
lrwx------ 1 root root 64 Sep 12 06:05 0 -> /dev/pts/0
lrwx------ 1 root root 64 Sep 12 06:05 1 -> /dev/pts/0
lrwx------ 1 root root 64 Sep 12 06:05 10 -> anon_inode:[eventpoll]
lrwx------ 1 root root 64 Sep 12 06:05 11 -> socket:[124061]
lrwx------ 1 root root 64 Sep 12 06:05 12 -> socket:[124063]
lrwx------ 1 root root 64 Sep 12 06:05 13 -> socket:[124064]
lrwx------ 1 root root 64 Sep 12 06:05 14 -> /dev/driver0
lr-x------ 1 root root 64 Sep 12 06:05 16 -> /temp/app/whatever.tar.gz
lr-x------ 1 root root 64 Sep 12 06:05 17 -> /dev/urandom
Then you can return from your signal handler, and you should get a core dump as usual.
One of the ways I jump to this information is just running strings on the core file. For instance, when I was running file on a core recently, due to the length of the folders I would get a truncated arguments list. I knew my run would have opened files from my home directory, so I just ran:
strings core.14930|grep jodie
But this is a case where I had a needle and a haystack.
If the program forgot to close those resources it might be because something like the following happened:
fd = open("/tmp/foo",O_CREAT);
//do stuff
fd = open("/tmp/bar",O_CREAT); //Oops, forgot to close(fd)
now I won't have the file descriptor for foo in memory.
If this didn't happen, you might be able to find the file descriptor number, but then again, that is not very useful because they are continuously changing, by the time you get to debug you won't know which file it actually meant at the time.
I really think you should debug this live, with strace, lsof and friends.
If there is a way to do it from the core dump, I'm eager to know it too :-)
You can try using strace to see the open, socket and close calls the program makes.
Edit: I don't think you can get the information from the core; at most it will have the file descriptors somewhere, but this still doesn't give you the actual file/socket. (Assuming you can distinguish open from closed file descriptors, which I also doubt.)
Recently during my error troubleshooting and analysis , my customer provided me a coredump which got generated in his filesystem and he went out of station in order to quickly scan through the file and read its contents i used the command
strings core.67545 > coredump.txt
and later i was able to open the file in file editor.
A core dump is a copy of the memory the process had access to when crashed. Depending on how the leak is occurring, it might have lost the reference to the handles, so it may prove to be useless.
lsof lists all currently open files in the system, you could check its output to find leaked sockets or files. Yes, you'd need to have the process running. You could run it with a specific username to easily discern which are the open files from the process you are debugging.
I hope somebody else has better information :-)
Another way to find out what files a process has opened - again, only during runtime - is looking into /proc/PID/fd/ , which contains symlinks to open files.

Resources