How do I find the inode of a TCP socket? - linux

How do I tie values in the 'inode' column of /proc/net/tcp to files in /proc/<pid>/fd/?
I was under the impression that the inode column in the TCP had a decimal representation of the socket's inode, but that doesn't seem to be the case.
For example, if I run telnet localhost 80, I see the following (telnet is pid 9021).
/proc/net/tcp contains
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
23: 0100007F:CE2A 0100007F:0050 01 00000000:00000000 00:00000000 00000000 1000 0 361556 1 00000000 20 0 0 10 -1
which makes me think that the inode of the socket connected to 127.0.0.1:80 is 361556. But if I run ls --inode -alh /proc/9021/fd, I see
349886 lrwx------ 1 me me 64 Dec 26 10:51 3 -> socket:[361556]
The inode is 349886, which is different from the value in the inode column of the tcp table: 361556. But the link target seems to have the right name. Similarly, stat /proc/9021/3 shows:
File: ‘/proc/9021/fd/3’ -> ‘socket:[361556]’
Size: 64 Blocks: 0 IO Block: 1024 symbolic link
Device: 3h/3d Inode: 349886 Links: 1
What's the number in the inode column of tcp table? Why doesn't it line up with the inode as reported by ls or stat?
(I'm running Ubuntu 14.10, if that matters)

The inode shown by ls and stat is for the symlink that points to the inode associated with the socket. Running ls -iLalh shows the right inode. Ditto for stat -L.
Herpa derp derp. I only figured this out when I was composing my question. ;_;

Inode id represent a file id per fs mount (proc, sys, ntfs, ext...), so as you probably understand you deal with two different fs here: procfs and some pseudo socket fs.
The files under the /proc/pid/fd/ directories are soft links which have inode representation in the procfs fs.
These links are "pointing" to different "fs" - socket fs.
What stat -L and ls -iLalh do, is to give you the inode of the file the links points to.
You can do this also explicitly with readlink /proc/#pid/fd/#fdnum

Related

In ss -s, what is the kernel counter actually counting?

While troubleshooting a problem on an OEL 7 server (3.10.0-1062.9.1.el7.x86_64), I ran the command
sudo ss -s
Which gave me the output of:
Total: 601 (kernel 1071)
TCP: 8 (estab 2, closed 0, orphaned 0, synrecv 0, timewait 0/0), ports 0
Transport Total IP IPv6
1071 - -
RAW 2 0 2
UDP 6 4 2
TCP 8 5 3
INET 16 9 7
FRAG 0 0 0
Doing an ss -a | wc -l came back with 225 entries.
It leads me to the question, what is kernel 1071 actually counting?
Looking through the various man pages did not provide an answer.
Using strace, I can see where ss reads:
/proc/net/sockstat
/proc/net/sockstat6
/proc/net/snmp
/proc/slabinfo
Looking through those files and the docs, it looks like the value is coming from /proc/slabinfo.
Grepping through /proc/slabinfo for 1071 came back with one entry:
sock_inode_cache 1071 1071 640 51 8 : tunables 0 0 0 : slabdata 21 21 0
Looking through the files and docs on sock_inode_cache has not helped so far. I am hoping someone here knows what the kernel counter is actually counting, or can point me in the right direction.
what is kernel 1071 actually counting?
sock_inode_cache represents Linux kernel Slab statistics. It shows how many socket inodes (active objects) are there.
struct socket_alloc corresponds to the sock_inode_cache slab cache and contains the struct socket and struct inode, so it is connected to VFS.

What is there behind a symbolic link?

How are symbolic links managed internally by UNIX/Linux systems. It is known that a symbolic link may exist even without an actual target file (Dangling link). So what is that which represents a symbolic link internally.
In Windows, the answer is a reparse point.
Questions:
Is the answer an inode in UNIX/Linux?
If yes, then will the inode number be same for target and links?
If yes, can the link inode can have permissions different from that of target's inode (if one exists)?
It is not about UNIX/Linux but about filesystem implementation - but yes, Unix/Linux uses inodes at kernel level and filesystem implementations have inodes (at least virtual ones).
In the general, symbolic links are simply files (btw, directories are also files), that have:
the flag file-type in the "inode" that tells to the system this file is a "symbolic link"
file-content: path to the target - in other words: a symbolic link is simply a file which contains a filename with a flag in the inode.
Virtual filesystems can have symbolic links too, so, check FUSE or some other filesystem implementation sources. (ext2/ext3/ufs..etc)
So,
Is the answer an inode in UNIX/Linux?
depends on filesystem implementation, but yes, generally the inode contains a "file-type" (and owners, access rights, timestamps, size, pointers to data blocks). There are filesystems that don't have inodes (in a physical implementation) but have only "virtual inodes" for maintaining compatibility with the kernel.
If yes, then will the inode number be same for target and links?
No. Usually, the symlink is a file with its own inode, (with file-type, own data blocks, etc.)
If yes, can the link inode can have permissions different from that of target's
inode(if one exists)?
This is about how symlink files are handled. Usually, the kernel doesn't allow changes to the symlink permissions - and symlinks always have default permissions. You could write your own filesystem that would allow different permissions for symlinks, but you would get into trouble because common programs like chmod don't change permissions on symlinks themselves, so making such a filesystem would be pointless anyway)
To understand the difference between hard links and symlinks, you should understand directories first.
Directories are files (with differentiated by a flag in the inode) that tell the kernel, "handle this file as a map of file-name to inode_number". Hard-links are simply file names that map to the same inode. So if the directory-file contains:
file_a: 1000
file_b: 1001
file_c: 1000
the above means, in this directory, are 3 files:
file_a described by inode 1000
file_b described by inode 1001 and
file_c again described by inode 1000 (so it is a hard link with file_a, not hardlink to file_a - because it is impossible to tell which filename came first - they are identical).
This is the main difference to symlinks, where the inode of file_b (inode 1001) could have content "file_a" and a flag meaning "this is a symlink". In this case, file_b would be a symlink pointing to file_a.
You can also easily explore this on your own:
$ touch a
$ ln -s a b
$ ln a c
$ ls -li
total 0
95905 -rw-r--r-- 1 regnarg regnarg 0 Jun 19 19:01 a
96990 lrwxrwxrwx 1 regnarg regnarg 1 Jun 19 19:01 b -> a
95905 -rw-r--r-- 2 regnarg regnarg 0 Jun 19 19:01 c
The -i option to ls shows inode numbers in the first column. You can see that the symlink has a different inode number while the hardlink has the same. You can also use the stat(1) command:
$ stat a
File: 'a'
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 28h/40d Inode: 95905 Links: 2
[...]
$ stat b
File: 'b' -> 'a'
Size: 1 Blocks: 0 IO Block: 4096 symbolic link
Device: 28h/40d Inode: 96990 Links: 1
[...]
If you want to do this programmatically, you can use the lstat(2) system call to find information about the symlink itself (its inode number etc.), while stat(2) shows information about the target of the symlink, if it exists. Example in Python:
>>> import os
>>> os.stat("b").st_ino
95905
>>> os.lstat("b").st_ino
96990

Identify other end of a unix domain socket connection

I'm trying to figure out what process is holding the other end of a unix domain socket. In some strace output I've identified a given file descriptor which is involved in the problem I'm currently debugging, and I'd like to know which process is on the other end of that. As there are multiple connections to that socket, simply going by path name won't work.
lsof provides me with the following information:
dbus-daem 4175 mvg 10u unix 0xffff8803e256d9c0 0t0 12828 #/tmp/dbus-LyGToFzlcG
So I know some address (“kernel address”?), I know some socket number, and I know the path. I can find that same information in other places:
$ netstat -n | grep 12828
unix 3 [ ] STREAM CONNECTED 12828 #/tmp/dbus-LyGToFzlcG
$ grep -E '12828|ffff8803e256d9c0' /proc/net/unix
ffff8803e256d9c0: 00000003 00000000 00000000 0001 03 12828 #/tmp/dbus-LyGToFzlcG
$ ls -l /proc/*/fd/* 2>/dev/null | grep 12828
lrwx------ 1 mvg users 64 10. Aug 09:08 /proc/4175/fd/10 -> socket:[12828]
However, none of this tells me what the other end of my socket connection is. How can I tell which process is holding the other end?
Similar questions have been asked on Server Fault and Unix & Linux. The accepted answer is that this information is not reliably available to the user space on Linux.
A common suggestion is to look at adjacent socket numbers, but ls -l /proc/*/fd/* 2>/dev/null | grep 1282[79] gave no results here. Perhaps adjacent lines in the output from netstat can be used. It seems like there was a pattern of connections with and without an associated socket name. But I'd like some kind of certainty, not just guesswork.
One answer suggests a tool which appears to be able to address this by digging through kernel structures. Using that option requires debug information for the kernel, as generated by the CONFIG_DEBUG_INFO option and provided as a separate package by some distributions. Based on that answer, using the address provided by lsof, the following solution worked for me:
# gdb /usr/src/linux/vmlinux /proc/kcore
(gdb) p ((struct unix_sock*)0xffff8803e256d9c0)->peer
This will print the address of the other end of the connection. Grepping lsof -U for that number will provide details like the process id and the file descriptor number.
If debug information is not available, it might be possible to access the required information by knowing the offset of the peer member into the unix_sock structure. In my case, on Linux 3.5.0 for x86_64, the following code can be used to compute the same address without relying on debugging symbols:
(gdb) p ((void**)0xffff8803e256d9c0)[0x52]
I won't make any guarantees about how portable that solution is.
Update: It's been possible to to do this using actual interfaces for a while now. Starting with Linux 3.3, the UNIX_DIAG feature provides a netlink-based API for this information, and lsof 4.89 and later support it. See https://unix.stackexchange.com/a/190606/1820 for more information.

How to read ring buffer within linux kernel space?

I'm writing a Linux character driver which can print system logs in user space. Just as the command 'dmesg' does.
I've learned that all the log that we print with 'printk' will be sent to a space named ring buffer. So I have the questions:
Is ring buffer inside kernel space?
If so, how can I read the ring buffer inside kernel space? (I've tried to read the source code of dmesg.c. But it did not help.)
What you are looking for is /proc/kmsg. This is the kernel ring buffer!
Yes, this is inside kernel space. Any process trying to read it should have super user privileges to read it!
How to read it the ring buffer? Here is a beautiful illustration from IBM Developerworks
dmesg would be your first resort! How does dmesg accomplish its task? By a call to syslog()! How does syslog do its job? Through the system call interface which in turn call do_syslog(). do_syslog() does the finishing act like this.
Here are a few more resources to get you more info about /proc/kmsg and kernel logging in general-
http://www.makelinux.net/ldd3/chp-4-sect-2
http://www.ibm.com/developerworks/linux/library/l-kernel-logging-apis/index.html
http://oguzhanozmen.blogspot.in/2008/09/kernel-log-buffering-printk-syslog-ng.html
This is further to Pavan's very good answer (taught me a lot):
Different distro may redirect the output of /proc/kmsg to any physical log files or virtual devices (/dev/xxx) they like. But "/proc/kmsg" is the original source of the kernel log, because the kernel implement its ring buffer operation inside fs/proc/kmsg.c:
static const struct file_operations proc_kmsg_operations = {
.read = kmsg_read,
.poll = kmsg_poll,
.open = kmsg_open,
.release = kmsg_release,
.llseek = generic_file_llseek,
};
So how you see the output is this:
sudo tail -f /proc/kmsg
But you can only see all the messages generated AFTER you have issued this command - all previous messages in the ring buffer will not be printed out. And so to see the physical file output, you can search for the user of "/proc/kmsg":
sudo lsof |grep proc.kmsg
And my machine indicated this:
rsyslogd 1743 syslog 3r REG 0,3 0 4026532041 /proc/kmsg
in:imuxso 1743 1755 syslog 3r REG 0,3 0 4026532041 /proc/kmsg
in:imklog 1743 1756 syslog 3r REG 0,3 0 4026532041 /proc/kmsg
rs:main 1743 1757 syslog 3r REG 0,3 0 4026532041 /proc/kmsg
So now it is pid 1743, let's see the files fd opened by 1743:
sudo ls -al /proc/1743/fd
lrwx------ 1 root root 64 Dec 11 08:36 0 -> socket:[14472]
l-wx------ 1 root root 64 Dec 11 08:36 1 -> /var/log/syslog
l-wx------ 1 root root 64 Dec 11 08:36 2 -> /var/log/kern.log
lr-x------ 1 root root 64 Dec 11 08:36 3 -> /proc/kmsg
l-wx------ 1 root root 64 Dec 11 08:36 4 -> /var/log/auth.log
And so there you go, pid 1743 is rsyslogd, and it redirect the output of /proc/kmsg to files like /var/log/syslog and /var/log/kern.log etc.

How to tie a network connection to a PID without using lsof or netstat?

Is there a way to tie a network connection to a PID (process ID) without forking to lsof or netstat?
Currently lsof is being used to poll what connections belong which process ID. However lsof or netstat can be quite expensive on a busy host and would like to avoid having to fork to these tools.
Is there someplace similar to /proc/$pid where one can look to find this information? I know what the network connections are by examining /proc/net but can't figure out how to tie this back to a pid. Over in /proc/$pid, there doesn't seem to be any network information.
The target hosts are Linux 2.4 and Solaris 8 to 10. If possible, a solution in Perl, but am willing to do C/C++.
additional notes:
I would like to emphasize the goal here is to tie a network connection to a PID. Getting one or the other is trivial, but putting the two together in a low cost manner appears to be difficult. Thanks for the answers to so far!
I don't know how often you need to poll, or what you mean with "expensive", but with the right options both netstat and lsof run a lot faster than in the default configuration.
Examples:
netstat -ltn
shows only listening tcp sockets, and omits the (slow) name resolution that is on by default.
lsof -b -n -i4tcp:80
omits all blocking operations, name resolution, and limits the selection to IPv4 tcp sockets on port 80.
On Solaris you can use pfiles(1) to do this:
# ps -fp 308
UID PID PPID C STIME TTY TIME CMD
root 308 255 0 22:44:07 ? 0:00 /usr/lib/ssh/sshd
# pfiles 308 | egrep 'S_IFSOCK|sockname: '
6: S_IFSOCK mode:0666 dev:326,0 ino:3255 uid:0 gid:0 size:0
sockname: AF_INET 192.168.1.30 port: 22
For Linux, this is more complex (gruesome):
# pgrep sshd
3155
# ls -l /proc/3155/fd | fgrep socket
lrwx------ 1 root root 64 May 22 23:04 3 -> socket:[7529]
# fgrep 7529 /proc/3155/net/tcp
6: 00000000:0016 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 7529 1 f5baa8a0 300 0 0 2 -1
00000000:0016 is 0.0.0.0:22. Here's the equivalent output from netstat -a:
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
Why don't you look at the source code of netstat and see how it get's the information? It's open source.
For Linux, have a look at the /proc/net directory
(for example, cat /proc/net/tcp lists your tcp connections). Not sure about Solaris.
Some more information here.
I guess netstat basically uses this exact same information so i don't know if you will be able to speed it up a whole lot. Be sure to try the netstat '-an' flags to NOT resolve ip-adresses to hostnames realtime (as this can take a lot of time due to dns queries).
The easiest thing to do is
strace -f netstat -na
On Linux (I don't know about Solaris). This will give you a log of all of the system calls made. It's a lot of output, some of which will be relevant. Take a look at the files in the /proc file system that it's opening. This should lead you to how netstat does it. Indecently, ltrace will allow you to do the same thing through the c library. Not useful for you in this instance, but it can be useful in other circumstances.
If it's not clear from that, then take a look at the source.
Take a look at these answers which thoroughly explore the options available:
How I can get ports associated to the application that opened them?
How to do like "netstat -p", but faster?

Resources