How to get the reference count of socket descriptor on user level?

How to get the reference count of socket descriptor on user level? - linux

I want to get the socket descriptor reference count. Where is this count stored? I didnt find it in inode structure.
How can I get this value?

It is available per protocol, in /proc/net/* files.
For instance, the official /proc/net/tcp documentation indicates there is a socket reference count column, just after the inode value. See https://askubuntu.com/a/243441
$ cat /proc/net/tcp
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
0: 0100007F:0CEA 00000000:0000 0A 00000000:00000000 00:00000000 00000000 115 0 14759 1 0000000000000000 100 0 0 10 -1
Here the inode is 14759, and the socket reference count is 1.
There is a similar ref column for UDP - see https://stackoverflow.com/a/18322579/458259
$ cat /proc/net/udp
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops
40: 00000000:0202 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 3466 2 ffff88013abc8340 0
Here the inode is 3466, and the socket reference count is 2.
Note that only newer kernels do have this socket reference count column information.

Related

build-id data offset in the ELF file

I need to modify the build-id of the ELF notes section. I found out that it is possible here. Also found out that I can do it by modifying this code. What I can't figure out is data location. Here is what I'm talking about.
$ eu-readelf -S myelffile
Section Headers:
[Nr] Name Type Addr Off Size ES Flags Lk Inf Al
...
[ 2] .note.ABI-tag NOTE 000000000000028c 0000028c 00000020 0 A 0 0 4
[ 3] .note.gnu.build-id NOTE 00000000000002ac 000002ac 00000024 0 A 0 0 4
...
$ eu-readelf -n myelffile
Note section [ 2] '.note.ABI-tag' of 32 bytes at offset 0x28c:
Owner Data size Type
GNU 16 GNU_ABI_TAG
OS: Linux, ABI: 3.14.0
Note section [ 3] '.note.gnu.build-id' of 36 bytes at offset 0x2ac:
Owner Data size Type
GNU 20 GNU_BUILD_ID
Build ID: d75a086c288c582036b0562908304bc3a8033235
.note.gnu.build-id section is 36 bytes. The build id is 20 bytes. What are the other 16 bytes?
I played with the code a bit and read 36 bytes of myelffile at offset 0x2ac. Got the following 040000001400000003000000474e5500d75a086c288c582036b0562908304bc3a8033235.
Then I decided to use Elf64_Shdr definition, so I read data at address 0x2ac + sizeof(Elf64_Shdr.sh_name) + sizeof(Elf64_Shdr.sh_type) + sizeof(Elf64_Shdr.sh_flags) and I got my build id, d75a086c288c582036b0562908304bc3a8033235. It does makes sense why I got it, sizeof(Elf64_Shdr.sh_name) + sizeof(Elf64_Shdr.sh_type) + sizeof(Elf64_Shdr.sh_flags) = 16 bytes, but according to Elf64_Shdr definition I should be pointing to Elf64_Addr sh_addr, i.e. section virtual address.
So what is not clear to me is what are the other 16 bytes of the section? What do they represent? I can't reconcile the Elf64_Shdr definition and the results I'm getting from my experiments.

.note.gnu.build-id section is 36 bytes. The build id is 20 bytes. What are the other 16 bytes?
Each .note.* section starts with Elf64_Nhdr (12 bytes), followed by (4-byte aligned) note name of variable size (GNU\0 here), followed by (4-byte aligned) actual note data. Documentation.
Looking at /bin/date on my system:
eu-readelf -Wn /bin/date
Note section [ 2] '.note.ABI-tag' of 32 bytes at offset 0x2c4:
Owner Data size Type
GNU 16 GNU_ABI_TAG
OS: Linux, ABI: 3.2.0
Note section [ 3] '.note.gnu.build-id' of 36 bytes at offset 0x2e4:
Owner Data size Type
GNU 20 GNU_BUILD_ID
Build ID: 979ae4616ae71af565b123da2f994f4261748cc9
What are the bytes at offset 0x2e4?
dd bs=1 skip=$((0x2e4)) count=36 < /bin/date | xxd
00000000: 0400 0000 1400 0000 0300 0000 474e 5500 ............GNU.
00000010: 979a e461 6ae7 1af5 65b1 23da 2f99 4f42 ...aj...e.#./.OB
00000020: 6174 8cc9 at..
So we have: .n_namesz == 4, .n_descsz == 20, .n_type == 3 == NT_GNU_BUILD_ID, followed by 4-byte GNU\0 note name, followed by 20 bytes of actual build-id bytes 0x97, 0x9a, etc.

Which PID is using a PORT inside a k8s pod without net tools

Sorry about the long question post, but I think it can be useful to others to learn how this works.
What I know:
On any linux host (not using docker container), I can look at /proc/net/tcp to extract information tcp socket related.
So, I can detect the ports in LISTEN state with:
cat /proc/net/tcp |
grep " 0A " |
sed 's/^[^:]*: \(..\)\(..\)\(..\)\(..\):\(....\).*/echo $((0x\4)).$((0x\3)).$((0x\2)).$((0x\1)):$((0x\5))/g' |
bash
Results:
0.0.0.0:111
10.174.109.1:53
127.0.0.53:53
0.0.0.0:22
127.0.0.1:631
0.0.0.0:8000
/proc/net/tcp gives UID, GID, unfortunately does not provides the PID. But returns the inode. That I can use to discover the PID using it as file descriptor.
So one way is to search /proc looking for the inode socket. It's slow, but works on host:
cat /proc/net/tcp |
grep " 0A " |
sed 's/^[^:]*: \(..\)\(..\)\(..\)\(..\):\(....\).\{72\}\([^ ]*\).*/echo $((0x\4)).$((0x\3)).$((0x\2)).$((0x\1)):$((0x\5))\\\t$(find \/proc\/ -type d -name fd 2>\/dev\/null \| while read f\; do ls -l $f 2>\/dev\/null \| grep -q \6 \&\& echo $f; done)/g' |
bash
output:
0.0.0.0:111 /proc/1/task/1/fd /proc/1/fd /proc/924/task/924/fd /proc/924/fd
10.174.109.1:53 /proc/23189/task/23189/fd /proc/23189/fd
127.0.0.53:53 /proc/923/task/923/fd /proc/923/fd
0.0.0.0:22 /proc/1194/task/1194/fd /proc/1194/fd
127.0.0.1:631 /proc/13921/task/13921/fd /proc/13921/fd
0.0.0.0:8000 /proc/23122/task/23122/fd /proc/23122/fd
Permission tip 1: You will only see what you have permission to look at.
Permission tip 2: fake root used in containers does not have access to all file descriptors in /proc/*/fd. You need to query it for each user.
If you run as normal user the results are:
0.0.0.0:111
10.174.109.1:53
127.0.0.53:53
0.0.0.0:22
127.0.0.1:631
0.0.0.0:8000 /proc/23122/task/23122/fd /proc/23122/fd
Using unshare to isolate environment it works as expected:
$ unshare -r --fork --pid unshare -r --fork --pid --mount-proc -n bash
# ps -fe
UID PID PPID C STIME TTY TIME CMD
root 1 0 2 07:19 pts/6 00:00:00 bash
root 100 1 0 07:19 pts/6 00:00:00 ps -fe
# netstat -ntpl
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
# python -m SimpleHTTPServer &
[1] 152
# Serving HTTP on 0.0.0.0 port 8000 ...
netstat -ntpl
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:8000 0.0.0.0:* LISTEN 152/python
# cat /proc/net/tcp |
> grep " 0A " |
> sed 's/^[^:]*: \(..\)\(..\)\(..\)\(..\):\(....\).\{72\}\([^ ]*\).*/echo $((0x\4)).$((0x\3)).$((0x\2)).$((0x\1)):$((0x\5))\\\t$(find \/proc\/ -type d -name fd 2>\/dev\/null \| while read f\; do ls -l $f 2>\/dev\/null \| grep -q \6 \&\& echo $f; done)/g' |
> bash
0.0.0.0:8000 /proc/152/task/152/fd /proc/152/fd
# ls -l /proc/152/fd
total 0
lrwx------ 1 root root 64 mai 25 07:20 0 -> /dev/pts/6
lrwx------ 1 root root 64 mai 25 07:20 1 -> /dev/pts/6
lrwx------ 1 root root 64 mai 25 07:20 2 -> /dev/pts/6
lrwx------ 1 root root 64 mai 25 07:20 3 -> 'socket:[52409024]'
lr-x------ 1 root root 64 mai 25 07:20 7 -> /dev/urandom
# cat /proc/net/tcp
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
0: 00000000:1F40 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 52409024 1 0000000000000000 100 0 0 10 0
Inside a docker container in my host, it seems to work in same way.
The problem:
I have a container inside a kubernetes pod running jitsi. Inside this container, I am unable to get the PID of the service listening the ports.
Nor after installing netstat:
root#jitsi-586cb55594-kfz6m:/# netstat -ntpl
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:5222 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:5269 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:8888 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:5280 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:5347 0.0.0.0:* LISTEN -
tcp6 0 0 :::5222 :::* LISTEN -
tcp6 0 0 :::5269 :::* LISTEN -
tcp6 0 0 :::5280 :::* LISTEN -
# ps -fe
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 May22 ? 00:00:00 s6-svscan -t0 /var/run/s6/services
root 32 1 0 May22 ? 00:00:00 s6-supervise s6-fdholderd
root 199 1 0 May22 ? 00:00:00 s6-supervise jicofo
jicofo 203 199 0 May22 ? 00:04:17 java -Xmx3072m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dnet.java.sip.communicator.SC_HOME_DIR_LOCATION=/ -Dnet.java.sip.communicator.SC_HOME_DIR_NAME=config -Djava
root 5990 0 0 09:48 pts/2 00:00:00 bash
root 10926 5990 0 09:57 pts/2 00:00:00 ps -fe
Finally the Questions:
a) Why can't I read the file descriptors of the proccess listening port 5222 ?
root#jitsi-586cb55594-kfz6m:/# cat /proc/net/tcp | grep " 0A "
0: 00000000:1466 00000000:0000 0A 00000000:00000000 00:00000000 00000000 101 0 244887827 1 ffff9bd749145800 100 0 0 10 0
...
root#jitsi-586cb55594-kfz6m:/# echo $(( 0x1466 ))
5222
root#jitsi-586cb55594-kfz6m:/# ls -l /proc/*/fd/* 2>/dev/null | grep 244887827
root#jitsi-586cb55594-kfz6m:/# echo $?
1
root#jitsi-586cb55594-kfz6m:/# su - svc
svc#jitsi-586cb55594-kfz6m:~$ id -u
101
svc#jitsi-586cb55594-kfz6m:~$ ls -l /proc/*/fd/* 2>/dev/null | grep 244887827
svc#jitsi-586cb55594-kfz6m:~$ echo $?
1
b) There is another way to list inode and link it to a pid without searching /proc/*/fd ?
Update 1:
Based on Anton Kostenko tip, I looked to AppArmor. It's not the case because the server don't use AppArmor, but searching, took me to SELinux.
In a ubuntu machine where AppArmor is running, I got:
$ sudo apparmor_status | grep dock
docker-default
In the OKE(Oracle Kubernetes Engine, my case) node there is no AppArmor. I got SELinux instead:
$ man selinuxenabled | grep EXIT -A1
EXIT STATUS
It exits with status 0 if SELinux is enabled and 1 if it is not enabled.
$ selinuxenabled && echo $?
0
Now, I do believe that SELinux is blocking the /proc/*/fd listing from root inside the container. But I don't know yet how to unlock it.
References:
https://jvns.ca/blog/2016/10/10/what-even-is-a-container/

The issue is solved by adding the POSIX capability: CAP_SYS_PTRACE
I'm my case the container are under kubernetes orchestration.
this reference explains about kubectl and POSIX Capabilities
So I have
root#jitsi-55584f98bf-6cwpn:/# cat /proc/1/status | grep Cap
CapInh: 00000000a80425fb
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fb
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
So I careful read the POSIX Capabilities Manual. But even adding CAP_SYS_ADMIN, the PID does not appear on netstat. So I tested all capabilities. CAP_SYS_PTRACE is The Chosen One
root#jitsi-65c6b5d4f7-r546h:/# cat /proc/1/status | grep Cap
CapInh: 00000000a80c25fb
CapPrm: 00000000a80c25fb
CapEff: 00000000a80c25fb
CapBnd: 00000000a80c25fb
CapAmb: 0000000000000000
So here my deployment spec change:
...
spec:
...
template:
...
spec:
...
containers:
...
securityContext:
capabilities:
add:
- SYS_PTRACE
...
Yet I don't know what security reasons selinux use to do it. But for now it's good enough for me.
References:
https://man7.org/linux/man-pages/man7/capabilities.7.html
https://kubernetes.io/docs/tasks/configure-pod-container/security-context/

Why using conv=notrunc when cloning a disk with dd?

If you look up how to clone an entire disk to another one on the web, you will find something like that:
dd if=/dev/sda of=/dev/sdb conv=notrunc,noerror
While I understand the noerror, I am getting a hard time understanding why people think that notrunc is required for "data integrity" (as ArchLinux's Wiki states, for instance).
Indeed, I do agree on that if you are copying a partition to another partition on another disk, and you do not want to overwrite the entire disk, just one partition. In thise case notrunc, according to dd's manual page, is what you want.
But if you're cloning an entire disk, what does notrunc change for you? Just time optimization?

TL;DR version:
notrunc is only important to prevent truncation when writing into a file. This has no effect on a block device such as sda or sdb.
Educational version
I looked into the coreutils source code which contains dd.c to see how notrunc is processed.
Here's the segment of code that I'm looking at:
int opts = (output_flags
| (conversions_mask & C_NOCREAT ? 0 : O_CREAT)
| (conversions_mask & C_EXCL ? O_EXCL : 0)
| (seek_records || (conversions_mask & C_NOTRUNC) ? 0 : O_TRUNC));
/* Open the output file with *read* access only if we might
need to read to satisfy a `seek=' request. If we can't read
the file, go ahead with write-only access; it might work. */
if ((! seek_records
|| fd_reopen (STDOUT_FILENO, output_file, O_RDWR | opts, perms) < 0)
&& (fd_reopen (STDOUT_FILENO, output_file, O_WRONLY | opts, perms) < 0))
error (EXIT_FAILURE, errno, _("opening %s"), quote (output_file));
We can see here that if notrunc is not specified, then the output file will be opened with O_TRUNC. Looking below at how O_TRUNC is treated, we can see that a normal file will get truncated if written into.
O_TRUNC
If the file already exists and is a regular file and the open
mode allows writing (i.e., is O_RDWR or O_WRONLY) it will be truncated
to length 0. If the file is a FIFO or terminal device file, the
O_TRUNC flag is ignored. Otherwise the effect of O_TRUNC is
unspecified.
Effects of notrunc / O_TRUNC I
In the following example, we start out by creating junk.txt of size 1024 bytes. Next, we write 512 bytes to the beginning of it with conv=notrunc. We can see that the size stays the same at 1024 bytes. Finally, we try it without the notrunc option and we can see that the new file size is 512. This is because it was opened with O_TRUNC.
$ dd if=/dev/urandom of=junk.txt bs=1024 count=1
$ ls -l junk.txt
-rw-rw-r-- 1 akyserr akyserr 1024 Dec 11 17:08 junk.txt
$ dd if=/dev/urandom of=junk.txt bs=512 count=1 conv=notrunc
$ ls -l junk.txt
-rw-rw-r-- 1 akyserr akyserr 1024 Dec 11 17:10 junk.txt
$ dd if=/dev/urandom of=junk.txt bs=512 count=1
$ ls -l junk.txt
-rw-rw-r-- 1 akyserr akyserr 512 Dec 11 17:10 junk.txt
Effects of notrunc / O_TRUNC II
I still haven't answered your original question of why when doing a disk-to-disk clone, why conv=notrunc is important. According to the above definition, O_TRUNC seems to be ignored when opening certain special files, and I would expect this to be true for block device nodes too. However, I don't want to assume anything and will attempt to prove it here.
openclose.c
I've written a simple C program here which opens and closes a file given as an argument with the O_TRUNC flag.
#include <stdio.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <time.h>
int main(int argc, char * argv[])
{
if (argc < 2)
{
fprintf(stderr, "Not enough arguments...\n");
return (1);
}
int f = open(argv[1], O_RDWR | O_TRUNC);
if (f >= 0)
{
fprintf(stderr, "%s was opened\n", argv[1]);
close(f);
fprintf(stderr, "%s was closed\n", argv[1]);
} else {
perror("Opening device node");
}
return (0);
}
Normal File Test
We can see below that the simple act of opening and closing a file with O_TRUNC will cause it to lose anything that was already there.
$ dd if=/dev/urandom of=junk.txt bs=1024 count=1^C
$ ls -l junk.txt
-rw-rw-r-- 1 akyserr akyserr 1024 Dec 11 17:26 junk.txt
$ ./openclose junk.txt
junk.txt was opened
junk.txt was closed
$ ls -l junk.txt
-rw-rw-r-- 1 akyserr akyserr 0 Dec 11 17:27 junk.txt
Block Device File Test
Let's try a similar test on a USB flash drive. We can see that we start out with a single partition on the USB flash drive. If it get's 'truncated', perhaps the partition will go away (considering it's defined in the first 512 bytes of the disk)?
$ ls -l /dev/sdc*
brw-rw---- 1 root disk 8, 32 Dec 11 17:22 /dev/sdc
brw-rw---- 1 root disk 8, 33 Dec 11 17:22 /dev/sdc1
$ sudo ./openclose /dev/sdc
/dev/sdc was opened
/dev/sdc was closed
$ sudo ./openclose /dev/sdc1
/dev/sdc1 was opened
/dev/sdc1 was closed
$ ls -l /dev/sdc*
brw-rw---- 1 root disk 8, 32 Dec 11 17:31 /dev/sdc
brw-rw---- 1 root disk 8, 33 Dec 11 17:31 /dev/sdc1
It looks like it has no affect whatsoever to open either the disk or the disk's partition 1 with the O_TRUNC option. From what I can tell, the filesystem is still mountable and the files are accessible and intact.
Effects of notrunc / O_TRUNC III
Okay, for my final test I will use dd on my flash drive directly. I will start by writing 512 bytes of random data, then writing 256 bytes of zeros at the beginning. For the final test, we will verify that the last 256 bytes remained unchanged.
$ sudo dd if=/dev/urandom of=/dev/sdc bs=256 count=2
$ sudo hexdump -n 512 /dev/sdc
0000000 3fb6 d17f 8824 a24d 40a5 2db3 2319 ac5b
0000010 c659 5780 2d04 3c4e f985 053c 4b3d 3eba
0000020 0be9 8105 cec4 d6fb 5825 a8e5 ec58 a38e
0000030 d736 3d47 d8d3 9067 8db8 25fb 44da af0f
0000040 add7 c0f2 fc11 d734 8e26 00c6 cfbb b725
0000050 8ff7 3e79 af97 2676 b9af 1c0d fc34 5eb1
0000060 6ede 318c 6f9f 1fea d200 39fe 4591 2ffb
0000070 0464 9637 ccc5 dfcc 3b0f 5432 cdc3 5d3c
0000080 01a9 7408 a10a c3c4 caba 270c 60d0 d2f7
0000090 2f8d a402 f91a a261 587b 5609 1260 a2fc
00000a0 4205 0076 f08b b41b 4738 aa12 8008 053f
00000b0 26f0 2e08 865e 0e6a c87e fc1c 7ef6 94c6
00000c0 9ced 37cf b2e7 e7ef 1f26 0872 cd72 54a4
00000d0 3e56 e0e1 bd88 f85b 9002 c269 bfaa 64f7
00000e0 08b9 5957 aad6 a76c 5e37 7e8a f5fc d066
00000f0 8f51 e0a1 2d69 0a8e 08a9 0ecf cee5 880c
0000100 3835 ef79 0998 323d 3d4f d76b 8434 6f20
0000110 534c a847 e1e2 778c 776b 19d4 c5f1 28ab
0000120 a7dc 75ea 8a8b 032a c9d4 fa08 268f 95e8
0000130 7ff3 3cd7 0c12 4943 fd23 33f9 fe5a 98d9
0000140 aa6d 3d89 c8b4 abec 187f 5985 8e0f 58d1
0000150 8439 b539 9a45 1c13 68c2 a43c 48d2 3d1e
0000160 02ec 24a5 e016 4c2d 27be 23ee 8eee 958e
0000170 dd48 b5a1 10f1 bf8e 1391 9355 1b61 6ffa
0000180 fd37 7718 aa80 20ff 6634 9213 0be1 f85e
0000190 a77f 4238 e04d 9b64 d231 aee8 90b6 5c7f
00001a0 5088 2a3e 0201 7108 8623 b98a e962 0860
00001b0 c0eb 21b7 53c6 31de f042 ac80 20ee 94dd
00001c0 b86c f50d 55bc 32db 9920 fd74 a21e 911a
00001d0 f7db 82c2 4d16 3786 3e18 2c0f 47c2 ebb0
00001e0 75af 6a8c 2e80 c5b6 e4ea a9bc a494 7d47
00001f0 f493 8b58 0765 44c5 ff01 42a3 b153 d395
$ sudo dd if=/dev/zero of=/dev/sdc bs=256 count=1
$ sudo hexdump -n 512 /dev/sdc
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0000100 3835 ef79 0998 323d 3d4f d76b 8434 6f20
0000110 534c a847 e1e2 778c 776b 19d4 c5f1 28ab
0000120 a7dc 75ea 8a8b 032a c9d4 fa08 268f 95e8
0000130 7ff3 3cd7 0c12 4943 fd23 33f9 fe5a 98d9
0000140 aa6d 3d89 c8b4 abec 187f 5985 8e0f 58d1
0000150 8439 b539 9a45 1c13 68c2 a43c 48d2 3d1e
0000160 02ec 24a5 e016 4c2d 27be 23ee 8eee 958e
0000170 dd48 b5a1 10f1 bf8e 1391 9355 1b61 6ffa
0000180 fd37 7718 aa80 20ff 6634 9213 0be1 f85e
0000190 a77f 4238 e04d 9b64 d231 aee8 90b6 5c7f
00001a0 5088 2a3e 0201 7108 8623 b98a e962 0860
00001b0 c0eb 21b7 53c6 31de f042 ac80 20ee 94dd
00001c0 b86c f50d 55bc 32db 9920 fd74 a21e 911a
00001d0 f7db 82c2 4d16 3786 3e18 2c0f 47c2 ebb0
00001e0 75af 6a8c 2e80 c5b6 e4ea a9bc a494 7d47
00001f0 f493 8b58 0765 44c5 ff01 42a3 b153 d395
Summary
Through the above experimentation, it seems that notrunc is only important for when you have a file you want to write into, but don't want to truncate it. This seems to have no effect on a block device such as sda or sdb.

How can you obtain a queue field from netstat -i in Linux?

In Solaris, the output of 'netstat -i' gives something like the following:
root# netstat -i
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue
lo0 8232 loopback localhost 136799 0 136799 0 0 0
igb0 1500 vulture vulture 1272272 0 347277 0 0 0
Note that there is a Queue field on the end.
In Linux, 'netstat -i' gives output with no Queue field:
[root#roseate ~]# netstat -i
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 2806170 0 0 0 791768 0 0 0 BMRU
eth1 1500 0 0 0 0 0 0 0 0 0 BMU
eth2 1500 0 0 0 0 0 0 0 0 0 BMU
eth3 1500 0 0 0 0 0 0 0 0 0 BMU
lo 16436 0 1405318 0 0 0 1405318 0 0 0 LRU
I've figured out how to get collisions in Linux by adding the -e option, but is there a way to get the Queue in Linux?

The only reference to queue I ever saw in netstat on Linux was when using -s, but that's probably too garrulous for your use-case?

$ netstat -na | awk 'BEGIN { RecvQ=0; SendQ=0; } { RecvQ+=$2; SendQ+=$3; } END { print "RecvQ " RecvQ/1024; print "SendQ " SendQ/1024; }'
RecvQ 255.882
SendQ 0.0507812
For per interface, I have dirty way
[spatel#us04 ~]$ for qw in `/sbin/ifconfig | grep 'inet addr:' | cut -d: -f2 | awk '{print $1}'`; do echo `/sbin/ip addr | grep $qw | awk '{print $7}'` : ; echo `netstat -na | grep $qw | awk 'BEGIN { RecvQ=0; SendQ=0; } { RecvQ+=$2; SendQ+=$3; } END { print "RecvQ " RecvQ/1024; print "SendQ " SendQ/1024; }'`; done
eth0 :
RecvQ 0 SendQ 0
eth2 :
RecvQ 0.0703125 SendQ 1.56738
:
RecvQ 0 SendQ 0

I ended up using
tc -s -d qdisc
[root#roseate ~]# tc -s -d qdisc
qdisc mq 0: dev eth2 root
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc mq 0: dev eth3 root
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc mq 0: dev eth0 root
Sent 218041403 bytes 1358829 pkt (dropped 0, overlimits 0 requeues 1)
rate 0bit 0pps backlog 0b 0p requeues 1
qdisc mq 0: dev eth1 root
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
which gives backlog bytes and packets.
Source

List of possible internal socket statuses from /proc

I would like to know the possible values of st column in /proc/net/tcp. I think the st column equates to STATE column from netstat(8) or ss(8).
I have managed to identify three codes:
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
0: 0100007F:08A0 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 7321 1 ffff81002f449980 3000 0 0 2 -1
1: 00000000:006F 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 6656 1 ffff81003a30c080 3000 0 0 2 -1
2: 00000000:0272 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 6733 1 ffff81003a30c6c0 3000 0 0 2 -1
3: 0100007F:0277 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 7411 1 ffff81002f448d00 3000 0 0 2 -1
4: 0100007F:0019 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 7520 1 ffff81002f4486c0 3000 0 0 2 -1
5: 0100007F:089F 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 7339 1 ffff81002f449340 3000 0 0 2 -1
6: 0100007F:E753 0100007F:0016 01 00000000:00000000 02:000AFA92 00000000 500 0 18198 2 ffff81002f448080 204 40 20 2 -1
7: 0100007F:E752 0100007F:0016 06 00000000:00000000 03:000005EC 00000000 0 0 0 2 ffff81000805dc00
The above shows:
On line sl 0: a listening port on tcp/2208. st = 0A = LISTEN
On line sl 6: An established session on tcp/22. st = 01 = ESTABLISHED
On line sl 7: An socket in TIME_WAIT state after ssh logout. No inode. st = 06 = TIME_WAIT
Can anyone expand on this list? The proc(5) manpage is quite terse on the subject stating:
/proc/net/tcp
Holds a dump of the TCP socket table. Much of the information is not of use apart from debugging. The "sl" value is the kernel hash slot for the socket, the "local address" is the local address and
port number pair. The "remote address" is the remote address and port number pair (if connected). ’St’ is the internal status of the socket. The ’tx_queue’ and ’rx_queue’ are the outgoing and incom-
ing data queue in terms of kernel memory usage. The "tr", "tm->when", and "rexmits" fields hold internal information of the kernel socket state and are only useful for debugging. The "uid" field
holds the effective UID of the creator of the socket.
And on a related note, the above /proc/net/tcp output is showing a few listening processes (2208, 62, 111 etc). However, I cannot see a listening tcp connection on tcp/22, althought the established and time_wait states are shown. Yes, I can see them in /proc/net/tcp6 but should they not be present in /proc/net/tcp also? Netstat output shows it differently to applications bound only to ipv4. E.g.
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 4231/portmap
tcp 0 0 :::22 :::* LISTEN 4556/sshd
Many thanks,
-Andrew

They should match to the enum in ./include/net/tcp_states.h in the linux kernel sources:
enum {
TCP_ESTABLISHED = 1,
TCP_SYN_SENT,
TCP_SYN_RECV,
TCP_FIN_WAIT1,
TCP_FIN_WAIT2,
TCP_TIME_WAIT,
TCP_CLOSE,
TCP_CLOSE_WAIT,
TCP_LAST_ACK,
TCP_LISTEN,
TCP_CLOSING, /* Now a valid state */
TCP_MAX_STATES /* Leave at the end! */
};
As for your 2. question, are you really sure there's not an sshd listening on e.g. 0.0.0.0:22 ? If not, I suspect what you're seeing is related to v4-mapped-on-v6 sockets, see e.g. man 7 ipv6

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to get the reference count of socket descriptor on user level? - linux

I want to get the socket descriptor reference count. Where is this count stored? I didnt find it in inode structure. How can I get this value?

Related

build-id data offset in the ELF file

Which PID is using a PORT inside a k8s pod without net tools

Why using conv=notrunc when cloning a disk with dd?

How can you obtain a queue field from netstat -i in Linux?

List of possible internal socket statuses from /proc

Categories

Resources