How can you obtain a queue field from netstat -i in Linux? - linux

In Solaris, the output of 'netstat -i' gives something like the following:
root# netstat -i
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue
lo0 8232 loopback localhost 136799 0 136799 0 0 0
igb0 1500 vulture vulture 1272272 0 347277 0 0 0
Note that there is a Queue field on the end.
In Linux, 'netstat -i' gives output with no Queue field:
[root#roseate ~]# netstat -i
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 2806170 0 0 0 791768 0 0 0 BMRU
eth1 1500 0 0 0 0 0 0 0 0 0 BMU
eth2 1500 0 0 0 0 0 0 0 0 0 BMU
eth3 1500 0 0 0 0 0 0 0 0 0 BMU
lo 16436 0 1405318 0 0 0 1405318 0 0 0 LRU
I've figured out how to get collisions in Linux by adding the -e option, but is there a way to get the Queue in Linux?

The only reference to queue I ever saw in netstat on Linux was when using -s, but that's probably too garrulous for your use-case?

$ netstat -na | awk 'BEGIN { RecvQ=0; SendQ=0; } { RecvQ+=$2; SendQ+=$3; } END { print "RecvQ " RecvQ/1024; print "SendQ " SendQ/1024; }'
RecvQ 255.882
SendQ 0.0507812
For per interface, I have dirty way
[spatel#us04 ~]$ for qw in `/sbin/ifconfig | grep 'inet addr:' | cut -d: -f2 | awk '{print $1}'`; do echo `/sbin/ip addr | grep $qw | awk '{print $7}'` : ; echo `netstat -na | grep $qw | awk 'BEGIN { RecvQ=0; SendQ=0; } { RecvQ+=$2; SendQ+=$3; } END { print "RecvQ " RecvQ/1024; print "SendQ " SendQ/1024; }'`; done
eth0 :
RecvQ 0 SendQ 0
eth2 :
RecvQ 0.0703125 SendQ 1.56738
:
RecvQ 0 SendQ 0

I ended up using
tc -s -d qdisc
[root#roseate ~]# tc -s -d qdisc
qdisc mq 0: dev eth2 root
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc mq 0: dev eth3 root
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc mq 0: dev eth0 root
Sent 218041403 bytes 1358829 pkt (dropped 0, overlimits 0 requeues 1)
rate 0bit 0pps backlog 0b 0p requeues 1
qdisc mq 0: dev eth1 root
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
which gives backlog bytes and packets.
Source

Related

how to set rss in dpdk 18.11 about mlx5

how to set rss correctly, can let all queue receive packets ?
this is my rss configure as follows:
.rxmode.mq_mode = ETH_MQ_RX_RSS;
.rx_adv_conf.rss_conf.rss_key = NULL; .rx_adv_conf.rss_conf.rss_hf = ETH_RSS_IP | ETH_RSS_UDP | ETH_RSS_TCP;
.txmode.mq_mode = ETH_MQ_TX_NONE;
but only queue 0 receive packets, other queues display 0 packets;
command as follows:
/root/dpdk-18.11/app/test-pmd/testpmd -w 0000:3c:00.0 -l 0-9 -n 4 -- --rxq=10 --txq=10 --rxd=256 --txd=4096 -i --nb-cores=9
testpmd> show port xstats all
NIC extended statistics for port 0
rx_good_packets: 56
tx_good_packets: 18
rx_good_bytes: 4932
tx_good_bytes: 1767
rx_missed_errors: 0
rx_errors: 0
tx_errors: 0
rx_mbuf_allocation_errors: 0
rx_q0packets: 56
rx_q0bytes: 4932
rx_q0errors: 0
rx_q1packets: 0
rx_q1bytes: 0
rx_q1errors: 0
rx_q2packets: 0
rx_q2bytes: 0
rx_q2errors: 0
rx_q3packets: 0
rx_q3bytes: 0
rx_q3errors: 0
rx_q4packets: 0
rx_q4bytes: 0
rx_q4errors: 0
rx_q5packets: 0
rx_q5bytes: 0
rx_q5errors: 0
rx_q6packets: 0
rx_q6bytes: 0
rx_q6errors: 0
rx_q7packets: 0
rx_q7bytes: 0
rx_q7errors: 0
rx_q8packets: 0
rx_q8bytes: 0
rx_q8errors: 0
rx_q9packets: 0
rx_q9bytes: 0
rx_q9errors: 0

Which PID is using a PORT inside a k8s pod without net tools

Sorry about the long question post, but I think it can be useful to others to learn how this works.
What I know:
On any linux host (not using docker container), I can look at /proc/net/tcp to extract information tcp socket related.
So, I can detect the ports in LISTEN state with:
cat /proc/net/tcp |
grep " 0A " |
sed 's/^[^:]*: \(..\)\(..\)\(..\)\(..\):\(....\).*/echo $((0x\4)).$((0x\3)).$((0x\2)).$((0x\1)):$((0x\5))/g' |
bash
Results:
0.0.0.0:111
10.174.109.1:53
127.0.0.53:53
0.0.0.0:22
127.0.0.1:631
0.0.0.0:8000
/proc/net/tcp gives UID, GID, unfortunately does not provides the PID. But returns the inode. That I can use to discover the PID using it as file descriptor.
So one way is to search /proc looking for the inode socket. It's slow, but works on host:
cat /proc/net/tcp |
grep " 0A " |
sed 's/^[^:]*: \(..\)\(..\)\(..\)\(..\):\(....\).\{72\}\([^ ]*\).*/echo $((0x\4)).$((0x\3)).$((0x\2)).$((0x\1)):$((0x\5))\\\t$(find \/proc\/ -type d -name fd 2>\/dev\/null \| while read f\; do ls -l $f 2>\/dev\/null \| grep -q \6 \&\& echo $f; done)/g' |
bash
output:
0.0.0.0:111 /proc/1/task/1/fd /proc/1/fd /proc/924/task/924/fd /proc/924/fd
10.174.109.1:53 /proc/23189/task/23189/fd /proc/23189/fd
127.0.0.53:53 /proc/923/task/923/fd /proc/923/fd
0.0.0.0:22 /proc/1194/task/1194/fd /proc/1194/fd
127.0.0.1:631 /proc/13921/task/13921/fd /proc/13921/fd
0.0.0.0:8000 /proc/23122/task/23122/fd /proc/23122/fd
Permission tip 1: You will only see what you have permission to look at.
Permission tip 2: fake root used in containers does not have access to all file descriptors in /proc/*/fd. You need to query it for each user.
If you run as normal user the results are:
0.0.0.0:111
10.174.109.1:53
127.0.0.53:53
0.0.0.0:22
127.0.0.1:631
0.0.0.0:8000 /proc/23122/task/23122/fd /proc/23122/fd
Using unshare to isolate environment it works as expected:
$ unshare -r --fork --pid unshare -r --fork --pid --mount-proc -n bash
# ps -fe
UID PID PPID C STIME TTY TIME CMD
root 1 0 2 07:19 pts/6 00:00:00 bash
root 100 1 0 07:19 pts/6 00:00:00 ps -fe
# netstat -ntpl
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
# python -m SimpleHTTPServer &
[1] 152
# Serving HTTP on 0.0.0.0 port 8000 ...
netstat -ntpl
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:8000 0.0.0.0:* LISTEN 152/python
# cat /proc/net/tcp |
> grep " 0A " |
> sed 's/^[^:]*: \(..\)\(..\)\(..\)\(..\):\(....\).\{72\}\([^ ]*\).*/echo $((0x\4)).$((0x\3)).$((0x\2)).$((0x\1)):$((0x\5))\\\t$(find \/proc\/ -type d -name fd 2>\/dev\/null \| while read f\; do ls -l $f 2>\/dev\/null \| grep -q \6 \&\& echo $f; done)/g' |
> bash
0.0.0.0:8000 /proc/152/task/152/fd /proc/152/fd
# ls -l /proc/152/fd
total 0
lrwx------ 1 root root 64 mai 25 07:20 0 -> /dev/pts/6
lrwx------ 1 root root 64 mai 25 07:20 1 -> /dev/pts/6
lrwx------ 1 root root 64 mai 25 07:20 2 -> /dev/pts/6
lrwx------ 1 root root 64 mai 25 07:20 3 -> 'socket:[52409024]'
lr-x------ 1 root root 64 mai 25 07:20 7 -> /dev/urandom
# cat /proc/net/tcp
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
0: 00000000:1F40 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 52409024 1 0000000000000000 100 0 0 10 0
Inside a docker container in my host, it seems to work in same way.
The problem:
I have a container inside a kubernetes pod running jitsi. Inside this container, I am unable to get the PID of the service listening the ports.
Nor after installing netstat:
root#jitsi-586cb55594-kfz6m:/# netstat -ntpl
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:5222 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:5269 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:8888 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:5280 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:5347 0.0.0.0:* LISTEN -
tcp6 0 0 :::5222 :::* LISTEN -
tcp6 0 0 :::5269 :::* LISTEN -
tcp6 0 0 :::5280 :::* LISTEN -
# ps -fe
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 May22 ? 00:00:00 s6-svscan -t0 /var/run/s6/services
root 32 1 0 May22 ? 00:00:00 s6-supervise s6-fdholderd
root 199 1 0 May22 ? 00:00:00 s6-supervise jicofo
jicofo 203 199 0 May22 ? 00:04:17 java -Xmx3072m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dnet.java.sip.communicator.SC_HOME_DIR_LOCATION=/ -Dnet.java.sip.communicator.SC_HOME_DIR_NAME=config -Djava
root 5990 0 0 09:48 pts/2 00:00:00 bash
root 10926 5990 0 09:57 pts/2 00:00:00 ps -fe
Finally the Questions:
a) Why can't I read the file descriptors of the proccess listening port 5222 ?
root#jitsi-586cb55594-kfz6m:/# cat /proc/net/tcp | grep " 0A "
0: 00000000:1466 00000000:0000 0A 00000000:00000000 00:00000000 00000000 101 0 244887827 1 ffff9bd749145800 100 0 0 10 0
...
root#jitsi-586cb55594-kfz6m:/# echo $(( 0x1466 ))
5222
root#jitsi-586cb55594-kfz6m:/# ls -l /proc/*/fd/* 2>/dev/null | grep 244887827
root#jitsi-586cb55594-kfz6m:/# echo $?
1
root#jitsi-586cb55594-kfz6m:/# su - svc
svc#jitsi-586cb55594-kfz6m:~$ id -u
101
svc#jitsi-586cb55594-kfz6m:~$ ls -l /proc/*/fd/* 2>/dev/null | grep 244887827
svc#jitsi-586cb55594-kfz6m:~$ echo $?
1
b) There is another way to list inode and link it to a pid without searching /proc/*/fd ?
Update 1:
Based on Anton Kostenko tip, I looked to AppArmor. It's not the case because the server don't use AppArmor, but searching, took me to SELinux.
In a ubuntu machine where AppArmor is running, I got:
$ sudo apparmor_status | grep dock
docker-default
In the OKE(Oracle Kubernetes Engine, my case) node there is no AppArmor. I got SELinux instead:
$ man selinuxenabled | grep EXIT -A1
EXIT STATUS
It exits with status 0 if SELinux is enabled and 1 if it is not enabled.
$ selinuxenabled && echo $?
0
Now, I do believe that SELinux is blocking the /proc/*/fd listing from root inside the container. But I don't know yet how to unlock it.
References:
https://jvns.ca/blog/2016/10/10/what-even-is-a-container/
The issue is solved by adding the POSIX capability: CAP_SYS_PTRACE
I'm my case the container are under kubernetes orchestration.
this reference explains about kubectl and POSIX Capabilities
So I have
root#jitsi-55584f98bf-6cwpn:/# cat /proc/1/status | grep Cap
CapInh: 00000000a80425fb
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fb
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
So I careful read the POSIX Capabilities Manual. But even adding CAP_SYS_ADMIN, the PID does not appear on netstat. So I tested all capabilities. CAP_SYS_PTRACE is The Chosen One
root#jitsi-65c6b5d4f7-r546h:/# cat /proc/1/status | grep Cap
CapInh: 00000000a80c25fb
CapPrm: 00000000a80c25fb
CapEff: 00000000a80c25fb
CapBnd: 00000000a80c25fb
CapAmb: 0000000000000000
So here my deployment spec change:
...
spec:
...
template:
...
spec:
...
containers:
...
securityContext:
capabilities:
add:
- SYS_PTRACE
...
Yet I don't know what security reasons selinux use to do it. But for now it's good enough for me.
References:
https://man7.org/linux/man-pages/man7/capabilities.7.html
https://kubernetes.io/docs/tasks/configure-pod-container/security-context/

How to parse netstat command to get the send-q number from the line

I have this output from netstat -naputeo:
tcp 0 0 :::44500 :::* LISTEN 2000 773788772 18117/java off (0.00/0/0)
tcp 0 0 :::22 :::* LISTEN 0 9419 4186/sshd off (0.00/0/0)
tcp 0 0 ::ffff:127.0.0.1:61666 ::ffff:127.0.0.1:43940 ESTABLISHED 2000 788032760 18122/java off (0.00/0/0)
tcp 0 0 ::ffff:192.168.1.202:56510 ::ffff:192.168.1.202:3000 ESTABLISHED 0 791652028 6804/java_ndsagent keepalive (7185.05/0/0)
tcp 0 0 ::ffff:192.168.1.202:56509 ::ffff:192.168.1.202:3000 TIME_WAIT 0 0 - timewait (41.13/0/0)
tcp 0 0 ::ffff:192.168.1.202:56508 ::ffff:192.168.1.202:3000 TIME_WAIT 0 0 - timewait (21.13/0/0)
tcp 0 4656 ::ffff:192.168.1.202:22 ::ffff:84.208.36.125:48507 ESTABLISHED 0 791474860 24141/1 on (0.19/0/0)
tcp 0 0 ::ffff:127.0.0.1:61616 ::ffff:127.0.0.1:45121 ESTABLISHED 2000 788032761 18117/java off (0.00/0/0)
tcp 0 0 ::ffff:192.168.1.202:3000 ::ffff:192.168.1.202:56510 ESTABLISHED 0 791651217 8044/rmiregistry off (0.00/0/0)
The Send-Q is the 3rd field, here the offender is port 22 and 4656KB.
The problem is that i need to output that specific line and that number/port/process to an output file [only if it is above 4000, that will be sent to my inbox and alert me.
I have seen similar answers but I can't extract the line using those suggestions. I don't know what process will be filling the Q but I know the ports. It's not just the 22 it could be more at any giving time.
I tried:
netstat -naputeo | awk '$3 == 0 && $4 ~ /[^0-9]22$/'
But that gives me the wrong line. [that is the :::22]
netstat -naputeo | awk '{if(($3)>0) print $3;}'
That is all wrong because it somehow produces all the lines of that field.
All I need is that number and line sent to a csv and that's all. I can deal with error checking later and maybe refine it.
Any suggestions??
Used this and it worked for now but there is room for improvement
filterQs() {
while read recv send address pid_program; do
ip=${address%%:*}
port=${address##*:}
pid=${pid_program%%/*}
program=${pid_program#*/}
echo "recv=${recv} send=${send} ip=${ip} port=${port} pid=${pid} program=${program}"
if [[ ${port} -eq 35487|| ${port} -eq 65485|| ${port} -eq CalorisPort || ${port} -eq 22 ]]
then
echo "recv=${recv} send=${send} ip=${ip} port=${port} pid=${pid} program=${program}" >> Qmonitor.txt
fi
done < <(netstat -napute 2>/dev/null | awk '$1 ~ /^(tcp|udp)/ && ($2 > 500 || $3 > 500) { print $2, $3, $4, $9 }')
}
Thanks all
Something like
$ netstat -naputeo 2>/dev/null | awk -v OFS=';' '$1 ~ /^tcp/ && $3 > 4000 { sub(/^.+:/, "", $4); print $3, $4, $9 }'
?
That would output the 3rd column (Send-Q), the port part of the 4th column (Local Address) and the 9th column (PID/Program name) if Send-Q > 4000, separated by semicolons so you can pipe it into your CSV.
E.g. (for Send-Q > 0 on my box)
$ netstat -naputeo 2>/dev/null | awk -v OFS=';' '$1 ~ /^tcp/ && $3 > 0 { sub(/^.+:/, "", $4); print $3, $4, $9 }'
52;22;4363/sshd:
EDIT:
If you really need to further process the values in bash, then you can just print the respective columns via awk and iterate over the lines like this:
#!/bin/bash
while read recv send address pid_program; do
ip=${address%%:*}
port=${address##*:}
pid=${pid_program%%/*}
program=${pid_program#*/}
echo "recv=${recv} send=${send} ip=${ip} port=${port} pid=${pid} program=${program}"
# do stuff here
done < <(netstat -naputeo 2>/dev/null | awk '$1 ~ /^(tcp|udp)/ && ($2 > 4000 || $3 > 4000) { print $2, $3, $4, $9 }')
E.g.:
$ ./t.sh
recv=0 send=52 ip=x.x.x.x port=22 pid=12345 program=sshd:
Note: I don't understand why you need the -o switch to netstat since you don't seem to be interested in the timers output, so you could probably drop that.
Try this:
netstat -naputeo | awk '{ if (($3 + 0) >= 4000) { sub(/.*:/, "", $4); print $3, $4, $9;} }'
This filters out the header line, and extracts the port number from the field $4.
Pure bash solution:
#!/bin/bash
filterHuge() {
while read -r -a line; do
if (( line[2] > 4000 )) && [[ ${line[3]##*:} == '22' ]]; then # if Send-Q is higher than 4000 and port number is 22
echo "Size: ${line[2]} Whole line: ${line[#]}"
fi
done
}
netstat -naputeo | filterHuge
I have a lineage2 server and have some problems with sent-q
I use your script and ....:
Size: 84509 Whole line: tcp 0 84509 144.217.255.80:6254 179.7.212.0:35176 ESTABLISHED 0 480806 2286/java on (46.42/11/0)
Size: 12130 Whole line: tcp 0 12130 144.217.255.80:6254 200.120.203.238:52295 ESTABLISHED 0 410043 2286/java on (0.69/0/0)
Size: 13774 Whole line: tcp 0 13774 144.217.255.80:6254 190.30.75.253:63749 ESTABLISHED 0 469361 2286/java on (0.76/0/0)
Size: 12319 Whole line: tcp 0 12319 144.217.255.80:6254 200.120.203.238:52389 ESTABLISHED 0 487569 2286/java on (0.37/0/0)
Size: 9800 Whole line: tcp 0 9800 144.217.255.80:6254 186.141.200.7:63572 ESTABLISHED 0 478974 2286/java on (0.38/0/0)
Size: 12150 Whole line: tcp 0 12150 144.217.255.80:6254 200.120.203.238:52298 ESTABLISHED 0 410128 2286/java on (0.26/0/0)
Size: 9626 Whole line: tcp 0 9626 144.217.255.80:6254 186.141.200.7:63569 ESTABLISHED 0 482721 2286/java on (0.44/0/0)
Size: 11443 Whole line: tcp 0 11443 144.217.255.80:6254 200.120.203.238:52291 ESTABLISHED 0 411061 2286/java on (0.89/0/0)
Size: 79254 Whole line: tcp 0 79254 144.217.255.80:6254 179.7.212.0:6014 ESTABLISHED 0 501998 2286/java on (89.42/10/0)
Size: 10722 Whole line: tcp 0 10722 144.217.255.80:6254 179.7.111.208:12925 ESTABLISHED 0 488352 2286/java on (0.23/0/0)
Size: 126708 Whole line: tcp 0 126708 144.217.255.80:6254 190.11.106.181:3481 ESTABLISHED 0 487867 2286/java on (85.32/7/0)
Problem are in one port : 6254
Which I could place for connections that are greater than 4000 in sent to the restart to 0 or dropping them

Unable to set speed limit using tc tbf

I am using tc to do QOS on an asterisk server. I want to prioritize voice and sip traffic but also cap all the rest to a fixed limit.
Here is my script:
#!/bin/bash
IFACE=eth1
UPSPEED=1.5mbit
tc qdisc del dev $IFACE root
tc qdisc add dev $IFACE root handle 1: prio priomap 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 0
tc qdisc add dev $IFACE parent 1:1 handle 10: sfq perturb 10
tc qdisc add dev $IFACE parent 1:2 handle 20: sfq perturb 10
tc qdisc add dev $IFACE parent 1:3 handle 30: tbf rate $UPSPEED burst 4kb mtu 1500 latency 100ms
tc filter add dev $IFACE protocol ip parent 1: prio 1 u32 match ip tos 0xb8 0xff flowid 1:1
tc filter add dev $IFACE protocol ip parent 1: prio 1 u32 match ip tos 0x60 0xff flowid 1:2
RTP and SIP traffic is well managed, being sent to the first and the second band. All other traffic is also well managed, being sent to the third band. However, for some reasons, If I download from the server, it is always at 10-16k/sec instead of the 185-190k (1.5mbit) specified in my script.
Worst, it seems that no matter how I change the tbf variables, the speed stays the same.
Using qdisc -s ls, I managed to find out that packets were dropped :
qdisc prio 1: bands 3 priomap 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 0
Sent 12610974 bytes 45683 pkt (dropped 1147, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc sfq 10: parent 1:1 limit 126p quantum 1514b perturb 10sec
Sent 7802180 bytes 36590 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc sfq 20: parent 1:2 limit 126p quantum 1514b perturb 10sec
Sent 181620 bytes 283 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc tbf 30: parent 1:3 rate 1500Kbit burst 4Kb lat 100.0ms
Sent 4627174 bytes 8810 pkt (dropped 1147, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
But I don't know why. Again, changed the tbf variables doesn't change anything, packets keep being dropped.
Please note that the mtu of eth1 is 1500.
Anyone?
Add mtu 100000 to your tc qdisc creation command. See this post for more info.
Quoting the post, in case it disappears:
Basically if your interface has TSO/GSO enabled (check using ethtool -k ethX), or you're using the loopback interface - then you'll probably hit a problem. It turns out that the loopback interface has GSO/TSO enabled as default, plus since it is a software interface its default mtu is 16384 (as compared to 1500 for normal Ethernet interface). This matters as the tbf queue checks the size of the incoming 'packets' - which in the case of GSO/TSO are much larger than a normal on-the-wire packet - instead they're up to 9 x iface's mtu. So for normal interfaces it's about 12K, but for loopback it is about 100k!
This appears to apply to Xen interfaces, too:
# ethtool -k eth0
Features for eth0:
rx-checksumming: on [fixed]
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-unneeded: off [fixed]
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on # THIS IS TSO
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on # THIS IS GSO
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: on [fixed]
tx-fcoe-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: on
loopback: off [fixed]
I had the same issue. Turning off segmentation offloading will solve the problem (and you won't need to pass mtu to tc anymore).
sudo ethtool -K eth1 tso off
sudo ethtool -K eth1 gso off
sudo ethtool -K eth1 gro off
sudo ethtool --offload eth1 rx off tx off

List of possible internal socket statuses from /proc

I would like to know the possible values of st column in /proc/net/tcp. I think the st column equates to STATE column from netstat(8) or ss(8).
I have managed to identify three codes:
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
0: 0100007F:08A0 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 7321 1 ffff81002f449980 3000 0 0 2 -1
1: 00000000:006F 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 6656 1 ffff81003a30c080 3000 0 0 2 -1
2: 00000000:0272 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 6733 1 ffff81003a30c6c0 3000 0 0 2 -1
3: 0100007F:0277 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 7411 1 ffff81002f448d00 3000 0 0 2 -1
4: 0100007F:0019 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 7520 1 ffff81002f4486c0 3000 0 0 2 -1
5: 0100007F:089F 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 7339 1 ffff81002f449340 3000 0 0 2 -1
6: 0100007F:E753 0100007F:0016 01 00000000:00000000 02:000AFA92 00000000 500 0 18198 2 ffff81002f448080 204 40 20 2 -1
7: 0100007F:E752 0100007F:0016 06 00000000:00000000 03:000005EC 00000000 0 0 0 2 ffff81000805dc00
The above shows:
On line sl 0: a listening port on tcp/2208. st = 0A = LISTEN
On line sl 6: An established session on tcp/22. st = 01 = ESTABLISHED
On line sl 7: An socket in TIME_WAIT state after ssh logout. No inode. st = 06 = TIME_WAIT
Can anyone expand on this list? The proc(5) manpage is quite terse on the subject stating:
/proc/net/tcp
Holds a dump of the TCP socket table. Much of the information is not of use apart from debugging. The "sl" value is the kernel hash slot for the socket, the "local address" is the local address and
port number pair. The "remote address" is the remote address and port number pair (if connected). ’St’ is the internal status of the socket. The ’tx_queue’ and ’rx_queue’ are the outgoing and incom-
ing data queue in terms of kernel memory usage. The "tr", "tm->when", and "rexmits" fields hold internal information of the kernel socket state and are only useful for debugging. The "uid" field
holds the effective UID of the creator of the socket.
And on a related note, the above /proc/net/tcp output is showing a few listening processes (2208, 62, 111 etc). However, I cannot see a listening tcp connection on tcp/22, althought the established and time_wait states are shown. Yes, I can see them in /proc/net/tcp6 but should they not be present in /proc/net/tcp also? Netstat output shows it differently to applications bound only to ipv4. E.g.
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 4231/portmap
tcp 0 0 :::22 :::* LISTEN 4556/sshd
Many thanks,
-Andrew
They should match to the enum in ./include/net/tcp_states.h in the linux kernel sources:
enum {
TCP_ESTABLISHED = 1,
TCP_SYN_SENT,
TCP_SYN_RECV,
TCP_FIN_WAIT1,
TCP_FIN_WAIT2,
TCP_TIME_WAIT,
TCP_CLOSE,
TCP_CLOSE_WAIT,
TCP_LAST_ACK,
TCP_LISTEN,
TCP_CLOSING, /* Now a valid state */
TCP_MAX_STATES /* Leave at the end! */
};
As for your 2. question, are you really sure there's not an sshd listening on e.g. 0.0.0.0:22 ? If not, I suspect what you're seeing is related to v4-mapped-on-v6 sockets, see e.g. man 7 ipv6

Resources