Linux Iptables string module does not match all packets - linux

This is the first time I'm using matching string module for iptables and met some strange behaviour I can't overcome.
In short, it looks like iptables passes "trimmed" packets to module, so module can not parse whole packet for string.
I cant find any information about such behaviour on net. All examples and tutors should work like a charm, but they dont.
Now in details.
OS: debian testing, kernel 3.2.0-3-686-pae
IPTABLES: iptables v1.4.14
OTHER:
tcpdump version 4.3.0,
libpcap version 1.3.0
# lsmod|grep ipt
ipt_LOG 12533 0
iptable_nat 12800 0
nf_nat 17924 1 iptable_nat
nf_conntrack_ipv4 13726 3 nf_nat,iptable_nat
nf_conntrack 43121 3 nf_conntrack_ipv4,nf_nat,iptable_nat
iptable_filter 12488 0
ip_tables 17079 2 iptable_filter,iptable_nat
x_tables 18121 6
ip_tables,iptable_filter,iptable_nat,xt_string,xt_tcpudp,ipt_LOG
I'v reseted all rules to default and added only two rules in following order:
iptables -t filter -A OUTPUT --protocol tcp --dport 80 --match string --algo bm --from 0 --to 1500 --string "/index.php" --jump LOG --log-prefix "matched :"
iptables -t filter -A OUTPUT --protocol tcp --dport 80 --jump LOG --log-prefix "OUT : "
The idea is obious - match requests to any ip to port 80 containing string /index.php and print them to log, and also log all the data passed to 80 port.
So the iptables -L -xvn :
Chain INPUT (policy ACCEPT 3 packets, 693 bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 3 packets, 184 bytes)
pkts bytes target prot opt in out source destination
0 0 LOG tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 STRING match "/index.php" ALGO name bm TO 1500 LOG flags 0 level 4 prefix "matched :"
0 0 LOG tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 LOG flags 0 level 4 prefix "OUT : "
Counters for rules are zeroed.
Ok, now browser goes to www.gentoo.org/index.php. This url is just example for illustration.
So this is the only request url in browser.
And I get the following for iptables -t filter -L -xvn:
Chain INPUT (policy ACCEPT 61 packets, 16657 bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 63 packets, 4394 bytes)
pkts bytes target prot opt in out source destination
1 380 LOG tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 STRING match "/index.php" ALGO name bm TO 1500 LOG flags 0 level 4 prefix "matched :"
13 1392 LOG tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 LOG flags 0 level 4 prefix "OUT : "
So we have only ONE match for 1st rule. But that is wrong. Here is tcpdump for output for connection.
First ip request 89.16.167.134:
<handshake omitted>
16:56:38.440308 IP 192.168.66.106.54704 > 89.16.167.134.80: Flags [P.], seq 1:373, ack 1, win 913,
options [nop,nop,TS val 85359 ecr 3026115253], length 372
<...>
0x0030: b45e dab5 4745 5420 2f69 6e64 6578 2e70 .^..GET./index.p
0x0040: 6870 2048 5454 502f 312e 310d 0a48 6f73 hp.HTTP/1.1..Hos
0x0050: 743a 2077 7777 2e67 656e 746f 6f2e 6f72 t:.www.gentoo.or
0x0060: 670d 0a55 7365 722d 4167 656e 743a 204d g..User-Agent:.M
0x0070: 6f7a 696c 6c61 2f35 2e30 2028 5831 313b ozilla/5.0.(X11;
Here we see one match in HTTP GET. Ok some packets later we have request for some more content to ip 140.211.166.176:
16:56:38.772432 IP 192.168.66.106.59766 > 140.211.166.176.80: Flags [P.], seq 1:329, ack 1, win 913,
options [nop,nop,TS val 85442 ecr 110101513], length 328
<...>
0x0030: 0690 0409 4745 5420 2f20 4854 5450 2f31 ....GET./.HTTP/1
<...>
0x0130: 6566 6c61 7465 0d0a 436f 6e6e 6563 7469 eflate..Connecti
0x0140: 6f6e 3a20 6b65 6570 2d61 6c69 7665 0d0a on:.keep-alive..
0x0150: 5265 6665 7265 723a 2068 7474 703a 2f2f Referer:.http://
0x0160: 7777 772e 6765 6e74 6f6f 2e6f 7267 2f69 www.gentoo.org/i
0x0170: 6e64 6578 2e70 6870 0d0a 0d0a ndex.php....
Here we see "/index.php" again.
But LOG rule gives the following info:
kernel: [ 641.386182] OUT : IN= OUT=eth0 SRC=192.168.66.106 DST=89.16.167.134 LEN=60
kernel: [ 641.435946] OUT : IN= OUT=eth0 SRC=192.168.66.106 DST=89.16.167.134 LEN=52
kernel: [ 641.436226] OUT : IN= OUT=eth0 SRC=192.168.66.106 DST=89.16.167.134 LEN=424
kernel: [ 641.512594] OUT : IN= OUT=eth0 SRC=192.168.66.106 DST=89.16.167.134 LEN=52
kernel: [ 641.512762] OUT : IN= OUT=eth0 SRC=192.168.66.106 DST=89.16.167.134 LEN=52
kernel: [ 641.512819] OUT : IN= OUT=eth0 SRC=192.168.66.106 DST=89.16.167.134 LEN=52
kernel: [ 641.567496] OUT : IN= OUT=eth0 SRC=192.168.66.106 DST=140.211.166.176 LEN=60
kernel: [ 641.767707] OUT : IN= OUT=eth0 SRC=192.168.66.106 DST=140.211.166.176 LEN=52
kernel: [ 641.768328] matched :IN= OUT=eth0 SRC=192.168.66.106 DST=140.211.166.176 LEN=380 <--
kernel: [ 641.768352] OUT : IN= OUT=eth0 SRC=192.168.66.106 DST=140.211.166.176 LEN=380
kernel: [ 641.990287] OUT : IN= OUT=eth0 SRC=192.168.66.106 DST=140.211.166.176 LEN=52
kernel: [ 641.990455] OUT : IN= OUT=eth0 SRC=192.168.66.106 DST=140.211.166.176 LEN=52
kernel: [ 641.990507] OUT : IN= OUT=eth0 SRC=192.168.66.106 DST=140.211.166.176 LEN=52
kernel: [ 641.990559] OUT : IN= OUT=eth0 SRC=192.168.66.106 DST=140.211.166.176 LEN=52
So we have match on only one packet, going to 140.211.166.176. But where is the first match?
The more strange for me is that on ither machine with ubuntu, it gives different counters, like 6 matches for string.
I'v also made same simple test on home notebook with ubuntu 12.04 and i get 2 clear matches.
Mb there is some kind of options to tune module data passing or smth?
UPDATE:
Very simple experiment mb if someone could explain that moment it will give a hint.
Now two pc, server centos 6.3 which listens on port 80 with netcat ans answers string:
# echo "List-IdServer"|nc -l 80
List-Id
And client (debian) connecting to server and sending data with nc and reading answer:
% echo "List-Id"|nc butorabackup 80
List-IdServer
After data exchange i have on SERVER SIDE:
Chain INPUT (policy ACCEPT 33 packets, 2164 bytes)
pkts bytes target prot opt in out source destination
1 60 LOG tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 STRING match "List-Id" ALGO name bm TO 6500 LOG flags 0 level 4
5 276 LOG tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 LOG flags 0 level 4
Chain FORWARD (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 23 packets, 4434 bytes)
pkts bytes target prot opt in out source destination
0 0 LOG tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp spt:80 STRING match "List-Id" ALGO name bm TO 6500 LOG flags 0 level 4
5 282 LOG tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp spt:80 LOG flags 0 level 4
and on CLIENT SIDE:
Chain INPUT (policy ACCEPT 28 packets, 2187 bytes)
pkts bytes target prot opt in out source destination
1 66 LOG tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp spt:80 STRING match "List-Id" ALGO name bm TO 6500 LOG flags 0 level 4
5 282 LOG tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp spt:80 LOG flags 0 level 4
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 23 packets, 1721 bytes)
pkts bytes target prot opt in out source destination
1 60 LOG tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 STRING match "List-Id" ALGO name bm TO 6500 LOG flags 0 level 4
5 276 LOG tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 LOG flags 0 level 4
So there is NO output rule match for server. Server matched IN, and client matched IN and OUT.
I dont understand how does it work :(
Tnx in advance.

Related

firewalld port forward to k8s node port not working

I want to configure port forward 80->32181, 443->30598. 32181 and 30598 is NodePort of k8s ingress controller which i can establish connection correctly:
$ curl http://localhost:32181
<html>
<head><title>404 Not Found</title></head>
<body>
...
$ curl https://localhost:30598 -k
<html>
<head><title>404 Not Found</title></head>
<body>
...
What I have done is:
$ cat /proc/sys/net/ipv4/ip_forward
1
$ firewall-cmd --list-all
public (active)
target: default
icmp-block-inversion: no
interfaces: eth0
sources:
services: cockpit dhcpv6-client frp http https kube-apiserver kube-kubelet ssh
ports:
protocols:
forward: no
masquerade: yes
forward-ports:
port=80:proto=tcp:toport=32181:toaddr=
port=443:proto=tcp:toport=30598:toaddr=
source-ports:
icmp-blocks:
rich rules:
but I cant access my nginx via 80 or 443:
$ curl https://localhost:443 -k
curl: (7) Failed to connect to localhost port 443: Connection refused
and more info:
centos: 8.2 4.18.0-348.2.1.el8_5.x86_64
k8s: 1.22(with calico(v3.21.0) network plugin)
firewalld: 0.9.3
and iptables output:
$ iptables -nvL -t nat --line-numbers
Chain PREROUTING (policy ACCEPT 51 packets, 2688 bytes)
num pkts bytes target prot opt in out source destination
1 51 2688 cali-PREROUTING all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:6gwbT8clXdHdC1b1 */
2 51 2688 KUBE-SERVICES all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */
3 51 2688 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT 50 packets, 2648 bytes)
num pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 1872 packets, 112K bytes)
num pkts bytes target prot opt in out source destination
1 1894 114K cali-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:O3lYWMrLQYEMJtB5 */
2 1862 112K KUBE-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes postrouting rules */
3 0 0 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0
Chain OUTPUT (policy ACCEPT 1922 packets, 116K bytes)
num pkts bytes target prot opt in out source destination
1 1894 114K cali-OUTPUT all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:tVnHkvAo15HuiPy0 */
2 1911 115K KUBE-SERVICES all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */
3 758 45480 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL
Chain DOCKER (2 references)
num pkts bytes target prot opt in out source destination
1 0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0
Chain KUBE-SERVICES (2 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-SVC-JD5MR3NA4I4DYORP tcp -- * * 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:metrics cluster IP */ tcp dpt:9153
2 0 0 KUBE-SVC-Z6GDYMWE5TV2NNJN tcp -- * * 0.0.0.0/0 10.110.193.197 /* kubernetes-dashboard/dashboard-metrics-scraper cluster IP */ tcp dpt:8000
3 0 0 KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- * * 0.0.0.0/0 10.96.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:443
4 0 0 KUBE-SVC-EDNDUDH2C75GIR6O tcp -- * * 0.0.0.0/0 10.97.201.174 /* ingress-nginx/ingress-nginx-controller:https cluster IP */ tcp dpt:443
5 0 0 KUBE-SVC-EZYNCFY2F7N6OQA2 tcp -- * * 0.0.0.0/0 10.103.242.141 /* ingress-nginx/ingress-nginx-controller-admission:https-webhook cluster IP */ tcp dpt:443
6 0 0 KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- * * 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
7 0 0 KUBE-SVC-TCOU7JCQXEZGVUNU udp -- * * 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
8 0 0 KUBE-SVC-CEZPIJSAUFW5MYPQ tcp -- * * 0.0.0.0/0 10.97.166.112 /* kubernetes-dashboard/kubernetes-dashboard cluster IP */ tcp dpt:443
9 0 0 KUBE-SVC-H5K62VURUHBF7BRH tcp -- * * 0.0.0.0/0 10.104.154.95 /* lens-metrics/kube-state-metrics:metrics cluster IP */ tcp dpt:8080
10 0 0 KUBE-SVC-MOZMMOD3XZX35IET tcp -- * * 0.0.0.0/0 10.96.73.22 /* lens-metrics/prometheus:web cluster IP */ tcp dpt:80
11 0 0 KUBE-SVC-CG5I4G2RS3ZVWGLK tcp -- * * 0.0.0.0/0 10.97.201.174 /* ingress-nginx/ingress-nginx-controller:http cluster IP */ tcp dpt:80
12 1165 69528 KUBE-NODEPORTS all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
Chain KUBE-POSTROUTING (1 references)
num pkts bytes target prot opt in out source destination
1 1859 112K RETURN all -- * * 0.0.0.0/0 0.0.0.0/0 mark match ! 0x4000/0x4000
2 3 180 MARK all -- * * 0.0.0.0/0 0.0.0.0/0 MARK xor 0x4000
3 3 180 MASQUERADE all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service traffic requiring SNAT */ random-fully
Chain KUBE-MARK-DROP (0 references)
num pkts bytes target prot opt in out source destination
1 0 0 MARK all -- * * 0.0.0.0/0 0.0.0.0/0 MARK or 0x8000
Chain KUBE-NODEPORTS (1 references)
num pkts bytes target prot opt in out source destination
1 2 120 KUBE-SVC-EDNDUDH2C75GIR6O tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress-nginx/ingress-nginx-controller:https */ tcp dpt:30598
2 1 60 KUBE-SVC-CG5I4G2RS3ZVWGLK tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress-nginx/ingress-nginx-controller:http */ tcp dpt:32181
Chain KUBE-MARK-MASQ (27 references)
num pkts bytes target prot opt in out source destination
1 3 180 MARK all -- * * 0.0.0.0/0 0.0.0.0/0 MARK or 0x4000
Chain KUBE-SEP-IPE5TMLTCUYK646X (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ all -- * * 192.168.103.147 0.0.0.0/0 /* kube-system/kube-dns:metrics */
2 0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:metrics */ tcp to:192.168.103.147:9153
Chain KUBE-SEP-3LZLTHU4JT3FAVZK (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ all -- * * 192.168.103.149 0.0.0.0/0 /* kube-system/kube-dns:metrics */
2 0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:metrics */ tcp to:192.168.103.149:9153
Chain KUBE-SVC-JD5MR3NA4I4DYORP (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ tcp -- * * !192.168.0.0/16 10.96.0.10 /* kube-system/kube-dns:metrics cluster IP */ tcp dpt:9153
2 0 0 KUBE-SEP-IPE5TMLTCUYK646X all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:metrics */ statistic mode random probability 0.50000000000
3 0 0 KUBE-SEP-3LZLTHU4JT3FAVZK all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:metrics */
Chain KUBE-SEP-ZOAMCQDU54EOM4EJ (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ all -- * * 192.168.103.141 0.0.0.0/0 /* kubernetes-dashboard/dashboard-metrics-scraper */
2 0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes-dashboard/dashboard-metrics-scraper */ tcp to:192.168.103.141:8000
Chain KUBE-SVC-Z6GDYMWE5TV2NNJN (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ tcp -- * * !192.168.0.0/16 10.110.193.197 /* kubernetes-dashboard/dashboard-metrics-scraper cluster IP */ tcp dpt:8000
2 0 0 KUBE-SEP-ZOAMCQDU54EOM4EJ all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes-dashboard/dashboard-metrics-scraper */
Chain KUBE-SEP-HYE2IFAO6PORQFJR (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ all -- * * 192.168.0.176 0.0.0.0/0 /* default/kubernetes:https */
2 0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* default/kubernetes:https */ tcp to:192.168.0.176:6443
Chain KUBE-SVC-NPX46M4PTMTKRN6Y (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ tcp -- * * !192.168.0.0/16 10.96.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:443
2 0 0 KUBE-SEP-HYE2IFAO6PORQFJR all -- * * 0.0.0.0/0 0.0.0.0/0 /* default/kubernetes:https */
Chain KUBE-SEP-GJ4OJHBKIREWLMRS (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ all -- * * 192.168.103.146 0.0.0.0/0 /* ingress-nginx/ingress-nginx-controller:https */
2 2 120 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress-nginx/ingress-nginx-controller:https */ tcp to:192.168.103.146:443
Chain KUBE-SVC-EDNDUDH2C75GIR6O (2 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ tcp -- * * !192.168.0.0/16 10.97.201.174 /* ingress-nginx/ingress-nginx-controller:https cluster IP */ tcp dpt:443
2 2 120 KUBE-MARK-MASQ tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress-nginx/ingress-nginx-controller:https */ tcp dpt:30598
3 2 120 KUBE-SEP-GJ4OJHBKIREWLMRS all -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress-nginx/ingress-nginx-controller:https */
Chain KUBE-SEP-K2CVHZPTBE2YAD6P (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ all -- * * 192.168.103.146 0.0.0.0/0 /* ingress-nginx/ingress-nginx-controller-admission:https-webhook */
2 0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress-nginx/ingress-nginx-controller-admission:https-webhook */ tcp to:192.168.103.146:8443
Chain KUBE-SVC-EZYNCFY2F7N6OQA2 (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ tcp -- * * !192.168.0.0/16 10.103.242.141 /* ingress-nginx/ingress-nginx-controller-admission:https-webhook cluster IP */ tcp dpt:443
2 0 0 KUBE-SEP-K2CVHZPTBE2YAD6P all -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress-nginx/ingress-nginx-controller-admission:https-webhook */
Chain KUBE-SEP-S6VTWHFP6KEYRW5L (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ all -- * * 192.168.103.147 0.0.0.0/0 /* kube-system/kube-dns:dns-tcp */
2 0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns-tcp */ tcp to:192.168.103.147:53
Chain KUBE-SEP-SFGZMYIS2CE4JD3K (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ all -- * * 192.168.103.149 0.0.0.0/0 /* kube-system/kube-dns:dns-tcp */
2 0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns-tcp */ tcp to:192.168.103.149:53
Chain KUBE-SVC-ERIFXISQEP7F7OF4 (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ tcp -- * * !192.168.0.0/16 10.96.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
2 0 0 KUBE-SEP-S6VTWHFP6KEYRW5L all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns-tcp */ statistic mode random probability 0.50000000000
3 0 0 KUBE-SEP-SFGZMYIS2CE4JD3K all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns-tcp */
Chain KUBE-SEP-IJUMPPTQDLYXOX4B (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ all -- * * 192.168.103.147 0.0.0.0/0 /* kube-system/kube-dns:dns */
2 0 0 DNAT udp -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns */ udp to:192.168.103.147:53
Chain KUBE-SEP-C4W6TKYY5HHEG4RV (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ all -- * * 192.168.103.149 0.0.0.0/0 /* kube-system/kube-dns:dns */
2 0 0 DNAT udp -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns */ udp to:192.168.103.149:53
Chain KUBE-SVC-TCOU7JCQXEZGVUNU (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ udp -- * * !192.168.0.0/16 10.96.0.10 /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
2 0 0 KUBE-SEP-IJUMPPTQDLYXOX4B all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns */ statistic mode random probability 0.50000000000
3 0 0 KUBE-SEP-C4W6TKYY5HHEG4RV all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns */
Chain KUBE-SEP-GX372II3CQAGUHFM (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ all -- * * 192.168.103.145 0.0.0.0/0 /* kubernetes-dashboard/kubernetes-dashboard */
2 0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes-dashboard/kubernetes-dashboard */ tcp to:192.168.103.145:8443
Chain KUBE-SVC-CEZPIJSAUFW5MYPQ (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ tcp -- * * !192.168.0.0/16 10.97.166.112 /* kubernetes-dashboard/kubernetes-dashboard cluster IP */ tcp dpt:443
2 0 0 KUBE-SEP-GX372II3CQAGUHFM all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes-dashboard/kubernetes-dashboard */
Chain KUBE-SEP-I3RZS3REJP7POFLG (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ all -- * * 192.168.103.143 0.0.0.0/0 /* lens-metrics/kube-state-metrics:metrics */
2 0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* lens-metrics/kube-state-metrics:metrics */ tcp to:192.168.103.143:8080
Chain KUBE-SVC-H5K62VURUHBF7BRH (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ tcp -- * * !192.168.0.0/16 10.104.154.95 /* lens-metrics/kube-state-metrics:metrics cluster IP */ tcp dpt:8080
2 0 0 KUBE-SEP-I3RZS3REJP7POFLG all -- * * 0.0.0.0/0 0.0.0.0/0 /* lens-metrics/kube-state-metrics:metrics */
Chain KUBE-SEP-ROTMHDCXAI3T7IOR (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ all -- * * 192.168.103.144 0.0.0.0/0 /* lens-metrics/prometheus:web */
2 0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* lens-metrics/prometheus:web */ tcp to:192.168.103.144:9090
Chain KUBE-SVC-MOZMMOD3XZX35IET (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ tcp -- * * !192.168.0.0/16 10.96.73.22 /* lens-metrics/prometheus:web cluster IP */ tcp dpt:80
2 0 0 KUBE-SEP-ROTMHDCXAI3T7IOR all -- * * 0.0.0.0/0 0.0.0.0/0 /* lens-metrics/prometheus:web */
Chain KUBE-SEP-OAYGOO6JHJEB65WC (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ all -- * * 192.168.103.146 0.0.0.0/0 /* ingress-nginx/ingress-nginx-controller:http */
2 1 60 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress-nginx/ingress-nginx-controller:http */ tcp to:192.168.103.146:80
Chain KUBE-SVC-CG5I4G2RS3ZVWGLK (2 references)
num pkts bytes target prot opt in out source destination
1 0 0 KUBE-MARK-MASQ tcp -- * * !192.168.0.0/16 10.97.201.174 /* ingress-nginx/ingress-nginx-controller:http cluster IP */ tcp dpt:80
2 1 60 KUBE-MARK-MASQ tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress-nginx/ingress-nginx-controller:http */ tcp dpt:32181
3 1 60 KUBE-SEP-OAYGOO6JHJEB65WC all -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress-nginx/ingress-nginx-controller:http */
Chain KUBE-PROXY-CANARY (0 references)
num pkts bytes target prot opt in out source destination
Chain cali-nat-outgoing (1 references)
num pkts bytes target prot opt in out source destination
1 49 3274 MASQUERADE all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:flqWnvo8yq4ULQLa */ match-set cali40masq-ipam-pools src ! match-set cali40all-ipam-pools dst random-fully
Chain cali-POSTROUTING (1 references)
num pkts bytes target prot opt in out source destination
1 1894 114K cali-fip-snat all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:Z-c7XtVd2Bq7s_hA */
2 1894 114K cali-nat-outgoing all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:nYKhEzDlr11Jccal */
3 0 0 MASQUERADE all -- * tunl0 0.0.0.0/0 0.0.0.0/0 /* cali:SXWvdsbh4Mw7wOln */ ADDRTYPE match src-type !LOCAL limit-out ADDRTYPE match src-type LOCAL random-fully
Chain cali-PREROUTING (1 references)
num pkts bytes target prot opt in out source destination
1 51 2688 cali-fip-dnat all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:r6XmIziWUJsdOK6Z */
Chain cali-fip-snat (1 references)
num pkts bytes target prot opt in out source destination
Chain cali-OUTPUT (1 references)
num pkts bytes target prot opt in out source destination
1 1894 114K cali-fip-dnat all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:GBTAv2p5CwevEyJm */
Chain cali-fip-dnat (2 references)
num pkts bytes target prot opt in out source destination
Chain KUBE-KUBELET-CANARY (0 references)
num pkts bytes target prot opt in out source destination
To clarify I am posting Community Wiki answer.
The problem existed only during forwarding to a k8s service NodePort.
To solve the problem you have set up an External Nginx as a TCP Proxy.
Here one can find documentation about External NGINX.
Ingress does not directly support TCP services, so some additional configuration is necessary. Your NGINX Ingress Controller may have been deployed directly (i.e. with a Kubernetes spec file) or through the official Helm chart. The configuration of the TCP pass through will differ depending on the deployment approach.

Docker container cannot connect to the Internet, ping works, wget fails

I am trying to find a solution for days now and ended up asking questions here...
I have Debian 10 with Docker installed, a container connects to the other containers without any problem, but I cannot figure out what needs to be done to access the Internet from the containers.
A container can do a ping and gets the replies:
docker run -i -t busybox ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=53 time=10.156 ms
64 bytes from 8.8.8.8: seq=1 ttl=53 time=10.516 ms
64 bytes from 8.8.8.8: seq=2 ttl=53 time=10.218 ms
64 bytes from 8.8.8.8: seq=3 ttl=53 time=10.487 ms
Unfortunately, when I try to use wget it fails:
docker run -i -t busybox wget -S -T 5 http://google.com
Connecting to google.com (216.58.209.14:80)
wget: download timed out
The containers DNS seem to be properly set up:
docker run -i -t busybox cat /etc/resolv.conf
nameserver 8.8.8.8
nameserver 8.8.4.4
OS details and docker version:
uname -a
Linux host1 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
docker -v
Docker version 19.03.8, build afacb8b7f0
Docker bridge network details:
docker network inspect bridge
[
{
"Name": "bridge",
"Id": "970f8f04c009361b831d8ff8b4fa6d223645aadbbe93a27576d4934c0a8710e0",
"Created": "2020-04-23T17:15:37.376767708+02:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.17.0.0/16",
"Gateway": "172.17.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {},
"Options": {
"com.docker.network.bridge.default_bridge": "true",
"com.docker.network.bridge.enable_icc": "true",
"com.docker.network.bridge.enable_ip_masquerade": "true",
"com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
"com.docker.network.bridge.name": "docker0",
"com.docker.network.driver.mtu": "1500"
},
"Labels": {}
}
]
iptables are enabled and configured, however I have also tried with clear rules (ACCEPT all), still no luck:
iptables -nvL
Chain INPUT (policy DROP 484 packets, 40785 bytes)
pkts bytes target prot opt in out source destination
2 116 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:443 state NEW
2501 309K ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
3 192 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:1337
0 0 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0
8 498 ACCEPT udp -- * * 0.0.0.0/0 0.0.0.0/0 udp dpt:1194 state NEW
10 640 ACCEPT all -- tun0 * 0.0.0.0/0 0.0.0.0/0
Chain FORWARD (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
70 4889 DOCKER-USER all -- * * 0.0.0.0/0 0.0.0.0/0
46 3449 DOCKER-ISOLATION-STAGE-1 all -- * * 0.0.0.0/0 0.0.0.0/0
14 1164 ACCEPT all -- * docker0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 DOCKER all -- * docker0 0.0.0.0/0 0.0.0.0/0
24 1607 ACCEPT all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- docker0 docker0 0.0.0.0/0 0.0.0.0/0
8 678 ACCEPT all -- tun0 * 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- tun0 ens192 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
0 0 ACCEPT all -- ens192 tun+ 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
Chain OUTPUT (policy ACCEPT 10 packets, 733 bytes)
pkts bytes target prot opt in out source destination
1782 1233K ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
0 0 ACCEPT all -- * lo 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- * tun0 0.0.0.0/0 0.0.0.0/0
Chain DOCKER (1 references)
pkts bytes target prot opt in out source destination
Chain DOCKER-ISOLATION-STAGE-1 (1 references)
pkts bytes target prot opt in out source destination
24 1607 DOCKER-ISOLATION-STAGE-2 all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0
46 3449 RETURN all -- * * 0.0.0.0/0 0.0.0.0/0
Chain DOCKER-USER (1 references)
pkts bytes target prot opt in out source destination
24 1440 REJECT tcp -- ens192 * 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable
46 3449 RETURN all -- * * 0.0.0.0/0 0.0.0.0/0
Chain DOCKER-ISOLATION-STAGE-2 (1 references)
pkts bytes target prot opt in out source destination
0 0 DROP all -- * docker0 0.0.0.0/0 0.0.0.0/0
24 1607 RETURN all -- * * 0.0.0.0/0 0.0.0.0/0
iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 516 packets, 43250 bytes)
pkts bytes target prot opt in out source destination
290 14045 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT 18 packets, 1101 bytes)
pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 10 packets, 744 bytes)
pkts bytes target prot opt in out source destination
10 590 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0
0 0 MASQUERADE tcp -- * * 172.17.0.2 172.17.0.2 tcp dpt:80
Chain OUTPUT (policy ACCEPT 9 packets, 666 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0
Any idea why my containers cannot connect to the outside world?
EDIT:
I have tried cleaning up completely my iptables rules and allow all traffic:
iptables -nvL
Chain INPUT (policy ACCEPT 12968 packets, 945K bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 83 packets, 7850 bytes)
pkts bytes target prot opt in out source destination
iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 12871 packets, 939K bytes)
pkts bytes target prot opt in out source destination
Chain INPUT (policy ACCEPT 37 packets, 1856 bytes)
pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 29 packets, 2447 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 29 packets, 2447 bytes)
pkts bytes target prot opt in out source destination
iptables -t mangle -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
In such case even pings are not going out of the container:
docker run -i -t busybox ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
--- 8.8.8.8 ping statistics ---
7 packets transmitted, 0 packets received, 100% packet loss

iptables block IP for x hours not working?

On my Linux server, I want to ban IPs that access certain ports for 24 hours using IPtables. For this, I use the following IPtables rules:
# Check if IP is on banlist, if yes then drop
-A INPUT -m state --state NEW -j bancheck
-A bancheck -m recent --name blacklist --rcheck --reap --seconds 86400 -j LOG --log-prefix "IPT blacklist_ban: "
-A bancheck -m recent --name blacklist --rcheck --reap --seconds 86400 -j DROP
# PUT IPs on banlist
-A banlist -m recent --set --name blacklist -j LOG --log-prefix "IPT add_IP_to_blacklist: "
-A banlist -j DROP
# Ban access to these ports
-A INPUT -p tcp -m multiport --dports 23,25,445,1433,2323,3389,4899,5900 -j LOG --log-prefix "IPT syn_naughty_ports: "
-A INPUT -p tcp -m multiport --dports 23,25,445,1433,2323,3389,4899,5900 -j banlist
In the logs, I can verify that this works:
Mar 13 02:12:23 kernel: [39534099.648488] IPT syn_naughty_ports: IN=eth0 OUT= MAC=... SRC=218.189.140.2 DST=... LEN=52 TOS=0x00 PREC=0x00 TTL=113 ID=29768 DF PROTO=TCP SPT=65315 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
Mar 13 02:12:23 kernel: [39534099.648519] IPT add_IP_to_blacklist: IN=eth0 OUT= MAC=... SRC=218.189.140.2 DST=...4 LEN=52 TOS=0x00 PREC=0x00 TTL=113 ID=29768 DF PROTO=TCP SPT=65315 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
Mar 13 02:12:26 kernel: [39534102.664136] IPT blacklist_ban: IN=eth0 OUT= MAC=... SRC=218.189.140.2 DST=... LEN=52 TOS=0x00 PREC=0x00 TTL=113 ID=4724 DF PROTO=TCP SPT=65315 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
But then the logs also show that just over 2 hours later, the same IP again accesses my system. Rather than being blocked right at the beginning through the chain "bancheck", the IP can access the port, which results in it being put on the "banlist" again (destination port in both cases was the same port 25).
Mar 13 04:35:59 kernel: [39542718.875859] IPT syn_naughty_ports: IN=eth0 OUT= MAC=... SRC=218.189.140.2 DST=... LEN=52 TOS=0x00 PREC=0x00 TTL=113 ID=4533 DF PROTO=TCP SPT=57719 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
Mar 13 04:35:59 kernel: [39542718.875890] IPT add_IP_to_blacklist: IN=eth0 OUT= MAC=... SRC=218.189.140.2 DST=... LEN=52 TOS=0x00 PREC=0x00 TTL=113 ID=4533 DF PROTO=TCP SPT=57719 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
Mar 13 04:36:02 kernel: [39542721.880524] IPT blacklist_ban: IN=eth0 OUT= MAC=... DST=... LEN=52 TOS=0x00 PREC=0x00 TTL=113 ID=12505 DF PROTO=TCP SPT=57719 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
Mar 13 04:36:08 kernel: [39542727.882973] IPT blacklist_ban: IN=eth0 OUT= MAC=... SRC=218.189.140.2 DST=... LEN=48 TOS=0x00 PREC=0x00 TTL=113 ID=29092 DF PROTO=TCP SPT=57719 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
But if I understand the IPtables rules right, it should be blocked within the first few lines, as long as it is within the 24 hours, and not be able to get down that far in the IPtables rule set where it is again being found to violate the ports rule, and again put on the "banlist".
Am I doing something wrong, or do I misunderstand the way the rules work?
Working example for ssh from my server
iptables -X black
iptables -N black
iptables -A black -m recent --set --name blacklist -j DROP
iptables -X ssh
iptables -N ssh
iptables -I ssh 1 -m recent --update --name blacklist --reap --seconds 86400 -j DROP
iptables -A INPUT -p TCP --dport ssh -m state --state NEW -j ssh
Don't forget to create the iptables chains with iptables -N
Compare this with your config and see if there are any notable differences.
A more elegant solution would be to use ipset combined with iptables:
timeout
All set types supports the optional timeout parameter when creating a set and adding entries. The value of the timeout parameter for the create command means the default timeout value (in seconds) for new entries. If a set is created with timeout support, then the same timeout option can be used to specify non-default timeout values when adding entries. Zero timeout value means the entry is added permanent to the set. The timeout value of already added elements can be changed by readding the element using the -exist option.
Example:
ipset create test hash:ip timeout 300
ipset add test 192.168.0.1 timeout 60
ipset -exist add test 192.168.0.1 timeout 600

Need to drop established connections with iptables

For app testing purposes, I need to simulate a situation, when a stateful firewall drops an established TCP connection from client to server by timeout. I installed 3 guest VMs in Virtualbox:
Client, network1 ip: 10.0.2.110
Firewall, network1 ip: 10.0.2.5, network2 ip: 10.0.3.5
Server, network2 ip: 10.0.3.6
Client and Server are Fedora19 with iptables disabled
Firewall is Ubuntu 13.10 with the following settings:
cat /etc/iptables.conf
*filter
:INPUT ACCEPT [201:13136]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [110:14472]
-A FORWARD -j LOG --log-prefix "[netfilter] "
-A FORWARD -p icmp -j ACCEPT
-A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -m conntrack --ctstate INVALID -j DROP
-A FORWARD -p tcp -m tcp --dport 2000 -m state --state NEW -j ACCEPT
-A FORWARD -j DROP
COMMIT
sysctl net.netfilter
...
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 30
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 30
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 30
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 30
...
With this settings, conntrack|iptables should drop established TCP connections after 30 seconds of inactivity.
To run the test, I set up "server" on Server:
# ncat -l 2000 --keep-open --exec "/bin/cat"
and connect there with telnet on Client:
$ telnet 10.0.3.6 2000
Trying 10.0.3.6...
Connected to 10.0.3.6.
Escape character is '^]'.
In iptables log I get a normal TCP handshake:
Dec 2 12:24:23 ubuntu kernel: [ 5231.169804] [netfilter] IN=eth0 OUT=eth1 MAC=08:00:27:b8:68:9f:08:00:27:4f:ee:15:08:00 SRC=10.0.2.110 DST=10.0.3.6 LEN=60 TOS=0x10 PREC=0x00 TTL=63 ID=44926 DF PROTO=TCP SPT=47899 DPT=2000 WINDOW=29200 RES=0x00 SYN URGP=0
Dec 2 12:24:23 ubuntu kernel: [ 5231.170489] [netfilter] IN=eth1 OUT=eth0 MAC=08:00:27:00:72:8c:08:00:27:74:b7:df:08:00 SRC=10.0.3.6 DST=10.0.2.110 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=TCP SPT=2000 DPT=47899 WINDOW=28960 RES=0x00 ACK SYN URGP=0
Dec 2 12:24:23 ubuntu kernel: [ 5231.171315] [netfilter] IN=eth0 OUT=eth1 MAC=08:00:27:b8:68:9f:08:00:27:4f:ee:15:08:00 SRC=10.0.2.110 DST=10.0.3.6 LEN=52 TOS=0x10 PREC=0x00 TTL=63 ID=44927 DF PROTO=TCP SPT=47899 DPT=2000 WINDOW=229 RES=0x00 ACK URGP=0
While connection is established, I send several packets with telnet and with # conntrack -L command, I get:
tcp 6 24 ESTABLISHED src=10.0.2.110 dst=10.0.3.6 sport=47899 dport=2000 src=10.0.3.6 dst=10.0.2.110 sport=2000 dport=47899 [ASSURED] mark=0 use=1
And in iptables log I get:
Dec 2 12:24:38 ubuntu kernel: [ 5245.917564] [netfilter] IN=eth0 OUT=eth1 MAC=08:00:27:b8:68:9f:08:00:27:4f:ee:15:08:00 SRC=10.0.2.110 DST=10.0.3.6 LEN=55 TOS=0x10 PREC=0x00 TTL=63 ID=44928 DF PROTO=TCP SPT=47899 DPT=2000 WINDOW=229 RES=0x00 ACK PSH URGP=0
Dec 2 12:24:38 ubuntu kernel: [ 5245.917961] [netfilter] IN=eth1 OUT=eth0 MAC=08:00:27:00:72:8c:08:00:27:74:b7:df:08:00 SRC=10.0.3.6 DST=10.0.2.110 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=36952 DF PROTO=TCP SPT=2000 DPT=47899 WINDOW=227 RES=0x00 ACK URGP=0
Dec 2 12:24:38 ubuntu kernel: [ 5245.918326] [netfilter] IN=eth1 OUT=eth0 MAC=08:00:27:00:72:8c:08:00:27:74:b7:df:08:00 SRC=10.0.3.6 DST=10.0.2.110 LEN=55 TOS=0x00 PREC=0x00 TTL=63 ID=36953 DF PROTO=TCP SPT=2000 DPT=47899 WINDOW=227 RES=0x00 ACK PSH URGP=0
Dec 2 12:24:38 ubuntu kernel: [ 5245.918535] [netfilter] IN=eth0 OUT=eth1 MAC=08:00:27:b8:68:9f:08:00:27:4f:ee:15:08:00 SRC=10.0.2.110 DST=10.0.3.6 LEN=52 TOS=0x10 PREC=0x00 TTL=63 ID=44929 DF PROTO=TCP SPT=47899 DPT=2000 WINDOW=229 RES=0x00 ACK URGP=0
that's also OK.
Next, I wait for several minutes and check that # conntrack -L returns an empty table, than I send some more packets with telnet and expect that it freezes or says something like "connection closed", but, to my surprise, connection isn't actually closed and I get such messages in iptables log:
Dec 2 12:29:51 ubuntu kernel: [ 5558.925402] [netfilter] IN=eth0 OUT=eth1 MAC=08:00:27:b8:68:9f:08:00:27:4f:ee:15:08:00 SRC=10.0.2.110 DST=10.0.3.6 LEN=55 TOS=0x10 PREC=0x00 TTL=63 ID=44930 DF PROTO=TCP SPT=47899 DPT=2000 WINDOW=229 RES=0x00 ACK PSH URGP=0
Dec 2 12:29:51 ubuntu kernel: [ 5558.925927] [netfilter] IN=eth1 OUT=eth0 MAC=08:00:27:00:72:8c:08:00:27:74:b7:df:08:00 SRC=10.0.3.6 DST=10.0.2.110 LEN=55 TOS=0x00 PREC=0x00 TTL=63 ID=36954 DF PROTO=TCP SPT=2000 DPT=47899 WINDOW=227 RES=0x00 ACK PSH URGP=0
Dec 2 12:29:51 ubuntu kernel: [ 5558.926237] [netfilter] IN=eth0 OUT=eth1 MAC=08:00:27:b8:68:9f:08:00:27:4f:ee:15:08:00 SRC=10.0.2.110 DST=10.0.3.6 LEN=52 TOS=0x10 PREC=0x00 TTL=63 ID=44931 DF PROTO=TCP SPT=47899 DPT=2000 WINDOW=229 RES=0x00 ACK URGP=0
No TCP handshake, that could indicate that telnet silently reestablished connection , no difference from previous log, where connection was established according to conntrack.
How can I really make iptables to close established connection after 30 seconds of inactivity?
This might not be the answer, but some explanation of the behaviour: It seems that ip_conntrack tries to allocate the same source port which the internal client has used again on the external interface if it is available. That means, even after a complete wipe of the conntrack table it will be "transparently" re-establish on the same port and TCP sees no interruption. And actually you could consider this as a feature.
To verify this behaviour you would need 2 clients with the same source port connecting to the outside world and then see conntrack again (hard to get simulated as OS assigns source port numbers freely). You should get 2 different port numbers then. Only in this case the TCP connection might recognize that something happened in the meantime (which will be connection closed in most cases)...
try to insert into FORWARD -m state INVALID -j DROP
this will drops packets without established connection.

TCP messages getting coalesced

I have a Java application that is writing to the network. It is writing messages in the region of 764b, +/- 5b. A pcap shows that the stream is getting IP fragmented and we can't explain this.
Linux 2.6.18-238.1.1.el5
A strace shows:
(strace -vvvv -f -tt -o strace.out -e trace=network -p $PID)
1: 2045 12:48:23.984173 sendto(45, "\0\0\0\0\0\0\2\374\0\0\0\0\0\3\n\0\0\0\0\3upd\365myData"..., 764, 0, NULL, 0) = 764
2: 15206 12:48:23.984706 sendto(131, "\0\0\0\0\0\0\2\374\0\0\0\0\0\3\n\0\0\0\0\3upd\365myData"..., 764, 0, NULL, 0 <unfinished ...>
3: 2046 12:48:23.984811 sendto(46, "\0\0\0\0\0\0\2\374\0\0\0\0\0\3\n\0\0\0\0\3upd\365myData"..., 764, 0, NULL, 0 <unfinished ...>
4: 15206 12:48:23.984893 <... sendto resumed> ) = 764
5: 2046 12:48:23.984948 <... sendto resumed> ) = 764
I am seeing packets larger than the MTU when I capture the network, which is causing fragmentation.
4809 5.848987 10.0.0.2 -> 10.0.0.5 TCP 40656 > taiclock [ACK] Seq=325501 Ack=1 Win=46 Len=1448 TSV=344627654 TSER=270108068 # First Fragment
4810 5.848991 10.0.0.5 -> 10.0.0.2 TCP taiclock > 40656 [ACK] Seq=1 Ack=326949 Win=12287 Len=0 TSV=270108081 TSER=344627643 # TCP ack
4811 5.849037 10.0.0.2 -> 10.0.0.5 TCP 40656 > taiclock [PSH, ACK] Seq=326949 Ack=1 Win=46 Len=82 TSV=344627654 TSER=270108081 # Second Frag
Questions:
1) It appears the server trying to batch the two sendto() into one IP packet, which is larger than the MTU and is therefore getting fragmented. Why?
2) Looking at the strace output for PID 2046, is the figure after the equal sign <... sendto resumed> line a total for what was sent? I.e. 764b was sent in total for line 3 and line 5? Or is 764 bytes being sent per line?
3) Are there any options I can pass to strace to log all of the sendto() output? Can't seem to find anything..
To answer your questions, in order:
1) It is perfectly normal for multiple send calls to be coalesced when using TCP as it is a stream protocol so does not preserve user level send boundaries in any way. I don't see any evidence of IP fragmentation (which would be bad) in your trace, just of TCP segmentation (which is completely normal).
2) Yes, that is the size - more specifically it is reporting the value that the system call returned after it resumed.
3) You can use "-e write=all" or "-e write=" to get strace to report the whole of the written data.

Resources