MYSQL-Warning: got packets out of order. Expected 2 but received 0 - amazon-rds

Warning: got packets out of order. Expected 2 but received 0
i use mysql and change wait_timeout 28800 to 180
and then after 180 sec i got that warning message.
In mysql processlist Sleep Command Time reach 180 sec and then I can see warning message.
Should I change wait_timeout ? 180 --> 28800
what is means or that wraning?
please tell me.

Related

chrony: Find the most recent sync time

I am trying to find out the most recent time chrony checked to see if we are in sync. I can see chrony's sources with the following command:
~]$ chronyc sources
210 Number of sources = 3
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* a.b.c 1 10 377 32m -8638us[-8930us] +/- 103ms
^- d.e.f 1 10 366 77m +2928us[+1960us] +/- 104ms
^+ g.h.i 2 10 377 403 -14ms[ -14ms] +/- 137ms
The * symbol indicates the server that we synced to, and the + symbol indicates "acceptable" servers. I can see that chrony is indeed polling both a.b.c and g.h.i.
So what I want to know is if I can use the minimum between the two LastRx values as the most recent time that we confirmed we are in sync. In this example we are synced to a.b.c, but 403s < 32m, so does that mean that 403 seconds ago chrony checked against g.h.i and determined we are in sync. In other words, is 403 seconds the last time I know system-time was in sync, or is it 32 minutes ago?
There is some documentation on the chronyc command here: https://chrony.tuxfamily.org/doc/3.5/chronyc.html

MDNS causes local HTTP requests to block for 2 seconds

It takes 0.02 seconds to send a message via python code requests.post(http://172.16.90.18:8080, files=files), but it takes 2 seconds to send a message via python code requests.post(http://sdss-server.local:8080, files=files)
The following chart is the packet I caught with wireshark, from the first column of 62 to 107, you can see that it took 2 seconds for mdns to resolve the domain name.
My system is ubuntu 18.04, I rely on this link Mac OS X slow connections - mdns 4-5 seconds - bonjour slow
I edited the /etc/hosts file, I changed this line to
127.0.0.1 localhost sdss-server.local
After modification, it still takes 2 seconds to send the message via python code requests.post(http://sdss-server.local:8080, files=files).
Normally it should be 0.02 to 0.03 seconds, what should I do to fix this and reduce the time from 2 seconds to 0.02 seconds?

TCP write operation (in a non-blocking mode) sometimes blocks for more then a few msec

I'm having a weird blocking issue when streaming data over TCP from client to server.
My client opens a TCP socket in a non-blocking mode.
Most of the time, everything is ok and the write calls return immediately (500-700 usec).
But sometimes, the write method is been blocked for a few consecutive write calls, and then back to normal (write calls return immediately (500-700 usec)).
In one of the times the issue happened, I've noticed that the data I need to write is in the size of 1324752 bytes, while the pending data (I used ioctl(c->fd, SIOCOUTQ, &pending) for that) in the TCP socket queue was 331188 bytes, and then the write call blocked for 0.9 msec.
On the next write the data I need to write is in the size of 1249300 bytes, while the pending data in the TCP socket queue was 330144 bytes, and then the write call blocked for 22 msec.
The following write operations also took a few msec (13 msec and 4 msec) and then things started to cool down and back to normal...
I'll try to summarize it in a table were:
seq number 2269863 is the first write operation in this trace (2269864 is the second operation etc.)
data size is the size of the data we need to send.
nwritten is the actual data we had succeed to send.
pending data is the amount of unsent data in the socket send queue.
write duration is time that the write operation was blocking.
seq number
data size [bytes]
nwritten [bytes]
pending data [bytes]
write duration [usec]
2269863
1324752
225992
331188
934
2269864
1249300
718431
330144
22227
2269865
651301
651301
0
13648
2269866
150540
150540
0
4262
2269867
30108
30108
0
755
2269868
30108
30108
0
754
2269869
30108
30108
0
613
2269870
30108
30108
0
857
2269871
30108
30108
0
555
2269872
30108
30108
0
569
2269873
30108
30108
0
636
2269874
30108
30108
0
814
2269875
30108
30108
0
812
Some important notes:
I've verified that the NON_BLOCKING flag is set.
I'm running 6 TCP clients, sending their data to the same TCP server - when the write method is been blocked, it happens on most the clients simultaneously (sometimes on all of them).
The server runs on linux 4.15 and the clients run on linux 4.19.
Question:
Why does the write blocks for so long (were it's configured as a non-blocking operation) ?
Any tips or ideas are more then welcome,
Thanks !!

Nuxt.js / Node.js Nginx requests per second

I'm trying to prepare a CentOS server to run Nuxt.js (Node.js) application via Nginx reverse proxy.
First, I fire up a simple test server that returns an HTTP 200 response with the text "ok". It easily handles ~10.000 requests/second with ~10ms of mean latency.
Then, when I switch to the hello-world NUXT example app (npx create-nuxt-app) and I run weighttp http benchmarking tool to run the following command:
weighttp -n 10000 -t 4 -c 100 localhost:3000
The results are as follows:
starting benchmark...
spawning thread #1: 25 concurrent requests, 2500 total requests
spawning thread #2: 25 concurrent requests, 2500 total requests
spawning thread #3: 25 concurrent requests, 2500 total requests
spawning thread #4: 25 concurrent requests, 2500 total requests
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done
finished in 9 sec, 416 millisec and 115 microsec, 1062 req/s, 6424 kbyte/s
requests: 10000 total, 10000 started, 10000 done, 10000 succeeded, 0 failed,
0 errored
status codes: 10000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 61950000 bytes total, 2000000 bytes http, 59950000 bytes data
As You can see it won't climb over 1062 req/s. Sometimes I can reach something like ~1700 req/s if I ramp up the concurrency param, but no more than that.
I'm expecting a simple hello world example app to run at least ~10.000 req/s without a high delay or latency on this machine.
I've tried checking file limits, open connection limits, Nginx workers, etc but couldn't find the root cause, so I'm really looking forward to any ideas on where to at least start searching for the root cause.
I can provide any logs or any other additional info if needed.

Troubleshooting varnish sess_closed_err

We deliver IPTV-services to our Customers with Varnish-cache 6.0 and I have been a bit worried that there might be a problem with our Varnish-cache servers. This assumption is based on the amount of customer incident reports when the IPTV-stream flows through our Varnish-cache instead of the backend-server directly.
That is why I would like to eliminate all errors from varnishstat to narrow down the possible reasons for the incidents, since at the moment I don't have a better angle to troubleshoot the problem.
Let's state that I am far from being familiar or an expert with Varnish.
So let's dig in to the "problem":
varnishstat -1 output:
MAIN.sess_closed 38788 0.01 Session Closed
MAIN.sess_closed_err 15260404 3.47 Session Closed with error
Basically almost all of the connections to Varnish-cache servers close with error. I set up a Virtualized Demo-server to our Network with identical Varnish configuration and there the only sess_closed_err were generated when I changed channels in my VLC-mediaplayer. Let's note that I was not able to run but a few VLC's at the same time to the server and that our customers use STB-boxes to use the service.
So my actual question is, how can I troubleshoot what causes the sessions to close with error?
There are some other counters that will show more specifically what happens with the sessions. The next step in your troubleshooting is therefore to look at these counters:
varnishstat -1 | grep ^MAIN.sc_
I'll elaborate a bit with a typical example:
$ sudo varnishstat -1 | egrep "(sess_closed|sc_)"
MAIN.sess_closed 8918046 1.45 Session Closed
MAIN.sess_closed_err 96244948 15.69 Session Closed with error
MAIN.sc_rem_close 86307498 14.07 Session OK REM_CLOSE
MAIN.sc_req_close 8402217 1.37 Session OK REQ_CLOSE
MAIN.sc_req_http10 45930 0.01 Session Err REQ_HTTP10
MAIN.sc_rx_bad 0 0.00 Session Err RX_BAD
MAIN.sc_rx_body 0 0.00 Session Err RX_BODY
MAIN.sc_rx_junk 132 0.00 Session Err RX_JUNK
MAIN.sc_rx_overflow 2 0.00 Session Err RX_OVERFLOW
MAIN.sc_rx_timeout 96193210 15.68 Session Err RX_TIMEOUT
MAIN.sc_tx_pipe 0 0.00 Session OK TX_PIPE
MAIN.sc_tx_error 0 0.00 Session Err TX_ERROR
MAIN.sc_tx_eof 3 0.00 Session OK TX_EOF
MAIN.sc_resp_close 0 0.00 Session OK RESP_CLOSE
MAIN.sc_overload 0 0.00 Session Err OVERLOAD
MAIN.sc_pipe_overflow 0 0.00 Session Err PIPE_OVERFLOW
MAIN.sc_range_short 0 0.00 Session Err RANGE_SHORT
MAIN.sc_req_http20 0 0.00 Session Err REQ_HTTP20
MAIN.sc_vcl_failure 0 0.00 Session Err VCL_FAILURE
The output from this specific environment shows that the majority of the sessions that close with error happens due to receive timeout (MAIN.sc_rx_timeout). This timeout controls how long varnish will keep idle connections open, and is set using the timeout_idle parameter to varnishd. Its value is 5 seconds by default. Use varnishadm to see the current value and the description of the timeout:
$ sudo varnishadm param.show timeout_idle
timeout_idle
Value is: 10.000 [seconds]
Default is: 5.000
Minimum is: 0.000
Idle timeout for client connections.
A connection is considered idle until we have received the full
request headers.
This parameter is particularly relevant for HTTP1 keepalive
connections which are closed unless the next request is
received before this timeout is reached.
Increasing timeout_idle will likely reduce the number of sessions that are closed due to idle timeout. This can be done by setting the value as a parameter when starting varnish. Example:
varnishd [...] -p timeout_idle=15
Note that there are pros and cons related to increasing this timeout.

Resources