Nginx in Docker keeps hanging every few (~10) requests on Docker Mac OS X m1 - node.js

Edit: I found this to be a networking issue, but I don't have an answer on how to repair it yet, so hopefully someone else knows anything about it:
When i'm inside the Nginx container, I can query node.js like this:
curl http://192.168.65.2:3001/api/getTest
and that works, but has the same erratic behaviour as Nginx has below. So it indeed does mostly time-out on the backend for some networking reason I do not understand.
So when I run ab from the Nginx container:
> ab -n 10000 -c 5 http://192.168.65.2:3001/api/getTest
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.168.65.2 (be patient)
apr_pollset_poll: The timeout specified has expired (70007)
Total of 16 requests completed
Which indeed is the same behaviour as I saw from Nginx -> backend. Considering node.js works fine (seeing from the below ab on node.js directly; I even ran both simultaneously; the node.js version always finished with 0 errors, the ab from Docker never finishes correctly as shown above and below).
------- Old question, needed to understand the full case:
I have a standard nginx docker image:
image: nginx
port mapping:
- "8080:80"
I have a node server running outside docker with node v17.3.0 on port 3001.
I proxy pass requests from nginx 8080 to 3001 with the following configuration;
upstream backend {
server host.docker.internal:3001;
keepalive 32;
}
location /api {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-NginX-Proxy true;
proxy_pass http://backend;
proxy_set_header Host $http_host;
}
When I do requests with curl or via the web frontend, it works on both 8080 and 3001 as expected. However, I noticed that on :8080, even with very little subsequent requests , Nginx hangs; average when I manually just try curl requests, after average 10 in a row Nginx hangs (it seems to be at around 10 subsequent request 'blocks' as ab hangs always at around 50 when concurrency=5, 30 when currency is 3 etc completed requests and manually, it always hangs after 10 completed requests). And it hangs longer than 30s, always.
So I get try:
> curl http://localhost:8080/api/getTest
<myjsonbody>
> curl http://localhost:3001/api/getTest
<myjsonbody>
But sometimes(!), simply:
> curl http://localhost:8080/api/getTest
And nothing for 30s+ seconds after which it sometimes does return the correct result and sometimes not:
> time curl http://localhost:8080/api/getTest
<html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx/1.21.4</center>
</body>
</html>
0.01s user 0.01s system 0% cpu 1:00.04 total
Now I look at the node.js logs, when sending requests to nginx and the ones that hang do not arrive at node.js until 30-60s later or, in the latter case, it doesn't arrive at all at node.js.
So, to test I try the following ab version:
> ab -n 10000 -c 5 http://localhost:8080/api/getTest
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
apr_pollset_poll: The timeout specified has expired (70007)
Total of 48 requests completed
And, directly to node.js:
ab -n 10000 -c 5 http://localhost:3001/api/getTest
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software:
Server Hostname: localhost
Server Port: 3001
Document Path: /api/getTest
Document Length: 41017 bytes
Concurrency Level: 5
Time taken for tests: 2.965 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 412610000 bytes
HTML transferred: 410170000 bytes
Requests per second: 3372.14 [#/sec] (mean)
Time per request: 1.483 [ms] (mean)
Time per request: 0.297 [ms] (mean, across all concurrent requests)
Transfer rate: 135877.00 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 0 1 3.7 1 118
Waiting: 0 1 3.1 1 118
Total: 0 1 3.7 1 119
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 2
95% 2
98% 3
99% 4
100% 119 (longest request)
All requests succeed directly to node.js while nginx almost immediately hangs. I tried direct node.js with many more requests concurrently and in total; it simply never hangs. Nginx almost immediately always hangs.
Any ideas? I wouldn't know how to even debug it. I used Wireshark to trace the data and it goes into Nginx when it fails, but often doesn't actually gets sent to node.js (and then it gives the above gateway error) while when it does reach node.js, it is as:
Request received by Nginx
40s nothing
Request received by Node.js from Nginx
Immediate response by Node.js to Nginx
Nginx delivers endresult after 40s+some ms.
I tried a lot of configs but it doesn't fix the problem.
Could it be a networking problem? It feels like it but how would I find out further than what I have tried already?

Related

How to try all servers in dns using libcurl?

I need to regularly and randomly test with Linux/C++/libcurl the responses of several servers that are available through a single DNS name, such as
$ host example.com
n1.example.com 1.2.3.4
n2.example.com 1.2.3.5
n3.example.com 1.2.3.6
The list changes. When I try https://example.com libcurl always uses the same IP for the span of the TTL, and I cannot switch to the next host. There is CURLOPT_DNS_CACHE_TIMEOUT setopt, but setting it to zero does not help - even if I fully recreate easycurl object I still get the same IP. Therefore, this does not help: curl - How to set up TTL for dns cache & How to clear the curl cache
I can of course manually resolve DNS names and iterate, but are there any options? Polling randomly is okay. I see curl uses c-ares. Is there a way to clean up the cache there and will it help?
I cannot do exactly what I need with curl without doing a resolve by myself, but there are findings for the others to share with:
First of all, as a well-written TCP client, curl will try the hosts from the DNS list from top to bottom until a successful connection is made. Since then it will use that host even if it returns some higher level error (such as SSL error or HTTP 500). This is good for all major cases.
Curl command line of newer curl versions has --retry and --retry-all-errors - but there are no such things in libcurl, unfortunately. The feature is being enhanced right now, and there is no release yet as of 2021-07-14 that will enumerate all DNS hosts until there is one that returns HTTP 200. Instead, the released curl versions (I tried 7.76 and 7.77) will always do retries with the same host. But the nightly build (2021-07-14) does enumerate all DNS hosts. Here is how it behaves for two retries and three inexisting hosts (note, the retries will happen if any host returns HTTP 5xx):
$ ./src/curl http://nohost.sureno --trace - --retry 2 --retry-all-errors
== Info: Trying 192.168.1.112:80...
== Info: connect to 192.168.1.112 port 80 failed: No route to host
== Info: Trying 192.168.1.113:80...
== Info: connect to 192.168.1.113 port 80 failed: No route to host
== Info: Trying 192.168.1.114:80...
== Info: connect to 192.168.1.114 port 80 failed: No route to host
== Info: Failed to connect to nohost.sureno port 80 after 9210 ms: No route to host
== Info: Closing connection 0
curl: (7) Failed to connect to nohost.sureno port 80 after 9210 ms: No route to host
Warning: Problem (retrying all errors). Will retry in 1 seconds. 2 retries
Warning: left.
== Info: Hostname nohost.sureno was found in DNS cache
== Info: Trying 192.168.1.112:80...
== Info: connect to 192.168.1.112 port 80 failed: No route to host
== Info: Trying 192.168.1.113:80...
== Info: connect to 192.168.1.113 port 80 failed: No route to host
== Info: Trying 192.168.1.114:80...
== Info: connect to 192.168.1.114 port 80 failed: No route to host
== Info: Failed to connect to nohost.sureno port 80 after 9206 ms: No route to host
== Info: Closing connection 1
curl: (7) Failed to connect to nohost.sureno port 80 after 9206 ms: No route to host
Warning: Problem (retrying all errors). Will retry in 2 seconds. 1 retries
This behavior can be very helpful for the users of libcurl, but unfortunately, these retry flags presently have no mapping to curl_easy_setopt. And as a result, if you give --libcurl to the command line you will not see any retry-related code

Getting 502 Bad Gateway error with ngrok when I use https localhost url in a Node App

I'm developing a Node App. I need https for receiving callback URLs from 3rd party Apps. So I added SSL certificate.
ngrok works only with http URL (http://localhost:3000).
I'm using the command ngrok http 3000. But when I access ngrok https URL, I'm getting 502 Bad Gateway error in browser.
How do I make ngrok work with https://localhost:3000 URL.
If you are using for signup or login with google/facebook then I can suggest you another way. You can use
https://tolocalhost.com/
configure how it should redirect a callback to your localhost. This is only for development purposes.
ngrok can itself provide https support - this is one of its major use cases (at least for me) so you don't need to create any ssl certificates
Step-by-step guide
Here's a simple testing file:
$ cat t.html
<body>
<h1>test</h1>
</body>
Bringing it up a simple http server on localhost:
python -m SimpleHTTPServer 7070
Running ngrok
$ ngrok http 7070
grok by #inconshreveable (Ctrl+C to quit)
Session Status online
Session Expires 7 hours, 59 minutes
Update update available (version 2.2.8, Ctrl-U to update)
Version 2.2.4
Region United States (us)
Web Interface http://127.0.0.1:4040
Forwarding http://4580e823.ngrok.io -> localhost:7070
Forwarding https://4580e823.ngrok.io -> localhost:7070
Connections ttl opn rt1 rt5 p50 p90
0 0 0.00 0.00 0.00 0.00
Checking
curl -D - https://4580e823.ngrok.io/t.html
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/2.7.10
Date: Tue, 23 Oct 2018 20:03:45 GMT
Content-type: text/html
Content-Length: 33
Last-Modified: Tue, 23 Oct 2018 19:53:09 GMT
Connection: keep-alive
<body>
<h1>test</h1>
</body>
That's it

Debian Linux Raspbian- Raspberry Pi time offset is 65s ahead of UTC

For some strange reason unknown to me, my RPi appears to have been set incorrectly to UTC +65s. The output I receive is the following:
sudo ntpd -gq
ntpd: time set -65.706156s
I have tried stopping and restarting ntp server (no effect).
When I check the sync servers using the following command, I do receive a ping back so it's not a case of the servers not responding, or a firewall issue:
grep -P "^server" /etc/ntp.conf
server 0.debian.pool.ntp.org iburst
server 1.debian.pool.ntp.org iburst
server 2.debian.pool.ntp.org iburst
server 3.debian.pool.ntp.org iburst
ping -c 1 0.debian.pool.ntp.org
PING 0.debian.pool.ntp.org (193.1.219.116) 56(84) bytes of data.
64 bytes from tbag.heanet.ie (193.1.219.116): icmp_req=1 ttl=51 time=18.8 ms
--- 0.debian.pool.ntp.org ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 18.818/18.818/18.818/0.000 ms
I'm at a loss as to how to correct this.
UPDATE:
Running the ntpq -p command yields the following info:
remote refid st t when poll reach delay offset jitter
==============================================================================
*adsl-172-10-0-1 117.70.*.110 4 u 2 64 7 0.617 -0.070 0.109
Is this the ntp server that I'm trying to sync to - because that IP belongs to CHINANET (I don't know how or why).
I also tried to manually set the RPi time, after stopping ntp service, setting the time correctly and restarting the service.
What I noticed was that the time was correctly set for a good 5 seconds, before reverting back to it's 65s offset. So it appears that this is the issue.
Found the solution as described in post 6 of the link:
http://forum.openmediavault.org/index.php/Thread/13035-Raspberry-Pi-NTP-service-not-using-etc-ntp-conf/
Basically, connecting the RPi to the network, the DHCP server acts as the NTP server and creates a copy of the ntp.conf file in the location /var/lib/ntp/ntp.conf.dhcp
This file overrides the default /etc/ntp.conf file, so deleting it and then stopping the ntp service, performing a resync, and then starting the service is the only way to resolve this.
The command for resync is:
sudo ntpdate -b pool.ntp.org
The original issue was that the ntp server was syncing with a CHINANET server and causing a 65s offset, which I suspect is down to a misconfigured DCHP/NTP server on our network.

HAProxy decreasing throughput

I think I am doing something wrong with HAProxy conf because my throughput drops to 25% in a real-world test done with HAProxy and one single AWS instance. Following is my relevant (extremely simple) configuration:
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 20000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 30000
frontend localnodes
bind *:80
mode http
default_backend nodes
backend nodes
mode http
balance roundrobin
hash-type consistent
option httpchk /health
server w1 xx.xx.xx.xx:80 check id 1
I had enabled logging. A typical entry in log looks like this:
Dec 2 09:29:05 localhost haproxy[2782]: xx.xx.xx.xx:43908
[02/Dec/2016:09:29:05.940] localnodes nodes/w1 38/0/0/1/41 200 130 - -
---- 36/36/12/2/0 0/0 "GET /ep?key=123&message=XXQSYI HTTP/1.1" Dec 2 09:29:05 localhost haproxy[2782]: xx.xx.xx.xx:43920
[02/Dec/2016:09:29:05.941] localnodes nodes/web01 39/0/0/0/40 200 160
- - ---- 35/35/11/0/0 0/0 "GET /q1?key=123&val=123 HTTP/1.1" Dec 2 09:29:05 localhost haproxy[2782]: xx.xx.xx.xx:43933
[02/Dec/2016:09:29:05.955] localnodes nodes/web01 24/0/0/1/26 200 134
- - ---- 34/34/11/1/0 0/0 "GET /q1?key=123&val=123 HTTP/1.1"
My throughput is 25% of what a direct traffic to my instance would be. This is terrible performance. Am I doing something really wrong?
EDIT
Going down the log, some logs clearly show that time taken to reach server from HAProxy is too high
Dec 2 10:56:59 localhost haproxy[25988]: xx.xx.xx.xx:39789 [02/Dec/2016:10:56:58.729] main app/app1 0/0/1000/1/1002 200 449 - - ---- 13/13/13/7/0 0/0 "GET / HTTP/1.1"
Dec 2 10:56:59 localhost haproxy[25988]: xx.xx.xx.xx:39803 [02/Dec/2016:10:56:58.730] main app/app1 0/0/999/1/1000 200 377 - - ---- 12/12/12/7/0 0/0 "GET / HTTP/1.1"
Dec 2 10:56:59 localhost haproxy[25988]: xx.xx.xx.xx:39804 [02/Dec/2016:10:56:58.730] main app/app1 0/0/999/1/1000 200 277 - - ---- 11/11/11/7/0 0/0 "GET / HTTP/1.1"
From your log, most of your time is being spent connecting to the server. For example, you spend 1000, 999 and 999 milliseconds connecting. This may have to do with that you are closing the connection to the server immediately after each transaction by using option http-server-close. So, the TCP connection has to be re-established each time (if this is the same client between requests).
Overall, it looks like you're spending about 1 second per request, which doesn't sound horrible to me. What were you seeing before using HAProxy?

AWS - EC2 - MongoDB replica set time sync issue - NTP - replication lag

We are encountering clock drift issues with our MongoDB replica set running on AWS. This just seemed to start happening recently after we added additional data to the set, before then we did not really notice this issue unless the system was under heavy load. The following error is logged in the mongod.log file sporadically and the system is not under load.
To test this we have isolated a set of machines with the same dataset and not in use by our web application though the error is still occurring;
2014-12-12T13:33:51.333+0000 [rsBackgroundSync] changing sync target
because current sync target's most recent OpTime is Dec 12 13:32:42:c
which is more than 30 seconds behind member mongo1:27017 whose most
recent OpTime is 1418391230
From the above the time stamp shows that one of the mongodb replica set members is over a minute behind. The worst we have seen is 12 minutes out of sync.
This error in turn causes replication lag and we receive the notification about this from the Mongo Monitoring Service although it does correct itself.
The setup is 3 x r3.xlarge AWS Linux instances, 1 in each availability zone of the EU-West-1A region. The machines have been setup using the Mongo recommended settings with a Raid array and the cloud formation scripts provided by Mongo. The data is around 4GB in size.
We think the issue is related to the NTP sync, by default on the AWS Linux Amazon Machine Image the ntpd service is configured to go to a pool of aws ntp servers hosted on www.pool.ntp.org.
To try and rule this out we setup our own NTP server on AWS that the MongoDB servers could sync to. The issue still occurred so we changed the maxpoll and minpoll time for the ntpd service on the mongo machines to sync the time every 16 seconds from the NTP server but the error is still occurring.
We increased the MongoDB OpLog size as well to see if that would make any difference but it didn’t.
Does anyone else encounter this type of issue? Is there something we are missing?
Cheers,
Colin.
ps -ef |grep ntp;
mongodb1
ntp 5163 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 15865 15839 0 09:31 pts/2 00:00:00 grep ntp
mongodb2
ntp 4834 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 19056 19029 0 09:31 pts/0 00:00:00 grep ntp
mongodb3
ntp 5795 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 26199 26173 0 09:31 pts/0 00:00:00 grep ntp
cat /etc/ntp.conf;
# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict -6 ::1
# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.amazon.pool.ntp.org iburst dynamic
#server 1.amazon.pool.ntp.org iburst dynamic
#server 2.amazon.pool.ntp.org iburst dynamic
#server 3.amazon.pool.ntp.org iburst dynamic
server time-server.domain.com iburst
#broadcast 192.168.1.255 autokey # broadcast server
#broadcastclient # broadcast client
#broadcast 224.0.1.1 autokey # multicast server
#multicastclient 224.0.1.1 # multicast client
#manycastserver 239.255.254.254 # manycast server
#manycastclient 239.255.254.254 autokey # manycast client
# Enable public key cryptography.
#crypto
includefile /etc/ntp/crypto/pw
# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys
# Specify the key identifiers which are trusted.
#trustedkey 4 8 42
# Specify the key identifier to use with the ntpdc utility.
#requestkey 8
# Specify the key identifier to use with the ntpq utility.
#controlkey 8
# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats
# Enable additional logging.
logconfig =clockall =peerall =sysall =syncall
# Listen only on the primary network interface.
interface listen eth0
interface ignore ipv6
ntpq -npcrv;
remote refid st t when poll reach delay offset jitter
==============================================================================
*172.31.14.137 91.*.*.* 3 u 557 1024 377 1.121 -0.264 0.161
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5#1.2349-o Sat Mar 23 00:37:31 UTC 2013 (1)",
processor="x86_64", system="Linux/3.14.23-22.44.amzn1.x86_64", leap=00,
stratum=4, precision=-23, rootdelay=23.597, rootdisp=109.962,
refid=172.31.14.137,
reftime=d83a757a.175b5fa1 Tue, Dec 16 2014 9:10:18.091,
clock=d83a77a7.82431efa Tue, Dec 16 2014 9:19:35.508, peer=27361,
tc=10, mintc=3, offset=-0.264, frequency=-13.994, sys_jitter=0.000,
clk_jitter=0.358, clk_wander=0.053
After upgrading to MongoDB 3 using the WiredTiger storage engine we do not see this issue any more.

Resources