How to try all servers in dns using libcurl? - dns

I need to regularly and randomly test with Linux/C++/libcurl the responses of several servers that are available through a single DNS name, such as
$ host example.com
n1.example.com 1.2.3.4
n2.example.com 1.2.3.5
n3.example.com 1.2.3.6
The list changes. When I try https://example.com libcurl always uses the same IP for the span of the TTL, and I cannot switch to the next host. There is CURLOPT_DNS_CACHE_TIMEOUT setopt, but setting it to zero does not help - even if I fully recreate easycurl object I still get the same IP. Therefore, this does not help: curl - How to set up TTL for dns cache & How to clear the curl cache
I can of course manually resolve DNS names and iterate, but are there any options? Polling randomly is okay. I see curl uses c-ares. Is there a way to clean up the cache there and will it help?

I cannot do exactly what I need with curl without doing a resolve by myself, but there are findings for the others to share with:
First of all, as a well-written TCP client, curl will try the hosts from the DNS list from top to bottom until a successful connection is made. Since then it will use that host even if it returns some higher level error (such as SSL error or HTTP 500). This is good for all major cases.
Curl command line of newer curl versions has --retry and --retry-all-errors - but there are no such things in libcurl, unfortunately. The feature is being enhanced right now, and there is no release yet as of 2021-07-14 that will enumerate all DNS hosts until there is one that returns HTTP 200. Instead, the released curl versions (I tried 7.76 and 7.77) will always do retries with the same host. But the nightly build (2021-07-14) does enumerate all DNS hosts. Here is how it behaves for two retries and three inexisting hosts (note, the retries will happen if any host returns HTTP 5xx):
$ ./src/curl http://nohost.sureno --trace - --retry 2 --retry-all-errors
== Info: Trying 192.168.1.112:80...
== Info: connect to 192.168.1.112 port 80 failed: No route to host
== Info: Trying 192.168.1.113:80...
== Info: connect to 192.168.1.113 port 80 failed: No route to host
== Info: Trying 192.168.1.114:80...
== Info: connect to 192.168.1.114 port 80 failed: No route to host
== Info: Failed to connect to nohost.sureno port 80 after 9210 ms: No route to host
== Info: Closing connection 0
curl: (7) Failed to connect to nohost.sureno port 80 after 9210 ms: No route to host
Warning: Problem (retrying all errors). Will retry in 1 seconds. 2 retries
Warning: left.
== Info: Hostname nohost.sureno was found in DNS cache
== Info: Trying 192.168.1.112:80...
== Info: connect to 192.168.1.112 port 80 failed: No route to host
== Info: Trying 192.168.1.113:80...
== Info: connect to 192.168.1.113 port 80 failed: No route to host
== Info: Trying 192.168.1.114:80...
== Info: connect to 192.168.1.114 port 80 failed: No route to host
== Info: Failed to connect to nohost.sureno port 80 after 9206 ms: No route to host
== Info: Closing connection 1
curl: (7) Failed to connect to nohost.sureno port 80 after 9206 ms: No route to host
Warning: Problem (retrying all errors). Will retry in 2 seconds. 1 retries
This behavior can be very helpful for the users of libcurl, but unfortunately, these retry flags presently have no mapping to curl_easy_setopt. And as a result, if you give --libcurl to the command line you will not see any retry-related code

Related

Remote access to OpenShift Local (CRC) running on Win11

I've got CRC running on Windows 11 and I would like to connect there from a RHEL9 VM.
CRC listening on 127.0.0.1:6443
Port forwarding rule created on Win machine to fwd connections on 192.168.1.156 (local intf) to 127.0.0.1:
$ netsh interface portproxy show v4tov4
Listen on ipv4: Connect to ipv4:
Address Port Address Port
192.168.1.156 9000 127.0.0.1 6443
Added rule in firewall to allow connections to port 9000
From the VM:
[test#workstation ~]$ telnet 192.168.1.156 9000
Trying 192.168.1.156...
Connected to 192.168.1.156.
Escape character is '^]'.
Connection closed by foreign host.
[test#workstation ~]$ oc login -u developer -p developer https://192.168.1.156:9000
The server is using a certificate that does not match its hostname: x509: certificate is valid for 10.217.4.1, not 192.168.1.156
You can bypass the certificate check, but any data you send to the server could be intercepted by others.
Use insecure connections? (y/n): y
Error from server (InternalError): Internal error occurred: unexpected response: 412
Any idea on how I can fix this and be able to connect from my VM towards CRC?
thanks

rafthttp: dial tcp timeout on etcd 3-node cluster creation

I don't have an access to the etcd part of the project's source code, however I do have access to the /var/log/syslog.
The goal is to setup up 3-node cluster.
(1)The very first etcd error that comes up is:
rafthttp: failed to dial 76e7ffhh20007a98 on stream MsgApp v2 (dial tcp 10.0.0.134:2380: i/o timeout)
Before continuing, I would say that I can ping all three nodes from each of the nodes. As well as I have tried to open the 2380 TCP ports and still no success - same error.
(2)So, before that error I had following messages from the etcd, which in my opinion confirm that cluster is setup correctly:
etcdserver/membership: added member 76e7ffhh20007a98 [https://server2:2380]
etcdserver/membership: added member 222e88db3803e816 [https://server1:2380]
etcdserver/membership: added member 999115e00e17123d [https://server3:2380]
In /etc/hosts file these DNS names are resolved as:
server2 10.0.0.135
server1 10.0.0.134
server3 10.0.0.136
(3)The initial setup, however (on each nodes looks like this):
embed: listening for peers on https://127.0.0.1:2380
embed: listening for client requests on 127.0.0.1:2379
So, to sum up, each node have got this initial setup log (3) and then adds members (2) then once these steps are done it fails with (1). As I know the etcd cluster creation is following this pattern: https://etcd.io/docs/v3.5/tutorials/how-to-setup-cluster/
Without knowing the source code is really hard to debug, however maybe some ideas on the error and what could cause it?
UPD: etcdctl cluster-health output (ETCDCTL_ENDPOINT is exported):
cluster may be unhealthy: failed to list members Error: client: etcd
cluster is unavailable or misconfigured; error #0: client: endpoint
http://127.0.0.1:2379 exceeded header timeout ; error #1: dial tcp
127.0.0.1:4001: connect: connection refused
error #0: client: endpoint http://127.0.0.1:2379 exceeded header
timeout error #1: dial tcp 127.0.0.1:4001: connect: connection refused

python3 requests hangs when accessing port 25564 or higher on Ubuntu 20.04 LTS

I have a simple program which creates a simple web server at localhost with a random port between 10000 and 65535 (which is the highest unsigned 16-bit integer). You can also specify a port but if you don't know on which port it runs it's hard to find out.
I have written a little helper program that should show every port that's being listened to.
The helper:
import requests
for port in range(10000, 65535):
try:
print(port, requests.get("http://localhost:{}".format(port)))
except Exception as e:
print("{}: {}".format(type(e).__name__, port), end="\r")
I expect it to show ConnectionError: 10000 and counting up to 65535 and showing any found connections. But it hangs always on port 25564 25565, last showing the message for port 25564. And if I do a completely unrelated request to 'http://localhost:25564' or any higher port it hangs.
The script hangs on port 25565 when I start a server on 25564.
Normally if a port has no server listening it will immediately refuse the connection and give a ConnectionError. Above port 25564 it doesn't but just waits until I stop it.
This behaviour seems completely random as port 25564 is unassigned according to speedguide.net.
Port 25565 is the standard MySQL and Minecraft Dedicated Server port (according to speedguide.net), both of which I haven't running on my machine. Therefore the hang still seems random.
I'm using python3 on Ubuntu 20.04 LTS.
Interestingly it didn't fail on my laptop with Linux Mint 21...
As #root requested in the comments, here is the output of nmap localhost:
Starting Nmap 7.80 ( https://nmap.org ) at 2022-09-25 11:42 CEST
Host is up (0.00014s latency).
Not shown: 996 closed ports
PORT STATE SERVICE
80/tcp open http
631/tcp open ipp
8080/tcp open http-proxy
9050/tcp open tor-socks
Nmap done: 1 IP address (1 host up) scanned in 0.06 seconds
Just a little note: port 80/tcp is listened on by apache2 with the "You are an idiot" flash animation.
As per the comments, you can try something like this:
You will note that i have added the timeout parameter in the requests. This units are in seconds. The default timeout is None, which means it'll wait (hang) until the connection is closed.
import requests
for port in range(10_000, 65_535):
try:
r = requests.get(f'http://localhost:{port}', timeout=5)
print(port)
except Exception as e:
print(f'{type(e).__name__}, {port}', end='\r')

Connection refused with a basic HTTP server on AWS EC2

I know there are lots of resources on this topic, but I think I've done everything correctly and I still can't connect to my server.
I've started a simple node.js server on port 80.
sudo netstat -tnlp | grep 80
tcp 0 0 127.0.0.1:80 0.0.0.0:* LISTEN 3657/node
curl localhost:80
Welcome Node.js
I've configured the Security group for this instance as well as the VPC to allow traffic.
I've made sure there is no local firewall and that the VPC ACL is not blocking traffic (not that I expected it, since this is a completely new instance.)
service iptables status
Redirecting to /bin/systemctl status iptables.service
Unit iptables.service could not be found.
The output when I try to connect from my local machine:
curl 3.xxx.xxx.xxx
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:03 --:--:-- 0
curl: (7) Failed to connect to 3.xxx.xxx.xxx port 80: Connection refused
Are there any other ideas on what to check next?
The answer to my problem was https://stackoverflow.com/a/14045163/2369000. The boilerplate code that I copied used a method to only listen to requests that originated from localhost. This could have been detected from the netstat output, which said 127.0.0.1:80 for the listening address. The answer was to use .listen(80, "0.0.0.0") or just .listen(80) since the default behavior is to listen for requests from any IP address.

Spark in Kubernetes Connection Refused

I am trying to deploy a Spark job in a Kubernetes cluster (running on AWS EKS). I deploy a pod that executes spark-submit in client mode. The pod becomes the driver pod and then begins to launch executor pods. The executor pods try to connect to driver but fail causing the executors to crash. Here is the error message from the executor log:
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: data-loom-stats/10.135.131.239:9902
Caused by: java.net.ConnectException: Connection refused
The driver pod is exposed thru a headless Kubernetes service (per recommendations by Spark: https://spark.apache.org/docs/latest/running-on-kubernetes.html#client-mode-networking). The service exposes the driver with the DNS name data-loom-stats. Based upon the error message the DNS resolution appears to be working since it is correctly translating it to the pod IP address 10.135.131.239. To see what is happening on the driver end I opened a shell in the running driver container and was able to netstat the listening ports:
[root#data-loom-stats-7496b69994-9t8zs work-dir]# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:4040 0.0.0.0:* LISTEN 673/java
tcp 0 0 127.0.0.1:40077 0.0.0.0:* LISTEN 673/java
tcp 0 0 127.0.0.1:9902 0.0.0.0:* LISTEN 673/java
tcp 0 0 0.0.0.0:41267 0.0.0.0:* LISTEN 673/java
As you can see port 9902 is bound to the loopback IP address. Port 4040 is the Spark UI and it is bound to 0.0.0.0. Since the executor pods are not stable I did some testing from another pod that is. I was able to curl port 4040:
/merida/src # curl -v http://10.135.131.239:4040
* Trying 10.135.131.239:4040...
* TCP_NODELAY set
* Connected to 10.135.131.239 (10.135.131.239) port 4040 (#0)
> GET / HTTP/1.1
> Host: 10.135.131.239:4040
> User-Agent: curl/7.67.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 302 Found
< Date: Fri, 29 May 2020 22:50:46 GMT
< Location: http://10.135.131.239:4040/jobs/
< Content-Length: 0
< Server: Jetty(9.3.z-SNAPSHOT)
<
* Connection #0 to host 10.135.131.239 left intact
But trying to connect to port 9902 gives the connection refused error, just like the driver log.
/merida/src # curl -v http://10.135.131.239:9902
* Trying 10.135.131.239:9902...
* TCP_NODELAY set
* connect to 10.135.131.239 port 9902 failed: Connection refused
* Failed to connect to 10.135.131.239 port 9902: Connection refused
* Closing connection 0
curl: (7) Failed to connect to 10.135.131.239 port 9902: Connection refused
So it appears that my address/port binding needs to be fixed. Is this conclusion correct? If so is this something I can fix in the k8s manifest, or is it caused by something in the spark configuration?
I can supply more to help to identify a root cause.

Resources