presto + presto couldn’t restart because Error fetching node state from - presto

we have Hadoop cluster that include presto , HIVE , HDFS , etc
we have in our presto production cluster 254 presto agent machines , and one presto coordinator
all presto services are installed on RHEL 7.6 machines
we have strange behavior that presto agents are restart each ~ 60 seconds , and until now we cant put the finger about the root cause
the log server.log looks as the following:
grep "Error fetching node state from" /presto/data/var/log/server.log
2022-02-20T07:55:21.841Z WARN http-client-node-manager-39 io.prestosql.metadata.RemoteNodeState Error fetching node state from http://34.2.37.240:4444/v1/info/state: Server refused connection: http://34.2.37.240:4444/v1/info/state
2022-02-20T07:56:12.998Z WARN http-client-node-manager-39 io.prestosql.metadata.RemoteNodeState Error fetching node state from http://34.2.37.240:4444/v1/info/state: Server refused connection: http://34.2.37.240:4444/v1/info/state
2022-02-20T07:56:17.998Z WARN http-client-node-manager-44 io.prestosql.metadata.RemoteNodeState Error fetching node state from http://34.2.37.240:4444/v1/info/state: Server refused connection: http://34.2.37.240:4444/v1/info/state
2022-02-20T07:56:23.002Z WARN http-client-node-manager-39 io.prestosql.metadata.RemoteNodeState Error fetching node state from http://34.2.37.240:4444/v1/info/state: Server refused connection: http://34.2.37.240:4444/v1/info/state
2022-02-20T07:57:18.729Z WARN http-client-node-manager-42 io.prestosql.metadata.RemoteNodeState Error fetching node state from http://34.2.37.240:4444/v1/info/state: Server refused connection: http://34.2.37.240:4444/v1/info/state
2022-02-20T07:57:23.726Z WARN http-client-node-manager-44 io.prestosql.metadata.RemoteNodeState Error fetching node state from http://34.2.37.240:4444/v1/info/state: Server refused connection: http://34.2.37.240:4444/v1/info/state
2022-02-20T07:58:19.616Z WARN http-client-node-manager-40 io.prestosql.metadata.RemoteNodeState Error fetching node state from http://34.2.37.240:4444/v1/info/state: Server refused connection: http://34.2.37.240:4444/v1/info/state
2022-02-20T07:58:24.615Z WARN http-client-node-manager-39 io.prestosql.metadata.RemoteNodeState Error fetching node state from http://34.2.37.240:4444/v1/info/state: Server refused connection: http://34.2.37.240:4444/v1/info/state
what is the meaning of this error "Error fetching node state from" ?
2022-02-20T08:04:36.152Z INFO Thread-43 io.airlift.bootstrap.LifeCycleManager JVM is shutting down, cleaning up
2022-02-20T08:04:36.153Z INFO Thread-40 io.airlift.bootstrap.LifeCycleManager JVM is shutting down, cleaning up

Related

502 Bad Gateway issue while starting Jfrog

Am trying to bring Jfrog up, in local tomcat is running and artifactory service also looking fine. But in UI jfrog is not coming up.
Getting 502 Bad Gateway error. I have shared the console log details below.
Below is the console log
[TRACE] [Service registry ping] operation attempt #94 failed. retrying in 1s. current error: error while trying to connect to local router at address 'http://localhost:8046/access/api/v1/system/ping': Get "http://localhost:8046/access/api/v1/system/ping": dial tcp 127.0.0.1:8046: connect: connection refused
[TRACE] [Service registry ping] running retry attempt #95
[INFO ] Cluster join: Retry 95: Service registry ping failed, will retry. Error: error while trying to connect to local router at address 'http://localhost:8046/access/api/v1/system/ping': Get "http://localhost:8046/access/api/v1/system/ping": dial tcp 127.0.0.1:8046: connect: connection refused
[TRACE] [Service registry ping] operation attempt #95 failed. retrying in 1s. current error: error while trying to connect to local router at address 'http://localhost:8046/access/api/v1/system/ping': Get "http://localhost:8046/access/api/v1/system/ping": dial tcp 127.0.0.1:8046: connect: connection refused
2022-09-10T06:14:20.271Z [jffe ] [INFO ] [ ] [ ] [main ] - pinging artifactory, attempt number 90
2022-09-10T06:14:20.274Z [jffe ] [INFO ] [ ] [ ] [main ] - pinging artifactory attempt number 90 failed with code : ECONNREFUSED
[TRACE] [Service registry ping] running retry attempt #96
[DEBUG] Cluster join: Retry 96: Service registry ping failed, will retry. Error: error while trying to connect to local router at address 'http://localhost:8046/access/api/v1/system/ping': Get "http://localhost:8046/access/api/v1/system/ping": dial tcp 127.0.0.1:8046: connect: connection refused
[TRACE] [Service registry ping] operation attempt #96 failed. retrying in 1s. current error: error while trying to connect to local router at address 'http://localhost:8046/access/api/v1/system/ping': Get "http://localhost:8046/access/api/v1/system/ping": dial tcp 127.0.0.1:8046: connect: connection refused
2022-09-10T06:14:21.188Z [jfrou] [INFO ] [2b4bfed554e45cf6] [join_executor.go:169 ] [main ] [] - Cluster join: Retry 100: Service registry ping failed, will retry. Error: could not parse error from service registry, status code: 404, raw body: <!doctype html><html lang="en"><head><title>HTTP Status 404 – Not Found</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 404 – Not Found</h1></body></html>
[TRACE] [Service registry ping] running retry attempt #97
[DEBUG] Cluster join: Retry 97: Service registry ping failed, will retry. Error: error while trying to connect to local router at address 'http://localhost:8046/access/api/v1/system/ping': Get "http://localhost:8046/access/api/v1/system/ping": dial tcp 127.0.0.1:8046: connect: connection refused
[TRACE] [Service registry ping] operation attempt #97 failed. retrying in 1s. current error: error while trying to connect to local router at address 'http://localhost:8046/access/api/v1/system/ping': Get "http://localhost:8046/access/api/v1/system/ping": dial tcp 127.0.0.1:8046: connect: connection refused
2022-09-10T06:14:22.016Z [jfmd ] [INFO ] [ ] [accessclient.go:60 ] [main ] - Cluster join: Retry 100: Service registry ping failed, will retry. Error: Error while trying to connect to local router at address 'http://localhost:8046/access': Get "http://localhost:8046/access/api/v1/system/ping": dial tcp 127.0.0.1:8046: connect: connection refused [access_client]
[TRACE] [Service registry ping] running retry attempt #98
[DEBUG] Cluster join: Retry 98: Service registry ping failed, will retry. Error: error while trying to connect to local router at address 'http://localhost:8046/access/api/v1/system/ping': Get "http://localhost:8046/access/api/v1/system/ping": dial tcp 127.0.0.1:8046: connect: connection refused
and this is the error am getting in UI.
502 Bad Gateway error
Is it a newly installed Artifactory instance? If yes, we need to verify whether the required ports are in place (open at the firewall level). If the ports are already available, disable the IPv6 address from the VM where Artifactory is installed and restart the Artifactory. There are chances of this error occurrence if the application is trying to pick up the IPv6 address for initialisation instead of Ipv4.

Error starting vreplication engine: error in connecting to mysql db with connection <nil> Vitess on kubernetes

kubernetes version: v1.16.3
linux version: 7.3.1611
Starting Vitess cluster on kubernetes using default operator.yaml and 101_initial_cluster.yaml, one of example-vttablet-zone1-xxx pod is restarting forever.
using kubectl logs -f example-vttablet-zone1-2548885007-46a852d0 -c vttablet to see the logs, i got
W0706 07:42:02.200507 1 tm_init.go:531] Cannot get current mysql port, will keep retrying every 1s: net.Dial(/vt/socket/mysql.sock) to local server failed: dial unix /vt/socket/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000)
E0706 07:42:02.285406 1 engine.go:213] Error starting vreplication engine: error in connecting to mysql db with connection <nil>, err net.Dial(/vt/socket/mysql.sock) to local server failed: dial unix /vt/socket/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000), will keep retrying.
E0706 07:42:02.285504 1 state_manager.go:276] Error transitioning to the desired state: MASTER, Serving, will keep retrying: net.Dial(/vt/socket/mysql.sock) to local server failed: dial unix /vt/socket/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000)
I0706 07:42:02.285527 1 state_manager.go:661] State: exiting lameduck
E0706 07:42:02.285539 1 tm_state.go:258] Cannot start query service: net.Dial(/vt/socket/mysql.sock) to local server failed: dial unix /vt/socket/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000)
I0706 07:42:02.285553 1 tm_state.go:305] Publishing state: alias:<cell:"zone1" uid:2548885007 > hostname:"10.233.107.217" port_map:<key:"grpc" value:15999 > port_map:<key:"vt" value:15000 > keyspace:"commerce" shard:"-" key_range:<> type:MASTER db_name_override:"vt_commerce" mysql_hostname:"10.233.107.217" master_term_start_time:<seconds:1625527268 nanoseconds:196807555 >
I didn't change any yaml in operator directory, anyone know why is this?

AWS elastic Beanstalk / nginx : connect() failed (111: Connection refused

I got this message
connect() failed (111: Connection refused
Here is my log:
-------------------------------------
/var/log/nginx/error.log
-------------------------------------
2018/10/21 06:16:33 [error] 4282#0: *2 connect() failed (111: Connection refused) while connecting to upstream, client: 172.31.4.119, server: , request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8081/", host: "hackingdeal-env.qnyexn72ga.ap-northeast-2.elasticbeanstalk.com"
2018/10/21 06:16:33 [error] 4282#0: *2 connect() failed (111: Connection refused) while connecting to upstream, client: 172.31.4.119, server: , request: "GET /favicon.ico HTTP/1.1", upstream: "http://127.0.0.1:8081/favicon.ico", host: "hackingdeal-env.qnyexn72ga.ap-northeast-2.elasticbeanstalk.com", referrer: "http://hackingdeal-env.qnyexn72ga.ap-northeast-2.elasticbeanstalk.com/"
I am using nodejs/express Elastic Beanstalk env.
I have one nginx related file in
.ebextensions/nginx/conf.d/proxy.conf
Upper file contains:
client_max_body_size 50M;
Whenever I try to get my webpage I got 502 bad gateway.
What's wrong with my app?
Just recording my incident here just in case it helps someone or my future self. I had a Django application that had SECURE_SSL_REDIRECT set to True. Since I had no load balancers configured to handle HTTPS traffic I was getting a timeout. Setting it to False fixed the issue. Couple of days wasted on that one.
111 connection refused likely means your app isn't running on the server/port combination. Also check that the security group for your app instance (or load balancer) has an inbound rule set to allow traffic from the nginx instance
I was dealing with this error on my NodeJS application (NEXTJS). Posting this here just in case is useful for someone.
My error was thet the deploy command failed at the build step (next build), which means the Node server never restarted. For that reason nginx could not find the server. You can find this kind of errors in the web.stdout.log
I tested my build command locally, fixed the errors and it worked!

How to fix etcd cluster misconfigured error

Have two servers : pg1: 10.80.80.195 and pg2: 10.80.80.196
Version of etcd :
etcd Version: 3.2.0
Git SHA: 66722b1
Go Version: go1.8.3
Go OS/Arch: linux/amd64
I'm trying to run like this :
pg1 server :
etcd --name infra0 --initial-advertise-peer-urls http://10.80.80.195:2380 --listen-peer-urls http://10.80.80.195:2380 --listen-client-urls http://10.80.80.195:2379,http://127.0.0.1:2379 --advertise-client-urls http://10.80.80.195:2379 --initial-cluster-token etcd-cluster-1 --initial-cluster infra0=http://10.80.80.195:2380,infra1=http://10.80.80.196:2380 --initial-cluster-state new
pg2 server :
etcd --name infra1 --initial-advertise-peer-urls http://10.80.80.196:2380 --listen-peer-urls http://10.80.80.196:2380 --listen-client-urls http://10.80.80.196:2379,http://127.0.0.1:2379 --advertise-client-urls http://10.80.80.196:2379 --initial-cluster-token etcd-cluster-1 --initial-cluster infra0=http://10.80.80.195:2380,infra1=http://10.80.80.196:2380 --initial-cluster-state new
When trying to cherck health state on pg1:
etcdctl cluster-health
have an error :
cluster may be unhealthy: failed to list members
Error: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://127.0.0.1:2379 exceeded header timeout
; error #1: dial tcp 127.0.0.1:4001: getsockopt: connection refused
error #0: client: endpoint http://127.0.0.1:2379 exceeded header timeout
error #1: dial tcp 127.0.0.1:4001: getsockopt: connection refused
What I'm doing wrong and how to fix it ?
Both servers run on virtual machines with Bridged Adapter
I've got similar error when I set up etcd clusters using systemd according to the official tutorial from kubernetes.
It's three centos 7 of medium instances on AWS. I'm pretty sure the security groups are correct. And I've just:
$ systemctl restart network
and the
$ etcdctl cluster-health
just gives a healthy result.

rpc_address and broadcast_rpc address for cassandra.yaml for Datastax OpsCenter

So I have a single node cassandra running on an AWS machine which also has the OpsCenter installed. I'm trying to manage it with OpsCenter GUI from a windows machine (which is in the same private network as the cassandra node)however I keep getting the following error
"No HTTP communication to the agent"
Opscenter logs show the following information -
2017-02-19 18:08:17,622 [Test_Cluster] INFO: Node 172.18.51.175 changed its mode to normal (MainThread)
2017-02-19 18:08:17,773 [Test_Cluster] INFO: Using 1.2.3.4 as the RPC address for node 172.18.51.175 (MainThread)
2017-02-19 18:09:12,046 [Test_Cluster] WARN: These nodes reported this message, Nodes: ['172.18.51.175'] Message: HTTP request http://1.2.3.4:61621/connection-status? failed: User timeout caused connection failure. (MainThread)
2017-02-19 18:10:12,045 [Test_Cluster] WARN: These nodes reported this message, Nodes: ['172.18.51.175'] Message: HTTP request http://1.2.3.4:61621/connection-status? failed: User timeout caused connection failure. (MainThread)
2017-02-19 18:11:12,046 [Test_Cluster] WARN: These nodes reported this message, Nodes: ['172.18.51.175'] Message: HTTP request http://1.2.3.4:61621/connection-status? failed: IPv4Address(TCP, '1.2.3.4', 61621) (MainThread)
2017-02-19 18:12:12,045 [Test_Cluster] WARN: These nodes reported this message, Nodes: ['172.18.51.175'] Message: HTTP request http://1.2.3.4:61621/connection-status? failed: IPv4Address(TCP, '1.2.3.4', 61621) (MainThread)
2017-02-19 18:13:12,433 [Test_Cluster] WARN: These nodes reported this message, Nodes: ['172.18.51.175'] Message: HTTP request http://1.2.3.4:61621/connection-status? failed: IPv4Address(TCP, '1.2.3.4', 61621) (MainThread)
2017-02-19 18:14:12,045 [Test_Cluster] WARN: These nodes reported this message, Nodes: ['172.18.51.175'] Message: HTTP request http://1.2.3.4:61621/connection-status? failed: IPv4Address(TCP, '1.2.3.4', 61621) (MainThread)
2017-02-19 18:15:12,045 [Test_Cluster] WARN: These nodes reported this message, Nodes: ['172.18.51.175'] Message: HTTP request http://1.2.3.4:61621/connection-status? failed: User timeout caused connection failure. (MainThread)
2017-02-19 18:16:12,044 [Test_Cluster] WARN: These nodes reported this message, Nodes: ['172.18.51.175'] Message: HTTP request http://1.2.3.4:61621/connection-status? failed: IPv4Address(TCP, '1.2.3.4', 61621) (MainThread)
2017-02-19 18:17:12,044 [Test_Cluster] WARN: These nodes reported this message, Nodes: ['172.18.51.175'] Message: HTTP request http://1.2.3.4:61621/connection-status? failed: IPv4Address(TCP, '1.2.3.4', 61621) (MainThread)
2017-02-19 18:18:12,045 [Test_Cluster] WARN: These nodes reported this message, Nodes: ['172.18.51.175'] Message: HTTP request http://1.2.3.4:61621/connection-status? failed: IPv4Address(TCP, '1.2.3.4', 61621) (MainThread)
So I guess my cassandra.yaml file needs some change ?
Currently I have set listen_address as private IP of my node
my rpc_address is 0.0.0.0
and my broadcast_rpc_address is set as 1.2.3.4
Which is how the datastax doc recommended.
I tried setting the rpc_address and broadcast_rpc_address to the node's private IP and it failed in that scenario as well.
netstat --listen shows the below line for the port 61621 and 61620
tcp6 0 0 [::]:61620 [::]:* LISTEN
tcp6 0 0 [::]:61621 [::]:* LISTEN
I'm not sure what I'm doing wrong or how to set these parameters in cassandra.yaml for it to work with Opscenter.
Note : I seem to be having issues only with OpsCenter with the above config. Cassandra services start up fine and my web application is connecting to the cluster using the datastax driver. Any one have comments on what might be going wrong ?
Thanks
my rpc_address is 0.0.0.0
and my broadcast_rpc_address is set as 1.2.3.4
That is your mistake, change the rpc_address to the local IP --> 172.18.51.175 [if this is the nodes IP]
Check in cassandra.yaml file that the listen_address is also set to --> 172.18.51.175

Resources