I scaled in a TiDB cluster a few weeks ago to remove a misbehaving TiKV peer.
The peer refused to tombstone even after a full week so I turned the server itself off, left a few days to see if there were any issues, and then ran a forced scale-in to remove it from the cluster.
Even though tiup cluster display {clustername} no longer shows that server, some of the other TiKV servers keep trying to contact it.
Example log entries:
[2022/10/13 14:14:58.834 +00:00] [ERROR] [raft_client.rs:840] ["connection abort"] [addr=1.2.3.4:20160] [store_id=16025]
[2022/10/13 14:15:01.843 +00:00] [ERROR] [raft_client.rs:567] ["connection aborted"] [addr=1.2.3.4:20160] [receiver_err="Some(RpcFailure(RpcStatus { code: 14-UNAVAILABLE, message: \"failed to connect to all addresses\", details: [] }))"] [sink_error=Some(RemoteStopped)] [store_id=16025]
(IP replaced with 1.2.3.4, but the rest is verbatim)
the server in question has been removed from the cluster about a month now and yet the TiKV nodes still think it's there.
How do I correct this?
the store_id might be a clue - I believe there is a Raft store where the removed server was a leader, but how do I force that store to choose a new leader? The documentation is not clear on this, but I believe the solution has something to do with the PD servers.
Could you first check the store id in pd-ctl to ensure it's in tombstone?
For pd-ctl usage, please refer to https://docs.pingcap.com/tidb/dev/pd-control.
You can use pd-ctl to delete a store and once it's tombstone, then use pd-ctl remove-tombstone to remove it completely.
For all regions in TiKV, if its leader is disconnected, the followers will re-elect leaders and that dead TiKV node won't be leaders of regions anyway.
Related
A development server I was using ran low on disk space causing the system to crash. When I checked the replica set cluster it came back 1 node was unreachable. I removed the bad nodes and forced config. I went home for the day the next day I came back, and the status was not good saying unreachable for one of the nodes. I worked on something else and later that day it when I checked rs.status it came back primary and secondary. I then added the 3rd node back that ran out of space. Now I can connect to each node and the data looks ok, but I cannot connect to the replica set group in php/nodejs or stuido3t. When i use the group connect it returns auth invalid but I can use that same auth for each node.
Any ideas what could be going on and how to fix it?
What I needed to do was take down the 3 services making up the replica docker swarm. I redeployed it using my scripts with auth turned on. It I checked the replica status and it returns host unreachable, I checked it a few hours later that day it came back online. I was unable to get the replica set back online with rs.add / remove, but I did get it back up and running by recreating the services.
Any reason why com.datastax.driver.core.Metadata:getHosts() would return state UP for a host that has shutdown?
However, nodetool status returns DN for that host.
No matter how many times I check Host.getState(), it still says UP for that dead host.
This is how I'm querying Metadata:
cluster = DseCluster.builder()
.addContactPoints("192.168.1.1", "192.168.1.2", "192.168.1.3")
.withPort(9042)
.withReconnectionPolicy(new ConstantReconnectionPolicy(2000))
.build();
cluster.getMetadata().getAllHosts();
EDIT: Updated code to reflect I'm trying to connect to 3 hosts. I should've stated that the cluster I'm connecting has 3 nodes, 2 in DC1 and another in DC2.
Also, whenever I relaunch my Java process running this code, the behavior changes. Sometimes it gives me the right states, then when I restart it again, it gives me the wrong states, and so on.
I will post an answer which I got from the datastaxacademy slack:
Host.getState() is the driver's view of what it thinks the host
state is, where nodetool status is what that C* node thinks the
state of all nodes in the clusters are from its view (propagated via
gossip) There is not a way to get that via the driver
We use vnodes on our cluster.
I noticed that when the token space of a node changes (automatically on vnodes, during a repair or a cleanup after adding new nodes), the datastax nodejs driver gets a lot of "Operation timed out - received only X responses" for a few minutes.
I tried using ONE and LOCAL_QUORUM consistencies.
I suppose this is due to the coordinator not hitting the right node just after the move. This seems to be a logical behavior (data was moved) but we really want to address this particular issue.
What do you guys suggest we should do to avoid this ? Having a custom retry policy ? Caching ? Changing the consistency ?
Example of behavior
when we see this:
4/7/2016, 10:43am Info Host 172.31.34.155 moved from '8185241953623605265' to '-1108852503760494577'
We see a spike of those:
{
"message":"Operation timed out - received only 0 responses.",
"info":"Represents an error message from the server",
"code":4608,
"consistencies":1,
"received":0,
"blockFor":1,
"isDataPresent":0,
"coordinator":"172.31.34.155:9042",
"query":"SELECT foo FROM foo_bar LIMIT 10"
}
I suppose this is due to the coordinator not hitting the right node just after the move. This seems to be a logical behavior (data was moved) but we really want to address this particular issue.
In fact, when adding new node, there will be token range movement but Cassandra can still serve read requests using the old token ranges until the scale out has finished completely. So the behavior you're facing is very suspicious.
If you can reproduce this error, please activate query tracing to narrow down the issue.
The error can also be related to a node under heavy load and not replying fast enough
I've got an application that has got a couchDB as database. The database (Let's call it x.cloudant.com) is served on cloudant.com.
During development, I changed the account from x.cloudant.com to y.cloudant.com. After that (it may or may not have something to do with switching to the new account.) I got problems:
The traffic on cloudant.com spiked up really high (on one day I had
like over 700k requests). Mostly they were light HTTP requests
(GETs, HEADs). The user agent was from couchDB1.5 or couchDB1.6
Looking in the couchDB logs, I would get a lot of error messages: 500 HTTP messages, replication was canceled sind the database was shutdown, it couldn't connect to the database
My application would still sometimes connect to the old account (x.cloudant.com) even though I switched the account over a week ago. Meaning alongside replicating the data on the new account (y.cloudant.com) it also tries to replicate the data on my old account (x.cloudant.com).
My replication settings are the default settings. I want to reduce the amount of traffic on cloudant.con. Has anyone experienced the same issues? How did you solve it?
I added nodes to a cluster which initialy used the wrong network interface as listen_adress. I fixed it by changeing the listen_address to the correct IP. The cluster is running well with that configuration but clients trying to connect to that cluster still receive the wrong IPs as Metadata from cluster. Is there any way to refresh metadata of a cluster whithout decommissioning the nodes and setting up new ones again?
First of all, you may try to follow this advice: http://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_gossip_purge.html
You will need to restart the entire cluster on a rolling basis - one node at a time
If this does not work, try this on each node:
USE system;
SELECT * FROM peers;
Then delete bad records from the peers and restart the node, then go to the next node and do it again.