Ports used in async replication in YugabyteDB - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
When we set up async replication between two clusters using the cdc_stream commands, setup universe replication commands, is all the communication between the clusters done at port 5433/9042?

Note that async-replication between servers doesn't work on the client api ports: https://docs.yugabyte.com/preview/reference/configuration/default-ports/#client-apis
But uses inter-node communication ports, namely 7100 & 9100 https://docs.yugabyte.com/preview/reference/configuration/default-ports/#internode-rpc-communication

Related

postgresql.conf and hba.conf in YugabyteDB YSQL?

[Question posted by a user on YugabyteDB Community Slack]
When I set up YugaByte I don't remember editing ysql_hba.conf or postgresql.conf - so I didn't set it up to allow connections from everywhere to its port 5433. But still, I am able to access it. That's not the case with regular Postgres. So how does this work? That said I am not able to connect to YB running on the same machine using ysqlsh, so looks like postgresql.conf does matter.
The relevant tserver flags are ysql_pg_conf_csv and ysql_hba_conf_csv that generate ysql_pg.conf and ysql_hba.conf. See docs: yugabyte.com/preview/secure
hba.conf is the one that determines authentication. For example, with hba.conf, you can specify which IPs can authenticate with which methods.

How do I limit access to Cassandra from specific hosts?

I'm trying to control the access to Cassandra database so it can be accessed from specific hosts only (deny the access from not configured hosts ), I have the following configurations in cassandra.yaml file:-
start_rpc: true
rpc_address: 0.0.0.0
broadcast_rpc_address: x.x.x.x
rpc_port:9160
Are these configurations are correct or there is something missing? AND is there another way to access Cassandra from specific hosts?
Not sure which version of Cassandra you are using, but 9160 is for thrift protocol connections. It's been deprecated in Cassandra 3.0, and removed in Cassandra 4.0.
If it were me, I'd be closing that avenue of access by setting start_rpc: false.
All client connection requests should be using the CQL native binary protocol on port 9042 (9142 if client-to-node SSL is used in v4.0+).
control the access to Cassandra database so it can be accessed from specific hosts only
For this, your best option would be to filter with iptables on each node. Here's a resource which details how to do that. Basically, you'll need to ACCEPT connections to/from each IP address, on each node in the cluster:
Allow incoming connections from 192.168.0.1, only on port 9042:
iptables -A INPUT -s 192.168.0.1 --dport 9042 -j ACCEPT
Allow outgoing connections back to 192.168.0.1:
iptables -A OUTPUT -d 192.168.0.1 -j ACCEPT
I want to echo Aaron's comments. Thrift was deprecated and replaced by CQL in 2012. Support for Thrift in the Cassandra tools was dropped in 2014 (CASSANDRA-8358) and the Thrift RPC server was disabled by default since Cassandra 2.2 (CASSANDRA-9319).
Development on Thrift clients also ceased nearly 10 years ago. Nate McCall who is the current Chair of the Cassandra project and author of the Hector client closed it down in 2015 in preference for the Java driver so I wouldn't use Thrift anymore.
Instead of the Thrift server, you should configure the CQL native transport server. These are the properties you should focus on in cassandra.yaml:
listen_address: private_ip
rpc_address: public_ip
native_transport_port: 9042
If your nodes only have a single IP address, you can use it for both listen_address and rpc_address. It isn't really necessary to use broadcast_address unless you have a complicated network topology where nodes can only talk to nodes in a remote DC using public IP addresses, for example with EC2 multi-region deployments.
Your question isn't really about Cassandra but about networking. You need to talk to your network admin to configure the firewalls to only allow connections to port 9042 from the application servers. Cheers!

Throughput issue in Cassandra client communication

I've got a single node Cassandra cluster(3.11.2)(RHEL 6.5). I'm observing huge differences in the throughput when my client is on the same node on which my database resides vs when my client is on some other computer. The difference is more than 4x! I don't think this is normal.
I had read that port 9042 is used for client communications in Cassandra. If the same port is being used in both scenarios, is the latency being observed in the 2nd scenario due to slow connectivity between the 2 nodes?
For the 2nd scenario, I used the following command on the client side:
time nc -zw30 172.16.129.140 9042 //(172.16.129.140 is the IP_addr_of_database_node)
Connection to 172.16.129.140 9042 port [tcp/*] succeeded!
real 0m0.007s
user 0m0.005s
sys 0m0.001s
Are these values too high? What other linux commands can be useful to get a quantitative measure of the latency in client communication in both scenarios?
I'm using Datastax C++ driver for client.

Cassandra client port enable

How to enable cassandra port to connect with BI application. Here my setup with cassandra is of multiple nodes (192.xxx.xx.01,192.xxx.xx.02,192.xxx.xx.03). In this scenario which node will be acting like master / coordinator with my application.
Although i worked with listen_address, rpc_address, broadcast_rpc_address and seeds, I opened both tcp ports 9042 and 9160.
version: 3.10
Kindly, lead me to the right direction.
Cassandra uses master-less architecture.All nodes are equal in cassandra.
When you connect to one of the node that node act as co-ordinator node, any of the node can be co-ordinator.
The coordinator is selected by the driver based on the policy you have set. Common policies are DCAwareRoundRobinPolicy and TokenAware Policy.
For DCAwareRoundRobinPolicy, the driver selects the coordinator node based on its round robin policy. See more here: http://docs.datastax.com/en/drivers/java/2.1/com/datastax/driver/core/policies/DCAwareRoundRobinPolicy.html
For TokenAwarePolicy, it selects a coordinator node that has the data being queried - to reduce "hops" and latency. More info: http://docs.datastax.com/en/drivers/java/2.1/com/datastax/driver/core/policies/TokenAwarePolicy.html
native_transport_port is 9042 by default and clients use native transport by default.
Hence you should have connection from your BI to Cassandra host on port 9042.

CouchDB: transparency when connecting to replicated databases

Let's say I have set up continuous bidirectional replication in CouchDB:
HostA <-> HostB.
As far as I understand, this is the simplest master-master replication - so both HostA and HostB accept reads and writes. Is it possible for the client to be transparet on which database he operates? I use Ektorp as a Java driver for CouchDB. When I initialize the connection I have to provide the IP address, for example http://HostA.mydomain.com:5984. What happens when this host is actually down? I would like to achieve transparency, like in RavenDB's client library - when HostA is down, it tries to connect to the other replicated hosts (http://ravendb.net/docs/article-page/3.0/csharp/client-api/bundles/how-client-integrates-with-replication-bundle).
In other words - is CouchDB aware of its replication targets so that the client doesn't have to manually choose the currently-alive host?
One way to achieve this is to use a load balancer like HAProxy. HAproxy is commonly used with web application servers, but can be used with any server that uses HTTP. You can find an example configuration in the couchdb repo.

Resources