How to prevent an unauthorized node from joing a Cassandra cluster? - cassandra

How is security handled between Cassandra seeds and nodes? ie. How to prevent unauthorized replication of my datacenter by posing as my node? I could create a firewall with a whitelist, but is there another mechanism as well?

Enabling node-to-node SSL solves this issue. Essentially, a false node pretending to be legit shouldn't have a matching Java Truststore, and will thus be denied from joining the cluster.

Related

Apache Cassandra Server and Datastax Client - Changing IP Addresses

We are using the latest Apache Cassandra database server, and the Datastax Node.js client, running in the cloud.
When our Cassandra servers are rebuilt, they get new IP addresses. Then any running service clients can't find the new servers, the client driver obviously must cache the IP addresses, instead of using DNS.
Is there some way around this problem, other than doing client shutdown and get a new client, in our services when we encounter an error accessing the database?
If you only have 1 server, there is nothing you can do.
Otherwise the node when it rebuilds (if it is a single node in the cluster of many) will advertise the new IP to the cluster and cluster topology is updated. So the peers table will be updated and the driver can register this event (AFAIK).
But why not use private static addresses for your cassandra nodes?

What are the risks of exposing Cassandra's secured internode port in a shared private network such as DigitalOcean's?

In Cassandra's cassandra.yaml config file, there's this:
# SSL port, for encrypted communication. Unused unless enabled in
# encryption_options
# For security reasons, you should not expose this port to the internet. Firewall it if needed.
ssl_storage_port: 7001
I don't know the specifics of why the Cassandra team has given this guidance, but what are the risks within a shared private network such as provided by Digital Ocean? VMs of other DO clients are on the some non-routed internal network. Using iptables to limit source IPs is an option.
While I can't claim to know what the author of the comment had in mind, there are a few different ways you can set up internode encryption. If it is set up without correctly configuring truststores and require_client_auth = true then internode encryption should verify that all participating nodes are using trusted certificates which would minimise risk. If you don't use those configurations then certificates are not necessarily trusted, leaving you open to untrusted nodes joining the cluster.
There is a great, detailed article on setting this up properly here: http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html
Of course, all that said, this is security so the more layers of defence the better and so restricting by IP tables, etc is probably still a good idea.
Ben

CouchDB: transparency when connecting to replicated databases

Let's say I have set up continuous bidirectional replication in CouchDB:
HostA <-> HostB.
As far as I understand, this is the simplest master-master replication - so both HostA and HostB accept reads and writes. Is it possible for the client to be transparet on which database he operates? I use Ektorp as a Java driver for CouchDB. When I initialize the connection I have to provide the IP address, for example http://HostA.mydomain.com:5984. What happens when this host is actually down? I would like to achieve transparency, like in RavenDB's client library - when HostA is down, it tries to connect to the other replicated hosts (http://ravendb.net/docs/article-page/3.0/csharp/client-api/bundles/how-client-integrates-with-replication-bundle).
In other words - is CouchDB aware of its replication targets so that the client doesn't have to manually choose the currently-alive host?
One way to achieve this is to use a load balancer like HAProxy. HAproxy is commonly used with web application servers, but can be used with any server that uses HTTP. You can find an example configuration in the couchdb repo.

What happen if I didn't put all the C* hosts in DevCenter?

Let's say I have 4 nodes: host1, host2, host3 and host4. However I only add host1 and host2 as Contact hosts. What would happen if I perform any operation in DevCenter? Will the action propagate to host3 and host4? Will this cause data corruption?
Here's what will happen:
DevCenter will use the Whitelist load balancing policy 1 to connect to the provided nodes
While DevCenter uses the DataStax Java driver as the underlying connector, it does use the above mentioned load balancing policy to reduce the time needed to obtain connections (instead of the default driver's load balancing policy which requires discovering all the nodes in the cluster and initiating connection pools to all those)
DevCenter will send the request to the nodes in the list you provided
If data is local to these nodes they will take care of the requests. If data is found on the other nodes in the cluster, the nodes used for the connection will act as coordinators (basically they'll relay the requests to the nodes having the data)
Bottom line there's no risk of data corruption and the results you get will be exactly the same as for connecting to all the nodes.

How to check the client connected to which node in the cluster

We want to check the client connections going to which server in the cluster. Is there any way to track it?
I saw some audit logs which has host files and source files.
host:FQDN/5.6.7.8|source:/1.2.3.4
Is this means the client from 1.2.3.4 ipaddress is connected to server 5.6.7.8 to run the query?
5.6.7.8 is the coordinator node here for this session?
Yes,
indeed. Your assumption is totally correct. When you take a look at the "CQL Logging examples" part in the Datastax' Cassandra documentation:
http://www.datastax.com/docs/datastax_enterprise3.0/security/data_auditing
It explains that in your example 5.6.7.8 is the Cassandra node coordinating the requests from the client with IP 1.2.3.4.

Resources