Securing Cassandra communication with TLS/SSL - cassandra

We would like to protect the Cassandra against man-in-the-middle attacks. Is there any way to configure Cassandra in a way that the client-server and server-server (replication) communications are SSL encrypted?
thank you

short answer: no :)
For client - server : THRIFT-151
Edit: You might want to follow this thread on the ML

Encrypted server server communication seems to be available now:
https://issues.apache.org/jira/browse/CASSANDRA-1567
Provide configurable encryption support for internode communication
Resolution: Fixed
Fix Version/s: 0.8 beta 1
Resolved: 19/Jan/11 18:11

The strategy I employ is to have Apache Cassandra nodes communicate through a site to site VPN tunnel.
Specific configurations for the cassandra.yaml file:
listen_address: 10.x.x.x # vpn network ip
rpc_address: 172.16.x.x. # non-vpn network for client access although, I leave it blank so that it listens on all interfaces
The benefits to this approach is that you can deploy Apache Cassandra to many different environments and you become provider agnostic. For example, hosting nodes in various Amazon EC2 environments and hosting nodes in your own physical data center and hosting a few others under your desk!
Cost an issue preventing you from looking into this approach? Check out Vyatta ...
As KajMagnus pointed out, there is a JIRA ticket resolved and available in the stable version of Apache Cassandra: https://issues.apache.org/jira/browse/CASSANDRA-1567 which enables you to accomplish what you would like via TLS/SSL .. but there are a few ways to accomplish what you would like.
Finally, if you want to host your instance on Amazon EC2, region to region can be problematic and although there is a patch available in 1.x.x, is it really the right approach? I have found the VPN approach reduces latency between nodes in different regions and still maintains the necessary level of security.
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Running-across-multiple-EC2-regions-td6634995.html
Finally -- part 2 --
If you want to secure client to server communications, have your clients (web servers) communicate through the same VPN .. The configuration I have:
Front end webservers communicate via internal network to application servers
Application servers sit on their own internal network and VPN network and communicate to the Data Layer via the VPN tunnel and between each other on the internal network
Data Layer exists on it's own network per Data Centre / Rack and receives requests via the VPN network

Node to Node (gossip) communication can be secured per the issue above. Client and server will both soon support Kerberos (In Hector master as of commit: https://github.com/rantav/hector/commit/08149a03c81b559cba5680d115943dbf334f58fa should hit Cassandra side shortly).

Related

What is the best architecture for a web-app communicating with a gRPC service?

I have built a website with chess.js and java chess libraries that communicates with a custom c++ chess engine via gRPC with python. I am new to web dev and especially gRPC, so I am not sure on the architecture I should be going for when it comes to hosting.
My questions are below:
Do the website and gRPC service need to be hosted on separate server instances and connected via API?
Everything right now is hosted locally and I use two ports as it is right now (5000 for the website and 8080 for the server). If the site and server aren't separate, is this how they will communicate to each other on a single server (one local port)?
I am using this website just for a showcase of my portfolio for job searching, so I am looking for free/cheap hosting that also provides a decent RAM availability since the c++ chess engine is fairly computationally intense. Does anyone have any suggestions for what hosting service I should use for this?
I was considering a free hosting for the website and then a cheap dedicated server for the service (if the two should be separate). Is this a bad idea?
Taking all tips and tricks that anyone has to offer. Again, totally novice to web dev, hosting, servers, etc.
NOTE This is an architecture rather than a programming question and discouraged on stack overflow.
The website and gRPC service may be hosted on the same server (as you're doing locally). You have the flexibility in running both processes (website and gRPC service) on a single more powerful host or separately on two hosts.
NOTE Although most often gRPC communicates over TCP sockets, it is possible to use UNIX sockets and even buffered memory too.
If you run both processes on a single host, you will want to consider connecting the website to the gRPC service via localhost (127.0.0.1 or the loopback device). Using localhost, network traffic doesn't leave the host.
If you run both processes on different hosts, traffic must travel across a network. This is slower and will likely incur charges when hosted.
You will want to decide whether the gRPC service should be exposed to any network traffic other than your website. In many cases, a gRPC service is used to provide an API to facilitate integration by 3rd-parties. If you definitely don't want the gRPC service accessed by other things, then you'll want to ensure either that it's bound to localhost (see above; and thereby inaccessible to anything other than other processes e.g. your website on the host) or firewalled such that only the website is permitted to send traffic to it.
You can find cheap hosting of virtual machines (VMs) and you'll likely want to consider hosting both processes on a single VM, ensure that you constrain the resources that you pay for and that you secure traffic (as above).
You may wish to consider containerizing the application. In this case, while it's possible to run both processes in a single container, this is considered not good practice. You should thus consider 2 containers (website and gRPC server). Many hosting|cloud platforms provide container hosting and this is generally easier than managing VMs (since you don't need to patch|update the OS and any dependencies). If you can find a platform that accepts a Docker Compose describing or a Kubernetes Deployment in which you describe both your services and how they interact such that the gRPC service is only accessible to the website, that could be ideal.

Redis deployment configuration - master slave replication

Currently I have two servers which I have deployed node.js/Express.JS based web services API. I am using Redis for caching the JSON strings.
What will be the best option deploying this setup in to production? I see here it advices to go with a dedicated server redis. OK. I take it and use a dedicated server for running redis master. Can I use existing app servers as slave nodes? Note : these app servers are running an Node/Express application.
What other other options do I have?
You can.
It all depends on the load that those other servers have, it's a problem of resource sharing. To be honest my main issue with your architecture is not the dedicated vs the non-dedicated servers, it's the fact that you are placing a Redis server (master or not) on a host that most likely will be facing the internet (expressJS app), meaning, it's quite exposed.
If you can simulate HTTP load into your Node/Express JS servers, see the difference between running some benchmark tests on your dedicated server vs the non dedicated ones:
On a running redis server type in:
redis-benchmark -q -n 100000
If the app servers are being hammered and using all cores frequently you should see a substantial difference in the benchmarks.
My suggestion is, go ahead with your first setup and add monitoring for the redis response times, and only act when you have to, which might be now if the benchmarks show very poor results.
As a side note, consider the option of not sharing hosts for services that you expose to the internet with services that perform internal functions to your application.

Wide area service discovery via bonjur / avahi

I'm looking into wide area service discovery and bonjur / avahi seem to be really good.
However, I'm a bit confused about how all this works?
So:
I have a bunch of services running in a cloud.
I have clients which can be located anywhere in the world.
I want the clients to automatically discover the services in the cloud.
I need the clients to be absolutely zero conf, so they don't know IPs, ports, nothing.
If I understand it correctly, this can be done using the above mentioned dns-sd libs. I have full access to a DNS server, so I suppose, the services can register themselves on startup using these libs and then the data can be spread through DNS servers world wide.
The clients can obtain the advertised info by querying the DNS record of my domain using bonjur / avahi tech, right?
All I need to do is to link the client with bonjur / avahi libs, and tell it which domain it should use (query).
Is this correct?
Am I missing something here or is it how this works?
Thanks in advance!
Avahi does not currently support publishing to a wide-area server, though it can browse wide-area. So if you can dynamically update a DNS server somewhere with the appropriate records Avahi would be able to see it.
You do however potentially have more problems to solve here including port mapping/nat traversal which Avahi does not address at all.

create a load balancer and radius authentication under linux

i have a centos 6 server working as a gateway receiving two internet connection from 2 isp, what i need to do is to load balance those two connection and forward the traffic to a third network card into the internal network
i also need to use Radius server to perform network authentication for the users.
solution already tried:
i tried to create a bridge between the two input connection, it worked but i'm not able to perform traffic control
i also tried to install FreeRadius
my question is:
1- is it possible to perform the load balancing from the FreeRadius mean that i can only use it for the whole solution
2- if not can anyone please guide me to a solution or a utility to perform such task
P.S i can't use a dedicated Firewall such as ZeroShell or EndianFirewall i need to implement the solution under Centos 6
Yes, you can create pools of home servers and forward requests between them, see raddb/proxy.conf.

Cassandra on Amazon EC2 with Elastic IP addresses

Can I used cassandra on EC2 instances without Elastic IP addresses? I believe in that case any instance that goes down, would create an issue.
If I use Elastic IP addresses for the cassandra nodes, I have to configure them such that they use the Public IP address for internal communication (gossip etc.). But that will increase the network latency.
Please suggest how should I configure my nodes such that the problems can be minimized.
My answer would be, use Rackspace Cloud Servers instead because you get better i/o performance as well as both public and internal IPs.
But there are several people in the community using EC2; I'd ask on the cassandra-user list if you insist on that. :)
Many people (me included) use cassandra on Amazon's EC2 without issue. As the internal IP addresses are prone to change at will you'll just need to use your internal EC2 DNS names (not your public IP address or public DNS name as that would both be a security hole and also Amazon would charge you for all your Cassandra traffic).
It does mean that if your Cassandra node goes down for any reason then you'll lose the data on that node (unless you're using persistent storage which is slower) but this can easily be fixed by increasing your replication factor (we use RF=3)
We use a VPN solution that has worked exceptionally well with EC2 & Cassandra which allows us to not use Elastic IP's. Some information about it can be found here.
Spreading MongoDB across EC2 regions
http://mail-archives.apache.org/mod_mbox/cassandra-user/201105.mbox/%3CBANLkTi=mB5joMKZHasodJcPUJOQ7+V7O-Q#mail.gmail.com%3E
Now, this option isn't for everyone, but allows us to mitigate the issues with EC2 nodes changing IP addresses and not having to use Elastic IP. We also find that we receive performance increases by NOT using the Amazon interfaces (internal/external). Don't ask me why .. I don't know enough about their architecture to be able to explain it -- expcet that it just does!
Further, it allows us to leverage nicely the suggestions #jbellis proposes, to use RackSpace. We have a mixed provider setup .. so that we can leverage EC2, RackSpace and our own internal hosted nodes. Being vendor / service provider agnostic is quite important to me...

Resources