Securing Cassandra on a public machine? - cassandra

I am considering a Cassandra cluster deployment to Google Compute Engine. However one of our principal db clients would be an App Engine app. Since GCE firewalls do not include App Engine instances (meaning App Engine instances are considered "outside" the firewall) we would need to open ports in the firewall to the Cassandra nodes, effectively putting our database on the public Internet.
Is this reasonable to do? I have read up on Cassandra's authentication scheme (http://www.datastax.com/documentation/cassandra/2.0/cassandra/security/securityTOC.html) but I'm certainly not an expert and thus I don't trust that I can properly evaluate whether this scheme is strong enough to protect a publicly available database server.
If this is a bad idea, what's our best alternative? Writing some kind of authenticating app in front of each database is rather unappealing since (1) we obviously want the db to be fast, so any extra steps in the way are counter to that goal, and (2) it might necessitate custom changes to the standard Cassandra client libs/programs.
Is there a standard practice here?

Related

Hosting NodeJS service application

I would like to build a service using NodeJS. However, this question is more of an architectural nature. Lets say I have 2 companies with their own network security. Company A has a SQL Server instance, while Company B would host the NodeJS service application. In order to get data, the NodeJS service has to go to the SQL Server instance in Company A. Is this considered "bad practice"? If thats the case, whats the alternative? As a note, there is also the option of connecting to the SQL Server instance from AWS.
From an architectural standpoint, it's definitely not desired for an application to access a database through multiple network layers (potentially via Internet), because of multiple reasons, like: latency overhead, security (maybe), management overhead (if the DB is owned by another company).
Generally, the DB should be as close as possible to the app, because usually it's the main bottleneck of a system, and it will impact the throughput of the application at some point.
However, the right answer here depends on the requirements of your app. If the traffic volumes are not very big and the performance hit is acceptable, then you can use that approach (with all pros and cons it may have)
Ideally you should not do the same. You may setup a replica of Database on your application network. To sync the replica, you may setup VPN connection.

Create multiple front-ends hitting same data source

I want to create and host 4-5 websites using the same database. The only difference between the sites will be:
branding (colours and header)
data will be filtered per website (through sql query) and
Each site will be on a separate domain (but can be hosted on same server)
My 1st thought was to use API / Rest model and provision five front-ends in their own sub-domain. But as sites can be hosted on same server (I'm assuming one hosting account which enables multiple sub-domains), I think I can simply connect all sites with connection string to same database, avoiding complexities of using REST.
Is this possible and would i run into database conflicts doing this?
If later, I wanted to add a mobile app client, then will I need to build out a rest interface anyway?
Thanks
The right thing to do here depends a lot on your specific use case, expected load, preferred backend/edge technology, future plans, etc.
Site domains and servers -
The main point here is that you can host your domains/subdomains on the same or different servers. You simply need to update the DNS to point to the correct IP (update the subdomain's A record).
Note: If these sites are all public-facing, then I highly recommend using an edge/proxy server and even consider a load balancer, depending on expected number of visitors (Nginx, or Apache Web Server)
Decoupled architecture is almost always preferred -
I would definitely have an API/REST layer to abstract the database from the sites. This ensures that you establish a contract through which any clients can interact with the backend, including your mobile application. You also don't have to duplicate DB-specific code across the various clients. What if you decided to change your schema? Or even your database solution? Then all clients will be broken and your customers would be unhappy. As a guiding principle, think: if I change any one thing in my architecture, how many other things will need to change as a result? In terms of scalability, this architecture will also allow you to easily spin up more instances of whatever it is you need (databases, REST service, etc) should the need arise.
How do I build and deploy a REST API?Re: #2, to set up a simple custom REST service running on Node.js (and express), this is a good tutorial. The example also walks through setting up and integrating with an in-memory MongoDB database.
Database collisions?If you follow the above steps, this should be a moot point. Node.js/express and the databases expose ways to configure connection pools if the defaults do not suffice. Again, this will depend on your needs - how many concurrent users you expect.

Mongodb hosting remote vs on the same network

What is the killer reason to use remote db hostring services for MongoDB (like compose.io) for nodejs application VS hosting MongoDB on the same network (in the same datacenter, etc), for example when using PAAS providers (like modulus.io) which offer "integrated" MongoDB hosting .
What percentage of speed/perfomance may degrade when using internet remote DBs, how do DB providers you solve this? How to make right decision on this?
The reason you use something like compose.io is that you don't want to deal with that on your own and have experts taking care of it that know what they are doing. In the best case with support so you can take further advantage of those experts. And that's the only reason.
If you use Modulus that has this anyway and you run your application there as well - even better. There is no real reason to run your node application on Modulus and your mongodb on a different cloud hosting service.
In practice that probably doesn't matter as much because they all use AWS anyway ;)
Important: If they DON'T run in the same network make sure your mongoDB is protected properly(!!). If you do run in the same network just make sure the mongoDB is not accessible from the outside at all which is def the better solution!
Hope that helps

couchbase security, can I restrict the moxi port 11211 to local host

I feel like I must be really thick here but I'm struggling with couchbase configuration.
I am looking to replace memcached with couchbase and am wanting to secure things more to my liking, on the server their are a number of applications that are set up to use memcached so it needs to be as drop in as possible without changing the applications configuration.
What I have done is installed couchbase on each of the webservers like I did with memcached and so far with my config everything is working.
The problem I have is port 11211 is open to the world a large and this terrifies me, either I'm thick or I'm not looking in the right place but I want to restrict port 11211 to only be listening on localhost 127.0.0.1.
Now couchbase seems to have reams and reams of documentation but I cannot find how to do this and am starting to feel like you need to be a couchbase expert to make simple changes.
I'm aware that the security of couchbase is to use password protected buckets with SASL auth but for me this isn't an option.
While I'm on the subject and if I can change the listening interface, are their any other ports with couchbase that don't need to be open to the world and can be restricted to local host.
Many many thanks in advance for any help, I'm at my wits end.
Let's back up a bit. Unlike memcached, Couchbase is really meant to be installed as a separate tier in your infrastructure and as a cluster even if you are just using it as a high availability cache. A better architecture would be to have Couchbase installed on its own properly sized nodes (in a cluster using Couchbase buckets) and then install Moxi (Couchbase's memcached proxy) on each web server that will talk to the Couchbase cluster on your app's behalf. This architecture will give you the functionality you need, but then give you the high availability, replication, failover, etc that Couchbase is known for.
In the long run, I would recommend that you transition your code to using Couchbase's client SDKs to access the cluster as that will give you the most benefits, performance, resilience, etc. for your caching needs. Moxi is meant more as an interim step, not a final destination.

To understand Milton WebDAV server working with cloud environment load balancer

We want to use Milton WebDav to transfer files in our web application which eventually is going to be deployed on cloud environment (most likely azure) as IaaS.
Now we are aware that WebDAV standard is stateless and hence it should not create any problems with cloud load balancer, but what we are not sure about Milton and have few questions:
1.) Is Milton implemented WebDAV as it is, do all the communication remains stateless? I assume that it passes Authentication token with every request but I am not sure where is the token stored at server? Does it store it in the database or some sort of cache etc.?
2.) Do locking mechanism works fine if a load balance is used and there are 5-6 servers to handle the load? Again where does Milton server store Lock Token?
Sorry for the late comment, the two most important aspects of webdav which affect load balancing are digest authentication tokens (Nonce values) and lock tokens.
As the Resource implementor you get to control both of those. Lock tokens are typically stored in a database (you must implement the methods on LockableResource which will do the persistence) so will be shared across servers, although its not uncommon to use memory based lock tokens, in which case you need to find some way to share that information across servers.
Digest nonces are only a consideration if you've implemented DigestResource. The default NonceProvider uses a simple HashMap so this will not be shared across servers. But the interface is trivial so you can easily implement a database store. If your load-balancing solution uses sticky sessions then that won't be an issue because clients will go to the server which has their nonce.
Note that Tomcat session replication won't help with the above issues, because webdav clients typically dont support cookies, so there is no Servlet session.
I have never used Milton WebDAV before but from the looks of it, it is used to modify and edit files on a server.
However Azure's local storage is not shared. Each instance is a completely seperate server. If you modify a file on 1 server, it will not be replicated to the next.
Azure works by uploading a deployment package. When a new instance needs to come up it uses the deployment package and starts a completely new server.
From a your perspective they don't share anything in common. Because of this you will never know which server you are hitting.
If you have a shared file storage system behind, then it may be a different story. However that scenario looks odd from using Azure. Amazon EC2 with a shared EBS might do it though.

Resources