Is it possible to fake Cassandra connection? - cassandra

I have been given a task to configure Cassandra DB for the project. We are facing a problem - for all environments there is a dedicated server for Cassandra. But, for the DEV environment, the client does not want to provide a seperate server and current DEV servers are already fully packaged and we can't afford to install Cassandra on them.
My question is, is there any possibility to fake connection to Cassandra in an environment? I've created CassandraConfiguration.java class, configured session, cluster etc etc, it all works smoothly on other envs, but on DEV, well, it fails, as it cannot connect, because there's no Cassandra... Commiting the cassandraconfiguration file will kill the dev.

You can use scassandra (simulated cassandra), or Simulacron that are emulating Cassandra. Or you can use cassandra-unit that will run Cassandra in the same JVM as your test.

Related

How to update configuration of a Cassandra cluster

I have a 3 node Cassandra cluster and I want to make some adjustments to the cassandra.yaml
My question is, how should I perform this? One node at a time or is there a way to make it happen without shutting down nodes?
Btw, I am using Cassandra 2.2 and this is a production cluster.
There are multiple approaches here:
If you edit the cassandra.yaml file, you need to restart cassandra to re-read the contents of that file. If you restart all nodes at once, your cluster will be unavailable. Restarting one node at a time is almost always safe (provided you have sane replication-factors and consistency-levels). If your cluster is configured to survive a rack or datacenter outage, then you can safely restart more nodes concurrently.
Many settings can be changed without a restart via JMX, though I don't have a documentation link handy. Changing via JMX WON'T change cassandra.yml though, so you'll need to update that also or your config will revert back to what's in the file when the node restarts.
If you're using DSE, OpsCenter's Lifecycle Manager feature makes updating configs a simple point-and-click affair (disclaimer, I'm biased as I'm an LCM dev).

Cassandra 2+ HPC Deployment

I am trying to deploy Cassandra on a Linux Based HPC cluster and I need some guidelines if possible. Specifically, what is the difference between running Cassandra locally and in cluster.
When managing locally (in which case it runs smoothly) we duplicate the original files for every node inside our Cassandra directory and we apply the appropriate changes for IP address, rcp, JMX etc... however, when managing a network which files do we need to install in each node. The whole package with all the files or just some of the required ones
like, bin/cassandra.in.sh, conf/cassandra.yaml, bin/cassandra.
I am a little bit confused on what to store in each node separately so to start working on the cluster.
You need to install Cassandra on each node (VM), i.e. the whole package and then update config files as neccessary. As described here to configure cluster in a single data center you need:
Install Cassandra on each node
Configure cluster name
Configure seeds
Configure snitch, if needed

How to create a Cassandra copy to a test machine?

We have a staging environment which runs a one node cluster completely separate from our production environment. What I'd like to do is copy this one node cluster over to a test machine that I have for the sole purpose of testing.
What is the correct way to do this? The server and test server are running Centos 6.x, and the version of DSE is 4.5.1 and Cassandra 2.0.8.39
All you need is to follow the steps described in this document:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_backup_restore_c.html.
If your test cluister's topology is different from the original cluster then you will need to use a tool like a sstableloader:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_backup_snapshot_restore_t.html
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_move_cluster.html
http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsBulkloader_t.html

titan rexster with external cassandra instance

I have a cassandra cluster (2.1.0) running fine.
After installing titan 5.1, and editing the titan-cassandra.properties to point to cluster hostname list rather than localhost, i run following -
titan.sh -c conf/titan-cassandra.properties start
It is able to recognize running cassandra instance, starts elastic search, but times out while connecting to rexster.
If i run it with local cassandra, everything runs fine using following ->br>
titan.sh start
do i need to make any change in rexster properties to point to running cassandra cluster..
Thanks in advance
Titan Server started by titan.sh represents a quick way to get started with Titan/Rexster/ES. It is designed to simplify running all those things with one startup script. Once you start breaking things apart (e.g. a separate cassandra cluster), you might not want to use titan.sh anymore because, it still forks a cassandra process when it starts up. Presumably, you don't need that anymore, given that you have a separate cassandra cluster.
Given the more advanced nature of your environment, I would simply download Rexster and configure it to connect to your cluster.

Cassandra native transport port 9042 slow on EC2 Machine

I have a 5 node Cassandra cluster set up on EC2, all in the same region.
If I connect over cqlsh (9160), queries respond in under a second.
When I connect via Dev Center, or using the native Java Driver, both of which use port 9042, the queries take over 20 seconds to respond.
They consistently respond in the same 21 second region. Never fast and then slow.
I have set up a few Cassandra Clusters on EC2 and have seen this before but do not know how to fix the problem. The last time, I scrapped the cluster and built a new one and the response time on port 9042 was fine.
Any help in how to debug or fix this problem would be appreciated, thanks.
The current version of DevCenter was designed to support as main scenario running (longish) CQL scripts (vs an interactive console with queries executed one after another). DevCenter is using as an underlying connector the DataStax Java driver for Cassandra.
For the above mentioned scenario, in order to ensure there are no "conflicts", a new Session is created for each execution. When a Session is initialized, the driver performs an auto-node discovery, creates connection pools, etc. Basically it does a lot of preparation work. Depending on the latency from your client machine to the EC2 nodes, the size of the cluster and also the configuration of these nodes (see the connection requirements), this initialization phase can be quite expensive.
As you can imagine the time spent preparing wouldn't represent a large percentage of running a DDL script and a decent size of inserts/updates. But for an interactive scenario, it will result in a suboptimal behavior (the one you are describing)
The next version(s) of DevCenter will address the interactive scenario and optimize for it so the user experience would be what you'd expect. And supporting this scenario is pretty high on our list of priorities.
The underlying Java driver obtains the whole cluster topology when it initially connects. This enables it to automatically connect to any node in the cluster. On EC2 it only obtains the private addresses, tries each one, and then times out. It then sends the request over the initial connection

Resources