I tried to shift cassandra listener from rpc port to native port. However, after I turn off rpc port, opscenter would not work. I had to start both native and rpc port.
Is there any way let opscenter use native port? Also, if cassandra listerns on both native and rpc port, is there any performance penalty?
The answer depends on what version of OpsCenter you are using. Prior to version 5.2, it requires the Thrift RPC port. 5.2+ uses only native transport so you could disable Thrift.
That said, having both open will only use a marginal amount of additional memory, and you should not see any performance penalty having both servers bound.
Related
Typically port 9042 is provided with the Cassandra cluster seed hosts to connect to Cassandra cluster for performing CRUD operations.
Does Cassandra Java client driver has knowledge of port 7000 (used for peer communication) after the client establishes connection with Cassandra cluster?
Thanks,
Deepak
The Java driver doesn't make use of the internode communication port 7000 because it doesn't need to participate in gossip with the nodes in the cluster.
Instead, the Java driver establishes a control connection with one of the nodes the first time it connects to the cluster. The driver uses the control connection (1) to query the system tables to discover the cluster's topology and schema, and (2) listen to topology and schema changes.
It is in point (2) above that the driver recognises when nodes are added or decommissioned from the cluster as well as find out when the schema has been updated. This is the reason the driver doesn't need to gossip with the nodes.
For more information, see Control connection for the Java driver. Cheers!
No, the Java driver doesn't know and should not know about in-cluster node-to-node communications. Why should it, anyway?
I am able to see 160 Native clients on a particular node from OpsCenter.
But none of the application is pointing to this DC or any of the nodes from this DC.
If above is the situtation what are those 160 Native clients?
Few of them are because I have connected to that node using DevCenter.
Rest can be because of Inter DC communication??
No keyspace has RF set also for this DC. I am about to decommission this DC.
But not sure what are those clients.
Any idea??
In latest versions you can use nodetool clientlist or select * from system_views.clients ; in cqlsh to view the actual connections. This will give you the host and port which you can then track on the system what application is bound to it.
In older DSE versions you can also use dsetool perf userlatencytracking [enable|disable] to enable the userlatency tracking to do something similar with select * from dse_perf.user_io;.
I have used below command to find connecting clients:
sudo lsof -i -n -P | grep 9042 | grep ESTABLISHED
By running above, I found processes & those processes were all the java applications connecting to Cassandra. But I have not mentioned any of these hosts in connection, still requests were coming on them.
I found all those requests were because of Consistency Level: QUORUM by client applications. Although applications are not referring to those DCs directly but to achieve QUORUM requests were going on all DCs.
1 more thing was there:
Java client aplications were using username(superuser) 'cassandra', this also need consistency level QUORUM.
Inter DC communication does not happen on 9042, so my assumption of some connections from Inter DC was also not correct.
Above was solution to my problem.
I am new to Cassandra and learning it.
So question is how communication is done between nodes in Cassandra
Basic communication - failure detection and other
Data transmission from node to node and client
Any other type of communication
Answer of 1st one is Gossip protocol http://www.datastax.com/resources/faq
But I am little curious about protocol and methodology Cassandra uses to transfer data from one node to another or client.
Communication between nodes is through Gossip, as stated by you.
Failure detection is again through Gossip, each node checks for Gossip messages from other nodes. If it does not receive 'n'(configurable in cassandra.yaml file) number of gossip messages it considers the node as dead. Look for the tag phi-convict threshold.
I am not sure what cassandra uses for data transfer, mostly probably might be simple layers built over TCP. One of the major features of cassandra is that you don't have to worry about how Cassandra handles replication, you only have to think about the strategy
Cassandra inter node communication is separate to communication between nodes and clients.
Gossip - is used so that nodes are aware of failures (client not
involved)
This needs to be split: Nodes communicate/send data the storage_port (see cassandra.yaml - default port 7000), clients connect to port 9042 (or 9160 for old thrift clients) and communicate with a proprietary binary protocol specified here: https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v3.spec
Other communication you might care about is JMX, which node tool uses
More details here: http://www.datastax.com/documentation/cassandra/2.1/cassandra/security/secureFireWall_r.html
I have a 5 node Cassandra cluster set up on EC2, all in the same region.
If I connect over cqlsh (9160), queries respond in under a second.
When I connect via Dev Center, or using the native Java Driver, both of which use port 9042, the queries take over 20 seconds to respond.
They consistently respond in the same 21 second region. Never fast and then slow.
I have set up a few Cassandra Clusters on EC2 and have seen this before but do not know how to fix the problem. The last time, I scrapped the cluster and built a new one and the response time on port 9042 was fine.
Any help in how to debug or fix this problem would be appreciated, thanks.
The current version of DevCenter was designed to support as main scenario running (longish) CQL scripts (vs an interactive console with queries executed one after another). DevCenter is using as an underlying connector the DataStax Java driver for Cassandra.
For the above mentioned scenario, in order to ensure there are no "conflicts", a new Session is created for each execution. When a Session is initialized, the driver performs an auto-node discovery, creates connection pools, etc. Basically it does a lot of preparation work. Depending on the latency from your client machine to the EC2 nodes, the size of the cluster and also the configuration of these nodes (see the connection requirements), this initialization phase can be quite expensive.
As you can imagine the time spent preparing wouldn't represent a large percentage of running a DDL script and a decent size of inserts/updates. But for an interactive scenario, it will result in a suboptimal behavior (the one you are describing)
The next version(s) of DevCenter will address the interactive scenario and optimize for it so the user experience would be what you'd expect. And supporting this scenario is pretty high on our list of priorities.
The underlying Java driver obtains the whole cluster topology when it initially connects. This enables it to automatically connect to any node in the cluster. On EC2 it only obtains the private addresses, tries each one, and then times out. It then sends the request over the initial connection
Is it possible to use the CQL3 java client (the one using execute_prepared_cql3_query etc)
with the native protocol in 2.0?
Or is the Datastax java client the only one that supports the native protocol?
Is there a performance benefit in using the native protocol especially when inserting 1MB blobs?
I have existing applications that are using the CQL3 client, hence would prefer not to port unless the performance benefit is large.
The DataStax Java Driver for Apache Cassandra (https://github.com/datastax/java-driver) is the only Java Driver to support the CQL Native Protocol so far.
To clarify details about this new protocol: the CQL Native Protocol v1 has been introduced in Cassandra 1.2 and then enhanced as CQL Native Protocol v2 in Cassandra 2.0. While the Thrift interface will remain supported for a while in upcoming versions of Cassandra, you can expect some features to be only available with the CQL Native Protocol in the future, which is already the case with Automatic Paging in Cassandra 2.0 (see http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0).
On the performance side, there's still no clear benchmark to compare efficiency of Thrift vs. CQL Native Protocol publicly available. That's something we wish to see soon. Keep in mind though that there's no one size fits all answer to this performance question as it will heavily depends on the use case and workload being considered. That's why I would definitely advice you to run your own performance tests and see how you can improve the efficiency of your application. I would just note that for the 1MB blob case that you've mentioned, I don't expect much difference as the payload will be much bigger than the protocol overhead here.
As a consequence I wouldn't say that it's urgent for you to upgrade to a driver supporting the CQL Native Protocol, but that's something that you should start to experiment as most of the investments will happen there in the years to come.