What node does Cassandra store data on? - cassandra

Is there a command or any way at all to know what data is stored on what nodes of Cassandra?
Im pretty new to Cassandra and haven't had much luck googling this question.
Thanks!

You can get Cassandra to tell you which node(s) a particular key is on with nodetool getendpoints.
$ nodetool getendpoints mykeyspace tbl '8546200'
192.168.73.188
192.168.73.190
I don't know if that's what you're looking for or not. AFAIK there isn't a way to flat-out query the responsible nodes for all rows in a table or keyspace. But as Blake pointed out, your application doesn't really need to worry about that.
If you really wanted to find out, you could query your table using the token function on your partition key. Here's an example using Blake's schema:
SELECT token(partition_key),partition_key FROM tbl;
That would list the hashed tokens with your partition keys. Then you could run a nodetool ring to list out the token ranges for each node, and see which nodes are responsible for that range. Note that if you are using vNodes your output will be pretty big (256 lines for each, by default).

Cassandra uses consistent hashing on the row's Partition key to determine where data is stored. Tokens are assigned to nodes and the consistent hash of the Partition key determines which node(s) will store the row.
Partition key is the first part of the PRIMARY KEY in your table definition or in nested parentheses
CREATE TABLE tbl (
partition_key INT,
clus_key TEXT,
...,
PRIMARY KEY((partition_key), clus_key);
Some reading here on the ring and consistent hashing. You're probably using vNodes so I'd read a bit here too.
At query time, you don't have to worry about which node has what. Your C* driver will select a coordinator node from the list provided that will find the rows based on your query.
If you want to see details about what a query is doing in CQLSH, try turning tracing on:
> TRACING ON;
> SELECT * FROM table;
> Tracing session: 1f6b4440-050f-11e5-ba41-672ef88f159d
> ....
> <Details about the query>
> ....

Related

Cassandra query table without partition key

I am trying to extract data from a table as part of a migration job.
The schema is as follows:
CREATE TABLE IF NOT EXISTS ${keyspace}.entries (
username text,
entry_type int,
entry_id text,
PRIMARY KEY ((username, entry_type), entry_id)
);
In order to query the table we need the partition keys, the first part of the primary key.
Hence, if we know the username and the entry_type, we can query the table.
In this case the username can be whatever, but the entry_type is an integer in the range 0-9.
When doning the extraction we iterate the table 10 times for every username to make sure we try all versions of entry_type.
We can no longer find any entries as we have depleted our list of usernames. But our nodetool tablestats report that there is still data left in the table, gigabytes even. Hence we assume the table is not empty.
But I cannot find a way to inspect the table to figure out what usernames remains in the table. If I could inspect it I could add the usernames left in the table to our extraction job and eventually we could deplete the table. But I cannot simply query the table as such:
SELECT * FROM ${keyspace}.entries LIMIT 1
as cassandra requires the partition keys to make meaningful queries.
What can I do to figure out what is left in our table?
As per the comment, the migration process includes a DELETE operation from the Cassandra table, but the engine will have a delay before actually removing from disk the affected records; this process is controlled internally with tombstones and the gc_grace_seconds attribute of the table. The reason for this delay is fully explained in this blog entry, for a tl dr, if the default value is still in place, Cassandra will need to pass at least 10 days (864,000 seconds) from the execution of the delete before the actual removal of the data.
For your case, one way to proceed is:
Ensure that all your nodes are "Up" and "Healthy" (UN)
Decrease the gc_grace_seconds attribute of your table, in the example, it will set it to 1 minute, while the default is
ALTER TABLE .entries with GC_GRACE_SECONDS = 60;
Manually compact the table:
nodetool compact entries
Once that the process is completed, nodetool tablestats should be up to date
To answer your first question, I would like to put more light on gc_grace_seconds property.
In Cassandra, data isn’t deleted in the same way it is in RDBMSs. Cassandra is designed for high write throughput, and avoids reads-before-writes. So in Cassandra, a delete is actually an update, and updates are actually inserts. A “tombstone” marker is written to indicate that the data is now (logically) deleted (also known as soft delete). Records marked tombstoned must be removed to claim back the storage space. Which is done by a process called Compaction. But remember that tombstones are eligible for physical deletion / garbage collection only after a specific number of seconds known as gc_grace_seconds. This is a very good blog to read more in detail : https://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
Now possibly you are looking into table size before gc_grace_seconds and data is still there.
Coming to your second issue where you want to fetch some samples from the table without providing partition keys. You can analyze your table content using Spark. The Spark Cassandra Connector allows you to create Java applications that use Spark to analyze database data. You can follow the articles / documentation to write a quick handy spark application to analyze Cassandra data.
https://www.instaclustr.com/support/documentation/cassandra-add-ons/apache-spark/using-spark-to-sample-data-from-one-cassandra-cluster-and-write-to-another/
https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/spark/sparkJavaApi.html
I would recommend not to delete records while you do the migration. Rather first complete the migration and post that do a quick validation / verification to ensure all records are migrated successfully (this use can easily do using Spark buy comparing dataframes from old and new tables). Post successful verification truncate the old table as truncate does not create tombstones and hence more efficient. Note that huge no of tombstone is not good for cluster health.

Cassandra query collection while specifying a partition key

I've been reading about indexes in Cassandra but I'm a little confused when it comes to creating an index on a collection like a set, list or map.
Let's say I have the following table and index on users like the following
CREATE TABLE chatter.channels (
id text PRIMARY KEY,
users set<text>
);
CREATE INDEX channels_users_idx ON chatter.channels (values(users));
INSERT INTO chatter.channels (id, users) VALUE ('ch1', {'jeff', 'jenny'});
In the docs, at least what I've found so far, says that this can have a huge performance hit because the indexes are created local on the nodes. And all the examples that are given query the tables like below
SELECT * FROM chatter.channels WHERE users CONTAINS 'jeff';
From my understanding this would have the performance hit because the partition key is not specified and all nodes must be queried. However, if I was to issue a query like below
SELECT * FROM chatter.channels WHERE id = 'ch1' AND users CONTAINS 'jeff';
(giving the partition key) then would I still have the performance hit?
How would I be able to check this for myself? In SQL I can run EXPLAIN and get some useful information. Is there something similar in Cassandra?
Cassandra provides tracing capability , this helps to trace the progression of reads and writes of queries in Cassandra.
To view traces, open -> cqlsh on one of your Cassandra nodes and run the following command:
cqlsh> tracing on;
Now tracing requests.
cqlsh> use [KEYSPACE];
I hope this helps in checking the performance of query.

Determining write nodes in Cassandra

I've just started reading about Cassandra and I can't quite understand how Cassandra manages to decide which nodes should it write the data to.
What I understand, is that Cassandra uses a part of primary key, specifically partition key, and partitioner to get a token by hashing the partition key, therefore a node/vnode to which that token is bound to.
Now let's say I have 2 nodes in my cluster, each has 256 vnodes on it + I'm not using any clustering keys, just a simple PK and a bunch of simple columns. Hashing partition key would clearly determine where the data should go. By this logic, there would be only 512 unique records available for storage.
Would be funny if true. So am I wrong at the partitioner part?
Consider the base case: just a single node, with a single token. Do you think it can story only one record? Of course not.
The hash determines which node the row will go to, true. But the primary key determines where in the node the row will be stored. And many distinct primary keys may result in the same hash, but they will all be stored separately by the node.

Cassandra: Sort by query

I have a little bit special request.
Constelation: I use a Redis DB to store geo data and use georedius to get them back, sorted by distance. With this keys I search the data in cassandra. But the result of cassandra is sorted in the key or something else.
What I want is, to get the inforamtions back in the same order i requested it.
The partition key is build from id (I get back form redis) and a status.
Could I tell cassandra to sort by id array?
Partition key are designed to be randomly distributed across different nodes. You can use ByteOrderedPartitioner to do ordered queries. But BOP are considered anti-pattern is cassandra and I will highly recommend against it. You can read more about it here Cassandra ByteOrderedPartitioner.
You can add more parameters to the Primary Key which will determine how to store data on the disk. These are known as clustering keys. You can do Order By queries on clustering keys. This is a good document on clustering keys https://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_compound_keys_c.html.
If you can share more schema details, I can suggest what to use as clustering key.

Is it possible to have a Cassandra table that doesn't replicate?

There is one particular table in the system that I actually want to stay unique on a per server basis.
i.e. http://server1.example.com/some_stuff.html and http://server2.example.com/some_stuff.html should store and show data unique to that particular server. If one of the server dies, that table and its data goes with it.
I think CQL does not support table-level replication factors (see available create table options). One alternative is to create a key-space with a replication factor = 1:
CREATE KEYSPACE <ksname>
WITH replication = {'class':'<strategy>' [,'<option>':<val>]};
Example:
To create a keyspace with SimpleStrategy and "replication_factor" option
with a value of "1" you would use this statement:
CREATE KEYSPACE <ksname>
WITH replication = {'class':'SimpleStrategy', 'replication_factor':1};
then all tables created in that key-space will have no replication.
If you would like to have a table for each node, then I think Cassandra does not directly support that. One work-around is to start an additional Cassandra cluster for each node where each cluster only have one node.
I am not sure why you would want to do that, but here is my take on this:
Distribution of the actual your data among the nodes in your Cassandra cluster is determined by the row key.
Just setting the replication factor to 1 will not put all data from one column family/table on 1 node. The data will still be split/distributed according to your row key.
Exactly WHERE your data will be stored is determined by the row key along with the partitioner as documented here. This is an inherent part of a DDBS and there is no easy way to force this.
The only way I can think of to have all the data for one server phisically in one table on one node, is:
use one row key per server and create (very) wide rows, (maybe using composite column keys)
and trick your token selection so that all row key tokens map to the node you are expecting (http://wiki.apache.org/cassandra/Operations#Token_selection)

Resources