Using CQL3, is it possible to define multiple formats of composite columns within a table/column family? With the syntax:
PRIMARY KEY(A, B, C)
it looks like A becomes the row key, a composite column of B:C is created, and additional composite columns are created for each additional column with B:C prepended.
What if I wanted to have in that same column family, another composite column X:Y - can that be accomplished?
I am not sure if that is possible using CLI. But we are using PlayOrm for Cassandra and there it is very much possible. Basically, you can have millions of composite columns. Read this for more details. The example is given for a OneToMany relation where you can have multiple column with name like activities.act1, activities.act2. Similarly you can have more entities with *ToMany relationship and can save them in composite column. If you do not want to use *ToMany relationship, then you may try its #NoSqlEmbedded pattern which also stores the data in composite columns.
Primary key identifies the single row of the table (all listed components). The A is a partition key that defines the placement of this partition on particular server.
In CQL3 you have only one primary key per table/column family. If you need to access it differently, you may (with some limitations) use the materialized views, or duplicate data into separate table.
Related
I am new to Cassandra, and found below in the wikipedia.
A column family (called "table" since CQL 3) resembles a table in an RDBMS (Relational Database Management System). Column families contain rows and columns. Each row is uniquely identified by a row key. Each row has multiple columns, each of which has a name, value, and a timestamp. Unlike a table in an RDBMS, different rows in the same column family do not have to share the same set of columns, and a column may be added to one or multiple rows at any time.[29]
It said that 'different rows in the same column family do not have to share the same set of columns', but how to implement it? I have almost read all the documents in the offical site.
I can create table and insert data like below.
CREATE TABLE Emp_record(E_id int PRIMARY KEY,E_score int,E_name text,E_city text);
INSERT INTO Emp_record(E_id, E_score, E_name, E_city) values (101, 85, 'ashish', 'Noida');
INSERT INTO Emp_record(E_id, E_score, E_name, E_city) values (102, 90, 'ankur', 'meerut');
It's very like I did in the relational database. So how to create multiply rows with different columns?
I also found the offical document mentioned 'Flexible schema', how to understand it here?
Thanks very much in advance.
Column family is from the original design of Cassandra, when the data model looked like the Google BigTable or Apache HBase, and Thrift protocol was used for communication. But this required that schema was defined inside the application, and that makes access to data from many applications more problematic, as you need to update the schema inside all of them...
The CREATE TABLE and INSERT is a part of the Cassandra Query Language (CQL) that was introduced long time ago, and replaced Thrift-based implementation (Cassandra 4.0 completely removed the Thrift support). In CQL you need to have schema defined for a table, where you need to provide column name & type. If you really need to have dynamic columns, there are several approaches to that (I'll link answers that I already wrote over the time, so there won't duplicates):
If you have values of the same type, you can use one column as a name of the attribute/column, and another to store the value, like described here
if you have values of different types, you can also use one column as a name of attribute/column, and define multiple columns for values - one for each of the data types: int, text, ..., and you insert value into the corresponding columns only (described here)
you can use maps (described here) - it's similar to first or second, but mostly designed for very small number of "dynamic columns", plus have other limitations, like, you need to read the full map to fetch one value, etc.)
I am not good in English!
There is a table in Cassandra 3.5 which all columns of a row don't come at same time. Unique of table is some columns that are unique in a row together, but some of them are null at first. I can not set them the primary key because of null value. I have identify a column with name id and type uuid in Cassandra.
How can I have a unique key with that columns together in Cassandra?
Is my data model true?
How can I solve this problem?
You can't. It's not a relational DB. Use clustering and/or partitioning keys to add an unique constraint.
See this answer
To store unique values, create a separate table having your unique value as a key. Check if it exists by requesting this table before inserting a row. But beware, even doing this, you cannot ensure it will be unique in your final table if you have two concurrent inserts.
Basically, I would recommend using Cassandra as it really is: A data store. And find a way to implement your business logic where it belongs: in your code.
Im trying to learn cassandra but im confused with the terminology.
Many instances it says the row stores key/value pairs.
but, when I define a table its more like declaring a SQL table ie; you create a table and specify the column names and data types.
Can someone clarify this?
Cassandra is a column based NoSQL database. While yes at its lowest level it does store simple key-value pairs it stores these key-value pairs in collections. This grouping of keys and collections is analogous to rows and columns in a traditional relational model. Cassandra tables contain a schema and can be referenced (with restrictions) using a SQL-like language called CQL.
In your comment you ask about Apples being stored in a different table from oranges. The answer to that specific question is No it will be in the same table. However Cassandra tables have an additional concept call the Partition Key that doesn't really have an analgous concept in the relational world. Take for example the following table definition
CREATE TABLE fruit_types {
fruit text,
location text,
cost float,
PRIMARY KEY ((fruit), location)
}
In this table definition you will notice that we are defining the schema for the table. You will also notice that we are defining a PRIMARY KEY. This primary key is similar but not exactly like a relational concept. In Cassandra the PRIMAY KEY is made up of two parts the PARTITION KEY and CLUSTERING COLUMNS. The PARTITION KEY is the first fields specified in the PRIMARY KEY and can contain one or more fields delimitated by parenthesis. The purpose of the PARTITION KEY is to be hashed and used to define the node that owns the data and is also used to physically divide the information on the disk into files. The CLUSTERING COLUMNS make up the other columns listed in the PRIMARY KEY and amongst other things are used for defining how the data is physically stored on the disk inside the different files as specified by the PARTITION KEY. I suggest you do some additional reading on the PRIMARY KEY here if your interested in more detail:
https://docs.datastax.com/en/cql/3.0/cql/ddl/ddl_compound_keys_c.html
Basically cassandra storage is like sparse matrix, earlier version has a command line tool called cqlsh which can show the exact storage foot print of your columnfamily(aka table in latest version). Later community decided to keep RDBMS kind of syntax for better understanding coz the query language(CQL) syntax is similar to sql.
main storage is key(partition) (which is hash function result of chosen partition column in your table and rest of the columns will be tagged to it like sparse matrix.
I'm reading documentation on the Datastax site at http://www.datastax.com/documentation/cassandra/1.2/cassandra/cql_reference/create_table_r.html
and I see:
"When you use a composite partition key, Cassandra treats the columns in nested parentheses as partition keys and stores columns of a row on more than one node. "
The example given is:
CREATE TABLE Cats (
block_id uuid,
breed text,
color text,
short_hair boolean,
PRIMARY KEY ((block_id, breed), color, short_hair)
);
I understand how the cluster columns (in this case, color and short_hair) work in regard to how they are actually stored on disk as contiguous "columns" for the given row. What I don't understand is the line "...stores columns of a row on more than one node". Is this right?
For a given block_id and breed, doesn't this composite key just make a partition key similar to "block_id + breed", in which case the columns/clusters would be in the same row, whose physical location is determined by the partition key (block_id + breed) ?
Or is there some kind of splitting in this row going on because the primary key is based on two fields?
EDIT:
I think Richard's answer below is probably right, but I've also come across this in the Datastax documentation for 1.2 which enforces the first quote I posted:
"composite partition key - Stores columns of a row on more than one node using partition keys declared in nested parentheses of the PRIMARY KEY definition of a table."
Why would it say using plural partition key*s*... The fields that make up the composite key make up the only row key, as far as I know, and they are all used to make the key.
Then they say, the columns of a row can be split, which to me means a single row (with a given partition key) could have its columns split up on different nodes, which would mean the fields of the composite key are being handled separately.
Still a little confused on the Datastax documentation and whether it's actually right.
I think what it means is that rows with the same block_id are stored on different nodes. As you say, the partition key is like "block_id + breed", so columns with the same block_id but different breed will in general be stored on different nodes. But columns with the same block_id and breed will be stored on the same node.
Basically, the nodes that store a partition are found by a function of the partition key only. Whether it is composite or not, nothing else can join together or split rows.
I was going through the documentation of CQLv3.0
Should we specify composite keys across updates and selects like 'a:b:1' incase my comparator or key_validation is ascii, ascii, int?
There is no mention on <select expression> in select or way to specify composite columns and rows in update too <primary/composite key name>
Expecting some help over it
CQL 3 takes care of managing the actual composite types and values for you. CQL 3 rows are not necessarily the same as the underlying Cassandra rows ("storage engine rows"). The composite values are abstracted into separate columns.
The example at http://www.datastax.com/dev/blog/schema-in-cassandra-1-1 may help in understanding the transformation.