How to understand CQL3 table model from scratch? - cassandra

I recently visited http://cassandra.apache.org/doc/cql3/CQL.html#CQLSyntax and just read this:
[...] please note that as such, they do not refer to the concept of
rows and columns found in the internal implementation of Cassandra and
in the thrift and CQL v2 API.
If we assume that I've understood Cassandras data model (column family, etc.) I do not understand where CQL3 differs from?
Is the table model of CQL3 related to Cassandras Column Family however?
I mean, how about perfomance issues?
How is the CQL3 implementation compared to the relational tables?
What is the internal implementation and/or concept of CQL3's row-and-column-model?
I know that there is something like Composite Column model. Is this the difference to the classical Cassandra row-column-model?
Moreover I am interested in theoretical details.
Asking these questions: What did I not understand?
Unfortunately I just started getting deeper into CQL. But I am very familiar with the cassandra-cli and (my)SQL.

This should be your one-stop-shopee to learn CQL data model:
http://www.datastax.com/dev/blog/cql3-for-cassandra-experts

Related

Cassandra is column-based or column-oriented?

What are differences between column-based or column-oriented?
Is there any differences for Cassandra about two of them?
Please give example for each of them?
Column-based and column-oriented are essentially the same thing. Essentially, data for specific columns is stored together to make querying that data faster, as well as scalable. Examples of columnar DBMS products are: Druid, MonetDB, and Vertica.
In terms of how Cassandra relates, the answer to that is that it doesn't. Cassandra is a partitioned row-store. Column values are stored by partitions and rows.
You are not alone in this perception, as many people mistake Cassandra for a "columnar" data store. Earlier versions of Cassandra were considered "schemaless," so that may be where some of the confusion originates. But Cassandra has never embraced a storage model which keeps data for specific columns together.

Cassandra version differences

I started reading Cassandra the definitive guide, which is based on Cassandra 0.7. Now, I'm trying to experiment with Cassandra 2.1.5 and it seems that there's a lot of differences which is really confusing.
For example, I see that in 0.7 version CQL did not exist. On the other hand, data model seems quite different. You can now define a schema with CQL, while in version 0.7 there was no schema.
Can anyone shortly explain the differences, especially about the data model?
I understand that in 0.7 version the idea was about different length rows, that is, rows that have different number of columns. But now I understand that each column is actually a field that contains a number of parameters, so you can have as much fields as you want within the same row (same key).
Can someone summarize the differences? Maybe I did not understand correctly.
An important point to consider, is that the underlying storage model remains the same. CQL is simply an abstraction layer on top of that model, to make it easier to work with and model your data. DataStax MVP John Berryman has a great article on this: Understanding How CQL3 Maps to Cassandra’s Internal Data Structure
In this article, Berryman observes that:
The value of the CQL primary key is used internally as the row key (which in the new CQL paradigm is being called a “partition key”).
The names of the non-primary key CQL fields are used internally as columns names. The values of the non-primary key CQL fields are then internally stored as the corresponding column values.
Additionally, he outlines the benefits of using the CQL-based approach:
It provides fast look-up by partition key and efficient scans and slices by cluster key.
It groups together related data as CQL rows. This means that you can do in one query what would otherwise take multiple queries into different column families.
It allows for individual fields to be added, modified, and deleted independently.
It is strictly better than the old Cassandra paradigm. Proof: you can coerce CQL Tables to behave exactly like old-style Cassandra ColumnFamilies. (See the examples here.)
It extends easily to implementation of sets lists and maps (which are super ugly if you’re working directly in old cassandra) — but that’s for another blog post.
The CQL protocol allows for asynchronous communication as compared with the synchronous, call-response communication required by Thrift. As a result, CQL is capable of being much faster and less resource intensive than Thrift – especially when using single threaded clients.
can have as much fields as you want within the same row (same key).
Actually, there is a hard limit of about 2 billion columns per partition (rowkey).

Is it possible to insert/write data without defining columns in Cassandra?

I am trying to understand the fundamentals of Cassandra data model. I am using CQL. As per I know the schema must be defined before anyone can insert into new columns. If someone needs to add any column can use ALTER TABLE and can INSERT value to that new column.
But in cassandra definitive guide there is written that Cassandra is schema less.
In Cassandra, you don’t define the columns up front; you just define the column
families you want in the keyspace, and then you can start writing data without defining
the columns anywhere. That’s because in Cassandra, all of a column’s names are
supplied by the client.
I am getting confused and not finding any expected answer. Can someone please explain it to me or tell me if I am missing somthing?
Thanks in advance.
Theres two different APIs to interact with Cassandra for writing data. First there's the thrift API which always allowed to create columns dynamically, but also supports adding meta data for your columns.
Next theres the newer CQL based API. CQL was created to provide another abstraction layer that would make it more user friendly to work with Cassandra. With CQL you're required to define a schema upfront for your column names and datatypes. However, that doesn't mean its not possible to use dynamic columns using CQL.
See here for differences:
http://www.datastax.com/dev/blog/thrift-to-cql3
You are reading "Cassandra, the definitive guide": a 3/4 years old book that is telling you something that has changed long time ago. Now you have to define the tables structure before being able to write data.
Here you can find some reasons behind CQL introduction and the schema-less abandonment.
The official Datastax documentation should be your definitive guide.
HTH,
Carlo

cassandra data model step by step

I've started to learn cassandra, at first I want to learn cassandra data model but I don't know from where I must start, I have seen many web pages and the cassandra documentation (http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_intro_c.html)
but I really confused. In its documentation it only talks about some examples that is so similiar to relational db without talking about super column concept or others concepts which we can find these concepts in others urls.
I need a step by step tutorial for data modeling which is straight forward.
Regards
Although CQL looks similar to SQL, they are very different. CQL is very limited compared to SQL and you need to understand how data is stored and retrieved in Cassandra based on the partition key and clustering columns. Until you understand how the keys work, you will be lost.
I haven't seen a very good overview of Cassandra on the web, but if you're willing to spring for a book, a good introduction to Cassandra and how it works is called "Apache Cassandra Hands-On Training Level One".

Confusing between Thrift API and CQL

I am working in a Java web application, using NoSQL (target is Cassandra). I use Astyanax as Cassandra client since it is suggested the best client of Cassandra for now. I've just approached Cassandra for 2 weeks, so many things is so weird to me.
During my working, I encountered some problems and I do not know how to overcome:
Is table created from CQL like column family created by Thrift API? I feel they are similar, but maybe there are some differences behind. For example:
table create by CQL command cannot be accessed by Thrift API
Thrift-based APIs cannot work with tables created by CQL, but CQL methods can access column family created by Thrift API!
​Is primary key in table correspond to row key in column family?
In CQL I can declare a table which contains a collection/set/map inside. Can I do the same thing in Thrift API?
If my application needs both of them (column families and tables), how can they deal with each other?
I recognize one thing: I cannot use Thrift API to do manipulating data on tables create by CQL, and vice versa. I wonder that that, how can I remember which table/column family created from which way so that I can use the correct APIs to process data? For the time being, we don't have a general way to handle two of them, do we? AFAIK, Thrift API and CQL do not have a same interface, so they cannot understand each other?!
Could you please help me explain these things? Thank you so much.
Yes. It's impossible to update the Thrift APIs to be CQL-aware without breaking existing applications. So if you use CQL you are committing to using CQL clients only like the Java driver, and not Astyanax, Hector, et al. But this is no great sacrifice since CQL is much more usable.
For a simple PK (i.e., single column), yes. For a compound PK, it's a bit more complicated.
No. The Thrift API operates at a lower level, by design. (So you'd see the individual storage cells that make up the Map, for instance.)
I don't understand the question. With CQL you can do everything you could do with Thrift, but more easily.
Simple; don't mix the two. Stick with one or the other.
In my opinion, I believe focus is shifting towards making cassandra look like a RDBMS with SQL Queries to gain wider adoption.
But with inconsistencies between work done using Hector/Astyanax(thrift) and CQL, i think it will hurt adoption. Its almost a U turn from hector/astyanax to CQL in the middle of the journey.
Atleast CQL should have been planned in such a way that Thrift api (and high level java apis on top of it) have no problem in transitioning.

Resources