I am new to Cassandra and this may have been covered somewhere, but, I haven't been able to find it here, on Planet Cassandra, or, in the DataStax documentation.
I have inherited a set of keyspaces created by another programmer who has left the company. There is a particular data item that he was supposed to have stored in the keyspace, however, it's not listed in the schema (displayed from the Cassandra CLI).
The programmer stated that it was in the 'blob', however, there aren't any columns in the keyspace defined as a 'blob'.
When I use the DataStax DevCenter tool, however, there is a 'key' column listed as a 'blob' that isn't in the schema...
key (blob)
assignExpirydate (text)
bookingClass (text)
... etc.
Since it wasn't in the schema, I'm assuming that the column is created by Cassandra, and, not what I'm looking for, but, I would like to verify that.
So, my question is, is there some documentation that refers to (or person that knows) whether Cassandra creates this column? A quick explanation of it would be appreciated as well.
Thanks
If the table was created using an older version of cassandra through the cassandra-cli, you may not see it in dev center. You should take a look at the docs on using "thrift" tables from "cql3" to see what is going on. If you use the cassandra-cli and do a list on the table it should show you all the data in there. When using thrift you can insert data with any column name you want, it doesn't have to be defined in the schema.
Links to thrift/cql3 information:
http://www.datastax.com/dev/blog/thrift-to-cql3
http://www.datastax.com/dev/blog/cql3-for-cassandra-experts
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows
Related
I'm aware of the restrictions Cassandra has on modifying column data types once table is created or even dropping and adding column with same name of different data type.
Dropping and Adding is allowed with restrictions.
But, if we talk about actual scenarios, it's not that uncommon to modify table schema during initial phase of our project.
Example: modifying Name column of User table from TEXT to a UDT(User Defined Type) that could encapsulate more information.
Coming from a RDBMS background, this is a very strange behaviour and maybe someone with actual project experience on Cassandra can answer it.
How do we handle such scenario of modifying column datatypes ? And what are the best practices.
Also, is this a common behaviour with other NoSQL or columnar databases ?
I've seen it being written in multiple sources that it is perfectly normal, with Cassandra, to store data in the column name, while leaving the column value empty. I'm not sure I completely understand how that's possible. Can anyone throw more light on this, preferably with an example schema?
No, not any more. This used to be possible. It required the old (pre-3.x storage engine) and use of a Thrift-based API. But tables built with CQL (and the new storage engine) require all columns to be defined-up front, and do not allow it at runtime (at least, not in the same way that Thrift did).
The article referenced above is dated 2015, when this was still possible. Apache Cassandra is one of those techs that has changed a lot in a short time, quickly out-dating once accepted practices and recommendations.
Given a scenario where you have a User table, with id as PRIMARY KEY.
You have a column called email, and a column called name.
You want to UPDATE User.name based on User.email
I realized that the UPDATE command requires you to pass in a PRIMARY KEY. Does this mean I can't use a pure CQL migration, and would need to first query for the User.id primary key before I can UPDATE?
In this case, I DO know the PRIMARY KEY because the UUIDs are the same for dev and prod, but it feels dirty.
Yes, you're correct - you need to know primary key of the record to perform an update on the data, or deletion of specific record. There are several options here, depending of your data model:
Perform full scan of the table using effective token range scan (Look to this answer for more details);
If this is required very often, you can create a materialized view, with User.email as partition key, and fetch all message IDs that you can update (but you'll need to do this from your application, there is no nested query support in CQL). But also be aware that materialized views are "experimental" feature in Cassandra, and may not work all the time (it's more stable in DataStax Enterprise). Also, if you have some users with hundreds of thousands of emails, this may create big partitions.
Do like 2nd item with your code, by using an additional table
I think Alex's answer covers your question -- "how can I find a value in a PK column working backwards from a non-PK column's value?".
However, I think it's worth noting that asking this question indicates you should reconsider your data model. A rule of thumb in C* data model design is that you begin by considering the queries you need, and you've missed the UPDATE query use case. You can probably make things work without changing your model for now, but if you find you need to make other queries you're unprepared for, you'll run into operational issues with lots of indexes and/or MVs.
More generally, search around for articles and other resources about Cassandra data modeling. It sounds like you're basically using C* for a relational use case so you'll want to look into that.
I am trying to understand the fundamentals of Cassandra data model. I am using CQL. As per I know the schema must be defined before anyone can insert into new columns. If someone needs to add any column can use ALTER TABLE and can INSERT value to that new column.
But in cassandra definitive guide there is written that Cassandra is schema less.
In Cassandra, you don’t define the columns up front; you just define the column
families you want in the keyspace, and then you can start writing data without defining
the columns anywhere. That’s because in Cassandra, all of a column’s names are
supplied by the client.
I am getting confused and not finding any expected answer. Can someone please explain it to me or tell me if I am missing somthing?
Thanks in advance.
Theres two different APIs to interact with Cassandra for writing data. First there's the thrift API which always allowed to create columns dynamically, but also supports adding meta data for your columns.
Next theres the newer CQL based API. CQL was created to provide another abstraction layer that would make it more user friendly to work with Cassandra. With CQL you're required to define a schema upfront for your column names and datatypes. However, that doesn't mean its not possible to use dynamic columns using CQL.
See here for differences:
http://www.datastax.com/dev/blog/thrift-to-cql3
You are reading "Cassandra, the definitive guide": a 3/4 years old book that is telling you something that has changed long time ago. Now you have to define the tables structure before being able to write data.
Here you can find some reasons behind CQL introduction and the schema-less abandonment.
The official Datastax documentation should be your definitive guide.
HTH,
Carlo
Our CREATE TABLE statement uses a user defined type (the ones you create with CREATE TYPE). Is this supported in the stress tool in 2.1? It doesn't look that way if I look into StressProfile.java
Also I was wondering if there was a way to stress test multiple tables at the same time.
In my experience that is not possible. While using cassandra-stress (2.1) I furthermore noticed that not only UDTs but the CQL data type map is not supported as well.
I ended up to create one user profile for each table and dropped the map-typed columns from the table while stressing.