Extra column created by CQL inserts (comparing to cli) - cassandra

I see extra column being created in my column family when I use cql comparing to cli.
Create table using CQL and insert row:
cqlsh:cassandraSample> CREATE TABLE bedbugs(
... id varchar,
... name varchar,
... description varchar,
... primary key(id, name)
... ) ;
cqlsh:cassandraSample> insert into bedbugs (id, name, description)
values ('Cimex','Cimex lectularius','http://en.wikipedia.org/wiki/Bed_bug');
Now insert column using cli:
[default#cassandraSample] set bedbugs['BatBedBug']['C. pipistrelli:description']='google.com';
Value inserted.
Elapsed time: 1.82 msec(s).
[default#cassandraSample] list bedbugs
... ;
Using default limit of 100
Using default column limit of 100
-------------------
RowKey: Cimex
=> (column=Cimex lectularius:, value=, timestamp=1369682957658000)
=> (column=Cimex lectularius:description, value=http://en.wikipedia.org/wiki/Bed_bug, timestamp=1369682957658000)
-------------------
RowKey: BatBedBug
=> (column=C. pipistrelli:description, value=google.com, timestamp=1369688651442000)
2 Rows Returned.
cqlsh:cassandraSample> select * from bedbugs;
id | name | description
-----------+-------------------+--------------------------------------
Cimex | Cimex lectularius | http://en.wikipedia.org/wiki/Bed_bug
BatBedBug | C. pipistrelli | google.com
So, cql creates one extra column for each row, with empty non-primary key columns. Isn't it waste of space?

When you created a column family using CQLSh and specified primary key(Id, name) you make cassandra create two indices of the data stored one for data sorted by ID and the other for data sorted by name. but when you do this by cassandra-cli your column family doesn't have the index column. cassandra-cli doesn't support having secondary indexes. I hope I made sense to you I lack words to explain my understanding.

For compatibility with cassandra-cli and to prevent this extra column from being created, change your create table statement to include "WITH COMPACT STORAGE".
described here
So
CREATE TABLE bedbugs(
id varchar,
name varchar,
description varchar,
primary key(id, name)
);
becomes
CREATE TABLE bedbugs(
id varchar,
name varchar,
description varchar,
primary key(id, name)
) WITH COMPACT STORAGE;
WITH COMPACT STORAGE is also how you would go about supporting wide rows in cql.

Related

Cassandra add column after particular column

I need to alter the table to add a new column after a particular column or as last column, I have been through the document but no luck.
Let's say I'm starting with a table that has this definition:
CREATE TABLE mykeyspace.letterstable (
column_n TEXT,
column_b TEXT,
column_c TEXT,
column_z TEXT,
PRIMARY KEY (column_n));
1- Adding a column is a simple matter.
ALTER TABLE mykeyspace.letterstable ADD column_j TEXT;
2- After adding the new column, my table definition will look like this:
desc table mykeyspace.letterstable;
CREATE TABLE mykeyspace.letterstable (
column_n TEXT,
column_b TEXT,
column_c TEXT,
column_j TEXT,
column_z TEXT,
PRIMARY KEY (column_n));
This is because columns in Cassandra are stored by ASCII-betical order, after the keys (so column_n will always be first, because it is the only key). I can't tell Cassandra that I want my new column_j to go after column_z. It's going to put it between column_c and column_z on its own.
Cassandra will store table data based on partition & clustering key.
Standard CQL for adding column:
ALTER TABLE keyspace.table ADD COLUMN column1 columnType;
Running DESC table for a given table via CQLSH does not portray how the data is stored. It will always list the partition key & clustering key first; then the remaining columns in alphabetical order.
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/alter_table_r.html
Cassandra create table won't keep column order

Cassandra migrate int to bigint

What would be the easiest way to migrate an int to a bigint in Cassandra? I thought of creating a new column of type bigint and then running a script to basically set the value of that column = the value of the int column for all rows, and then dropping the original column and renaming the new column. However, I'd like to know if someone has a better alternative, because this approach just doesn't sit quite right with me.
You could ALTER your table and change your int column to a varint type. Check the documentation about ALTER TABLE, and the data types compatibility matrix.
The only other alternative is what you said: add a new column and populate it row by row. Dropping the first column can be entirely optional: if you don't assign values when performing insert everything will stay as it is, and new records won't consume space.
You can ALTER your table to store bigint in cassandra with varint. See the example-
cassandra#cqlsh:demo> CREATE TABLE int_test (id int, name text, primary key(id));
cassandra#cqlsh:demo> SELECT * FROM int_test;
id | name
----+------
(0 rows)
cassandra#cqlsh:demo> INSERT INTO int_test (id, name) VALUES ( 215478936541111, 'abc');
cassandra#cqlsh:demo> SELECT * FROM int_test ;
id | name
---------------------+---------
215478936541111 | abc
(1 rows)
cassandra#cqlsh:demo> ALTER TABLE demo.int_test ALTER id TYPE varint;
cassandra#cqlsh:demo> INSERT INTO int_test (id, name) VALUES ( 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999, 'abcd');
cassandra#cqlsh:demo> SELECT * FROM int_test ;
id | name
------------------------------------------------------------------------------------------------------------------------------+---------
215478936541111 | abc
9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 | abcd
(2 rows)
cassandra#cqlsh:demo>

Is there a performance hit with simplifying a partition key?

Let's say I have the following unsimplified column family:
CREATE TABLE emp (
empID int,
deptID int,
first_name varchar,
last_name varchar,
PRIMARY KEY ((empID, deptID)));
The partition key is both empID and deptID.
Under the assumption I will only search this table using both of these fields, can I simplify the table and rewrite is as following?
CREATE TABLE emp2 (
empID_deptID text
first_name varchar,
last_name varchar,
PRIMARY KEY (empID_deptID));
Yes you can, but I don't see any added value in doing it. In your first code example, Cassandra concatenates empID and deptID for you.
In the precise example that you provided, there will be no difference. As a matter of fact, that is how it was done before composite partition keys became allowed in the previous versions.

Cassandra Composite Columns - How are CompositeTypes chosen?

I'm trying to understand the type used when I create composite columns.
I'm using CQL3 (via cqlsh) to create the CF and then the CLI to issue a describe command.
The Types in the Columns sorted by: ...CompositeType(Type1,Type2,...) are not the ones I'm expecting.
I'm using Cassandra 1.1.6.
CREATE TABLE CompKeyTest1 (
KeyA int,
KeyB int,
KeyC int,
MyData varchar,
PRIMARY KEY (KeyA, KeyB, KeyC)
);
The returned CompositeType is
CompositeType(Int32,Int32,UTF8)
Shouldn't it be (Int32,Int32,Int32)?
CREATE TABLE CompKeyTest2 (
KeyA int,
KeyB varchar,
KeyC int,
MyData varchar,
PRIMARY KEY (KeyA, KeyB, KeyC)
);
The returned CompositeType is
CompositeType(UTF8,Int32,UTF8)
Why isn't it the same as the types used when I define the table? I'm probably missing something basic in the type assignment...
Thanks!
The composite column name is composed of the values of primary keys 2...n and the name of the non-primary key column being saved.
(So if you have 5 non-key fields then you'll have five such columns and their column names will differ only in the last composed value which would be the non-key field name.)
So in both examples the composite column is made up of the values of KeyB, KeyC and the name of the column being stored ("MyData", in both cases). That's why you're seeing those CompositeTypes being returned.
(btw, the first key in the primary key is the partitioning key and its value is only used as the row key (if you're familiar with Cassandra under the covers). It is not used as part of any of the composite column names.)

what is the reason for composite column, there must be at least one column which is not part of the primary key

From online document:
A CQL 3 table’s primary key can have any number (1 or more) of component columns, but there must be at least one column which is not part of the primary key.
What is the reason for that?
I tried to insert a row only with the columns in the composite key in CQL. I can't see it when I do SELECT
cqlsh:demo> CREATE TABLE DEMO (
user_id bigint,
dep_id bigint,
created timestamp,
lastupdated timestamp,
PRIMARY KEY (user_id, dep_id)
);
cqlsh:demo> INSERT INTO DEMO (user_id, dep_id)
... VALUES (100, 1);
cqlsh:demo> select * from demo;
cqlsh:demo>
But when I use cli, it shows up something:
default#demo] list demo;
Using default limit of 100
Using default column limit of 100
-------------------
RowKey: 100
1 Row Returned.
Elapsed time: 27 msec(s).
But can't see the values of the columns.
After I add the column which is not in the primary key, the value shows up in CQL
cqlsh:demo> INSERT INTO DEMO (user_id, dep_id, created)
... VALUES (100, 1, '7943-07-23');
cqlsh:demo> select * from demo;
user_id | dep_id | created | lastupdated
---------+--------+--------------------------+-------------
100 | 1 | 7943-07-23 00:00:00+0000 | null
Result from CLI:
[default#demo] list demo;
Using default limit of 100
Using default column limit of 100
-------------------
RowKey: 100
invalid UTF8 bytes 0000ab7240ab7580
[default#demo]
Any idea?
update: I found the reason why CLI returns invalid UTF8 bytes 0000ab7240ab7580, it's not compatible with the table created for from CQL3, if I use compact storage option, the value shows up correctly for CLI.
What's really happening under the covers is that the non-key values are being saved using the primary key values which make up the row key and column names. If you don't insert any non-key values then you're not really creating any new column family columns. The row key comes from the first primary key, so that's why Cassandra was able to create a new row for you, even though no columns were created with it.
This limitation is fixed in Cassandra 1.2, which is in beta now.

Resources