Cassandra: delete last record with static column - cassandra

We are unable to delete the last row in a table with a static column.
We have tried with Cassandra 2.2, 3.0 and 3.11.2. With 1 or more as replication factor.
You can reproduce this by creating the following table:
CREATE TABLE playlists (
username text,
playlist_id bigint,
playlist_order bigint,
last_modified bigint static,
PRIMARY KEY ((username, playlist_id), playlist_order)
)
WITH CLUSTERING ORDER BY (playlist_order DESC);
Then insert some test data:
INSERT INTO
playlists (
username,
playlist_id,
playlist_order,
last_modified)
values (
'test',
123,
123,
123);
Then delete said row:
DELETE FROM playlists WHERE username = 'test' AND playlist_id = 123 AND playlist_order = 123;
Now do a select:
SELECT * FROM playlists WHERE username = 'test' AND playlist_id = 123;
Your result should look like this:
username | playlist_id | playlist_order | last_modified
----------+-------------+----------------+---------------
test | 123 | null | 123
As you can see the record is not deleted, only the clustering column is deleted. We suspect this has to do with the static column but am unable to explain it beyond that.
However if you omit the clustering key in the delete query, like so:
DELETE FROM playlists WHERE username = 'test' AND playlist_id = 123;
Then the record is deleted, but this requires unnecessary application logic to complete.
The behaviour only applies to the last record with the shared static column, you can populate the table with multiple records and delete those successfully, but the last one will always be dangling.

Static columns exist per partition so in your case the last_modified value 123 exists for all rows in the partition test:123.
Your DELETE statement will not delete the static column because you are specifying a specific row for deletion. The static column will remain even though there are no rows left in the partition.
To delete the static column you need to issue:
DELETE last_modified FROM playlists WHERE username = 'test' AND 'playlist_id' = 123;
This will remove the static column from the partition.

Related

Get value from specific map-key in Cassandra

For example. I have a map under the column 'users' in a table called 'table' with primary key 'Id'.
If the map looks like this, {{'Phone': '1234567899'}, {'City': 'Dublin'}}, I want to get the value from key 'Phone' for specific 'Id', in Cassandra database.
Yes, that's possible to do with CQL when using a MAP collection.
To test this, I created a simple table using the specifications and data you mentioned above:
> CREATE TABLE stackoverflow.usermap (
id text PRIMARY KEY,
users map<text, text>);
> INSERT INTO usermap (id,users)
VALUES ('1a',{'Phone': '1234567899','City': 'Dublin'});
> SELECT * FROM usermap WHERE id='1a';
id | users
----+-------------------------------------------
1a | {'City': 'Dublin', 'Phone': '1234567899'}
(1 rows)
Then, I queried with the same WHERE clause, but altering my SELECT to pull back the user's phone only:
> SELECT users['Phone'] FROM usermap WHERE id='1a';
users['Phone']
----------------
1234567899
(1 rows)

Why does a (upserted) row disappear after updating the column to null? (But not when it's been inserted)

My understanding of inserts and updates in Cassandra was that they were basically the same thing. That's is also what the documentation says ( https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlUpdate.html?hl=upsert )
Note: Unlike the INSERT command, the UPDATE command supports counters. Otherwise, the UPDATE and INSERT operations are identical.
So aside from support for counters they should be the same.
But then I ran across a problem where rows that where create via update would disappear if I set columns to null, whereas this doesn't happen if the rows are created with insert.
cqlsh:test> CREATE TABLE IF NOT EXISTS address_table (
... name text PRIMARY KEY,
... addresses text,
... );
cqlsh:test> insert into address_table (name, addresses) values ('Alice', 'applelane 1');
cqlsh:test> update address_table set addresses = 'broadway 2' where name = 'Bob' ;
cqlsh:test> select * from address_table;
name | addresses
-------+-------------
Bob | broadway 2
Alice | applelane 1
(2 rows)
cqlsh:test> update address_table set addresses = null where name = 'Alice' ;
cqlsh:test> update address_table set addresses = null where name = 'Bob' ;
cqlsh:test> select * from address_table;
name | addresses
-------+-----------
Alice | null
(1 rows)
The same thing happens if I skip the separate step of first creating a row. With insert I can create a row with a null value, but if I use update the row is nowhere to be found.
cqlsh:test> insert into address_table (name, addresses) values ('Caroline', null);
cqlsh:test> update address_table set addresses = null where name = 'Dexter' ;
cqlsh:test> select * from address_table;
name | addresses
----------+-----------
Caroline | null
Alice | null
(2 rows)
Can someone explain what's going on?
We're using Cassandra 3.11.3
This is expected behavior. See details in https://issues.apache.org/jira/browse/CASSANDRA-14478
INSERT adds a row marker, while UPDATE does not. What does this mean? Basically an UPDATE requests that individual cells of the row be added, but not that the row itself be added; So if one later deletes the same individual cells with DELETE, the entire row goes away. However, an "INSERT" not only adds the cells, it also requests that the row be added (this is implemented via a "row marker"). So if later all the row's individual cells are deleted, an empty row remains behind (i.e., the primary of the row which now has no content is still remembered in the table).

cassandra data consistency issue

Hi I'm new in Apache Cassandra and I found article about Basic Rules of Cassandra Data Modeling. In example 1 are created 2 tables
CREATE TABLE users_by_username (
username text PRIMARY KEY,
email text,
age int
)
CREATE TABLE users_by_email (
email text PRIMARY KEY,
username text,
age int
)
This tables contains same data (username, email and age). Here I don't understand how to insert data into two tables. I think, that I have to execute two separate inserts. One for table users_by_username and one for table users_by_email. But how to maintain data consistency between tables. For example when I insert data into first table and I forgot to insert data to second table ... or the other way
It's your job as developer to make sure that data is in sync. Although, you can use things like materialized views to generate another "table" with slightly different primary key (there are some rules on what could be changed). For your case, for example, you can have following:
CREATE TABLE users_by_username (username text PRIMARY KEY,
email text, age int);
create MATERIALIZED VIEW users_by_email as SELECT * from
users_by_username where email is not null and
username is not null primary key (email, username);
and if you insert data as
insert into users_by_username (username, email, age)
values ('test', 'test#domain.com', 30);
you can query the materialized view for data in addition to query by username
SELECT * from users_by_username where username = 'test' ;
username | age | email
----------+-----+-----------------
test | 30 | test#domain.com
SELECT * from users_by_email where email = 'test#domain.com';
email | username | age
-----------------+----------+-----
test#domain.com | test | 30

Cassandra migrate int to bigint

What would be the easiest way to migrate an int to a bigint in Cassandra? I thought of creating a new column of type bigint and then running a script to basically set the value of that column = the value of the int column for all rows, and then dropping the original column and renaming the new column. However, I'd like to know if someone has a better alternative, because this approach just doesn't sit quite right with me.
You could ALTER your table and change your int column to a varint type. Check the documentation about ALTER TABLE, and the data types compatibility matrix.
The only other alternative is what you said: add a new column and populate it row by row. Dropping the first column can be entirely optional: if you don't assign values when performing insert everything will stay as it is, and new records won't consume space.
You can ALTER your table to store bigint in cassandra with varint. See the example-
cassandra#cqlsh:demo> CREATE TABLE int_test (id int, name text, primary key(id));
cassandra#cqlsh:demo> SELECT * FROM int_test;
id | name
----+------
(0 rows)
cassandra#cqlsh:demo> INSERT INTO int_test (id, name) VALUES ( 215478936541111, 'abc');
cassandra#cqlsh:demo> SELECT * FROM int_test ;
id | name
---------------------+---------
215478936541111 | abc
(1 rows)
cassandra#cqlsh:demo> ALTER TABLE demo.int_test ALTER id TYPE varint;
cassandra#cqlsh:demo> INSERT INTO int_test (id, name) VALUES ( 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999, 'abcd');
cassandra#cqlsh:demo> SELECT * FROM int_test ;
id | name
------------------------------------------------------------------------------------------------------------------------------+---------
215478936541111 | abc
9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 | abcd
(2 rows)
cassandra#cqlsh:demo>

Primary key in cassandra is unique?

It could be kind of lame but in cassandra has the primary key to be unique?
For example in the following table:
CREATE TABLE users (
name text,
surname text,
age int,
adress text,
PRIMARY KEY(name, surname)
);
So if is it possible in my database to have 2 persons in my database with the same name and surname but different ages? Which means same primary key..
Yes the primary key has to be unique. Otherwise there would be no way to know which row to return when you query with a duplicate key.
In your case you can have 2 rows with the same name or with the same surname but not both.
By definition, the primary key has to be unique. But that doesn't mean you can't accomplish your goals. You just need to change your approach/terminology.
First of all, if you relax your goal of having the name+surname be a primary key, you can do the following:
CREATE TABLE users ( name text, surname text, age int, address text, PRIMARY KEY((name, surname),age) );
insert into users (name,surname,age,address) values ('name1','surname1',10,'address1');
insert into users (name,surname,age,address) values ('name1','surname1',30,'address2');
select * from users where name='name1' and surname='surname1';
name | surname | age | address
-------+----------+-----+----------
name1 | surname1 | 10 | address1
name1 | surname1 | 30 | address2
If, on the other hand, you wanted to ensure that the address is shared as well, then you probably just want to store a collection of ages in the user record. That could be achieved by:
CREATE TABLE users2 ( name text, surname text, age set<int>, address text, PRIMARY KEY(name, surname) );
insert into users2 (name,surname,age,address) values ('name1','surname1',{10,30},'address2');
select * from users2 where name='name1' and surname='surname1';
name | surname | address | age
-------+----------+----------+----------
name1 | surname1 | address2 | {10, 30}
So it comes back to what you actually need to accomplish. Hopefully the above examples give you some ideas.
The primary key is unique. With your data model, you can only have one age per (name, surname) combination.
Yes as mentioned in above comments you can have a composite key with name, surname, and age to achieve your goal but still, that won't solve the problem. Rather you can consider adding a new column userID and make that as the primary key. So even in case of name, surname and age duplicate, you don't have to revisit your data model.
CREATE TABLE users (
userId int,
name text,
surname text,
age int,
adress text,
PRIMARY KEY(userid)
);
I would state specifically that partition key should be unique.I could not get it in one place but from the following statements.
Cassandra needs all the partition key columns to be able to compute
the hash that will allow it to locate the nodes containing the
partition.
The partition key has a special use in Apache Cassandra beyond
showing the uniqueness of the record in the database..
Please note that there will not be any error if you insert same
partition key again and again as there is no constraint check.
Queries that you'll run equality searches on should be in a partition
key.
References
https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause
how Cassandra chooses the coordinator node and the replication nodes?
Insert query replaces rows having same data field in Cassandra clustering column

Resources