Cassandra altering table to add new columns adds null as text

Cassandra altering table to add new columns adds null as text - cassandra

I have a cassandra table with data in it.
I add three new columns country as text, lat and long as double.
When these columns are added null values are inserted against the already present rows in the table. However, null is inserted as text in country column and null as value in lat and long columns.
Is this something that is the default behavior and can I add null as value under the newly created text columns?

Cassandra uses null to show that value is missing, not that this is explicitly inserted. In your case, when you add new columns - they are just added to table's specification stored in Cassandra itself - existing data (stored in SSTables) is not modified, so when Cassandra reads old data it doesn't find values for that columns in SSTable, and output null instead.
But you can have the same behavior without adding new columns - just don't insert value for specific regular column (you must have non-null values for columns of primary key!). For example:
cqlsh> create table test.abc (id int primary key, t1 text, t2 text);
cqlsh> insert into test.abc (id, t1, t2) values (1, 't1-1', 't2-1');
cqlsh> insert into test.abc (id, t1) values (2, 't1-2');
cqlsh> insert into test.abc (id, t2) values (3, 't3-3');
cqlsh> SELECT * from test.abc;
id | t1 | t2
----+------+------
1 | t1-1 | t2-1
2 | t1-2 | null
3 | null | t3-3
(3 rows)

Related

Does last_insert_id() gives session specific value or global value across all sessions(multi-client environment)?

I have a table with AUTO_INCREMENT column in vertica db and using this column as foreign key for some other table. For that i need last inserted value for AUTO_INCREMENT column.
CREATE TABLE orders.order_test
(
order_id AUTO_INCREMENT(1,1,1) PRIMARY KEY,
order_type VARCHAR(255)
);
Found this function,but not sure how it works for multiple sessions?
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/VerticaFunctions/LAST_INSERT_ID.htm
Above link says this:
Returns the last value of an AUTO_INCREMENT/IDENTITY column. If multiple sessions concurrently load the same table with an AUTO_INCREMENT/IDENTITY column, the function returns the last value generated for that column.

It is by session.
Let's test it.
Two command line windows. Starting vsql on both.
The transcript is the whole of the sessions.
Transaction 1:
sbx=> select export_objects('','id1',false);
CREATE TABLE dbadmin.id1
(
id IDENTITY ,
num int
);
[. . .]
sbx=> select * from id1;
id | num
--------+-----
250001 | 1
sbx=> \pset null NULL
Null display is "NULL".
sbx=> SELECT LAST_INSERT_ID();
LAST_INSERT_ID
----------------
NULL
-- insert a row ...
sbx=> INSERT INTO id1 (num) VALUES(2);
OUTPUT
--------
1
sbx=> SELECT LAST_INSERT_ID();
LAST_INSERT_ID
----------------
500001
Transaction 2:
sbx=> SELECT LAST_INSERT_ID();
LAST_INSERT_ID
----------------
NULL
-- now insert another row ...
sbx=> INSERT INTO id1 (num) VALUES(3);
OUTPUT
--------
1
sbx=> SELECT LAST_INSERT_ID();
LAST_INSERT_ID
----------------
750001
Now, back to Transaction 1:
sbx=> SELECT LAST_INSERT_ID();
LAST_INSERT_ID
----------------
500001
Still at the old value ...

Why does a (upserted) row disappear after updating the column to null? (But not when it's been inserted)

My understanding of inserts and updates in Cassandra was that they were basically the same thing. That's is also what the documentation says ( https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlUpdate.html?hl=upsert )
Note: Unlike the INSERT command, the UPDATE command supports counters. Otherwise, the UPDATE and INSERT operations are identical.
So aside from support for counters they should be the same.
But then I ran across a problem where rows that where create via update would disappear if I set columns to null, whereas this doesn't happen if the rows are created with insert.
cqlsh:test> CREATE TABLE IF NOT EXISTS address_table (
... name text PRIMARY KEY,
... addresses text,
... );
cqlsh:test> insert into address_table (name, addresses) values ('Alice', 'applelane 1');
cqlsh:test> update address_table set addresses = 'broadway 2' where name = 'Bob' ;
cqlsh:test> select * from address_table;
name | addresses
-------+-------------
Bob | broadway 2
Alice | applelane 1
(2 rows)
cqlsh:test> update address_table set addresses = null where name = 'Alice' ;
cqlsh:test> update address_table set addresses = null where name = 'Bob' ;
cqlsh:test> select * from address_table;
name | addresses
-------+-----------
Alice | null
(1 rows)
The same thing happens if I skip the separate step of first creating a row. With insert I can create a row with a null value, but if I use update the row is nowhere to be found.
cqlsh:test> insert into address_table (name, addresses) values ('Caroline', null);
cqlsh:test> update address_table set addresses = null where name = 'Dexter' ;
cqlsh:test> select * from address_table;
name | addresses
----------+-----------
Caroline | null
Alice | null
(2 rows)
Can someone explain what's going on?
We're using Cassandra 3.11.3

This is expected behavior. See details in https://issues.apache.org/jira/browse/CASSANDRA-14478
INSERT adds a row marker, while UPDATE does not. What does this mean? Basically an UPDATE requests that individual cells of the row be added, but not that the row itself be added; So if one later deletes the same individual cells with DELETE, the entire row goes away. However, an "INSERT" not only adds the cells, it also requests that the row be added (this is implemented via a "row marker"). So if later all the row's individual cells are deleted, an empty row remains behind (i.e., the primary of the row which now has no content is still remembered in the table).

Cassandra migrate int to bigint

What would be the easiest way to migrate an int to a bigint in Cassandra? I thought of creating a new column of type bigint and then running a script to basically set the value of that column = the value of the int column for all rows, and then dropping the original column and renaming the new column. However, I'd like to know if someone has a better alternative, because this approach just doesn't sit quite right with me.

You could ALTER your table and change your int column to a varint type. Check the documentation about ALTER TABLE, and the data types compatibility matrix.
The only other alternative is what you said: add a new column and populate it row by row. Dropping the first column can be entirely optional: if you don't assign values when performing insert everything will stay as it is, and new records won't consume space.

You can ALTER your table to store bigint in cassandra with varint. See the example-
cassandra#cqlsh:demo> CREATE TABLE int_test (id int, name text, primary key(id));
cassandra#cqlsh:demo> SELECT * FROM int_test;
id | name
----+------
(0 rows)
cassandra#cqlsh:demo> INSERT INTO int_test (id, name) VALUES ( 215478936541111, 'abc');
cassandra#cqlsh:demo> SELECT * FROM int_test ;
id | name
---------------------+---------
215478936541111 | abc
(1 rows)
cassandra#cqlsh:demo> ALTER TABLE demo.int_test ALTER id TYPE varint;
cassandra#cqlsh:demo> INSERT INTO int_test (id, name) VALUES ( 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999, 'abcd');
cassandra#cqlsh:demo> SELECT * FROM int_test ;
id | name
------------------------------------------------------------------------------------------------------------------------------+---------
215478936541111 | abc
9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 | abcd
(2 rows)
cassandra#cqlsh:demo>

SASI Indexes in Cassandra seem to have some bugs

I just started working with the SASI index on Cassandra 3.7.0 and I encountered a problem which as I suspected was a bug. I had hardly tracked down the situation in which the bug showed up, here is what I found:
When querying with a SASI index, it may incorrectly return 0 rows, and changing a little conditions, it works again, like the following CQL code:
CREATE TABLE IF NOT EXISTS roles (
name text,
a int,
b int,
PRIMARY KEY ((name, a), b)
) WITH CLUSTERING ORDER BY (b DESC);
insert into roles (name,a,b) values ('Joe',1,1);
insert into roles (name,a,b) values ('Joe',2,2);
insert into roles (name,a,b) values ('Joe',3,3);
insert into roles (name,a,b) values ('Joe',4,4);
CREATE TABLE IF NOT EXISTS roles2 (
name text,
a int,
b int,
PRIMARY KEY ((name, a), b)
) WITH CLUSTERING ORDER BY (b ASC);
insert into roles2 (name,a,b) values ('Joe',1,1);
insert into roles2 (name,a,b) values ('Joe',2,2);
insert into roles2 (name,a,b) values ('Joe',3,3);
insert into roles2 (name,a,b) values ('Joe',4,4);
CREATE CUSTOM INDEX ON roles (b) USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = { 'mode': 'SPARSE' };
CREATE CUSTOM INDEX ON roles2 (b) USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = { 'mode': 'SPARSE' };
Noticing that I only change table roles2 from table roles's 'CLUSTERING ORDER BY (b DESC)' into 'CLUSTERING ORDER BY (b ASC)'.
When querying with statement select * from roles2 where b<3;, the rusult is two rows:
name | a | b
------+---+---
Joe | 1 | 1
Joe | 2 | 2
(2 rows)
However, if querying with select * from roles where b<3;, it returned no rows at all:
name | a | b
------+---+---
(0 rows)
This is not the only situation where the bug would show up, one time I created a SASI index with specific name like 'end_idx' on column 'end', the bug showed up, when I didn't specify the index name, it gone.
Please help me confirm this bug, or tell me if I'd used the SASI index the wrong way.

Timestamp with auto increment in Cassandra

Want to write System.currentMiliseconds in the cassandta table for each column by cassandra. For example
writeToCassandra(name, email)
in cassandra table:
--------------------------------
name | email| currentMiliseconds
Can cassandra prepare currentMiliseconds column automatically like auto increment ?
BR!

Cassandra has some sort of columnar database taste inside. So if you read docs how the columns are stored inside SSTable, you'll notice that each column has a personal write timestamp appended (used for conflict resolution, like last-write-wins strategy). You can query for that timestamp using writetime() function:
cqlsh:so> create table ticks ( id text primary key, value int);
cqlsh:so> insert into ticks (id, value) values ('foo', 1);
cqlsh:so> insert into ticks (id, value) values ('bar', 2);
cqlsh:so> insert into ticks (id, value) values ('baz', 3);
cqlsh:so> select id, value from ticks;
id | value
-----+-------
bar | 2
foo | 1
baz | 3
(3 rows)
cqlsh:so> select id, writetime(value) from ticks;
id | writetime(value)
-----+------------------
bar | 1448282940862913
foo | 1448282937031542
baz | 1448282945591607
(3 rows)
As you requested, I've not explicitly inserted write timestamp to DB, but able to query it. Note you cannot use writetime() function for PK.

You can try with: dateof(now())
e.g.
INSERT INTO YOUR_TABLE (NAME, EMAIL, DATE)
VALUES ('NAME', 'EMAIL', dateof(now()));

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Cassandra altering table to add new columns adds null as text - cassandra

Related

Does last_insert_id() gives session specific value or global value across all sessions(multi-client environment)?

Why does a (upserted) row disappear after updating the column to null? (But not when it's been inserted)

Cassandra migrate int to bigint

SASI Indexes in Cassandra seem to have some bugs

Timestamp with auto increment in Cassandra

Categories

Resources