Boolean in Cassandra - cassandra

I see an issue in Cassandra boolean datatype,
I have a table with one field as boolean
CREATE TABLE keyspace.issuetable (
"partitionId" text,
"name" text,
"field" text,
"testboolean" boolean,
PRIMARY KEY ("partitionId", "name"));
Now when I try to insert in table, I didn't add the boolean 'testboolean'
INSERT into keyspace.issuetable("partitionId", "name", "field")
VALUES ('testpartition', 'cluster1_name','testfiled');
Issue :
1) If the boolean entry (say testboolean entry) in INSERT query is not added so as per the data type it needs to be 'false' but it is added as null
SELECT * FROM issuetable ;
partitionId | name | field | testboolean
---------------+---------------+-----------+-------------
testpartition | cluster1_name | testfiled | null
Could you someone explain why? Also let me know the solution to solve this, I expect 'false' not 'null'

Cassandra is not like the traditional SQL databases. It does not store rows in tables. The best way to think about Cassandra's data model is to imagine a sortedMap<rowKey, map<columnKey, value>>.
This means that any particular row is not required to have the same fields/columns as any other one. In your example the inserted row simply does not have a property named testboolean.
To understand more I can recommend referring here.
And no, you cannot set a default value for a column (or rather you can do it only on application side).

Related

Cassandra: delete of data with TTL, ROW with "null" value instead of being removed

Small question regarding Cassandra please.
I have created a table as follow:
CREATE TABLE contract ( contractidentifier text, name text, telephone text, idnumber text, companyid text, company text, startdate timestamp, hiringdate timestamp, interviewdate timestamp,
PRIMARY KEY (contractidentifier, company, name)) WITH default_time_to_live = 2628000;
And the goal is very straightforward, the web application is just going to write some data about some short term contracts which only last for one month.
Since the employment is only a month long, what I would like to achieve from the table point of view is: "keep only the data for one month only. After that, it should be deleted".
With this requirement in mind, I simply used the TTL feature of Cassandra (see query, WITH default_time_to_live = 2628000).
Now, I come back after once month, expecting the data to be deleted. However, I can see the data is still there, with some null values:
C102403845 | null | null | SMITH | null | null | null | null | DELL | null | null | null | null | null | null
Questions:
What is the issue here please? Did I misunderstood the purpose of the TTL? (i.e. My understanding of the TTL is that the row will be entirely deleted after one month, not: the row is still after one month, with only some of the values being null)
If my understanding is correct, did I misconfigured something?
Finally, if the TTL is actually not the solution, what else could I use please?
Thank you
You would have inserted into table with updated ttl using the USING TTL construct. Otherwise it is not possible that table will have values after TTL time has passed. You can check remaining ttl for the columns for which the value is shown using following construct.
select ttl(column_name) from tablename where key= value;

Cassandra find records where list is empty [duplicate]

How do I query in cassandra for != null columns.
Select * from tableA where id != null;
Select * from tableA where name != null;
Then I wanted to store these values and insert these into different table.
I don't think this is possible with Cassandra. First of all, Cassandra CQL doesn't support the use of NOT or not equal to operators in the WHERE clause. Secondly, your WHERE clause can only contain primary key columns, and primary key columns will not allow null values to be inserted. I wasn't sure about secondary indexes though, so I ran this quick test:
create table nullTest (id text PRIMARY KEY, name text);
INSERT INTO nullTest (id,name) VALUES ('1','bob');
INSERT INTO nullTest (id,name) VALUES ('2',null);
I now have a table and two rows (one with null data):
SELECT * FROM nullTest;
id | name
----+------
2 | null
1 | bob
(2 rows)
I then try to create a secondary index on name, which I know contains null values.
CREATE INDEX nullTestIdx ON nullTest(name);
It lets me do it. Now, I'll run a query on that index.
SELECT * FROM nullTest WHERE name=null;
Bad Request: Unsupported null value for indexed column name
And again, this is done under the premise that you can't query for not null, if you can't even query for column values that may actually be null.
So, I'm thinking this can't be done. Also, if null values are a possibility in your primary key, then you may want to re-evaluate your data model. Again, I know the OP's question is about querying where data is not null. But as I mentioned before, Cassandra CQL doesn't have a NOT or != operator, so that's going to be a problem right there.
Another option, is to insert an empty string instead of a null. You would then be able to query on an empty string. But that still doesn't get you past the fundamental design flaw of having a null in a primary key field. Perhaps if you had a composite primary key, and only part of it (the clustering columns) had the possibility of being empty (certainly not part of the partitioning key). But you'd still be stuck with the problem of not being able to query for rows that are "not empty" (instead of not null).
NOTE: Inserting null values was done here for demonstration purposes only. It is something you should do your best to avoid, as inserting a null column value WILL create a tombstone. Likewise, inserting lots of null values will create lots of tombstones.
1) select * from test;
name | id | address
------------------+----+------------------
bangalore | 3 | ramyam_lab
bangalore | 4 | bangalore_ramyam
bangalore | 5 | jasgdjgkj
prasad | 11 | null
prasad | 12 | null
india | 6 | karnata
india | 7 | karnata
ramyam-bangalore | 3 | jasgdjgkj
ramyam-bangalore | 5 | jasgdjgkj
2)cassandra does't support null values selection.It is showing null for our understanding.
3) For handling null values use another strings like "not-available","null",then we can select data

Automatic timestamp in Slick 3 - null on insert

I've defined column like this:
def lastChecked = column[Timestamp]("LAST_CHECKED", O.Default(new Timestamp(System.currentTimeMillis())))
And when I insert data in the table I'm omitting this column. But Slick inserts this column as null value. How this can be fixed?
You need provide default value to field on DB level. For HSQLDB define column in this way:
last_checked TIMESTAMP DEFAULT CURRENT_TIMESTAMP
In slick will be enough to define field with timestamp type:
val lastChecked: Rep[java.sql.Timestamp] = column[java.sql.Timestamp]("last_checked")
According to slick documantation O.Default used only for DDL statements.

Cassandra : Data Modelling

I currently have a table in cassandra called macrecord which looks something like this :
macadd | position | record | timestamp
-------------------+----------+--------+---------------------
23:FD:52:34:DS:32 | 1 | 1 | 2015-09-28 15:28:59
However i now need to make queries which will use the timestamp column to query for a range. I don't think it is possible to do so without timestamp being part of the primary key (macadd in this case) i-e without it being a clustering key.
If i make timestamp as part of the primary key the table looks like below :
macadd | timestamp | position | record
-------------------+---------------------+----------+--------
23:FD:52:34:DS:32 | 2015-09-28 15:33:26 | 1 | 1
However now i cannot update the timestamp column whenever i get a duplicate macadd.
update macrecord set timestamp = dateof(now()) where macadd = '23:FD:52:34:DS:32';
gives an error :
message="PRIMARY KEY part timestamp found in SET part"
I cannot think of an other solution in this case other than to delete the whole row if there is a duplicate value of macadd and then to insert a new row with an updated timestamp.
Is there a better solution to update the timestamp whenever there is a duplicate value of macadd or an alternative way to query for timestamp values in a range in my original table where only macadd is the primary key.
To do a range query in CQL, you'll need to have timestamp as a clustering key. But as you have seen, you can't update key fields without doing a delete and insert of the new key.
One option that will become available in Cassandra 3.0 when it is released in October is materialized views. That would allow you to have timestamp as a value column in the base table and as a clustering column in the view. See an example here.

Updating a Column in Cassandra based on Where Clause

I have a very simple table
cqlsh:hell> describe columnfamily info ;
CREATE TABLE info (
nos int,
value map<text, text>,
PRIMARY KEY (nos)
)
The following is the query where I am trying to update the value .
update info set value = {'count' : '0' , 'onget' : 'function onget(value,count) { count++ ; return {"value": value, "count":count} ; }' } where nos <= 1000 ;
Bad Request: Invalid operator LTE for PRIMARY KEY part nos
I use any operator for specifying the constraint . It complains saying invalid operator. I am not sure what I am doing wrong in here , according to cassandra 3.0 cql doc, there are similar update queries.
The following is my version
[cqlsh 4.1.0 | Cassandra 2.0.3 | CQL spec 3.1.1 | Thrift protocol 19.38.0]
I have no idea , whats going wrong.
The answer is really in my comment, but it needs a bit of elaboration. To restate from the comment...
The first predicate of the where clause has to uniquely identify the partition key. In your case, since the primary key is only one column the partition key == the primary key.
Cassandra can't do range scans over partitions. In the language of CQL, a partition is a potentially wide storage row that is uniquely identified by a key. In this case, the values in your nos column. The values of the partition keys are hashed into tokens which explicitly identify where that data lives in the cluster. Since that hash has no order to it, Cassandra cannot use any operator other than equality to route a statement to the correct destination. This isn't a primary key index that could potentially be updated, it is the fundamental partitioning mechanism in Cassandra. So, you can't use inequality operators as the first clause of a predicate. You can use them in subsequent clauses because the partition has been identified and now you're dealing with an ordered set of columns.
You can't use non-equal condition on the partition key (nos is your partition key).
http://cassandra.apache.org/doc/cql3/CQL.html#selectWhere
Cassandra currently does not support user defined functions inside a query such as the following.
update info set value = {'count' : '0' , 'onget' : 'function onget(value,count) { count++ ; return {"value": value, "count":count} ; }' } where nos <= 1000 ;
First, can you push this onget function into the application layer? You can first query all the rows which nos < 1000. Then increment rows those via some batch query.
Otherwise, you can use a counter column for nos, not a int data type. Notice though, you cannot mix map data type with counter column families unless the non-counter columns are part of a composite key.
Also, you probably doe not want to have nos, a column that changes value as the primary key.
CREATE TABLE info (
id UUID,
value map<text, text>,
PRIMARY KEY (id)
)
CREATE TABLE nos_counter (
info_id UUID,
nos COUNTER,
PRIMARY KEY (info_id)
)
Now you can update the nos counter like this.
update info set nos = nos + 1 where info_id = 'SOME_UUID';

Resources