Automatic timestamp in Slick 3 - null on insert - slick

I've defined column like this:
def lastChecked = column[Timestamp]("LAST_CHECKED", O.Default(new Timestamp(System.currentTimeMillis())))
And when I insert data in the table I'm omitting this column. But Slick inserts this column as null value. How this can be fixed?

You need provide default value to field on DB level. For HSQLDB define column in this way:
last_checked TIMESTAMP DEFAULT CURRENT_TIMESTAMP
In slick will be enough to define field with timestamp type:
val lastChecked: Rep[java.sql.Timestamp] = column[java.sql.Timestamp]("last_checked")
According to slick documantation O.Default used only for DDL statements.

Related

Delete records in Cassandra table based on time range

I have a Cassandra table with schema:
CREATE TABLE IF NOT EXISTS TestTable(
documentId text,
sequenceNo bigint,
messageData blob,
clientId text
PRIMARY KEY(documentId, sequenceNo))
WITH CLUSTERING ORDER BY(sequenceNo DESC);
Is there a way to delete the records which were inserted between a given time range? I know internally Cassandra must be using some timestamp to track the insertion time of each record, which would be used by features like TTL.
Since there is no explicit column for insertion timestamp in the given schema, is there a way to use the implicit timestamp or is there any better approach?
There is never any update to the records after insertion.
It's an interesting question...
All columns that aren't part of the primary key have so-called WriteTime that could be retrieved using the writetime(column_name) function of CQL (warning: it doesn't work with collection columns, and return null for UDTs!). But because we don't have nested queries in the CQL, you will need to write a program to fetch data, filter out entries by WriteTime, and delete entries where WriteTime is older than your threshold. (note that value of writetime is in microseconds, not milliseconds as in CQL's timestamp type).
The easiest way is to use Spark Cassandra Connector's RDD API, something like this:
val timestamp = someDate.toInstant.getEpochSecond * 1000L
val oldData = sc.cassandraTable(srcKeyspace, srcTable)
.select("prk1", "prk2", "reg_col".writeTime as "writetime")
.filter(row => row.getLong("writetime") < timestamp)
oldData.deleteFromCassandra(srcKeyspace, srcTable,
keyColumns = SomeColumns("prk1", "prk2"))
where: prk1, prk2, ... are all components of the primary key (documentId and sequenceNo in your case), and reg_col - any of the "regular" columns of the table that isn't collection or UDT (for example, clientId). It's important that list of the primary key columns in select and deleteFromCassandra was the same.

Boolean in Cassandra

I see an issue in Cassandra boolean datatype,
I have a table with one field as boolean
CREATE TABLE keyspace.issuetable (
"partitionId" text,
"name" text,
"field" text,
"testboolean" boolean,
PRIMARY KEY ("partitionId", "name"));
Now when I try to insert in table, I didn't add the boolean 'testboolean'
INSERT into keyspace.issuetable("partitionId", "name", "field")
VALUES ('testpartition', 'cluster1_name','testfiled');
Issue :
1) If the boolean entry (say testboolean entry) in INSERT query is not added so as per the data type it needs to be 'false' but it is added as null
SELECT * FROM issuetable ;
partitionId | name | field | testboolean
---------------+---------------+-----------+-------------
testpartition | cluster1_name | testfiled | null
Could you someone explain why? Also let me know the solution to solve this, I expect 'false' not 'null'
Cassandra is not like the traditional SQL databases. It does not store rows in tables. The best way to think about Cassandra's data model is to imagine a sortedMap<rowKey, map<columnKey, value>>.
This means that any particular row is not required to have the same fields/columns as any other one. In your example the inserted row simply does not have a property named testboolean.
To understand more I can recommend referring here.
And no, you cannot set a default value for a column (or rather you can do it only on application side).

cassandra 2.0.9: query for undefined column

Using Cassandra 2.0.9 CQL, how does one query for rows that don't have a particular column defined? For example:
create table testtable ( id int primary key, thing int );
create index on testtable ( thing );
# can now select rows by thing
insert into testtable( id, thing ) values ( 100, 100 );
# row values will be persistent
update testtable using TTL 30 set thing=1 where id=100;
# wait 30 seconds, thing column will go away for the row
select * from testtable;
Ideally I'd like to be able to do something like this:
select * from testtable where NOT DEFINED thing;
or some such and have the row with the id==1 returned. Is there any way to search for rows that do not have a particular column value assigned?
I'm afraid I've been through the Datastax 2.0 manual, as well as the CQLSH help with no luck trying to find an operator or syntax for this. Thanks.
Doesn't appear to be available yet
https://issues.apache.org/jira/browse/CASSANDRA-3783

Selecting timeuuid columns corresponding to a specific date

Short version: Is it possible to query for all timeuuid columns corresponding to a particular date?
More details:
I have a table defined as follows:
CREATE TABLE timetest(
key uuid,
activation_time timeuuid,
value text,
PRIMARY KEY(key,activation_time)
);
I have populated this with a single row, as follows (f0532ef0-2a15-11e3-b292-51843b245f21 is a timeuuid corresponding to the date 2013-09-30 22:19:06+0100):
insert into timetest (key, activation_time, value) VALUES (7daecb80-29b0-11e3-92ec-e291eb9d325e, f0532ef0-2a15-11e3-b292-51843b245f21, 'some value');
And I can query for that row as follows:
select activation_time,dateof(activation_time) from timetest where key=7daecb80-29b0-11e3-92ec-e291eb9d325e
which results in the following (using cqlsh)
activation_time | dateof(activation_time)
--------------------------------------+--------------------------
f0532ef0-2a15-11e3-b292-51843b245f21 | 2013-09-30 22:19:06+0100
Now lets assume there's a lot of data in my table and I want to retrieve all rows where activation_time corresponds to a particular date, say 2013-09-30 22:19:06+0100.
I would have expected to be able to query for the range of all timeuuids between minTimeuuid('2013-09-30 22:19:06+0100') and maxTimeuuid('2013-09-30 22:19:06+0100') but this doesn't seem possible (the following query returns zero rows):
select * from timetest where key=7daecb80-29b0-11e3-92ec-e291eb9d325e and activation_time>minTimeuuid('2013-09-30 22:19:06+0100') and activation_time<=maxTimeuuid('2013-09-30 22:19:06+0100');
It seems I need to use a hack whereby I increment the second date in my query (by a second) to catch the row(s), i.e.,
select * from timetest where key=7daecb80-29b0-11e3-92ec-e291eb9d325e and activation_time>minTimeuuid('2013-09-30 22:19:06+0100') and activation_time<=maxTimeuuid('2013-09-30 22:19:07+0100');
This feels wrong. Am I missing something? Is there a cleaner way to do this?
The CQL documentation discusses timeuuid functions but it's pretty short on gte/lte expressions with timeuuids, beyond:
The min/maxTimeuuid example selects all rows where the timeuuid column, t, is strictly later than 2013-01-01 00:05+0000 but strictly earlier than 2013-02-02 10:00+0000. The t >= maxTimeuuid('2013-01-01 00:05+0000') does not select a timeuuid generated exactly at 2013-01-01 00:05+0000 and is essentially equivalent to t > maxTimeuuid('2013-01-01 00:05+0000').
p.s. the following query also returns zero rows:
select * from timetest where key=7daecb80-29b0-11e3-92ec-e291eb9d325e and activation_time<=maxTimeuuid('2013-09-30 22:19:06+0100');
and the following query returns the row(s):
select * from timetest where key=7daecb80-29b0-11e3-92ec-e291eb9d325e and activation_time>minTimeuuid('2013-09-30 22:19:06+0100');
I'm sure the problem is that cqlsh does not display milliseconds for your timestamps
So the real timestamp is something like '2013-09-30 22:19:06.123+0100'
When you call maxTimeuuid('2013-09-30 22:19:06+0100') as milliseconds are missing, zero is assumed so it is the same as calling maxTimeuuid('2013-09-30 22:19:06.000+0100')
And as 22:19:06.123 > 22:19:06.000 that causes record to be filtered out.
Not directly related to answer but as an additional addon to #dimas answer.
cqlsh (version 5.0.1) seem to show the miliseconds now
system.dateof(id)
---------------------------------
2016-06-03 02:42:09.990000+0000
2016-05-28 17:07:30.244000+0000

Cassandra-secondary index on part of the composite key?

I am using a composite primary key consisting of 2 strings Name1, Name2, and a timestamp (e.g. 'Joe:Smith:123456'). I want to query a range of timestamps given an equality condition for either Name1 or Name2.
For example, in SQL:
SELECT * FROM testcf WHERE (timestamp > 111111 AND timestamp < 222222 and Name2 = 'Brown');
and
SELECT * FROM testcf WHERE (timestamp > 111111 AND timestamp < 222222 and Name1 = 'Charlie);
From my understanding, the first part of the composite key is the partition key, so the second query is possible, but the first query would require some kind of index on Name2.
Is it possible to create a separate index on a component of the composite key? Or am I misunderstanding something here?
You will need to manually create and maintain an index of names if you want to use your schema and support the first query. Given this requirement, I question your choice in data model. Your model should be designed with your read pattern in mind. I presume you are also storing some column values as well that you want to query by timestamp. If so, perhaps the following model would serve you better:
"[current_day]:Joe:Smith" {
123456:Field1 : value
123456:Field2 : value
123450:Field1 : value
123450:Field2 : value
}
With this model you can use the current day (or some known day) as a sentinel value, then filter on first and last names. You can also get a range of columns by timestamp using the composite column names.

Resources