Inconsistent results when “using timestamp” on YugabyteDB YCQL

Inconsistent results when “using timestamp” on YugabyteDB YCQL - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
I'm trying to upsert, delete and upsert the same record using using timestamp syntax. The first upsert and delete are successful. After I delete the record, if I upsert the same record again, the update status is true, but the select statement is not showing the row.
ycqlsh:test> CREATE TABLE todo ( id int, seq int, task text, status boolean, primary key (id, seq) );
ycqlsh:test> insert into todo(id, seq, task, status) values(1, 1, 'sample', false);
ycqlsh:test> insert into todo(id, seq, task, status) values(1, 2, 'sample2', false);
ycqlsh:test> select * from todo;
id | seq | task | status
----+-----+---------+--------
1 | 1 | sample | False
1 | 2 | sample2 | False
(2 rows)
ycqlsh:test> UPDATE todo using timestamp 1000 SET status = false, task='sample3' WHERE id=1 and seq=3 returns status as row;
[applied] | [message] | id | seq | task | status
-----------+-----------+------+------+------+--------
True | null | null | null | null | null
ycqlsh:test> select * from todo;
id | seq | task | status
----+-----+---------+--------
1 | 1 | sample | False
1 | 2 | sample2 | False
1 | 3 | sample3 | False
(3 rows)
ycqlsh:test> delete from todo WHERE id=1 and seq=3;
ycqlsh:test> select * from todo;
id | seq | task | status
----+-----+---------+--------
1 | 1 | sample | False
1 | 2 | sample2 | False
(2 rows)
ycqlsh:test> UPDATE todo using timestamp 2000 SET status = false, task='sample3' WHERE id=1 and seq=3 returns status as row;
[applied] | [message] | id | seq | task | status
-----------+-----------+------+------+------+--------
True | null | null | null | null | null
ycqlsh:test> select * from todo;
id | seq | task | status
----+-----+---------+--------
1 | 1 | sample | False
1 | 2 | sample2 | False

This is because you use DELETE without USING. To reintroduce the row back into the table, you have to use UPDATE without USING TIMESTAMP.
The correct usage should have been:
delete from todo USING TIMESTAMP 1000 id=1 and seq=3;
If you want DELETE without the use of USING timestamp... then you should use timestamp values that are actually close to physical time in microseconds since epoch rather than numbers like 1000 or 2000.

Related

Why django-viewflow internal table 'viewflow_task_previous' goes backwards?

Given the following steps defined
And when I dig into the internal tables, especially viewflow_task_previous table, it seems the from and to are reversed?
pmas=> select * from viewflow_task_previous where from_task_id = 10248;
id | from_task_id | to_task_id
------+--------------+------------
9099 | 10248 | 10247
(1 row)
pmas=> select id, status, flow_task, status from viewflow_task where id = 10248;
id | status | flow_task | status
-------+----------+-------------------------------------------------------------------------+----------
10248 | ASSIGNED | connect_it/flows.new_circuit.flow.NewCircuit.external_task_installation | ASSIGNED
(1 row)
pmas=> select id, status, flow_task, status from viewflow_task where id = 10247;
id | status | flow_task | status
-------+--------+-------------------------------------------------------------------------+--------
10247 | DONE | connect_it/flows.new_circuit.flow.NewCircuit.external_task_provisioning | DONE
(1 row)
Could someone explain why and how this works?

viewflow_task_previous table created by models.ManyToManyField previous field of the Task model
https://github.com/viewflow/viewflow/blob/master/viewflow/models.py#L97
Yep, that gives some confusion on the SQL level

typeOrm unique row

I'm trying to make a Entity using typeOrm on my NestJS, and it's not working as I expected.
I have the following entity
#Entity('TableOne')
export class TableOneModel {
#PrimaryGeneratedColumn()
id: number
#PrimaryColumn()
tableTwoID: number
#PrimaryColumn()
tableThreeID: number
#CreateDateColumn()
createdAt?: Date
}
This code generate a migration that generates a table like the example below
+--------------+-------------+------+-----+----------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+----------------------+-------+
| id | int(11) | NO | | NULL | |
| tableTwoID | int(11) | NO | | NULL | |
| tableThreeID | int(11) | NO | | NULL | |
| createdAt | datetime(6) | NO | | CURRENT_TIMESTAMP(6) | |
+--------------+-------------+------+-----+----------------------+-------+
That's ok, the problem is, that I want to the table only allow one row with tableTwoID and tableThreeID, what should I use in the Entity to generated the table as I expected it to be?
Expected to not allow rows like the example below
+----+------------+--------------+----------------------------+
| id | tableTwoID | tableThreeID | createdAt |
+----+------------+--------------+----------------------------+
| 1 | 1 | 1 | 2019-10-30 19:27:43.054844 |
| 2 | 1 | 1 | 2019-10-30 19:27:43.819174 | <- should not allow the insert of this row
+----+------------+--------------+----------------------------+

Try marking the column as Unique
#Unique()
ColumnName

This is currently expected behavior from TypeORM. According to the documentation if you have multiple #PrimaryColumn() decorators you create a composite key. The combination of the composite key columns must be unique (in your above '1' + '1' + '1' = '111' vs '2' + '1' + '1' = '211'). If you are looking to make each column unique along with being a composite primary key, you should be able to do something like #PrimaryColumn({ unique: true })

Cassandra - select query with token() function

According to this documentation, I was trying a select query with token() function in it, but it gives wrong results.
I am using below cassandra version
[cqlsh 5.0.1 | Cassandra 2.2.5 | CQL spec 3.3.1 | Native protocol v4]
I was trying token query for below table -
CREATE TABLE price_key_test (
objectid int,
createdOn bigint,
price int,
foo text,
PRIMARY KEY ((objectid, createdOn), price));
Inserted data --
insert into nasa.price_key_test (objectid,createdOn,price,foo) values (1,1000,100,'x');
insert into nasa.price_key_test (objectid,createdOn,price,foo) values (1,2000,200,'x');
insert into nasa.price_key_test (objectid,createdOn,price,foo) values (1,3000,300,'x');
Data in table --
objectid | createdon | price | foo
----------+-----------+-------+-----
1 | 3000 | 300 | x
1 | 2000 | 200 | x
1 | 1000 | 100 | x
Select query is --
select * from nasa.price_key_test where token(objectid,createdOn) > token(1,1000) and token(objectid,createdOn) < token(1,3000)
This query suppose to return row with createdOn 2000, but it returns zero rows.
objectid | createdon | price | foo
----------+-----------+-------+-----
(0 rows)
According to my understanding, token(objectid,createdOn) > token(1,1000) and token(objectid,createdOn) < token(1,3000) should select row with partition key with value 1 and 2000.
Is my understanding correct?

Try flipping your greater/less-than signs around:
aploetz#cqlsh:stackoverflow> SELECT * FROM price_key_test
WHERE token(objectid,createdOn) < token(1,1000)
AND token(objectid,createdOn) > token(1,3000) ;
objectid | createdon | price | foo
----------+-----------+-------+-----
1 | 2000 | 200 | x
(1 rows)
Adding the token() function to your SELECT should help you to understand why:
aploetz#cqlsh:stackoverflow> SELECT objectid, createdon, token(objectid,createdon),
price, foo FROM price_key_test ;
objectid | createdon | system.token(objectid, createdon) | price | foo
----------+-----------+-----------------------------------+-------+-----
1 | 3000 | -8449493444802114536 | 300 | x
1 | 2000 | -2885017981309686341 | 200 | x
1 | 1000 | -1219246892563628877 | 100 | x
(3 rows)
The hashed token values generated are not necessarily proportional to their original numeric values. In your case, token(1,3000) generated a hash that was the smallest of the three, and not the largest.

retrieving data from cassandra database

I'm working on smart parking data stored in Cassandra database and i'm trying to get the last status of each device.
I'm working on self-made dataset.
here's the description of the table.
table description
select * from parking.meters
need help please !

trying to get the last status of each device
In Cassandra, you need to design your tables according to your query patterns. Building a table, filling it with data, and then trying to fulfill a query requirement is a very backward approach. The point, is that if you really need to satisfy that query, then your table should have been designed to serve that query from the beginning.
That being said, there may still be a way to make this work. You haven't mentioned which version of Cassandra you are using, but if you are on 3.6+, you can use the PER PARTITION LIMIT clause on your SELECT.
If I build your table structure and INSERT some of your rows:
aploetz#cqlsh:stackoverflow> SELECT * FROM meters ;
parking_id | device_id | date | status
------------+-----------+----------------------+--------
1 | 20 | 2017-01-12T12:14:58Z | False
1 | 20 | 2017-01-10T09:11:51Z | True
1 | 20 | 2017-01-01T13:51:50Z | False
1 | 7 | 2017-01-13T01:20:02Z | False
1 | 7 | 2016-12-02T16:50:04Z | True
1 | 7 | 2016-11-24T23:38:31Z | False
1 | 19 | 2016-12-14T11:36:26Z | True
1 | 19 | 2016-11-22T15:15:23Z | False
(8 rows)
And I consider your PRIMARY KEY and CLUSTERING ORDER definitions:
PRIMARY KEY ((parking_id, device_id), date, status)
) WITH CLUSTERING ORDER BY (date DESC, status ASC);
You are at least clustering by date (which should be an actual date type, not a text), so that will order your rows in a way that helps you here:
aploetz#cqlsh:stackoverflow> SELECT * FROM meters PER PARTITION LIMIT 1;
parking_id | device_id | date | status
------------+-----------+----------------------+--------
1 | 20 | 2017-01-12T12:14:58Z | False
1 | 7 | 2017-01-13T01:20:02Z | False
1 | 19 | 2016-12-14T11:36:26Z | True
(3 rows)

Removing redundant rows in a Spark data frame with time series data

I have a Spark data frame that looks like this (simplifying timestamp and id column values for clarity):
| Timestamp | id | status |
--------------------------------
| 1 | 1 | pending |
| 2 | 2 | pending |
| 3 | 1 | in-progress |
| 4 | 1 | in-progress |
| 5 | 3 | in-progress |
| 6 | 1 | pending |
| 7 | 4 | closed |
| 8 | 1 | pending |
| 9 | 1 | in-progress |
It's a time series of status events. What I'd like to end up with is only the rows representing a status change. In that sense, the problem can be seen as one of removing redundant rows - e.g. entries at times 4 and 8 - both for id = 1 - should be dropped as they do not represent a change of status for a given id.
For the above set of rows, this would give (order being unimportant):
| Timestamp | id | status |
--------------------------------
| 1 | 1 | pending |
| 2 | 2 | pending |
| 3 | 1 | in-progress |
| 5 | 3 | in-progress |
| 6 | 1 | pending |
| 7 | 4 | closed |
| 9 | 1 | in-progress |
Original plan was to partition by id and status, order by timestamp, and pick the first row for each partition - however this would give
| Timestamp | id | status |
--------------------------------
| 1 | 1 | pending |
| 2 | 2 | pending |
| 3 | 1 | in-progress |
| 5 | 3 | in-progress |
| 7 | 4 | closed |
i.e. it loses repeated status changes.
Any pointers appreciated, I'm new to data frames and may be missing a trick or two.

Using the lag window function should do the trick
case class Event(timestamp: Int, id: Int, status: String)
val events = sqlContext.createDataFrame(sc.parallelize(
Event(1, 1, "pending") :: Event(2, 2, "pending") ::
Event(3, 1, "in-progress") :: Event(4, 1, "in-progress") ::
Event(5, 3, "in-progress") :: Event(6, 1, "pending") ::
Event(7, 4, "closed") :: Event(8, 1, "pending") ::
Event(9, 1, "in-progress") :: Nil
))
events.registerTempTable("events")
val query = """SELECT timestamp, id, status FROM (
SELECT timestamp, id, status, lag(status) OVER (
PARTITION BY id ORDER BY timestamp
) AS prev_status FROM events) tmp
WHERE prev_status IS NULL OR prev_status != status
ORDER BY timestamp, id"""
sqlContext.sql(query).show
Inner query
SELECT timestamp, id, status, lag(status) OVER (
PARTITION BY id ORDER BY timestamp
) AS prev_status FROM events
creates table as below where prev_status is a previous value of status for a given id and ordered by timestamp.
+---------+--+-----------+-----------+
|timestamp|id| status|prev_status|
+---------+--+-----------+-----------+
| 1| 1| pending| null|
| 3| 1|in-progress| pending|
| 4| 1|in-progress|in-progress|
| 6| 1| pending|in-progress|
| 8| 1| pending| pending|
| 9| 1|in-progress| pending|
| 2| 2| pending| null|
| 5| 3|in-progress| null|
| 7| 4| closed| null|
+---------+--+-----------+-----------+
Outer query
SELECT timestamp, id, status FROM (...)
WHERE prev_status IS NULL OR prev_status != status
ORDER BY timestamp, id
simply filters rows where prev_status is NULL (first row for a given id) or prev_status is different than status (there was a status change between consecutive timestamps). Order added just to make a visual inspection easier.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Inconsistent results when “using timestamp” on YugabyteDB YCQL - yugabytedb

Related

Why django-viewflow internal table 'viewflow_task_previous' goes backwards?

typeOrm unique row

Cassandra - select query with token() function

retrieving data from cassandra database

Removing redundant rows in a Spark data frame with time series data

Categories

Resources