why the default delete and update procedure of voltdb on specified table not work - voltdb

I use the default procedure generated by voltdb to update the below table,
schema:
create table sys_sec_user_org_role(
user_id bigint not null,
org_id integer not null,
role_id integer not null,
primary key(user_id,role_id,org_id)
);
partition table sys_sec_user_org_role on column user_id;
and then the default procedures were created successfully,I call procedures in the following order
insert:exec SYS_SEC_USER_ORG_ROLE.insert 2 3 4
sucess:modified_tuples:1
and then delete the inserted row
delete:exec SYS_SEC_USER_ORG_ROLE.delete 2 3 4
modified_tuples:0
I don't know why the default delete or update don't work on this table while most cases work.

[Updated]
I work at VoltDB. Thank you for sharing this. I got the same results as you and initially thought this was a bug, but one of our engineers noticed the problem.
While the .insert procedure takes the parameters in the order the columns are defined in the table, the .update and procedures are only generated if there is a primary key defined, and the arguments need to be in the order of the columns as defined in the primary key.
If you pass in the arguments in the order of the primary key columns, the default procedures will find the matching row and update or delete it.
--exec <tablename>.delete <user_id> <role_id> <org_id>;
exec SYS_SEC_USER_ORG_ROLE.delete 2 4 3;
(Returned 1 rows in 0.02s)
Here is the ticket I logged earlier for reference.

Related

YCQL Secondary indexes on tables with TTL in YugabyteDB

[Question posted by a user on YugabyteDB Community Slack]
I have a table with TTL and a secondary index, using YugabyteDB 2.9.0 and I’m getting the following error when I try to insert a row:
SyntaxException: Feature Not Supported
Below is my schema:
CREATE TABLE lists.list_table (
item_value text,
list_id uuid,
created_at timestamp,
updated_at timestamp,
is_deleted boolean,
valid_from timestamp,
valid_till timestamp,
metadata jsonb,
PRIMARY KEY ((item_value, list_id))
) WITH default_time_to_live = 0
AND transactions = {'enabled': 'true'};
CREATE INDEX list_created_at_idx ON lists.list_table (list_id, created_at)
WITH transactions = {'enabled': 'true'};
We have two types of queries (80% & 20% distribution):
select * from list_table where list_id= <id> and item_value = <value>
select * from list_table where list_id= <id> and created_at>= <created_at>
We expect per list_id there would be around 1000-10000 entries.
The TTL would be around 1 month.
It is a restriction, it’s currently not supported to transactionally expire rows using TTL out of a table which are indexed (i.e. atomic expiry of TTL entries in both table and index). There are several workarounds to this:
a) In YCQL, we also support an index with a weaker consistency. This is not well documented today, but you can see the details here: https://github.com/YugaByte/yugabyte-db/issues/1696
The main issue to call out when using this variant of index is that error handling (on INSERT failure), is that it is an application side responsibility to retry the INSERT on failure. As noted in the above issue << If an insert/update or batch of such operations fails, it is the app's responsibility to retry the operation so that the index is consistent. Much like in a 2-table case, it would have been the apps responsibility to retry (in case of a failure between the update to the two tables) to make sure both tables are in sync again. >>
This type of index supports a TTL at the table & index level. (which is recommended to keep the same): https://github.com/yugabyte/yugabyte-db/issues/2481#issuecomment-537177471
b)Another workaround is to use a background cleanup job to periodically delete stale records (instead of using TTL).
c)Avoid using indexes and store data in two tables. one organized by the original primary key and one organized by the index columns you wanted (as the primary key). Both tables can have TTL. But it is an application side responsibility to INSERT to both tables when data is added to the database.
The first table's PK would be ((list_id, item_value)), identical to the current main table. nstead of an index you'll have a second table; the second table's PK would be ((list_id), created_at) and both tables would have a TTL. The application must insert the data into both tables. In the 2nd table you have a choice:
(option 1) Duplicate all the columns from the main table here including your JSON columns etc. This makes Q2 lookup fast, the row has everything it needs; but increases your storage requirements.
(option 2): In addition to the PK, just store the item_value column in the second table. For Q2, you must first lookup the 2nd table and get the item_value, and then use list_id and item_value and retrieve the data from the main table (much like an index would do under the covers).
d)Another workaround, is if we could avoid the index and pick the PK to be ((list_id, item_value), created_at).
This would not affect the performance of Q1 because with (where list_id and item_value) provided it can use the PK to find the rows. But it would be slower for Q2 where list_id and created_at are provided because while it can still use list_id, it must filter out the data using the created_at value without the help of an index. So if Q2 is really 20% of your queries, you probably do not want to scan 1 to 10k items to find your matching row.
To clarify option (c), with the example in mind:
The first table's PK would be ((list_id, item_value)); it is the same as your current main table. Instead of an index you'll have a second table; the second table's PK would be ((list_id), created_at).
both tables would have a TTL
The application would have to insert entries into both tables.
In the 2nd table you have a choice:
(option 1) duplicate all the columns from the main table, including your JSON columns etc. This makes Q2 lookup fast, the row has everything it needs; but increases your storage requirements.
(option 2): in addition to the Primary Key, just store the item_value column in the second table. For Q2, you must first lookup the 2nd table and get the item_value, and then use list_id and item_value and retrieve the data from the main table (much like an index would do under the covers)

Can I user counter type field as primary key of my C* table?

When I am trying to create table like below
create table if not exists counter_temp(id counter PRIMARY KEY , comment text);
It is giving error as below
Multiple markers at this line
For a table with counter columns, all columns except the primary key must be type counter
counter type is not supported for PRIMARY KEY part
Question 1 :
What is the reason , counter column not allowed as part of primary key?
Question 2 :
While I am trying to create as below
create table if not exists counter_temp(id uuid PRIMARY KEY, counter_t counter, comment text)
Error : Cant mix counter and non-counter columns in the same table
What is wrong here ? how to handle it correct way ?
Question 3 :
I have a table emp( emp_id counter, emp_name text) in Dev env where has data , now I need to copy that data into another SIT env emp( emp_id counter, emp_name text) table ?
Can it be done will it copy counter fields properly ?
Short answer for question 1 is No, as it was communicated in the error message. But Even if it was allowed, then it didn't make any sense - when you change value of the primary key, you're basically create a new row with different primary key.
for Q2 - if there is at least one counter column in the table, then all other regular columns should have type counter. If you need to add a comment field, just create a 2nd table, with UUID primary key & insert or read data to/from 2 tables at the same time.
for Q3 - cqlsh's COPY command supports tables with counters for newer Cassandra versions (where fix for CASSANDRA-9043 is implemented). Also, Spark Cassandra Connector is able to read from tables with counters & write to them. But in both cases make sure that the target table is empty, otherwise new values will be appended to existing ones.

Insert identical records into multiple tables with different primary keys

I have some billions records with 15 fields, which I want to insert them into Cassandra (with Java api). Since my queries search key can be one of the five different fields of record (i.e search query on fields 3 or 7 or 8 or 13 or 14), so I have created 5 identical tables with different primary keys in Cassandra (similar the note that is mentioned in enter link description here).
Now I read a record (or a batch of the records) and call "inserting into Cassandra" 5 times.
I want to know is there a mechanism in Cassandra that makes me to call "inserting into Cassandra" one times and storing the record(s) into 5 tables automatically?
For example the record(s) stores in MemTable at once (from my code by inserting at once) and the Cassandra core stores them in 5 tables in SSTable?
Since Cassandra 3.0 there is support for materialized views that could help you. But you need to design your source table carefully, as there is a number of limitations on how you can change structure of the materialized views comparing to source table - most notably:
* you can add to primary key at most one column that isn't in the primary key of source table;
* materialized view's primary key should contain all components of primary key of source table, but you can use different order of columns in primary key.
* all columns of materialized view's primary key should be non-null.
More details on these limitations you can find in this blog post.
You also need to be careful with changing partition key to not to get the big partitions (but you may have the same problem if you write data manually). Also, take into account that this adds more load to coordinator node that will need to distribute data to other servers if partition key is changed - when you write data "manually" then driver will send request directly to replica that holds that data.
Syntax for creation of materialized views is in the documentation - it quite similar to SQL's but not exactly (example from documentation):
CREATE TABLE cyclist_mv (cid UUID PRIMARY KEY,
name text, age int, birthday date, country text);
CREATE MATERIALIZED VIEW cyclist_by_age
AS SELECT age, birthday, name, country
FROM cyclist_mv
WHERE age IS NOT NULL AND cid IS NOT NULL
PRIMARY KEY (age, cid);
In this case, we move from one column in primary key (cid) to 2 columns in the primary key (age, and cid). Note the explicit check for non-NULL values in theWHERE` condition.

Can consecutive updates to different fields on a row in Cassandra lead to inconsistency?

Assuming you have a table with a field (column) that serves as the primary (partition) key (let say its name is "id") and the rest of the fields columns are "regular" (no clustering) - lets call them "field1", "field2", field3", "field4", etc. The logic that currently exists in the system might generate 2 separate update commands to the same row. For example:
UPDATE table SET field1='value1' WHERE id='key';
UPDATE table SET field2='value2' WHERE id='key';
These commands run one after the other in quorum.
Seldom, when you retrieve the row (quorum read) from the DB, its as if one of the updates did not happen. Is it possible that the inconsistency is because of this write pattern and can be circumvented by making one update call like this:
UPDATE table SET field1='value1',field2='value2' WHERE id='key';
This is happening on Cassandra 2.1.17
Yes this is totally possible.
If you need to preserve the orders when making the two statements you can to 2 things:
add using timestamp to your queries and set it explicitly on client code - this will prevent the inconsistencies
use batch
What I would have done,is change the table definition
CREATE TABLE TABLE_NAME(
id text,
field text,
value text
PRIMARY KEY( id , field )
This way you don't have to worry about updates to fields for a particular key.
Your queries would be ,
INSERT INTO TABLE_NAME (id , field , value ) VALUES ('key','fieldname1', 'value1' );
INSERT INTO TABLE_NAME (id , field , value ) VALUES ('key','fieldname2', 'value2' );
The drawback of design is, if you have too many data for 'key',it would created wide row.
For select queries -
SELECT * from TABLE_NAME where id ='key';
On client side, build your object.

How to make Cassandra have a varying column key for a specific row key?

I was reading the following article about Cassandra:
http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/#.UzIcL-ddVRw
and it seemed to imply you can have varying column keys in cassandra for a given row key. Is that true? And if its true, how do you allow for varying row keys.
The reason I think this might be true is because say we have a user and it can like many items and we simply want the userId to be the rowkey. We let this rowKey (userID) map to all the items that specific user might like. Each specific user might like a different number of items. Therefore, if we could have multiple column keys, one for each itemID each user likes, then we could solve the problem that way.
Therefore, is it possible to have varying length of cassandra column keys for a specific rowKey? (and how do you do it)
Providing an example and/or some cql code would be awesome!
The thing that is confusing me is that I have seen some .cql files and they define keyspaces before hand and it seems pretty inflexible on how to make it dynamic, i.e. allow it to have additional columns as we please. For example:
CREATE TABLE IF NOT EXISTS results (
test blob,
tid timeuuid,
result text,
PRIMARY KEY(test, tid)
);
How can this even allow growing columns? Don't we need to specify the name before hand anyway?Or additional custom columns as the application desires?
Yes, you can have a varying number of columns per row_key. From a relational perspective, it's not obvious that tid is the name of a variable. It acts as a placeholder for the variable column key. Note in the inserts statements below, "tid", "result", and "data" are never mentioned in the statement.
CREATE TABLE IF NOT EXISTS results (
data blob,
tid timeuuid,
result text,
PRIMARY KEY(test, tid)
);
So in your example, you need to identify the row_key, column_key, and payload of the table.
The primary key contains both the row_key and column_key.
Test is your row_key.
tid is your column_key.
data is your payload.
The following inserts are all valid:
INSERT your_keyspace.results('row_key_1', 'a4a70900-24e1-11df-8924-001ff3591711', 'blob_1');
INSERT your_keyspace.results('row_key_1', 'a4a70900-24e1-11df-8924-001ff3591712', 'blob_2');
#notice that the column_key changed but the row_key remained the same
INSERT your_keyspace.results('row_key_2', 'a4a70900-24e1-11df-8924-001ff3591711', 'blob_3');
See here
Did you thought of exploring collection support in cassandra for handling such relations in colocated way{e.g. on same data node}.
Not sure if it helps, but what about keeping user id as row key and a map containing item id as key and some value?
-Vivel

Resources