Cassandra throws Bad Request: Batch with conditions cannot span multiple tables - cassandra

Cassandra docs and blog says that batches with conditional update statements work at the granularity of the partition, and partition is defined by first key in primary key
So, I have following keyspaces:
CREATE TABLE SOCIAL_PROFILE (
soc_net_type text,
soc_net_user_id text,
user_prof_id text,
PRIMARY KEY (soc_net_type, soc_net_user_id));
CREATE TABLE SOCIAL_PROFILE_CONTACT (
soc_prof_soc_net_type text,
soc_prof_soc_net_user_id text,
soc_net_user_id text,
PRIMARY KEY (soc_prof_soc_net_type, soc_prof_soc_net_user_id, soc_net_user_id));
And insert statement:
BEGIN BATCH
INSERT INTO social_profile (soc_net_type, soc_net_user_id, user_prof_id) VALUES ('vk', '1', '100') IF NOT EXISTS;
INSERT INTO social_profile_contact (soc_prof_soc_net_type, soc_prof_soc_net_user_id, soc_net_user_id) VALUES ('vk', '1', '2');
INSERT INTO social_profile_contact (soc_prof_soc_net_type, soc_prof_soc_net_user_id, soc_net_user_id) VALUES ('vk', '1', '3');
INSERT INTO social_profile_contact (soc_prof_soc_net_type, soc_prof_soc_net_user_id, soc_net_user_id) VALUES ('vk', '1', '4');
APPLY BATCH;
And social_profile_contact.soc_prof_soc_net_type = social_profile.soc_net_type they are both have same values and should be in the same partition, but Cassandra throws:
Bad Request: Batch with conditions cannot span multiple tables
I haven't found a word about tables in docs, what I'm doing wrong?
cqlsh 4.1.1 | Cassandra 2.0.11 | CQL spec 3.1.1 | Thrift protocol 19.39.0

"Batch with conditions cannot span multiple tables"
Two different column families (tables) using the same primary key are still two different partitions.

You need to move the statement with IF NOT EXISTS in the independent batch.
INSERT INTO social_profile (soc_net_type, soc_net_user_id, user_prof_id) VALUES ('vk', '1', '100') IF NOT EXISTS;
The batch insert with conditionals IF NOT EXISTS could contain only single type.

Related

How to use order by(Sorting) on Secondary index using Cassandra DB

My table schema is:
CREATE TABLE users
(user_id BIGINT PRIMARY KEY,
user_name text,
email_ text);
I inserted below rows into the table.
INSERT INTO users(user_id, email_, user_name)
VALUES(1, 'abc#test.com', 'ABC');
INSERT INTO users(user_id, email_, user_name)
VALUES(2, 'abc#test.com', 'ZYX ABC');
INSERT INTO users(user_id, email_, user_name)
VALUES(3, 'abc#test.com', 'Test ABC');
INSERT INTO users(user_id, email_, user_name)
VALUES(4, 'abc#test.com', 'C ABC');
For searching data into the user_name column, I created an index to use the LIKE operator with '%%':
CREATE CUSTOM INDEX idx_users_user_name ON users (user_name)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
'mode': 'CONTAINS',
'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'};
Problem:1
When I am executing below Query, it returns 3 records only, instead of 4.
select *
from users
where user_name like '%ABC%';
Problem:2
When I use below query, it gives an error as
ERROR: com.datastax.driver.core.exceptions.InvalidQueryException:
ORDER BY with 2ndary indexes is not supported.
Query =select * from users where user_name like '%ABC%' ORDER BY user_name ASC;
Query:
select *
from users
where user_name like '%ABC%'
ORDER BY user_name ASC;
My requirement is to filter the user_name with order by user_name.
The first query does work correctly for me using cassandra:latest which is now cassandra:3.11.3. You might want to double-check the inserted data (or just recreate from scratch using the cql statements you provided).
The second one gives you enough info - ordering by secondary indexes is not possible in Cassandra. You might have to sort the result set in your application.
That being said I would not recommend running this setup in real apps. With some additional scale (when you have many records) this will be a suicide performance-wise. I should not go into much detail since maybe you already understand this and SO is not a wikia/documentation site, so here is a link.

CQL table design for temporal data

As a Cassandra novice, I have a CQL design question. I want to re-use a concept which I've build before using RDBMS systems, to create history for customerData. The customer himself will only see the latest version, so that should be the fastest, but queries on whole history can be performed.
My suggested entity properties:
customerId text,
validFromDate date,
validUntilDate date,
customerData text
First save of customerData just INSERTs customerData with validFromDate=NOW and validUntilDate=31-12-9999
Subsequent saves of customerData changes the last record - setting validUntilDate=NOW - and INSERT new customerData with validFromDate=NOW and validUntilDate=31-12-9999
Result:
This way a query of (customerId, validUntilDate)=(id,31-12-9999) will give last saved version.
Query on (customerId) will give all history.
To query customerData at certain time t just use query with validFromDate < t < validUntilDate
My guess is PARTITION_KEY = customerId and CLUSTER_KEY can be validFromDate. Or use PRIMARY KEY = customerId. Or I could create two tables, one for fast querying of lastest version (has no history), and another for historical analyses.
How do you design this in CQL-way? I think I'm thinking too much RDBMish.
Use change timestamp as CLUSTERING KEY with DESC order, e.g
CREATE TABLE customer_data_versions (
id text,
change_time timestamp,
name text,
PRIMARY KEY (id, change_time)
) WITH CLUSTERING ORDER BY ( change_time DESC );
It will allow you to store data versions per customer id in descending order.
Insert two versions for the same id:
INSERT INTO customer_data_versions (id, change_time, name) VALUES ('id1', totimestamp(now()),'John');
INSERT INTO customer_data_versions (id, change_time, name) VALUES ('id1', totimestamp(now()),'John Doe');
Get last saved version:
SELECT * FROM customer_data_versions WHERE id='id1' LIMIT 1;
Get all versions for the id:
SELECT * FROM customer_data_versions WHERE id='id1';
Get versions between dates:
SELECT * FROM customer_data_versions WHERE id='id1' AND change_time <= before_date AND change_time >= after_date;
Please note, there are some limits for partition size (how much versions you will be able to store per customer id):
Cells in a partition: ~2 billion (231); single column value size: 2 GB ( 1 MB is recommended)

nested map in cassandra data modelling

I have following requirement of my dataset, need to unserstand what datatype should I use and how to save my data accordingly :-
CREATE TABLE events (
id text,
evntoverlap map<text, map<timestamp,int>>,
PRIMARY KEY (id)
)
evntoverlap = {
'Dig1': {{'2017-10-09 04:10:05', 0}},
'Dig2': {{'2017-10-09 04:11:05', 0},{'2017-10-09 04:15:05', 0}},
'Dig3': {{'2017-10-09 04:11:05', 0},{'2017-10-09 04:15:05', 0},{'2017-10-09 04:11:05', 0}}
}
This gives an error :-
Error from server: code=2200 [Invalid query] message="Non-frozen collections are not allowed inside collections: map<text, map<timestamp, int>>"
How should I store this type of data in single column . Please suggest datatype and insert command for the same.
Thanks,
There is limitation of Cassandra - you can't nest collection (or UDT) inside collection without making it frozen. So you need to "froze" one of the collections - either nested:
CREATE TABLE events (
id text,
evntoverlap map<text, frozen<map<timestamp,int>>>,
PRIMARY KEY (id)
);
or top-level:
CREATE TABLE events (
id text,
evntoverlap frozen<map<text, map<timestamp,int>>>,
PRIMARY KEY (id)
);
See documentation for more details.
CQL collections limited to 64kb, if putting things like maps in maps you might push that limit. Especially with frozen maps you are deserializing the entire map, modifying it, and re inserting. Might be better off with a
CREATE TABLE events (
id text,
evnt_key, text
value map<timestamp, int>,
PRIMARY KEY ((id), evnt_key)
)
Or even a
CREATE TABLE events (
id text,
evnt_key, text
evnt_time timestamp
value int,
PRIMARY KEY ((id), evnt_key, evnt_time)
)
It would be more efficient and safer while giving additional benefits like being able to order the event_time's in ascending or descending order.

Cassandra: List all tables in keyspace based on restriction such as LIKE or CONTAINS?

I have many tables per keyspace, therefore I would like to filter the tables based on restriction criteria. I tried this query but it is not really giving the intended result that I want:
SELECT table_name FROM system_schema.tables
WHERE keyspace_name = 'test'
and table_name >= 'test_001_%';
The output shown is:
'table_name'
---------------------
'test_001_metadata'
'test_001_time1'
'test_001_time2'
'test_001_time3'
'test_001_time4'
'test_002_metadata'
'test_002_time1'
'test_002_time2'
'test_002_time3'
What I really want is:
The output shown is:
'table_name'
---------------------
'test_001_metadata'
'test_001_time1'
'test_001_time2'
'test_001_time3'
'test_001_time4'
The other way out is to use LIKE keyword by creating secondary index on table_name. But I am a bit skeptical if it might cause problem as it is a system table. Another concern is, does clustering column ACTUALLY support secondary index?
Create a SASI index with mode contains on the table_name column after removing the previous index and try the query as
SELECT table_name FROM system_schema.tables
WHERE keyspace_name = 'test'
and table_name LIKE '%test_001_%';
The command to create a SASI index with mode contains is as follows:
CREATE CUSTOM INDEX ON system_schema.tables(table_name)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
'case_sensitive': 'false', 'tokenization_normalize_uppercase': 'true', 'mode': 'CONTAINS'}
And for your second question, you cannot create secondary index on anything which is part of PRIMARY KEY.

Cassandra: Inserting value in UDT

I am trying to insert values in UDT but getting error message -
message="unconfigured columnfamily my_object"
below my statement-
INSERT INTO home.my_object (id,type,quantity ,critical,page_count,stock,outer_envelope ) VALUES ('3.MYF','COM','D','A','VV','','');
What am i doing wrong?
That error means that the keyspace "home" exists, but does not contain a table (column family) called "my_object". I also noticed that your insert statement does not contain a UDT literal.
UDTs define a type, but you must also define a table with a column of that type before inserting any data. I assume your UDT is called "my_object". Try this:
create table home.test (key int primary key, object frozen<my_object>);
insert into home.test (key, object) values (0, {id: 'value', type: 'othervalue'});

Resources