c* schema - denormalization vs materialized view

c* schema - denormalization vs materialized view - cassandra

Schema is:
col0 int,
col1 text,
col2 text,
stamp timestamp,
somemap map<text, int>
I want to query for somemap
- using col0, col1 and a range of stamp
- using col0, col1, col2 and a range of stamp
I need every value of somemap for distinct col0, col1, col2, stamp to be present for either query (IE for the first query I want all the values of col2 to be there)
I've tried various combinations of columns for primary key but I can't find one that permits both types of queries.
I can denormalize this and create both types of tables:
- primary key ((col0, col1), stamp, col2)
- primary key ((col0, col1), col2, stamp)
What Im hoping for is a way to use a materialized view to accomplish this.

Related

Cassandra wide row with every column as composite key

I have a wide row table with column
page_id int, user_id int, session_tid timeuuid and end_time timestamp
with partition key = user_id
I need to do multiple queries on the table, some based on one column and some based on another - and it turns out that I have cases with where clause on every column
As Cassandra doesnt allow me to use where clause on non-indexed, non-key column, is it ok if I make all of the columns my composite key? (currently all but end_time column are already composite key, with user_id as the partition key)

Making all columns as part of the primary key will not allow you to perform where conditions to each column in the way you're thinking.
To make an easy example if you create such a primary key
PK(key1, key2, key3, key4)
you won't be able to perform a query like
select * from mytable where key2 = 'xyz';
Because the rule is that you have to follow the order of keys to create a "multiple-where" condition.
So valid queries with multiple where are the following:
select * from mytable where key1 = 'xyz' and key2 = 'abc';
select * from mytable where key1 = 'xyz' and key2 = 'abc' and key3 = 11;
select * from mytable where key2 = 'xyz' and key2 = 'abc' and key3 = 11 and key4 = 2014;
You can ask for keyN only providing keyN-1
HTH,
Carlo

Cassandra - dates before 1970

Is there a way to support dates older than 1970 in Cassandra while supporting dates operations on them? I can only see timestamps. If we need older dates should I simulate my own dates as longs or perhaps as strings?
CQL doesn't return anything when I issue the query:
SELECT col1 FROM table1 WHERE ts >= '1900-01-01 00:00:00+0000'

This issue seems to be ok with Cassandra 2.0.9:
CREATE TABLE table1 (id int, col1 int, ts timestamp, PRIMARY KEY (id, ts));
INSERT INTO table1 (id, col1, ts) values (1, 10, '2000-02-03');
INSERT INTO table1 (id, col1, ts) values (1, 20, '1960-02-03');
INSERT INTO table1 (id, col1, ts) values (1, 30, '1890-02-03');
SELECT col1 FROM table1 WHERE id = 1 and ts >= '1900-01-01 00:00:00+0000' limit 10;
Output:
col1
------
20
10
Problems in earlier versions might be related to CASSANDRA-6395 (fixed in 2.0.4), or JAVA-264 (that was later reverted).

will composite partition & compound key affect perfomance in cassandra?

Given below, CQL for 3 tables.
Both have same column structure, But difference in setting the PRIMARY KEY.
tab1: NO compound primary key
CREATE TABLE tab1
(
key1 text,
key2 text,
key3 text,
key4 text,
data1 text,
data2 text,
data3 int,
PRIMARY KEY(key1,key2,key3,key4));
tab2: (key1,key2) forms compound primary key
CREATE TABLE tab2
(
key1 text,
key2 text,
key3 text,
key4 int,
data1 text,
data2 text,
data3 text,
PRIMARY KEY((key1,key2),key3,key4));
tab3: (key1,key2,key3) forms compound primary key
CREATE TABLE tab3
(
key1 text,
key2 text,
key3 text,
key4 int,
data1 text,
data2 text,
data3 text,
PRIMARY KEY((key1,key2,key3),key4));
While querying value1,value2,value3 is known and key4 is specified as a range.
Sample CQL query,
select data1,data2,data3 from tab3 where key1='value1' and key2='value2' and key3='value3' and key4 > 1000 and key4 < 1000000 ;
key4 may have some 50,000 records.
Which TABLE Design in better?
Which design have better read/write performance?

If you need to support range queries over key4, then it needs to be a clustering column, so that rules out tab1. Since you're always specifying an exact value for key3, there's no need to make it a clustering column, so tab3 is a better choice than tab2. Leaving key3 in the partition key will partition your data more evenly around the cluster.

ERROR CASSANDRA: 'ascii' codec can't decode bye 0xe1 in position 27: ordinal not in range(128) cqlsh

I'm new in Cassandra and I have a trouble inserting some rows in a database getting the error of the title.
I use cassandra 1.0.8 and cqlsh for doing changes in my database.
Next, I explain the given steps before I get the error:
CREATE A COLUMN FAMILY
CREATE TABLE test (
col1 int PRIMARY KEY,
col2 bigint,
col3 boolean,
col4 timestamp
);
INSERT SEVERAL ROWS WITHOUT SPECIFICYING ALL OF COLUMNS OF THE TABLE
insert into test (col1, col2, col3) values (1, 100, true);
insert into test (col1, col2, col3) values (2, 200, false);
SELECT FOR CHECKING THAT ROWS HAVE BEEN INSERTED CORRECTLY
select * from test;
The result is the following:
INSERT A ROW SPECIFICYING A VALUE FOR THE col4 (NOT SPECIFIED BEFORE)
insert into test (col1, col2, col3, col4) values (3, 100, true, '2011-02-03');
SELECT FOR CHECKING THAT ROW HAS BEEN INSERTED CORRECTLY
select * from test;
In this SELECT is the error. The result is the following:
SELECT EACH COLUMN OF THE TABLE SEPARATELY
select col1 from test;
select col2 from test;
select col3 from test;
select col4 from test;
it works fine and shows the right values:
Then, my question is: what's the problem in the first SELECT? what's wrong?
Thanks in advance!!
NOTE:
If I define col4 as Integer rather than a timestamp it works. However, I've tried to insert col4 as the normalized format yyyy-mm-dd HH:mm (I've tried with '2011-02-03 01:05' and '2011-02-03 01:05:10') but it doesn't work.

Cassandra 1.0.8 shipped with CQL2 and that's where your problem is coming from. I managed to recreate this in 1.0.8 but it works fine with 1.2.x so my advice is upgrade if you can.
In C* 1.2.10
cqlsh> update db.user set date='2011-02-03 01:05' where user='JCTYpjJlM';
cqlsh> SELECT * from db.user ;
user | date | password
-----------+--------------------------+----------
xvkYQKerQ | null | 765
JCTYpjJlM | 2011-02-03 01:05:00+0200 | 391

#mol
Weird, try to insert col4 as Integer (convert to milliseconds first) or use the normalized format : yyyy-mm-dd HH:mm
Accodring to the doc here, you can omit the time and just input the date but it seems that breaks something in your case

Cassandra: Query with where clause containing greather- or lesser-than (< and >)

I'm using Cassandra 1.1.2 I'm trying to convert a RDBMS application to Cassandra. In my RDBMS application I have following table called table1:
| Col1 | Col2 | Col3 | Col4 |
Col1: String (primary key)
Col2: String (primary key)
Col3: Bigint (index)
Col4: Bigint
This table counts over 200 million records. Mostly used query is something like:
Select * from table where col3 < 100 and col3 > 50;
In Cassandra I used following statement to create the table:
create table table1 (primary_key varchar, col1 varchar,
col2 varchar, col3 bigint, col4 bigint, primary key (primary_key));
create index on table1(col3);
I changed the primary key to an extra column (I calculate the key inside my application).
After importing a few records I tried to execute following cql:
select * from table1 where col3 < 100 and col3 > 50;
This result is:
Bad Request: No indexed columns present in by-columns clause with Equal operator
The Query select col1,col2,col3,col4 from table1 where col3 = 67 works
Google said there is no way to execute that kind of queries. Is that right? Any advice how to create such a query?

Cassandra indexes don't actually support sequential access; see http://www.datastax.com/docs/1.1/ddl/indexes for a good quick explanation of where they are useful. But don't despair; the more classical way of using Cassandra (and many other NoSQL systems) is to denormalize, denormalize, denormalize.
It may be a good idea in your case to use the classic bucket-range pattern, which lets you use the recommended RandomPartitioner and keep your rows well distributed around your cluster, while still allowing sequential access to your values. The idea in this case is that you would make a second dynamic columnfamily mapping (bucketed and ordered) col3 values back to the related primary_key values. As an example, if your col3 values range from 0 to 10^9 and are fairly evenly distributed, you might want to put them in 1000 buckets of range 10^6 each (the best level of granularity will depend on the sort of queries you need, the sort of data you have, query round-trip time, etc). Example schema for cql3:
CREATE TABLE indexotron (
rangestart int,
col3val int,
table1key varchar,
PRIMARY KEY (rangestart, col3val, table1key)
);
When inserting into table1, you should insert a corresponding row in indexotron, with rangestart = int(col3val / 1000000). Then when you need to enumerate all rows in table1 with col3 > X, you need to query up to 1000 buckets of indexotron, but all the col3vals within will be sorted. Example query to find all table1.primary_key values for which table1.col3 < 4021:
SELECT * FROM indexotron WHERE rangestart = 0 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 1000 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 2000 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 3000 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 4000 AND col3val < 4021 ORDER BY col3val;

If col3 is always known small values/ranges, you may be able to get away with a simpler table that also maps back to the initial table, ex:
create table table2 (col3val int, table1key varchar,
primary key (col3val, table1key));
and use
insert into table2 (col3val, table1key) values (55, 'foreign_key');
insert into table2 (col3val, table1key) values (55, 'foreign_key3');
select * from table2 where col3val = 51;
select * from table2 where col3val = 52;
...
Or
select * from table2 where col3val in (51, 52, ...);
Maybe OK if you don't have too large of ranges. (you could get the same effect with your secondary index as well, but secondary indexes aren't highly recommended?). Could theoretically parallelize it "locally on the client side" as well.
It seems the "Cassandra way" is to have some key like "userid" and you use that as the first part of "all your queries" so you may need to rethink your data model, then you can have queries like select * from table1 where userid='X' and col3val > 3 and it can work (assuming a clustering key on col3val).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

c* schema - denormalization vs materialized view - cassandra

Related

Cassandra wide row with every column as composite key

Cassandra - dates before 1970

will composite partition & compound key affect perfomance in cassandra?

ERROR CASSANDRA: 'ascii' codec can't decode bye 0xe1 in position 27: ordinal not in range(128) cqlsh

Cassandra: Query with where clause containing greather- or lesser-than (< and >)

Categories

Resources