CQL: Bad Request: Missing CLUSTERING ORDER for column - cassandra

What is the problem with this CQL query
cqlsh> create table citybizz.notifications(
... userId varchar,
... notifId UUID,
... notification varchar,
... time bigint,read boolean,
... primary key (userId, notifId,time)
... ) with clustering order by (time desc);
It throws Bad Request: Missing CLUSTERING ORDER for column notifid. I am using cassandra 1.2.2

You need to specify the order for notifId too:
create table citybizz.notifications(
userId varchar,
notifId UUID,
notification varchar,
time bigint,read boolean,
primary key (userId, notifId,time)
) with clustering order by (notifId asc, time desc);
Cassandra doesn't assume default ordering (asc) for the other clustering keys so you need to specify it.

Related

Cassandra Invalid Query: Some cluster keys are missing

I'm using Cassandra 3.0.
My table was created with this query, but when I try to insert data into the table, I get the error: 'Some cluster keys are missing: created'
Table Structure:
CREATE TABLE db.feed (
action_object_id int,
owner_id int,
created timeuuid,
action_object text,
action_object_type int,
actor text,
feed_type text,
target text,
target_type int,
verb text,
PRIMARY KEY (action_object_id, owner_id, created)
) WITH CLUSTERING ORDER BY (owner_id ASC, created ASC)
You must have to provide values for all the primary keys. action_object_id, owner_id, created must have to be mentioned in your insert query.
Ex: insert into db.feed(action_object_id, owner_id, created, ...) values (?,?,?,...). And you cannot provide NULL values for primary keys. created cannot be null.

Is cassandra suitable for analytics storing?

I'm willing to develop an open-source analytics project which will store visits, referers, devices (by kind, family etc.).
I'm fairly new to the cassandra world so I'm asking a lot of questions about modeling with it.
I have read a lot of documentation about it, here is a part of my datamodel:
create table visits(
id UUID,
remote_addr VARCHAR,
method VARCHAR,
user_agent VARCHAR,
status_code INT,
host VARCHAR,
protocol VARCHAR,
path VARCHAR,
data VARCHAR,
headers VARCHAR,
query_string VARCHAR,
referer_id UUID,
device_id UUID,
browser_id UUID,
platform_id UUID,
created_at TIMEUUID,
PRIMARY KEY (id, created_at) ) WITH CLUSTERING ORDER BY (created_at DESC);
create table referers(
id UUID PRIMARY KEY,
host VARCHAR,
path VARCHAR,
first_seen TIMESTAMP,
last_seen TIMESTAMP,
seen_count INT );
create table browsers(
id UUID PRIMARY KEY,
key VARCHAR,
version VARCHAR,
first_seen TIMESTAMP,
last_seen TIMESTAMP,
seen_count INT );
create table platforms(
id UUID PRIMARY KEY,
key VARCHAR,
version VARCHAR,
first_seen TIMESTAMP,
last_seen TIMESTAMP,
seen_count INT );
With this model, if I want for example "all visits from status_code 200" I will have to create a secondary index, same for referers, devices, etc.
So do I need to create individual tables "visits_by_referers", "visits_by_devices" like so:
create table visits_by_referers(
visit_id UUID,
device_id UUID,
PRIMARY KEY (visit_id, device_id)
);
or am I completely wrong and cassandra is not suitable for this?
Thank you :)
Until 3.0 comes out with Materialized Views (https://issues.apache.org/jira/browse/CASSANDRA-6477), which will be HUGE for this type of use case, you need to create individual tables for things like 'visits by referrer' if you plan on doing direct querying.
What a lot of people tend to do is use a single large table, and then overlay something like Spark to actually read the data into memory and do much more complicated querying.

ORDER BY reloaded, cassandra

A given column family I would like to sort and to this I am trying to create a table with the option CLUSTERING ORDER BY. I always encounter the following errors:
1.) Variant A resulting in
Bad Request: Missing CLUSTERING ORDER for column userid
Statement:
CREATE TABLE test.user (
userID timeuuid,
firstname varchar,
lastname varchar,
PRIMARY KEY (lastname, userID)
)WITH CLUSTERING ORDER BY (lastname desc);
2.) Variant B resulting in
Bad Request: Only clustering key columns can be defined in CLUSTERING ORDER directive
Statement:
CREATE TABLE test.user (
userID timeuuid,
firstname varchar,
lastname varchar,
PRIMARY KEY (lastname, userID)
)WITH CLUSTERING ORDER BY (lastname desc, userID asc);
As far as I can see in the manual this is the correct syntax for creating a table for which I would like to run queries as "SELECT .... FROM user WHERE ... ORDER BY lastname". How could I achieve this? (The column 'lastname' I would like to keep as the first part of the primary key, so that I could use it in delete statements with the WHERE-clause.)
Thanks a lot, Tamas
You can only specify clustering order on your clustering keys.
PRIMARY KEY (lastname, userID)
)WITH CLUSTERING ORDER BY (lastname desc);
In your first example, your only clustering key is userID. Thus, it is the only valid entry for CLUSTERING ORDER BY.
PRIMARY KEY (lastname, userID)
)WITH CLUSTERING ORDER BY (lastname desc, userID asc);
The second example fails because you are specifying your partition key in CLUSTERING ORDER BY, and that's not going to work either.
Cassandra works by ordering CQL rows according to clustering keys, but only when a partition key is specified. This is because the whole idea of Cassandra wide-row modeling is to query by partition key, and read a series of ordered rows in one query operation.
I would like to run queries as "SELECT .... FROM user WHERE ... ORDER BY lastname".
Given this statement, I am going to suggest that you need another column in this model before it will work the way you want. What you need is an appropriate partition key for your users table. Say...like group. With your users partitioned by group, and clustered by lastname, your definition would look something like this:
CREATE TABLE test.usersbygroup (
userID timeuuid,
firstname varchar,
lastname varchar,
group text,
PRIMARY KEY (group,lastname)
)WITH CLUSTERING ORDER BY (lastname desc);
Then, this query will work, returning users (in this case) who are fans of the show "Firefly," ordered by lastname (descending):
SELECT * FROM usersbygroup WHERE group='Firefly Fans';
Read through this DataStax doc on Compound Keys and Clustering to get a better understanding.
NOTE: You don't need to specify ORDER BY in your SELECT. The rows will come back ordered by their clustering key(s), and ORDER BY cannot change that. All ORDER BY can really do, is alter the sort direction (DESCending vs. ASCending).
Clustering would be limited to whats defined in partitioning key, in your case (lastName + userId). So cassandra would store result in sorted order whose (lastName+userId) combination. Thats why u nned to give both for retrieval purpose. Its still not useful schema if you want to sort all data in table as last name as userId is unique(timeuuid) so clustering key would be of no use.
CREATE TABLE test.user (
userID timeuuid,
firstname varchar,
lastname varchar,
bucket int,
PRIMARY KEY (bucket)
)WITH CLUSTERING ORDER BY (lastname desc);
Here if u provide buket value say 1 for all user records then , all user would go in same bucket and hense it would retrieve all rows in sorted order of last name. (By no mean this is a good design, just to give you an idea).
Revised :
CREATE TABLE user1 (
userID uuid,
firstname varchar,
lastname varchar,
bucket int,
PRIMARY KEY ((bucket), lastname,userID)
)WITH CLUSTERING ORDER BY (lastname desc);

How to design the cassandra table for one query with a ordering and limit?

Now I created a table:
CREATE TABLE posts_by_user(
user_id bigint,
post_id uuid,
post_at timestamp,
PRIMARY KEY (user_id,post_id)
);
I want to select last 10 rows with operator IN for user_id and ordering by post_at field.
Also I read a good article:
http://planetcassandra.org/blog/the-in-operator-in-cassandra-cql/
I can nit use query: WHERE post_at = time AND user_id IN (1,2) because I need all notes, not for a concrete date.
How i can change my design schema? Thank you.
I change on:
CREATE TABLE posts_by_user (
user_id bigint,
post_id uuid,
post_at timestamp,
PRIMARY KEY (user_id, post_at)
) WITH CLUSTERING ORDER BY (post_at DESC);
Think it is a good...
How about using this approach: http://www.datastax.com/documentation/cql/3.1/cql/cql_using/use-slice-partition.html

Non-EQ relation error Cassandra - how fix primary key?

I created a one table posts. When I make request SELECT:
return $this->db->query('SELECT * FROM "posts" WHERE "id" IN(:id) LIMIT '.$this->limit_per_page, ['id' => $id]);
I get error:
PRIMARY KEY column "id" cannot be restricted (preceding column
"post_at" is either not restricted or by a non-EQ relation)
My table dump is:
CREATE TABLE posts (
id uuid,
post_at timestamp,
user_id bigint,
name text,
category set<text>,
link varchar,
image set<varchar>,
video set<varchar>,
content map<text, text>,
private boolean,
PRIMARY KEY (user_id,post_at,id)
)
WITH CLUSTERING ORDER BY (post_at DESC);
I read some article about PRIMARY AND CLUSTER KEYS, and understood, when there are some primary keys - I need use operator = with IN. In my case, i can not use a one PRIMARY KEY. What you advise me to change in table structure, that error will disappear?
My dummy table structure
CREATE TABLE posts (
id timeuuid,
post_at timestamp,
user_id bigint,
PRIMARY KEY (id,post_at,user_id)
)
WITH CLUSTERING ORDER BY (post_at DESC);
And after inserting some dummy data
I ran query select * from posts where id in (timeuuid1,timeuuid2,timeuuid3);
I was using cassandra 2.0 with cql 3.0

Resources