SELECT COLUMN which has null values (Cassandra 3.11.3) - cassandra

I have a table (table1) with 14 columns in which 10 columns has data and I need to import data to rest of 4 columns from other table (For now these 4 columns has empty/null value)
My team met has written a code perform import of data from other table but he is facing some issues and he is asking me to give him a select query which will give/display columns which has null/empty dataset (In this example 4 columns which are having null/empty dataset)
I have tried below select query... were I have used distinct query with partition key.
SELECT distinct host_name from table1 WHERE empty_column = '' ;
Note: Column - host_name is the primary or partition key and empty_column is the column which does not have any value or null/empty dataset.
Getting error:
InvalidRequest: Error from server: code=2200 [Invalid query] message="SELECT DISTINCT with WHERE clause only supports restriction by partition key and/or static columns."
Please help...

Related

Cassandra Range queries on Map values using timestamp

I have below Cassandra table.
create table person(
id int PRIMARY KEY,
name text,
imp_dates map<text,timestamp>
);
Data inserted as below
insert into person(id,name,imp_dates) values(1,'one',{'birth':'1982-04-01','marriage':'2018-04-01'});
insert into person(id,name,imp_dates) values(2,'two',{'birth':'1980-04-01','marriage':'2010-04-01'});
insert into person(id,name,imp_dates) values(3,'three',{'birth':'1980-04-01','graduation':'2012-04-01'});
id | name | imp_dates
----+-------+-----------------------------------------------------------------------------------------------
1 | one | {'birth': '1982-03-31 18:30:00.000000+0000', 'marriage': '2018-03-31 18:30:00.000000+0000'}
2 | two | {'birth': '1980-03-31 18:30:00.000000+0000', 'marriage': '2010-03-31 18:30:00.000000+0000'}
3 | three | {'birth': '1980-03-31 18:30:00.000000+0000', 'graduation': '2012-03-31 18:30:00.000000+0000'}
I have requirement to write query as below. This required range on map value column.
select id,name,imp_dates from person where id =1 and imp_dates['birth'] < '2000-04-01';
I get following error
Error from server: code=2200 [Invalid query] message="Only EQ relations are supported on map entries"
The possible solution I can think of is:
1) Make map flat into multiple columns and then make it part of primary key. this will work but its not flexible since I may have to alter the schema
2) I can create another table person_id_by_important_dates to replace Map but then I loose read consistency as I have to read from two tables and join myself.
I do not wish to include imp_dates (map) part of primary key as it will create new row every time I insert with new values.
Appreciate help with this.
Thanks

How can I run SELECT DISTINCT for any column in CQLSH?

Every time I try to run SELECT DISTINCT %column_name from %table_name I receive
InvalidRequest: Error from server: code=2200 [Invalid query] message="SELECT DISTINCT queries must only request partition key columns and/or static columns (not specified %column_name)"
You can run SELECT DISTINCT only on your partition key column. For example, if your schema looks like:
CREATE TABLE artist (
id int PRIMARY KEY,
band_name text,
name text,
role text
);
Then query will be:
SELECT DISTINCT id FROM artist;

Is this type of counter table definition valid?

I want to create a table with wide partitions (or, put another way, a table which has no value columns (non primary key columns)) that enables the number of rows in any of its partitions to be efficiently procured. Here is a simple definition of such a table
CREATE TABLE IF NOT EXISTS test_table
(
partitionKeyCol timestamp
clusteringCol timeuuid
partitionRowCountCol counter static
PRIMARY KEY (partitionKeyCol, clusteringCol)
)
The problem with this definition, and others structured like it, is that their validity cannot be clearly deduced from the information contained in the docs.
What the docs do state (with regards to counters):
A counter column can neither be specified as part of a table's PRIMARY KEY, nor used to create an INDEX
A counter column can only be defined in a dedicated counter table (which I take to be a table which solely has counter columns defined as its value columns)
What the docs do not state (with regards to counters):
The ability of a table to have a static counter column defined for it (given the unique write path of counters, I feel that this is worth mentioning)
The ability of a table, which has zero value columns defined for it (making it a dedicated counter table, given my understanding of the term), to also have a static counter column defined for it
Given the information on this subject that is present in (and absent from) the docs, such a definition appears to be valid. However, I'm not sure how that is possible, given that the updates to partitionRowCountCol would require use of a write path different from that used to insert (partitionKeyCol, clusteringCol) tuples.
Is this type of counter table definition valid? If so, how are writes to the table carried out?
It looks like a table with this structure can be defined, but I'm struggling to find a good use case for it. It seems there is no way to actually write to that clustering column.
CREATE TABLE test.test_table (
a timestamp,
b timeuuid,
c counter static,
PRIMARY KEY (a, b)
);
cassandra#cqlsh:test> insert into test_table (a,b,c) VALUES (unixtimestampof(now()), now(), 3);
InvalidRequest: code=2200 [Invalid query] message="INSERT statements are not allowed on counter tables, use UPDATE instead"
cassandra#cqlsh:test> update test_table set c = c + 1 where a=unixtimestampof(now());
cassandra#cqlsh:test> update test_table set c = c + 1 where a=unixtimestampof(now());
cassandra#cqlsh:test> select * from test_table;
a | b | c
--------------------------+------+---
2016-03-24 15:04:31+0000 | null | 1
2016-03-24 15:04:37+0000 | null | 1
(2 rows)
cassandra#cqlsh:test> update test_table set c = c + 1 where a=unixtimestampof(now()) and b=now();
InvalidRequest: code=2200 [Invalid query] message="Invalid restrictions on clustering columns since the UPDATE statement modifies only static columns"
cassandra#cqlsh:test> insert into test_table (a,b) VALUES (unixtimestampof(now()), now());
InvalidRequest: code=2200 [Invalid query] message="INSERT statements are not allowed on counter tables, use UPDATE instead"
cassandra#cqlsh:test> update test_table set b = now(), c = c + 1 where a=unixtimestampof(now());
InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY part b found in SET part"
What is it you're trying to model?

Cassandra queries performance, ranges

I'm quite new with Cassandra, and I was wondering if there would be any impact in performance if a query is asked with "date = '2015-01-01'" or "date >= '2015-01-01' AND date <= '2015-01-01'".
The only reason I want to use the ranges like that is because I need to make multiple queries and I want to have them prepared (as in prepared statements). This way the prepared statements number is cut by half.
The keys used are ((key1, key2), date) and (key1, date, key2) in the two tables I want to use this. The query for the first table is similar to:
SELECT * FROM table1
WHERE key1 = val1
AND key2 = val2
AND date >= date1 AND date <= date2
For a PRIMARY KEY (key1, date, key2) that type of query just isn't possible. If you do, you'll see an error like this:
InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY column
"key2" cannot be restricted (preceding column "date" is either not
restricted or by a non-EQ relation)"
Cassandra won't allow you to filter by a PRIMARY KEY component if the preceding column(s) are filtered by anything other than the equals operator.
On the other hand, your queries for PRIMARY KEY ((key1, key2), date) will work and perform well. The reason, is that Cassandra uses the clustering key(s) (date in this case) to specify the on-disk sort order of data within a partition. As you are specifying partition keys (key1 and key2) your result set will be sorted by date, allowing Cassandra to satisfy your query by performing a continuous read from the disk.
Just to test that out, I'll even run two queries on a table with a similar key, and turn tracing on:
SELECT * FROM log_date2 WHERe userid=1001
AND time > 32671010-f588-11e4-ade7-21b264d4c94d
AND time < a3e1f750-f588-11e4-ade7-21b264d4c94d;
Returns 1 row and completes in 4068 microseconds.
SELECT * FROM log_date2 WHERe userid=1001
AND time=74ad4f70-f588-11e4-ade7-21b264d4c94d;
Returns 1 row and completes in 4001 microseconds.

cassandra, select via a non primary key

I'm new with cassandra and I met a problem. I created a keyspace demodb and a table users. This table got 3 columns: id (int and primary key), firstname (varchar), name (varchar).
this request send me the good result:
SELECT * FROM demodb.users WHERE id = 3;
but this one:
SELECT * FROM demodb.users WHERE firstname = 'francois';
doesn't work and I get the following error message:
InvalidRequest: code=2200 [Invalid query] message="No secondary indexes on the restricted columns support the provided operators: "
This request also doesn't work:
SELECT * FROM users WHERE firstname = 'francois' ORDER BY id DESC LIMIT 5;
InvalidRequest: code=2200 [Invalid query] message="ORDER BY with 2ndary indexes is not supported."
Thanks in advance.
This request also doesn't work:
That's because you are mis-understanding how sort order works in Cassandra. Instead of using a secondary index on firstname, create a table specifically for this query, like this:
CREATE TABLE usersByFirstName (
id int,
firstname text,
lastname text,
PRIMARY KEY (firstname,id));
This query should now work:
SELECT * FROM usersByFirstName WHERE firstname='francois'
ORDER BY id DESC LIMIT 5;
Note, that I have created a compound primary key on firstname and id. This will partition your data on firstname (allowing you to query by it), while also clustering your data by id. By default, your data will be clustered by id in ascending order. To alter this behavior, you can specify a CLUSTERING ORDER in your table creation statement:
WITH CLUSTERING ORDER BY (id DESC)
...and then you won't even need an ORDER BY clause.
I recently wrote an article on how clustering order works in Cassandra (We Shall Have Order). It explains this, and covers some ordering strategies as well.
There is one constraint in cassandra: any field you want to use in the where clause has to be the primary key of the table or there must be a secondary index on it. So you have to create an index to firstname and only after that you can use firstname in the where condition and you will get the result you were expecting.

Resources