Cassandra select query failure

Cassandra select query failure - cassandra

We have a table:
CREATE TABLE table (
col1 text,
col2 text,
col3 timestamp,
cl4 int,
col5 timestamp,
PRIMARY KEY (col1, col2, col3, col4)
) WITH CLUSTERING ORDER BY (col2 DESC, col3 DESC,col4 DESC)
When I try querying from this table like:
select * from table where col1 = 'something' and col3 < 'something'
and col4= 12 limit 5 ALLOW FILTERING;
select * from table where col1 = 'something' and col4 < 23
and col3 >= 'something' ALLOW FILTERING;
I always get the error: Clustering column "col4" cannot be restricted (preceding column "col3" is restricted by a non-EQ relation) .
I tried to change the table creation by making col4, col3, col2, but the second query doesn't work and throw a similar error.
Any suggetion/advice to solve this problem.
We are on : Cassandra 3.0.17.7.

You can use non-equality condition only on the last column of partition of the query.
For example, you can do use col1 = val and col2 <= ..., or col1 = val and col2 = val2 and col3 <= ..., or col1 = val and col2 = val2 and col3 = val3 and col4 <= ..., but you can't do non-equality condition on several columns - that's how Cassandra reads data.

Related

How to get top 3 columns and their values across multiple columns (dynamically) in BigQuery

I have a table that looks like this
select 'Alice' AS ID, 1 as col1, 3 as col2, -2 as col3, 9 as col4
union all
select 'Bob' AS ID, -9 as col1, 2 as col2, 5 as col3, -6 as col4
I would like to get the top 3 absolute values for each record across the four columns and then format the output as a dictionary or STRUCT like below
select
'Alice' AS ID, [STRUCT('col4' AS column, 9 AS value), STRUCT('col2',3), STRUCT('col3',-2)] output
union all
select
'Bob' AS ID, [STRUCT('col1' AS column, -9 AS value), STRUCT('col4',-6), STRUCT('col3',5)]
output
output
I would like it to be dynamic, so avoid writing out columns individually. It could go up to 100 columns that change
For more context, I am trying to get the top three features from the batch local explanations output in Vertex AI
https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/get-batch-predictions
I have looked up some examples, would like something similar to the second answer here How to get max value of column values in a record ? (BigQuery)
EDIT: the data is actually structured like this. If this can be worked with more easily, this would be a better option to work from
select 'Alice' AS ID, STRUCT(1 as col1, 3 as col2, -2 as col3, 9 as col4) AS featureAttributions
union all
SELECT 'Bob' AS ID, STRUCT(-9 as col1, 2 as col2, 5 as col3, -6 as col4) AS featureAttributions

Consider below query.
SELECT ID, ARRAY_AGG(STRUCT(column, value) ORDER BY ABS(value) DESC LIMIT 3) output
FROM (
SELECT * FROM sample_table UNPIVOT (value FOR column IN (col1, col2, col3, col4))
)
GROUP BY ID;
Query results
Dynamic Query
I would like it to be dynamic, so avoid writing out columns individually
You need to consider a dynamic SQL for this. By refering to the answer from #Mikhail you linked in the post, you can write a dynamic query like below.
EXECUTE IMMEDIATE FORMAT("""
SELECT ID, ARRAY_AGG(STRUCT(column, value) ORDER BY ABS(value) DESC LIMIT 3) output
FROM (
SELECT * FROM sample_table UNPIVOT (value FOR column IN (%s))
)
GROUP BY ID
""", ARRAY_TO_STRING(
REGEXP_EXTRACT_ALL(TO_JSON_STRING((SELECT AS STRUCT * EXCEPT (ID) FROM sample_table LIMIT 1)), r'"([^,{]+)":'), ',')
);
For updated sample table
SELECT ID, ARRAY_AGG(STRUCT(column, value) ORDER BY ABS(value) DESC LIMIT 3) output
FROM (
SELECT * FROM (SELECT ID, featureAttributions.* FROM sample_table)
UNPIVOT (value FOR column IN (col1, col2, col3, col4))
)
GROUP BY ID;
EXECUTE IMMEDIATE FORMAT("""
SELECT ID, ARRAY_AGG(STRUCT(column, value) ORDER BY ABS(value) DESC LIMIT 3) output
FROM (
SELECT * FROM (SELECT ID, featureAttributions.* FROM sample_table)
UNPIVOT (value FOR column IN (%s))
)
GROUP BY ID
""", ARRAY_TO_STRING(
REGEXP_EXTRACT_ALL(TO_JSON_STRING((SELECT featureAttributions FROM sample_table LIMIT 1)), r'"([^,{]+)":'), ',')
);

Pull latest record from data using distinct on?

I have a data that looks like below:
col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15
2020-10-30 17:57:17,False,2020-07-01,14,2,False,0.0,True,30.0,True,30.0,True,True,True,False
2020-10-30 17:57:17,False,2020-07-01,15,2,True,28.0,False,0.0,False,0.0,True,True,True,False
2020-11-15 17:57:17,True,2020-07-01,5,2,True,28.0,False,0.0,False,0.0,True,True,True,False
2020-11-15 17:57:17,False,2020-07-01,7,2,False,0.0,True,30.0,True,30.0,True,True,True,False
My query looks like the following:
select distinct on (col3) col4
from table where col13 is true and col15 is false
and col3 = '2020-07-01'
and col1 <= '2020-09-16'
and col2 is false order by col3, col1 asc;
My expected answer should be [14, 15] since these are earliest records for '2020-07-01'. However using the above query I only get [15]. Any ideas what I might be doing wrong.

I was able to resolve this using the following query:
select distinct col4,
from table where col13 is true and col15 is false
and col3 = '2020-07-01'
and col1 = (select min(col1) from table
where col1 <= '2020-09-16' and col3 = '2020-07-01')
and col2 is false;

Update table based on CTE

I have an update query as below:
WITH update_cte as (
SELECT col1, col2,
daterange(col3, col4) as col5
FROM temp_table
)
UPDATE my_table
SET col5 = update_cte.col5
FROM update_cte inner join my_table
ON (update_cte.col1 = mytable.col1)
AND (update_cte.col2 = mytable.col2)
It gives me the following error:
"Error: table name \"my_table\" specified more than once\n"}

I was able to resolve it by specifying an alias in inner join.
WITH update_cte as (
SELECT col1, col2,
daterange(col3, col4) as col5
FROM temp_table
)
UPDATE my_table
SET col5 = update_cte.col5
FROM update_cte inner join my_table as idr
ON (update_cte.col1 = idr.col1)
AND (update_cte.col2 = idr.col2)

Creating SQL Queries Dynamically in Scala and Apache Spark Dataframe

I am having two tables TableA and TableB with below structure
Table1
PkCol1
PkCol2
Col3
Col4
Col5
Table2
PkCol1
PkCol2
Col3
Col4
Col5
But i am getting the primary key information as input. For example, i receive it as PkCol2,PkCol2. I may receive more primaryKey columns as input too.
How do i dynamically add my where condition to spark sql ?
Below is my code
df.createOrReplaceTempView("Table1")
df2.createOrReplaceTempView("Table2")
primaryKeyString = ar(1)
val df3 = spark.sql("Select * from table1 where "+primaryKeyString+" not in (select "+primaryKeyString+" from table2)").toDF()
If there is a better way to do it with Dataframes let me know.
I am able to acheive my purpose by concatenating in spark sql as below
val df3 = spark.sql("Select * from table1 where CONCAT("+primaryKeyString+") not in (select CONCAT("+primaryKeyString+") from table2)").toDF()
Trying to find if there is a better way to achieve it in scala.

ERROR CASSANDRA: 'ascii' codec can't decode bye 0xe1 in position 27: ordinal not in range(128) cqlsh

I'm new in Cassandra and I have a trouble inserting some rows in a database getting the error of the title.
I use cassandra 1.0.8 and cqlsh for doing changes in my database.
Next, I explain the given steps before I get the error:
CREATE A COLUMN FAMILY
CREATE TABLE test (
col1 int PRIMARY KEY,
col2 bigint,
col3 boolean,
col4 timestamp
);
INSERT SEVERAL ROWS WITHOUT SPECIFICYING ALL OF COLUMNS OF THE TABLE
insert into test (col1, col2, col3) values (1, 100, true);
insert into test (col1, col2, col3) values (2, 200, false);
SELECT FOR CHECKING THAT ROWS HAVE BEEN INSERTED CORRECTLY
select * from test;
The result is the following:
INSERT A ROW SPECIFICYING A VALUE FOR THE col4 (NOT SPECIFIED BEFORE)
insert into test (col1, col2, col3, col4) values (3, 100, true, '2011-02-03');
SELECT FOR CHECKING THAT ROW HAS BEEN INSERTED CORRECTLY
select * from test;
In this SELECT is the error. The result is the following:
SELECT EACH COLUMN OF THE TABLE SEPARATELY
select col1 from test;
select col2 from test;
select col3 from test;
select col4 from test;
it works fine and shows the right values:
Then, my question is: what's the problem in the first SELECT? what's wrong?
Thanks in advance!!
NOTE:
If I define col4 as Integer rather than a timestamp it works. However, I've tried to insert col4 as the normalized format yyyy-mm-dd HH:mm (I've tried with '2011-02-03 01:05' and '2011-02-03 01:05:10') but it doesn't work.

Cassandra 1.0.8 shipped with CQL2 and that's where your problem is coming from. I managed to recreate this in 1.0.8 but it works fine with 1.2.x so my advice is upgrade if you can.
In C* 1.2.10
cqlsh> update db.user set date='2011-02-03 01:05' where user='JCTYpjJlM';
cqlsh> SELECT * from db.user ;
user | date | password
-----------+--------------------------+----------
xvkYQKerQ | null | 765
JCTYpjJlM | 2011-02-03 01:05:00+0200 | 391

#mol
Weird, try to insert col4 as Integer (convert to milliseconds first) or use the normalized format : yyyy-mm-dd HH:mm
Accodring to the doc here, you can omit the time and just input the date but it seems that breaks something in your case

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Cassandra select query failure - cassandra

Related

How to get top 3 columns and their values across multiple columns (dynamically) in BigQuery

Pull latest record from data using distinct on?

Update table based on CTE

Creating SQL Queries Dynamically in Scala and Apache Spark Dataframe

ERROR CASSANDRA: 'ascii' codec can't decode bye 0xe1 in position 27: ordinal not in range(128) cqlsh

Categories

Resources