How to do multiget in CQL3 for composite row key? - cassandra

CF schema:
CREATE TABLE mytable (
upperId int,
lowerId int,
hour timestamp,
counter text,
succ int,
fail int,
PRIMARY KEY ((upperId, lowerId), hour, counter));
each record is keyed by composite id upperId:lowerid, how can I do multiget with CQL3?
This is not valid:
select * from mytable where (upperid, lowerid) in ((10000, 1), (10000, 2), (20000, 1));
I can't do this either:
select * from mytable where (upperid = 10000 and lowerid in (1, 2)) or (upperid = 20000 and lowerid = 1);
I got error: missing EOF at ')'.
Please help point to effective way to do multiget for composite row key in CQL3.
Thanks,
William

CQL does not yet support a logical "or" in select statements.
Instead, in your application your could combine the result sets from the two queries:
select * from mytable where upperid = 10000 and lowerid in (1, 2);
select * from mytable where upperid = 20000 and lowerid = 1;
Reference:
SO question: Alternative for OR condition after where clause in select statement Cassandra
Latest CQL docs

Related

Cassandra : Key Level access in Map type columns

In Cassandra,Suppose we require to access key level against map type column. how to do it?
Create statement:
create table collection_tab2(
empid int,
emploc map<text,text>,
primary key(empid));
Insert statement:
insert into collection_tab2 (empid, emploc ) VALUES ( 100,{'CHE':'Tata Consultancy Services','CBE':'CTS','USA':'Genpact LLC'} );
select:
select emploc from collection_tab2;
empid | emploc
------+--------------------------------------------------------------------------
100 | {'CBE': 'CTS', 'CHE': 'Tata Consultancy Services', 'USA': 'Genpact LLC'}
In that case, if want to access 'USA' key alone . What I should do?
I tried based on the Index. But all values are coming.
CREATE INDEX fetch_index ON killrvideo.collection_tab2 (keys(emploc));
select * from collection_tab2 where emploc CONTAINS KEY 'CBE';
empid | emploc
------+--------------------------------------------------------------------------
100 | {'CBE': 'CTS', 'CHE': 'Tata Consultancy Services', 'USA': 'Genpact LLC'}
But expected:
'CHE': 'Tata Consultancy Services'
Just as a data model change I would strongly recommend:
create table collection_tab2(
empid int,
emploc_key text,
emploc_value text,
primary key(empid, emploc_key));
Then you can query and page through simply as the emploc_key is clustering key instead of part of the cql collection that has multiple limits and negative performance impacts.
Then:
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'CHE', 'Tata Consultancy Services');
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'CBE, 'CTS');
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'USA', 'Genpact LLC');
Can also put it in a unlogged batch and it will still be applied efficiently and atomically because all in the same partition.
To do it as you have you can after 4.0 with CASSANDRA-7396 with [] selectors like:
SELECT emploc['USA'] FROM collection_tab2 WHERE empid = 100;
But I would still strongly recommend data model changes as its significantly more efficient, and can work in existing versions with:
SELECT * FROM collection_tab2 WHERE empid = 100 AND emploc_key = 'USA';

SQL Oracle Sub-query

I am having a issue getting this Sub-query to run. I am using Toad Data Point -Oracle. I get syntax error. I have tried several different ways with no luck. I am knew to sub-query's
Select *
from FINC.VNDR_ITEM_M as M
where M.ACCT_DOC_NBR = A.ACCT_DOC_NBR
(SELECT A.CLIENT_ID,
A.SRC_SYS_ID,
A.CO_CD,
A.ACCT_NBR,
A.CLR_DT,
A.ASGN_NBR,
A.FISCAL_YR,
A.ACCT_DOC_NBR,
A.LINE_ITEM_NBR,
A.MFR_PART_NBR,
A.POST_DT,
A.DRCR_IND,
A.DOC_CRNCY_AMT,
A.CRNCY_CD,
A.BSL_DT
FROM FINC.VNDR_ITEM_F A
WHERE A.CLR_DT IN (SELECT MAX(B.CLR_DT)
FROM FINC.VNDR_ITEM_F AS B
where (B.ACCT_DOC_NBR = A.ACCT_DOC_NBR and B.FISCAL_YR=A.FISCAL_YR and B.LINE_ITEM_NBR = A.LINE_ITEM_NBR and B.SRC_SYS_ID =A.SRC_SYS_ID and B.POST_DT=A.POST_DT and B.CO_CD=A.CO_CD)
and (B.CO_CD >='1000' and B.CO_CD <= '3000' or B.CO_CD ='7090') and (B.POST_DT Between to_date ('08/01/2018','mm/dd/yyyy')
AND to_date ('08/31/2018', 'mm/dd/yyyy')) and (B.SRC_SYS_ID ='15399') and (B.FISCAL_YR ='2018'))
GROUP BY
A.CLIENT_ID,
A.SRC_SYS_ID,
A.CO_CD,
A.ACCT_NBR,
A.CLR_DT,
A.ASGN_NBR,
A.FISCAL_YR,
A.ACCT_DOC_NBR,
A.LINE_ITEM_NBR,
A.MFR_PART_NBR,
A.POST_DT,
A.DRCR_IND,
A.DOC_CRNCY_AMT,
A.CRNCY_CD,
A.BSL_DT)
Your syntax is broken, you put subquery just at the end. Now it looks like:
select *
from dual as m
where a.dummy = m.dummy
(select dummy from dual)
It is in incorrect place, not joined, not aliased. What you should probably do is:
select *
from dual m
join (select dummy from dual) a on a.dummy = m.dummy
You also have some redundant, unnecessary brackets, but that's minor flaw. Full code (I cannot test it without data access):
select *
from FINC.VNDR_ITEM_M M
join (SELECT A.CLIENT_ID, A.SRC_SYS_ID, A.CO_CD, A.ACCT_NBR, A.CLR_DT, A.ASGN_NBR,
A.FISCAL_YR, A.ACCT_DOC_NBR, A.LINE_ITEM_NBR, A.MFR_PART_NBR, A.POST_DT,
A.DRCR_IND, A.DOC_CRNCY_AMT, A.CRNCY_CD, A.BSL_DT
FROM FINC.VNDR_ITEM_F A
WHERE A.CLR_DT IN (SELECT MAX(B.CLR_DT)
FROM FINC.VNDR_ITEM_F AS B
where B.ACCT_DOC_NBR = A.ACCT_DOC_NBR
and B.FISCAL_YR=A.FISCAL_YR
and B.LINE_ITEM_NBR = A.LINE_ITEM_NBR
and B.SRC_SYS_ID =A.SRC_SYS_ID
and B.POST_DT=A.POST_DT
and B.CO_CD=A.CO_CD
and (('1000'<=B.CO_CD and B.CO_CD<='3000') or B.CO_CD='7090')
and B.POST_DT Between to_date ('08/01/2018', 'mm/dd/yyyy')
AND to_date ('08/31/2018', 'mm/dd/yyyy')
and B.SRC_SYS_ID ='15399' and B.FISCAL_YR ='2018')
GROUP BY A.CLIENT_ID, A.SRC_SYS_ID, A.CO_CD, A.ACCT_NBR, A.CLR_DT, A.ASGN_NBR,
A.FISCAL_YR, A.ACCT_DOC_NBR, A.LINE_ITEM_NBR, A.MFR_PART_NBR, A.POST_DT,
A.DRCR_IND, A.DOC_CRNCY_AMT, A.CRNCY_CD, A.BSL_DT) A
on M.ACCT_DOC_NBR = A.ACCT_DOC_NBR and M.CO_CD=A.CO_CD;
You need to add an alias to the SubSelect (or Derived Table in Standard SQL):
select *
from
( select .......
) AS dt
join ....

Need to fetch n percentage of rows in u-sql query

Need help in writing u-sql query to fetch me top n percentage of rows.I have one dataset from which need to take total count of rows and take top 3% rows from dataset based on col1. Code which I have written is :
#count = SELECT Convert.ToInt32(COUNT(*)) AS cnt FROM #telData;
#count1=SELECT cnt/100 AS cnt1 FROM #count;
DECLARE #cnt int=SELECT Convert.ToInt32(cnt1*3) FROM #count1;
#EngineFailureData=
SELECT vin,accelerator_pedal_position,enginefailure=1
FROM #telData
ORDER BY accelerator_pedal_position DESC
FETCH #cnt ROWS;
#telData is my basic dataset.Thanks for help.
Some comments first:
FETCH currently only takes literals as arguments (https://msdn.microsoft.com/en-us/library/azure/mt621321.aspx)
#var = SELECT ... will assign the name #var to the rowset expression that starts with the SELECT. U-SQL (currently) does not provide you with stateful scalar variable assignment from query results. Instead you would use a CROSS JOIN or other JOIN to join the scalar value in.
Now to the solution:
To get the percentage, take a look at the ROW_NUMBER() and PERCENT_RANK() functions. For example, the following shows you how to use either to answer your question. Given the simpler code for PERCENT_RANK() (no need for the MAX() and CROSS JOIN), I would suggest that solution.
DECLARE #percentage double = 0.25; // 25%
#data = SELECT *
FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)
) AS T(pos);
#data =
SELECT PERCENT_RANK() OVER(ORDER BY pos) AS p_rank,
ROW_NUMBER() OVER(ORDER BY pos) AS r_no,
pos
FROM #data;
#cut_off =
SELECT ((double) MAX(r_no)) * (1.0 - #percentage) AS max_r
FROM #data;
#r1 =
SELECT *
FROM #data CROSS JOIN #cut_off
WHERE ((double) r_no) > max_r;
#r2 =
SELECT *
FROM #data
WHERE p_rank >= 1.0 - #percentage;
OUTPUT #r1
TO "/output/top_perc1.csv"
ORDER BY p_rank DESC
USING Outputters.Csv();
OUTPUT #r2
TO "/output/top_perc2.csv"
ORDER BY p_rank DESC
USING Outputters.Csv();

Datastax java driver Date Filter Issue

I have a table with one timestamp column . When i try to execute a date filter using this timestamp column it doesn't give any results. Table structure and code segment is follows.
create table status_well
(
wid int,
data_time timestamp,
primary key (wid ,data_time)
)
SimpleDateFormat DATE_FORMAT = new SimpleDateFormat("yyyy-MM-dd");
PreparedStatement statement = session.prepare("select from status_well where data_time>? and data_time<?");
BoundStatement boundStatement=new BoundStatement(statement);
statement.setDate("data_time", DATE_FORMAT.parse("2015-05-01"));
statement.setDate("data_time", DATE_FORMAT.parse("2015-05-10"));
Data is there for the above specified date range but no data returns . I tried with a string instead of DATE_FORMAT.parse("2015-05-01") but that gives an invalid type error .
Please advise me on this.
There are two placeholders for the data_time column, so you need to use index-based setters, or named placeholders:
PreparedStatement statement = session.prepare("select * from status_well "
+ "where wid = :wid "
+ "and data_time > :min and data_time < :max");
BoundStatement boundStatement = new BoundStatement(statement)
.setInt("wid", 1)
.setDate("min", DATE_FORMAT.parse("2015-05-01"))
.setDate("max", DATE_FORMAT.parse("2015-05-10"));
(also added a restriction on wid to get a valid CQL query, as was mentioned in the comments)

CQL timeuuid comparison using maxTimeuuid / minTimeuuid

I am using Datastax cassandra distribution on Mac OS X (dsc-cassandra-1.2.6). I want to use timeuuid types, and was experimenting with queries against them.
Here is my table:
CREATE TABLE test_t (
canon_key text,
t timeuuid,
PRIMARY KEY (canon_key, t)
)
Now lets say I get a row.
cqlsh:pagedb> select canon_key, t, dateOf(t), unixTimestampOf(t) from test_t where canon_key = 'xxx' and t >= minTimeuuid('2013-08-08 18:43:58-0700');
canon_key | t | dateOf(t) | unixTimestampOf(t)
-----------+--------------------------------------+--------------------------+--------------------
xxx | 287d3c30-0095-11e3-9268-a7d2e09193eb | 2013-08-08 18:43:58-0700 | 1376012638067
Now, I want to delete this row. I don't see a good way of doing it, because there is no equality operator for the timeuuid type.
The nature of the data I am adding is such that I (possibly) wouldn't even mind doing this:
cqlsh:pagedb> select canon_key, t, dateOf(t), unixTimestampOf(t) from test_t where canon_key = 'xxx' and t >= minTimeuuid('2013-08-08 18:43:58-0700') and t <= maxTimeuuid('2013-08-08 18:43:58-0700');
But according to the documentation (http://cassandra.apache.org/doc/cql3/CQL.html#usingdates), that will not work. Quoting: " Please note that t >= maxTimeuuid('2013-01-01 00:05+0000') would still not select a timeuuid generated exactly at ‘2013-01-01 00:05+0000’ and is essentially equivalent to t > maxTimeuuid('2013-01-01 00:05+0000').".
So.. how do I delete this row?
Your premise is mistaken -- minTimeuuid and maxTimeuuid do exist to allow inequality operations on timeuuids, but that does not mean that simple equality is not supported:
cqlsh:foo> insert into test_t (canon_key, t) values ('k', now());
cqlsh:foo> select * from test_t;
canon_key | t
-----------+--------------------------------------
k | 27609890-0209-11e3-b862-59d5a2b25b8f
(1 rows)
cqlsh:foo> delete from test_t where canon_key = 'k' and t = 27609890-0209-11e3-b862-59d5a2b25b8f;
cqlsh:foo> select * from test_t;
(0 rows)

Resources