CF schema:
CREATE TABLE mytable (
upperId int,
lowerId int,
hour timestamp,
counter text,
succ int,
fail int,
PRIMARY KEY ((upperId, lowerId), hour, counter));
each record is keyed by composite id upperId:lowerid, how can I do multiget with CQL3?
This is not valid:
select * from mytable where (upperid, lowerid) in ((10000, 1), (10000, 2), (20000, 1));
I can't do this either:
select * from mytable where (upperid = 10000 and lowerid in (1, 2)) or (upperid = 20000 and lowerid = 1);
I got error: missing EOF at ')'.
Please help point to effective way to do multiget for composite row key in CQL3.
Thanks,
William
CQL does not yet support a logical "or" in select statements.
Instead, in your application your could combine the result sets from the two queries:
select * from mytable where upperid = 10000 and lowerid in (1, 2);
select * from mytable where upperid = 20000 and lowerid = 1;
Reference:
SO question: Alternative for OR condition after where clause in select statement Cassandra
Latest CQL docs
Related
In Cassandra,Suppose we require to access key level against map type column. how to do it?
Create statement:
create table collection_tab2(
empid int,
emploc map<text,text>,
primary key(empid));
Insert statement:
insert into collection_tab2 (empid, emploc ) VALUES ( 100,{'CHE':'Tata Consultancy Services','CBE':'CTS','USA':'Genpact LLC'} );
select:
select emploc from collection_tab2;
empid | emploc
------+--------------------------------------------------------------------------
100 | {'CBE': 'CTS', 'CHE': 'Tata Consultancy Services', 'USA': 'Genpact LLC'}
In that case, if want to access 'USA' key alone . What I should do?
I tried based on the Index. But all values are coming.
CREATE INDEX fetch_index ON killrvideo.collection_tab2 (keys(emploc));
select * from collection_tab2 where emploc CONTAINS KEY 'CBE';
empid | emploc
------+--------------------------------------------------------------------------
100 | {'CBE': 'CTS', 'CHE': 'Tata Consultancy Services', 'USA': 'Genpact LLC'}
But expected:
'CHE': 'Tata Consultancy Services'
Just as a data model change I would strongly recommend:
create table collection_tab2(
empid int,
emploc_key text,
emploc_value text,
primary key(empid, emploc_key));
Then you can query and page through simply as the emploc_key is clustering key instead of part of the cql collection that has multiple limits and negative performance impacts.
Then:
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'CHE', 'Tata Consultancy Services');
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'CBE, 'CTS');
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'USA', 'Genpact LLC');
Can also put it in a unlogged batch and it will still be applied efficiently and atomically because all in the same partition.
To do it as you have you can after 4.0 with CASSANDRA-7396 with [] selectors like:
SELECT emploc['USA'] FROM collection_tab2 WHERE empid = 100;
But I would still strongly recommend data model changes as its significantly more efficient, and can work in existing versions with:
SELECT * FROM collection_tab2 WHERE empid = 100 AND emploc_key = 'USA';
I am having a issue getting this Sub-query to run. I am using Toad Data Point -Oracle. I get syntax error. I have tried several different ways with no luck. I am knew to sub-query's
Select *
from FINC.VNDR_ITEM_M as M
where M.ACCT_DOC_NBR = A.ACCT_DOC_NBR
(SELECT A.CLIENT_ID,
A.SRC_SYS_ID,
A.CO_CD,
A.ACCT_NBR,
A.CLR_DT,
A.ASGN_NBR,
A.FISCAL_YR,
A.ACCT_DOC_NBR,
A.LINE_ITEM_NBR,
A.MFR_PART_NBR,
A.POST_DT,
A.DRCR_IND,
A.DOC_CRNCY_AMT,
A.CRNCY_CD,
A.BSL_DT
FROM FINC.VNDR_ITEM_F A
WHERE A.CLR_DT IN (SELECT MAX(B.CLR_DT)
FROM FINC.VNDR_ITEM_F AS B
where (B.ACCT_DOC_NBR = A.ACCT_DOC_NBR and B.FISCAL_YR=A.FISCAL_YR and B.LINE_ITEM_NBR = A.LINE_ITEM_NBR and B.SRC_SYS_ID =A.SRC_SYS_ID and B.POST_DT=A.POST_DT and B.CO_CD=A.CO_CD)
and (B.CO_CD >='1000' and B.CO_CD <= '3000' or B.CO_CD ='7090') and (B.POST_DT Between to_date ('08/01/2018','mm/dd/yyyy')
AND to_date ('08/31/2018', 'mm/dd/yyyy')) and (B.SRC_SYS_ID ='15399') and (B.FISCAL_YR ='2018'))
GROUP BY
A.CLIENT_ID,
A.SRC_SYS_ID,
A.CO_CD,
A.ACCT_NBR,
A.CLR_DT,
A.ASGN_NBR,
A.FISCAL_YR,
A.ACCT_DOC_NBR,
A.LINE_ITEM_NBR,
A.MFR_PART_NBR,
A.POST_DT,
A.DRCR_IND,
A.DOC_CRNCY_AMT,
A.CRNCY_CD,
A.BSL_DT)
Your syntax is broken, you put subquery just at the end. Now it looks like:
select *
from dual as m
where a.dummy = m.dummy
(select dummy from dual)
It is in incorrect place, not joined, not aliased. What you should probably do is:
select *
from dual m
join (select dummy from dual) a on a.dummy = m.dummy
You also have some redundant, unnecessary brackets, but that's minor flaw. Full code (I cannot test it without data access):
select *
from FINC.VNDR_ITEM_M M
join (SELECT A.CLIENT_ID, A.SRC_SYS_ID, A.CO_CD, A.ACCT_NBR, A.CLR_DT, A.ASGN_NBR,
A.FISCAL_YR, A.ACCT_DOC_NBR, A.LINE_ITEM_NBR, A.MFR_PART_NBR, A.POST_DT,
A.DRCR_IND, A.DOC_CRNCY_AMT, A.CRNCY_CD, A.BSL_DT
FROM FINC.VNDR_ITEM_F A
WHERE A.CLR_DT IN (SELECT MAX(B.CLR_DT)
FROM FINC.VNDR_ITEM_F AS B
where B.ACCT_DOC_NBR = A.ACCT_DOC_NBR
and B.FISCAL_YR=A.FISCAL_YR
and B.LINE_ITEM_NBR = A.LINE_ITEM_NBR
and B.SRC_SYS_ID =A.SRC_SYS_ID
and B.POST_DT=A.POST_DT
and B.CO_CD=A.CO_CD
and (('1000'<=B.CO_CD and B.CO_CD<='3000') or B.CO_CD='7090')
and B.POST_DT Between to_date ('08/01/2018', 'mm/dd/yyyy')
AND to_date ('08/31/2018', 'mm/dd/yyyy')
and B.SRC_SYS_ID ='15399' and B.FISCAL_YR ='2018')
GROUP BY A.CLIENT_ID, A.SRC_SYS_ID, A.CO_CD, A.ACCT_NBR, A.CLR_DT, A.ASGN_NBR,
A.FISCAL_YR, A.ACCT_DOC_NBR, A.LINE_ITEM_NBR, A.MFR_PART_NBR, A.POST_DT,
A.DRCR_IND, A.DOC_CRNCY_AMT, A.CRNCY_CD, A.BSL_DT) A
on M.ACCT_DOC_NBR = A.ACCT_DOC_NBR and M.CO_CD=A.CO_CD;
You need to add an alias to the SubSelect (or Derived Table in Standard SQL):
select *
from
( select .......
) AS dt
join ....
Need help in writing u-sql query to fetch me top n percentage of rows.I have one dataset from which need to take total count of rows and take top 3% rows from dataset based on col1. Code which I have written is :
#count = SELECT Convert.ToInt32(COUNT(*)) AS cnt FROM #telData;
#count1=SELECT cnt/100 AS cnt1 FROM #count;
DECLARE #cnt int=SELECT Convert.ToInt32(cnt1*3) FROM #count1;
#EngineFailureData=
SELECT vin,accelerator_pedal_position,enginefailure=1
FROM #telData
ORDER BY accelerator_pedal_position DESC
FETCH #cnt ROWS;
#telData is my basic dataset.Thanks for help.
Some comments first:
FETCH currently only takes literals as arguments (https://msdn.microsoft.com/en-us/library/azure/mt621321.aspx)
#var = SELECT ... will assign the name #var to the rowset expression that starts with the SELECT. U-SQL (currently) does not provide you with stateful scalar variable assignment from query results. Instead you would use a CROSS JOIN or other JOIN to join the scalar value in.
Now to the solution:
To get the percentage, take a look at the ROW_NUMBER() and PERCENT_RANK() functions. For example, the following shows you how to use either to answer your question. Given the simpler code for PERCENT_RANK() (no need for the MAX() and CROSS JOIN), I would suggest that solution.
DECLARE #percentage double = 0.25; // 25%
#data = SELECT *
FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)
) AS T(pos);
#data =
SELECT PERCENT_RANK() OVER(ORDER BY pos) AS p_rank,
ROW_NUMBER() OVER(ORDER BY pos) AS r_no,
pos
FROM #data;
#cut_off =
SELECT ((double) MAX(r_no)) * (1.0 - #percentage) AS max_r
FROM #data;
#r1 =
SELECT *
FROM #data CROSS JOIN #cut_off
WHERE ((double) r_no) > max_r;
#r2 =
SELECT *
FROM #data
WHERE p_rank >= 1.0 - #percentage;
OUTPUT #r1
TO "/output/top_perc1.csv"
ORDER BY p_rank DESC
USING Outputters.Csv();
OUTPUT #r2
TO "/output/top_perc2.csv"
ORDER BY p_rank DESC
USING Outputters.Csv();
I have a table with one timestamp column . When i try to execute a date filter using this timestamp column it doesn't give any results. Table structure and code segment is follows.
create table status_well
(
wid int,
data_time timestamp,
primary key (wid ,data_time)
)
SimpleDateFormat DATE_FORMAT = new SimpleDateFormat("yyyy-MM-dd");
PreparedStatement statement = session.prepare("select from status_well where data_time>? and data_time<?");
BoundStatement boundStatement=new BoundStatement(statement);
statement.setDate("data_time", DATE_FORMAT.parse("2015-05-01"));
statement.setDate("data_time", DATE_FORMAT.parse("2015-05-10"));
Data is there for the above specified date range but no data returns . I tried with a string instead of DATE_FORMAT.parse("2015-05-01") but that gives an invalid type error .
Please advise me on this.
There are two placeholders for the data_time column, so you need to use index-based setters, or named placeholders:
PreparedStatement statement = session.prepare("select * from status_well "
+ "where wid = :wid "
+ "and data_time > :min and data_time < :max");
BoundStatement boundStatement = new BoundStatement(statement)
.setInt("wid", 1)
.setDate("min", DATE_FORMAT.parse("2015-05-01"))
.setDate("max", DATE_FORMAT.parse("2015-05-10"));
(also added a restriction on wid to get a valid CQL query, as was mentioned in the comments)
I am using Datastax cassandra distribution on Mac OS X (dsc-cassandra-1.2.6). I want to use timeuuid types, and was experimenting with queries against them.
Here is my table:
CREATE TABLE test_t (
canon_key text,
t timeuuid,
PRIMARY KEY (canon_key, t)
)
Now lets say I get a row.
cqlsh:pagedb> select canon_key, t, dateOf(t), unixTimestampOf(t) from test_t where canon_key = 'xxx' and t >= minTimeuuid('2013-08-08 18:43:58-0700');
canon_key | t | dateOf(t) | unixTimestampOf(t)
-----------+--------------------------------------+--------------------------+--------------------
xxx | 287d3c30-0095-11e3-9268-a7d2e09193eb | 2013-08-08 18:43:58-0700 | 1376012638067
Now, I want to delete this row. I don't see a good way of doing it, because there is no equality operator for the timeuuid type.
The nature of the data I am adding is such that I (possibly) wouldn't even mind doing this:
cqlsh:pagedb> select canon_key, t, dateOf(t), unixTimestampOf(t) from test_t where canon_key = 'xxx' and t >= minTimeuuid('2013-08-08 18:43:58-0700') and t <= maxTimeuuid('2013-08-08 18:43:58-0700');
But according to the documentation (http://cassandra.apache.org/doc/cql3/CQL.html#usingdates), that will not work. Quoting: " Please note that t >= maxTimeuuid('2013-01-01 00:05+0000') would still not select a timeuuid generated exactly at ‘2013-01-01 00:05+0000’ and is essentially equivalent to t > maxTimeuuid('2013-01-01 00:05+0000').".
So.. how do I delete this row?
Your premise is mistaken -- minTimeuuid and maxTimeuuid do exist to allow inequality operations on timeuuids, but that does not mean that simple equality is not supported:
cqlsh:foo> insert into test_t (canon_key, t) values ('k', now());
cqlsh:foo> select * from test_t;
canon_key | t
-----------+--------------------------------------
k | 27609890-0209-11e3-b862-59d5a2b25b8f
(1 rows)
cqlsh:foo> delete from test_t where canon_key = 'k' and t = 27609890-0209-11e3-b862-59d5a2b25b8f;
cqlsh:foo> select * from test_t;
(0 rows)