How do I select a RowKey range with Azure Table Storage? - azure

I would like to query my azure tablestorage using PrimaryKey plus I would like to check my RowKey is within a range. For example the range 02001 to 02999
Can someone tell me how I can do this? I understand how to query the PK with a simple:
where fooEntiy.PartitionKey == partition
but I don't know how I can query fooEntity.RowKey.
Also if I do this by specifying a range then will it still retrieve all the entries for that partition and then check to see if they match the range?
Thank you for your advice,
Mariko

Your query could look something like this:
where fooEntity.PartitionKey == partionKey
&& fooEntity.RowKey.CompareTo(lowerBoundRowKey) >= 0
&& fooEntity.RowKey.CompareTo(upperBoundRowKey) <= 0
This should return all of the items between the lowerBoundRowKey and the upperBoundRowKey including those values (if you don't want it to be inclusive, just use > and < rather than >= and <=).
You will not need to do any other filtering than this.
It looks like you're already padding your numbers that you're storing in the RowKey with leading zeros which is a good thing as this range will be a lexical range, not a numeric range.
e.g. running this query with lowerBoundKey = 10 and upperBoundKey = 100 will not return an item with a RowKey of 20.
If you pad it with zeros however lowerBoundKey = 00010 and upperBoundKey = 00100 will return an item with a RowKey of 00020.

This will bring entities using the specified range of RowKey values with specified PartitionKey:
" PartitionKey eq 'your partitonKey value' and (RowKey gt '02001' and RowKey lt '02999') "
Find more information here and here.
Hope this helps.

Related

Is there anyway we could limit an update?

Generally, I see we can limit the select by select * from table where predicate = value limit by N Am currently in a situation where I have 200 records falling under a predicate, but I want to update the first 100 like update table set column = 1 where predicate = value limit...? and the second half by update table set column = 2 where predicate = value. I think it could be done by having ranges <=,>= in the predicate section, unfortunately, I have none of them.
Currently, I don't think we have this feature as WHERE clause must identify the row or rows to be updated by primary key as per. However, you could further limit the number of rows to be updated by using IF EXISTS condition. Details can be found here

Performance difference between SELECT sum(coloumn_name) FROM and SELECT coloumn_name in CQL

I like to know the performance difference in executing the following two queries for a table cycling.cyclist_points containing 1000s of rows. :
SELECT sum(race_points)
FROM cycling.cyclist_points
WHERE id = e3b19ec4-774a-4d1c-9e5a-decec1e30aac;
select *
from cycling.cyclist_points
WHERE id = e3b19ec4-774a-4d1c-9e5a-decec1e30aac;
If sum(race_points) causes the query to be expensive, I will have to look for other solutions.
Performance Difference between your query :
Both of your query need to scan same number of row.(Number of row in that partition)
First query only selecting a single column, so it is little bit faster.
Instead of calculating the sum run time, try to preprocess the sum.
If race_points is int or bigint then use a counter table like below :
CREATE TABLE race_points_counter (
id uuid PRIMARY KEY,
sum counter
);
Whenever a new data inserted into cyclist_points also increment the sum with your current point.
UPDATE race_points_counter SET sum = sum + ? WHERE id = ?
Now you can just select the sum of that id
SELECT sum FROM race_points_counter WHERE id = ?

How do I select everything where two columns contain equal values in CQL?

I'm trying to select everything where two columns contain equal values. Here is my CQL query:
select count(someColumn) from somekeySpace.sometable where columnX = columnY
This doesn't work. How can I do this?
You can't query like that, cassandra don't support it
You can do this in different way.
First you have to create a separate counter table.
CREATE TABLE match_counter(
partition int PRIMARY KEY,
count counter
);
At the time of insertion into your main table if columnX = columnY then increment the value here. Though you have only a single count, you can use a static value of partition
UPDATE match_counter SET count = count + 1 WHERE partition = 1;
Now you can get the count of match column
SELECT * FROM match_counter WHERE partition = 1;

Reading the most recent updated row in cassandra

I have a use case and want suggestion on the below.
Structure :
Rowkey_1:
Column1 = value1;
Column2 = value2;
Rowkey_2:
Column1 = value1;
Column2 = value2;
" Suppose i am writing 1000 rows into cassandra with each row having couple of columns. After sometime i update only 100 rows and make changes for column values ".
-> when i read data from cassandra i only want to get these 100 updated rows and not the entire row key information.
Is there a way to say to cassandra like give me all row keys from start - > end where time in between "Time_start" to "Time_end"
in SQL Lingo -- > select from "" to "" where time between "time_start" and "time_end".
P.S. i read Basic Time Series with Cassandra where it says you can annotate rowkey like the below
Inserting data — {:key => ‘server1-load-20110306′, :column_name => TimeUUID(now), :column_value => 0.75}
Here the column family has TimeUUID columns.
My question is can you annotate you rowkey with date and time like this : { :key ==> 2012-11-18 16:00:15 }
OR any other way to get only the recent updated rows.
Any suggestion/ guidance really appreciated.
You can't do range queries on keys unless you use ByteOrderedPartitioner, which you shouldn't use. The way to do this is by writing known sentinel values as keys, such as a timestamp representing the beginning of the day. Then you can do the column slice by time.

Cassandra-secondary index on part of the composite key?

I am using a composite primary key consisting of 2 strings Name1, Name2, and a timestamp (e.g. 'Joe:Smith:123456'). I want to query a range of timestamps given an equality condition for either Name1 or Name2.
For example, in SQL:
SELECT * FROM testcf WHERE (timestamp > 111111 AND timestamp < 222222 and Name2 = 'Brown');
and
SELECT * FROM testcf WHERE (timestamp > 111111 AND timestamp < 222222 and Name1 = 'Charlie);
From my understanding, the first part of the composite key is the partition key, so the second query is possible, but the first query would require some kind of index on Name2.
Is it possible to create a separate index on a component of the composite key? Or am I misunderstanding something here?
You will need to manually create and maintain an index of names if you want to use your schema and support the first query. Given this requirement, I question your choice in data model. Your model should be designed with your read pattern in mind. I presume you are also storing some column values as well that you want to query by timestamp. If so, perhaps the following model would serve you better:
"[current_day]:Joe:Smith" {
123456:Field1 : value
123456:Field2 : value
123450:Field1 : value
123450:Field2 : value
}
With this model you can use the current day (or some known day) as a sentinel value, then filter on first and last names. You can also get a range of columns by timestamp using the composite column names.

Resources