Azure Table Storage - Retrieving all entities matching a partial row key - azure

I am just learning Azure Table Storage and I'm able to save and retrieve entities without any problem. However, I'd like to do the following. Say I have row keys (all with the same partition key) that look as follows:
KJV-C1-V1
KJV-C1-V2
KJV-C1-V3
KJV-C2-V1
KJV-C2-V2
KJV-C2-V3
I'd like to be able to perform two types of queries in .NET C#:
Retrieve all entities with row keys that start with 'KJV-C1'.
Retrieve all entities with row keys that contain '-C1-' in the key
Preferrably I'd like to be able to do this without reading all entities in the partition and filtering the ones that don't match the pattern I'm looking for. Is this possible with Azure Table Storage queries?

Retrieve all entities with row keys that start with 'KJV-C1'.
This is possible. Sample OData query:
PartitionKey eq 'your partition key' and (RowKey ge 'KJV-C1' and RowKey lt 'KJV-C2')
Retrieve all entities with row keys that contain '-C1-' in the key
This unfortunately is not possible. You would have to fetch all entities and filter the data on the client side.

You cannot do something like contains() over keys. But as it supports CompareTo("") method, you need to slightly modify your table design.
Maintain multiple partition keys instead of single. You can simple push 'KJV' part of your row key to partition key. Then start with C1-V1, C1-V2 as your row keys.
Then, if you want
All entries of KJV - Query for partition key 'KJV'
All 'C1' entries of KJV - Query for partition key 'KJV' and row key starting with 'C1-'
All entries for C1 - Query for row key starting with 'C1-'
OR with out design change in table, you need to loop through your major products like 'KJV' and build multiple queries with each starting with 'KJV-C1-', then union all of them to get final result.
Please mind that table storage does not allow all LINQ operations and sometimes you need to design the table keys keeping your majority of queries in mind.

Related

Azure Table Storage: Retrieve all data by the same, specific partition key

I'm trying to retrieve a list of orderitems with the same orderID (orderID as the partition keys) from the table storage. For example: U001, and under one orderID there will be multiple productsID acting as rowkeys linked under the aforementioned U001 orderID.
The problem is so far with my knowledge you only can retrieve a table storage's by specifically mentioning BOTH the partition key and rowkey. Is there anyway to get all the data in the table storage by the only specifying the partition key?
The problem is so far with my knowledge you only can retrieve a table
storage's by specifically mentioning BOTH the partition key and
rowkey.
Not true. Considering a partition key/row key combination uniquely identifies an entity, if you want to fetch a single entity then you would specify both partition key and row key to get that entity.
Is there anyway to get all the data in the table storage by the only
specifying the partition key?
Yes. You would need to query entities in your table for that. You query (filter criteria) would be PartitionKey eq 'your-partition-key'. That way you will be able to fetch entities matching your partition key.
Please see this link for more details: https://learn.microsoft.com/en-us/rest/api/storageservices/query-entities.

Regarding suggestion of best schema for a cassandra table?

I want to have a table in Cassandra that has a partition key say column 'A', and a column say 'B' which is of 'set' type and can have up to 10000 elements in the set.
But when i retrieve a row from this table then the whole set is retrieved at once and because of that the JVM heap increases rapidly. So should i stick to this schema or go with other schema where 'A' is partition key and i make dynamic columns for each element in the set in my other schema say 'B1', 'B2' ..... 'B10,000'where each of this column is a clustering key.
Which schema is suited best and will give the optimal performance please recommend.
NOTE: cqlsh 5.0.1v
Based off of what you've described, and the documentation I've read, I would not create a collection with 10k elements. Instead I would have two tables, one with everything but the collection, and then use the primary key values of the first table, as the partition key columns of the second table; adding the element name (or whatever you can use to identify an individual element) as a clustering column.
So for a given query, if you wanted everything for a particular primary key value (including all elements), you'd query the first table with the primary key, grab whatever you need, then hit the second table as well, looping/fetching through all elements.
If the query only provides a filter on the partition key (not the primary key - i.e. retrieving multiple rows) , the first query would have to retrieve all columns that make up the primary key for each row, and then query the second table looping for all elements - nested loop here - one loop for each primary key record retrieved from the first table, and a second loop to grab all elements for each pk record.
Probably the best way to go with this. That's how I would probably tackle this.
Does that make sense?
-Jim

How to have unique key except primary key in cassandra?

I am not good in English!
There is a table in Cassandra 3.5 which all columns of a row don't come at same time. Unique of table is some columns that are unique in a row together, but some of them are null at first. I can not set them the primary key because of null value. I have identify a column with name id and type uuid in Cassandra.
How can I have a unique key with that columns together in Cassandra?
Is my data model true?
How can I solve this problem?
You can't. It's not a relational DB. Use clustering and/or partitioning keys to add an unique constraint.
See this answer
To store unique values, create a separate table having your unique value as a key. Check if it exists by requesting this table before inserting a row. But beware, even doing this, you cannot ensure it will be unique in your final table if you have two concurrent inserts.
Basically, I would recommend using Cassandra as it really is: A data store. And find a way to implement your business logic where it belongs: in your code.

Is cassandra a row column database?

Im trying to learn cassandra but im confused with the terminology.
Many instances it says the row stores key/value pairs.
but, when I define a table its more like declaring a SQL table ie; you create a table and specify the column names and data types.
Can someone clarify this?
Cassandra is a column based NoSQL database. While yes at its lowest level it does store simple key-value pairs it stores these key-value pairs in collections. This grouping of keys and collections is analogous to rows and columns in a traditional relational model. Cassandra tables contain a schema and can be referenced (with restrictions) using a SQL-like language called CQL.
In your comment you ask about Apples being stored in a different table from oranges. The answer to that specific question is No it will be in the same table. However Cassandra tables have an additional concept call the Partition Key that doesn't really have an analgous concept in the relational world. Take for example the following table definition
CREATE TABLE fruit_types {
fruit text,
location text,
cost float,
PRIMARY KEY ((fruit), location)
}
In this table definition you will notice that we are defining the schema for the table. You will also notice that we are defining a PRIMARY KEY. This primary key is similar but not exactly like a relational concept. In Cassandra the PRIMAY KEY is made up of two parts the PARTITION KEY and CLUSTERING COLUMNS. The PARTITION KEY is the first fields specified in the PRIMARY KEY and can contain one or more fields delimitated by parenthesis. The purpose of the PARTITION KEY is to be hashed and used to define the node that owns the data and is also used to physically divide the information on the disk into files. The CLUSTERING COLUMNS make up the other columns listed in the PRIMARY KEY and amongst other things are used for defining how the data is physically stored on the disk inside the different files as specified by the PARTITION KEY. I suggest you do some additional reading on the PRIMARY KEY here if your interested in more detail:
https://docs.datastax.com/en/cql/3.0/cql/ddl/ddl_compound_keys_c.html
Basically cassandra storage is like sparse matrix, earlier version has a command line tool called cqlsh which can show the exact storage foot print of your columnfamily(aka table in latest version). Later community decided to keep RDBMS kind of syntax for better understanding coz the query language(CQL) syntax is similar to sql.
main storage is key(partition) (which is hash function result of chosen partition column in your table and rest of the columns will be tagged to it like sparse matrix.

Azure Table Storage: Order by

I am building a web site that has a wish list. I want to store the wish list(s) in azure table storage, but also want the user to be able to sort their wish list, when viewing it, a number of different ways - date added, date added reversed, item name etc. I also want to implement paging which I believe I can implement by making use of the continuation token.
As I understand it, "order by" isn't implemented and the order that results are returned from table storage is based on the partition key and row key. Therefore if I want to implement the paging and sorting that I describe, is the best way to implement this by storing the wish list multiple times with different partition key / row key?
In this simple case, it is likely that the wish list won't be that large and I could in fact restrict the maximum number of items that can appear in the list, then get rid of paging and sort in memory. However, I have more complex cases that I also need to implement paging and sorting for.
On today’ s hardware having 1000’s of rows to hold, in a list, in memory and sort is easily supportable. What the real issue is, how possible is it for you to access the rows in table storage using the Keys and not having to do a table scan. Duplicating rows across multiple tables could get quite cumbersome to maintain.
An alternate solution, would be to temporarily stage your rows into SQL Azure and apply an order by there. This may be effective if your result set is too large to work in memory. For best results the temporary table would need to have the necessary indexes.
Azure Storage keeps entities in lexicographical order, indexed by Partition Key as primary index and Row Key as secondary index. In general for your scenario it sounds like UserId would be a good fit for a partition key, so you have the Row Key to optimize for per each query.
If you want the user to see the wish lists latest on top, then you can use the log tail pattern where your row key will be the inverted Date Time Ticks of the DateTime when the wish list was entered by the user.
https://learn.microsoft.com/azure/storage/tables/table-storage-design-patterns#log-tail-pattern
If you want user to see their wish lists ordered by the item name you could have your item name as your row key, and so the entities will naturally sorted by azure.
When you are writing the data you may want to denormalize the data and do multiple writes with these different row key schemas. Since you will have the same partition key as user id, you can at that stage do a batch insert operation and not worry about consistency since azure table batch operations are atomic.
To differentiate the different rowkey schemas, you may want to prepend each with a const string value. Like your inverted ticks row key value for instance woul dbe something like "InvertedTicks_[InvertedDateTimeTicksOfTheWishList]" and your item names row key value would be "ItemName_[ItemNameOfTheWishList]"
Why not do all of this in .net using a List.
For this type of application I would have thought SQL Azure would have been more appropriate.
Something like this worked just fine for me:
List<TableEntityType> rawData =
(from c in ctx.CreateQuery<TableEntityType>("insysdata")
where ((c.PartitionKey == "PartitionKey") && (c.Field == fieldvalue))
select c).AsTableServiceQuery().ToList();
List<TableEntityType> sortedData = rawData.OrderBy(c => c.DateTime).ToList();

Resources